Aller au contenu

Utilisateur:Bouziane.Rabab/Brouillon

Une page de Wikipédia, l'encyclopédie libre.

Distributed file system in cloud is a file system that allows many clients to have access to the same data/file providing important operations (create, delete, modify, read, write). Each file may be partitioned into several parts called chunks. Each chunk is stored in remote machines.Typically, data is stored in files in a hierarchical tree where the nodes represent the directories. Hence, it facilitates the parallel execution of applications. There are several ways to share files in a distributed architecture. Each solution must be suitable for a certain type of application relying on how complex is the application or how simple it is. Meanwhile, the security of the system must be ensured. Confidentiality, availability and integrity are the main keys for a secure system. Nowadays, users can share resources from any computer/device, anywhere and everywhere through internet thanks to cloud computing which is typically characterized by the scalable and elastic resources -such as physical servers, applications and any services- that are virtualized and allocated dynamically. Thus, synchronization is required to make sure that all devices are update. Distributed file systems enable also many big, medium and small enterprises to store and access their remote data exactly as they do locally, facilitating the use of variable resources.

Overview[modifier | modifier le code]

History[modifier | modifier le code]

Today, there are many implementations of distributed file systems. The first file servers were developed by researchers in the 1970s, and the Sun's Network File System were disposable in the early 1980. Before that, people who wanted to share files used the sneakernet method. Once the computer networks start to progress, it became obvious that the existing file systems had a lot of limitations and were unsuitable for multi-user environments. At the beginning, many users started to use FTP to share files. [1] It started running on the DPD-10 in the end of 1973. Even if with FTP, files needed to be copied from the source computer onto a server and also from the server onto the destination computer. And that force the users to know the physical addresses of all computers concerned by the file sharing.[2]

Supporting techniques[modifier | modifier le code]

Cloud computing use important techniques to enforce the performance of all the system. Modern Data centers provide a huge environment with data center networking (DCN) and consisting of big number of computers characterized by different capacity of storage. MapReduce framework had shown its performance with Data-intensive computing applications in a parallel and distributed system. Moreover, virtualization technique has been employed to provide dynamic resource allocation and allowing multiple operating systems to coexist on the same physical server.

Applications[modifier | modifier le code]

As cloud computing provides a large-scale computing thanks to its ability of providing to the user the needful CPU and storage resources with a complete transparency, it makes it very suitable to different types of applications that require a large-scale distributed processing. That kind of Data-intensive computing needs a high performance file system that can share data between VMs (Virtual machines)[3].

The application of the Cloud Computing and Cluster Computing paradigms are becoming increasingly important in the industrial data processing and scientific applications such as astronomy or physic ones that frequently demand a the availability of a huge number on computers in order to lead the required experiments. The cloud computing have represent a new way of using the computing infrastructure by dynamically allocating the needed resources, release them once it's finished and only pay for what they use instead of paying some resources, for a certain time fixed earlier(the pas-as-you-go model). That kind of services is often provide in the context of Service-level agreement. [4]

Architectures[modifier | modifier le code]

Most of distributed file systems are built on the client-server architecture, but yet others decentralized solutions exist as well.

Client-server architecture[modifier | modifier le code]

NFS (network file system) is the one of the most that use this architecture. NFS enable to share files between a certain number of machines on a network as if they were located locally. It provides a standardized view of the local file system. The NFS protocol allows heterogeneous clients (process), probably running on different operating systems and machines, to access the files on a distant server, ignoring the actual location of files. However, relying on a single server makes the NFS protocol suffering form a low availability and a poor scalability. Using multiple servers does not solve the problem since each server is working independently. [5] The model of NFS is the remote file service. This model is also called the remote access model which is in contrast with the upload/download model:

  • remote access model: provides the transparency , the client has access to a file . He can do requests to the remote file(the file remains on the server) [6]
  • upload/download model: the client can access the file only locally. It means that he has to download the file , make the modification and uploaded it again so it can be used by others clients.

The file system offered by NFS is almost the same as the one offered by UNIX systems. Files are hierarchically organized into a naming graph in which directories and files are represented by nodes.

Cluster-Based architectures[modifier | modifier le code]

It's rather an amelioration of client-server architecture in a way that improve the execution of parallel application. The technique used here is the file-striping one. This technique lead to split a file into several segments in order to save them in multiple servers. The goal is to have access to different parts of a file in parallel. If the application does not benefit from this technique, then it could be more convenient to just store different files on different servers. However, when it comes to organize a distributed file system for large data centers such as Amazon and Google that offer services to web clients allowing multiple operations (reading, updating, deleting,...) to a huge amount of files distributed among a massive number of computers, then it becomes more interesting. Note that a massive number of computers opens the door for more hardware failures because more server machines mean more hardware and thus high probability of hardware failures. [7]. Two of the most widely used DFS are the Google file system and the Hadoop distributed file system. In both systems, the file system is implemented by user level processes running on top of a standard operating system (in the case of GFS, Linux) [8].

design principles[modifier | modifier le code]

GFS and HDFS are specifically built for handling batch processing on very large data sets. For that, the following hypotheses must be taken into account [9]:

  • High availability: The cluster can contain thousands of file servers and some of them can be down at any time
  • Servers belong to a rack,a room, a data center, a country and a continent in order to precisely identify its geographical location
  • The size of file can varied form many gigabytes to many terabytes. The file system should be able to support a massive number of files
  • Need to support append operations and allow file contents to be visible even while a file is being written
  • Communication is reliable among working machines. TCP/IP is used with an Remote_procedure_call RPC communication abstraction
Examples[modifier | modifier le code]
GFS[modifier | modifier le code]

Among the biggest internet companies, Google has created its own distributed file system named Google File System (GFS) to meet the rapidly growing requests of Google's data processing needs and it's used for all cloud services. GFS is a scalable distributed file system for data-intensive applications. It provides a fault-tolerant way to store data and offer a high performance to a large number of clients.

GFS uses MapReduce that allows users to create programs and run them on multiple machines without thinking about the parallelization and load-balancing issues . GFS architecture is based on a single master, multiple chunckservers and multiple clients. [10]

The master server running on a dedicated node is responsible for coordinating storage resources and managing files's metadata (such as the equivalent of inodes in classical file systems) [11]. Each file is splited to multiple chunks of 64 MByte. Each chunk is stored in a chunk server.A chunk is identified by a chunk handle, which is a globally unique 64-bit number that is assigned by the master when the chunk is first created.

As said previously, the master maintain all of the files's metadata including their names, directories and the mapping of files to the list of chunks that contain each file’s data.The metadata is kept in the master main memory, along with the mapping of files to chunks. Updates of these data are logged to the disk onto an operation log. This operation log is also replicated onto remote machines. When the log become too large, a checkpoint is made and the main-memory data is stored in a B-tree structure to facilitate the mapped back into main memory [12].

For fault tolerance, a chunk is replicated onto multiple chunkservers, by default on three chunckservers[13]. A chunk is available on at least a chunk server. The advantage of this system is the simplicity. The master is responsible of allocating the chunk servers for each chunk and it's contacted only for metadata information. For all other data, the client has to interact with chunkservers. Moreover, the master keeps track of where a chunk is located. However, it does not attempt to keep precisely the chunk locations but occasionally contact the chunk servers to see which chunks they have stored. [lelivre] GFS is a scalable distributed file system for data-intensive applications [14]. The master does not have a problem of bottleneck due to all the work that has to to accomplish. In fact, when the client want to access a data, it communicates with the master to see which chunk server is holding that data. Once done, the communication is setted up between the client and the concerned chunk server.

In GFS, most files are modified by appending new data and not overwriting existing data. In fact, once written, the files are only read and often only sequentially rather than randomly, and that made this DFS the most suitable for scenarios in which many large files are created once but read many times [15] [16]. Now, let's detail the file access process. When a client wants to write/update to a file, the master should accord a replica for this operation. This replica will be the primary replica since it's the first one that gets the modification from clients. The process of writing is decomposed into two steps [17]:

  • sending: First, and by far the most important, the client contacts the master to find out which chunk servers holds the data. So the client is given a list of replicas identifying the primary chunk server and secondaries ones. Then, the client contacts the nearest replica chunk server, and send the data to it. This server will send the data to the next closest one, which then forwards it to yet another replica, and so on. After that, the data have been propagated but not yet written to a file (sits in a cache)
  • writing: When all the replicas receive the data, the client sends a write request to the primary chunk server -identifying the data that was sent in the sending phase- who will then assign a sequence number to the write operations that it has received, applies the writes to the file in serial-number order, and forwards the write requests in that order to the secondaries. Meanwhile, the master is kept out of the loop.

Consequently, we can differentiate two types of flows: the data flow and the control flow. The first one is associated to the sending phase and the second one is associated to the writing phase. This assures that the primary chunk server takes the control of the writes order. Note that when the master accord the write operation to a replica, it increments the chunk version number and informs all of the replicas containing that chunk of the new version number. Chunk version numbers allow to see if any replica didn't make the update because that chunkserver was down [18].


It seems that some new Google applications didn't work well with the 64-megabyte chunk size. To treat that, GFS started in 2004 to implement the BigTable approach."[1]

HDFS[modifier | modifier le code]

HDFS, Hadoop Distributed File System,hosted by Apache Software Foundation, is a distributed file system. It's designed to hold very large amounts of data (terabytes or even petabytes). It's architecture is similar to GFS one, i.e. a master/slave architecture.The HDFS is normally installed on a cluster of computers. The design concept of Hadoop refers to Google, including Google File System, Google MapReduce and BigTable. These three techniques are individually mapping to Hadoop and Distributed File System (HDFS), Hadoop MapReduce Hadoop Base (HBase), [19].

An HDFS cluster consists of a single NameNode and several Datanode machines. A nameNode, a master server, manages and maintains the metadata of storage DataNodes in its RAM. DataNodes manage storage attached to the nodes that they run on. The NameNode and Datanode are software programs designed to run on everyday use machines.These machines typically run on a GNU/Linux OS. HDFS can be run on any machine that supports Java and therefore can run either a NameNode or the Datanode software [20].

More explicitly, a file is split into one or more equal-size blocks except the last one that could smaller. Each block is stored in multiple DataNodes. Each block may be replicated on multiple DataNodes to guarantee a high availability. By default, each block is replicated three times and that process is called "Block Level Replication" [21].

The NameNode manage the file system namespace operations like opening, closing, and renaming files and directories and regulates the file access. It also determines the mapping of blocks to DataNodes. The DataNodes are responsible for operating read and write requests from the file system’s clients, managing the block allocation or deletion, and replicating blocks.

When a client wants to read or write data, it contacts the NameNode and the NameNode checks where the data should be read from or written to. After that, the client has the location of the dataNode and can send reads/writes request to it.

The HDFS is typically characterized by its compatibility with data rebalancing schemes. The problem of data rebalancing will be developed in a ulterior section. In general, managing the free space on a DataNode is very important. Data must be moved form a DataNode to another one if its free space is not satisfying. And also, in the case of creating additional replicas, data should move to assure the balance of the system.

Load balancing and rebalancing[modifier | modifier le code]
load balancing and rebalancing
Load balancing[modifier | modifier le code]

Load Balancing is essential for efficient operations in distributed environments. It means distributing the amount of work to do between different nodes in order to get more work done in the same amount of time and clients get served faster. In our case, consider a large-scale distributed file system. The system contains N chunkservers in a cloud (N can be 1000, 10000, or more) and where a certain number of files are stored. Each file is splitted into several parts/chunks of fixed- size( for example 64 MBytes). The load of a each chunkserver is proportional to the number of chunks hosted by the server.[22] In a load balanced cloud, the resources can be well used while maximizing the performance of MapReduce- based applications.

Load rebalancing[modifier | modifier le code]

In a cloud computing environment, failure is the norm, and chunkservers may be upgraded, replaced, and added in the system. In addition, files can also be dynamically created, deleted, and appended. An that lead to load imbalance in a distributed file system. It means that the file chunks are not distributed equitably between the nodes.

Distributed file systems in clouds such as GFS and HDFS, rely on central servers (master for GFS and NameNode for HDFS) to manage the metadata and the load balancing. The master rebalances replicas periodically: data must be moved form a DataNode/chumkserver to another one if its free space is below a certain threshold.[23] However, this centralized approach can provoke a bottleneck for those servers as they become unable to manage a large number of file accesses. Consequently, dealing with the load imbalance problem with the central nodes complicate more the situation as it increases their heavy loads. Note that the load rebalance problem is NP-hard.[24]

In order to manage large number of chunkservers to work in collaboration, and solve the problem of load balancing in distributed file systems, there are several approaches that have been proposed such as reallocating file chunks such that the chunks can be distributed to the system as uniformly as possible while reducing the movement cost as much as possible.[25]

Communication[modifier | modifier le code]

The high performance of distributed file systems require an efficient communication between computing nodes and a fast access to the storage system. Operations as open, close, read, write, send and receive should be fast to assure that performance. Note that for each read or write request, the remote disk is accessed and that may takes a long time due to the network latencies. Several works have been done in order to improve communications between cluster nodes at the application level and also the storage access to avoid that a remote node is Bottleneck while reading and storing data.

The data communication (send/receive) operation transfer the data from the application buffer to the kernel on the machine.TCP control the process of sending data and is implemented in the kernel. However, in case of network congestion or errors, TCP may not send the data directly. While transferring, data from a buffer in the kernel to the application, the machine does not read the byte stream from the remote machine. In fact, TCP is responsible for buffering the data for the application.[26]

Providing a high level of communication can be done by choosing the buffer-size of file reading and writing or file sending and receiving on application level. Explicitly, the buffer mechanism is developed using Circular Linked List.[27] It consists of a set of BufferNodes. Each BufferNode has a DataField. The DataField contains the data and a pointer called NextBufferNode that points to the next BufferNode. To find out the current position, two pointers are used: CurrentBufferNode and EndBufferNode , that represent the position in the BufferNode for the last written poistion and last read one. If the BufferNode has no free space, it will send a wait signal to the client to tell him to wait until there is available space.[28]

Security keys[modifier | modifier le code]

In cloud computing, the most important security concepts are confidentiality, availability and integrity. In fact, confidentiality becomes indispensable in order to keep private data from being disclosed and maintain privacy. In addition, integrity assures that data is not corrupted [Security and Privacy in Cloud Computing].

Confidentiality[modifier | modifier le code]

Confidentiality means that data and computation tasks are confidential: neither the cloud provider nor others clients could access to data. Many researches have been made about confidentiality because it's one of the crucial points that still represent challenges for cloud computing. The lack of trust toward the cloud providers is also a related issue [29]. So the infrastructure of the cloud must make assurance that all consumer's data will not be accessed by any an unauthorized persons. The risk of an unsecured environment is realized if the service provider can locate consumer's data in the cloud, has the privilege to access and retrieve consumer's data and can understand the meaning of data (types of data, functionalities and interfaces of the application and format of the data). [30] If these three conditions are satisfied simultaneously, then it became very dangerous.

The geographic location of data stores influences on the privacy and confidentiality. Furthermore, the location of clients should be taken into account. Indeed, clients in Europe won't be interested by using datacenters located in United States, because that affects the confidentiality of data as it will not be guaranteed. In order to figure out that problem, some Cloud computing vendors have included the geographic location of the hosting as a parameter of the service level agreement made with the customer [31] allowing users to chose by themselves the locations of the servers that will host their data.

An approach that may help to face the confidentiality matter is the data encryption [32] otherwise, there will be some serious risks of unauthorized uses. In the same context, other solutions exists such as encrypting only sensitive data. [33] and supporting only some operations, in order to simplify computation. [34]. Furthermore, Cryptographic techniques and tools as FHE, are also used to strengthen privacy preserving in cloud. [35]

Availability[modifier | modifier le code]

Availability is generally treated by replication. Meanwhile, consistency must to be guaranteed. However, consistency and availability cannot be achieved at the same time. This means that neither releasing consistency will allow the system to remain available nor making consistency a priority and letting the system sometimes unavailable. [36] In other hand, data has an identity which is a key that is produced by a one-way cryptographic hash function (e.g. MD5). Its location is the hash function of this key. The key space is partitioned into multiple partitions.[37] To maximize data availability data durability, the replicas are placed in different servers (geographically different) because the data availability increase with the geographical diversity. The process of replication consists of an evaluation of the data availability that must be above a certain minimum. Otherwise, data are replicated to another chunk server. Each partition i has an availability value represented by the following formula:

where s_i are the servers hosting the replicas, conf_i and conf_j are the confidence of servers i and j (relying on technical factors such as hardware components and non technical ones like the economical and political situation of a country) and the diversity is the geographical distance between s_i and s_j. [38]

integrity[modifier | modifier le code]

Integrity in cloud computing implies data integrity and meanwhile computing integrity. Integrity means data has to be stored correctly on cloud servers and in case of failures or incorrect computing, problems have to be detected.

Data integrity is easy to achieve thanks to cryptography (typically through Message authentication code, or MACs, on data blocks).[39]

There are different ways affecting data's integrity either from a malicious event or from administration errors (i.e backup and restore, data migration, changing memberships in P2P systems). [40]

It exists some checking mechanisms that check data integrity. For instance, HAIL (HAIL (High-Availability and Integrity Layer) a distributed cryptographic system that allows a set of servers to prove to a client that a stored file is intact and retrievable. [41]

Cloud-based Synchronization of Distributed File System[modifier | modifier le code]

More and more users have multiple devices with ad hoc connectivity. These devices need to be synchronized. In fact, an important point is to maintain user data by synchronizing replicated data sets between an arbitrary number of servers. This is useful for the backups and also for offline operation. Indeed, when the user network conditions are not good, then the user device will selectively replicate a part of data that will be modified later and off-line. Once the network conditions become good, it makes the synchronization. [42] Two approaches exists to tackle with the distributed synchronization issue: the user-controlled peer-to-peer synchronization and the cloud master-replica synchronization approach.[43]

  • user-controlled peer-to-peer: software such as rsync must be installed in all users computers that contain their data. The files are synchronized by a peer-to-peer synchronization in a way that users has to give all the network addresses of the devices and the synchronization parameters and thus made a manual process.
  • cloud master-replica synchronization: widely used by cloud services in which a master replica that contains all data to be synchronized is retained as a central copy in the cloud, and all the updates and synchronization operations are pushed to this central copy offering a high level of availability and reliability in case of failures.

Economic aspects[modifier | modifier le code]

The cloud computing is growing rapidly. The US government decided to spend 40% of annual growth rate CAGR and fixed 7 billion dollars by 2015. Huge number that should be take into consideration[44].

More and more companies have been utilizing the cloud computing to manage the massive amount of data and overcome the lack of storage capacities. Indeed, the companies are enabled to use resources as a service to assure their computing needs without having to invest on infrastructure, so they pay for what they are going to use (Pay-as-you-go model).[45]

Every application provider has to periodically pay the cost of each server where replicas of his data are stored. The cost of a server is generally estimated by the quality of the hardware, the storage capacities, and its query processing and communication overhead.[46]

Cloud computing facilitates the tasks for enterprises to scale their services under the client requests. The pay-as-you-go model has also facilitate the tasks for the startup companies that wish to benefit from compute-intensive business. Cloud computing also offers a huge opportunity to many third-world countries that don't have enough resources, and thus enabling IT services. Cloud computing can lower IT barriers to innovation . [47]

Although the wide utilization of cloud computing, an efficient sharing of large volumes of data in an untrusted cloud is still a challenging research topic.

References[modifier | modifier le code]

  1. [[#sun|]], p. 1
  2. Fabio Kon, p. 1
  3. K. Kobayashi, S. Mikami, H.Kimura, O.Tatebe, p. 1
  4. Angabini A, Yazdani N., Mundt T, Hassani F., p. 1
  5. M. Di Sano, A. Di Stefano, G. Morana, D.Zito 2012, p. 2
  6. Andrew S.Tanenbaum, Maarten van Steen, p. 492
  7. Andrew S.Tanenbaum, Maarten van Steen, p. 496
  8. Humbetov, p. 2
  9. Paul Krzyzanowski, p. 2
  10. M. Di Sano, A. Di Stefano, G. Morana, D.Zito, p. 1-2
  11. Paul Krzyzanowski, p. 2
  12. Paul Krzyzanowski, p. 4
  13. M. Di Sano, A. Di Stefano, G. Morana, D.Zito, p. 2
  14. Humbetov, p. 3
  15. Humbetov, p. 25
  16. Andrew S.Tanenbaum ,Maarten van Steen, p. 498
  17. Paul Krzyzanowski, p. 2
  18. Paul Krzyzanowski, p. 5
  19. Fan-Hsun, p. 2
  20. Azzedin, p. 2
  21. Abzetdin Adamov, p. 2
  22. Hung-Chang Hsiao, Haiying Shen, Hsueh-Yi Chung, Yu-Chang Chao, p. 2
  23. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, p. 8
  24. Hung-Chang Hsiao, Haiying Shen, Hsueh-Yi Chung, Yu-Chang Chao, p. 3
  25. Hung-Chang Hsiao, Haiying Shen, Hsueh-Yi Chung, Yu-Chang Chao
  26. B. Upadhyaya, Azimov, F., Doan T.T., Eunmi Choi, SangBum Kim, Pilsung Kim, p. 4
  27. B. Upadhyaya, Azimov, F., Doan T.T., Eunmi Choi, SangBum Kim, Pilsung Kim, p. 2
  28. B. Upadhyaya, Azimov, F., Doan T.T., Eunmi Choi, SangBum Kim, Pilsung Kim, p. 3
  29. Xiao Zhifeng, p. 3-4
  30. Stephen S. Yau, Ho G. An, p. 353
  31. Christian Vecchiola, Suraj Pandey,Rajkumar Buyya, p. 14
  32. Stephen S. Yau, Ho G. An, p. 2
  33. Miranda Mowbray ,Siani Pearson
  34. Naehrig Michael, Lauter Kristin
  35. Zhifeng Xiao and Yang Xiao, p. 854
  36. Vogels Werner, p. 2
  37. Nicolas Bonvin, Thanasis G. Papaioannou and Karl Aberer, p. 206
  38. Nicolas Bonvin, Thanasis G. Papaioannou and Karl Aberer, p. 208
  39. Ari Juels, p. 4
  40. Zhifeng Xia, p. 5
  41. Kevin D. Bowers , Ari Juels ,Alina Oprea
  42. Sandesh Uppoor, Michail D. Flouris, and Angelos Bilas, p. 1
  43. Sandesh Uppoor, Michail D. Flouris, and Angelos Bilas, p. 1
  44. John Harauz, Lori M. Kaufman, Bruce Potter, p. 2
  45. Alireza Angabini, Nasser Yazdani, Thomas Mundt, Fatemeh Hassani, p. 1
  46. Nicolas Bonvin, Thanasis G. Papaioannou, Karl Aberer, p. 3
  47. Sean Marston, Zhi Li, Subhajyoti Bandyopadhyay, Juheng Zhang, Anand Ghalsas, p. 3

Bibliography[modifier | modifier le code]

  1. Architecture & Structure & design:
    • (en) Zhang Qi-fei, Pan Xue-zeng, Shen Yan et Li Wen-juan, « A Novel Scalable Architecture of Cloud Storage System for Small Files Based on P2P », Cluster Computing Workshops (CLUSTER WORKSHOPS), 2012 IEEE International Conference on[le lien externe a été retiré],‎ (DOI 10.1109/ClusterW.2012.27, lire en ligne)
      Coll. of Comput. Sci. & Technol., Zhejiang Univ., Hangzhou, China
    • (en) Farag Azzedin, « Towards A Scalable HDFS Architecture », Collaboration Technologies and Systems (CTS), 2013 International Conference on[le lien externe a été retiré],‎ , p. 155-161 (DOI 10.1109/CTS.2013.6567222, lire en ligne)
      Information and Computer Science Department King Fahd University of Petroleum and Minerals
    • (en) Paul Krzyzanowski, « Distributed File Systems »,
    • (en) K. Kobayashi, S. Mikami, H. Kimura et O. Tatebe, « The Gfarm File System on Compute Clouds », Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on[le lien externe a été retiré],‎ (DOI 10.1109/IPDPS.2011.255, lire en ligne)
      Grad. Sch. of Syst. & Inf. Eng., Univ. of Tsukuba, Tsukuba, Japan
    • (en) Shamil Humbetov, « Data-Intensive Computing with Map-Reduce and Hadoop », Application of Information and Communication Technologies (AICT), 2012 6th International Conference on[le lien externe a été retiré],‎ , p. 1-5 (DOI 10.1109/ICAICT.2012.6398489, lire en ligne)
      Department of Computer Engineering Qafqaz University Baku, Azerbaijan
    • (en) Hung-Chang Hsiao, Hsueh-Yi Chung, Haiying Shen et Yu-Chang Chao, « Load Rebalancing for Distributed File Systems in Clouds », Parallel and Distributed Systems, IEEE Transactions on (Volume:24 , Issue: 5 )[le lien externe a été retiré],‎ , p. 951 - 962 (DOI 10.1109/TPDS.2012.196, lire en ligne)
      National Cheng Kung University, Tainan
    • (en) Kai Fan, Dayang Zhang, Hui Li et Yintang Yang, « An Adaptive Feedback Load Balancing Algorithm in HDFS », Intelligent Networking and Collaborative Systems (INCoS), 2013 5th International Conference on[le lien externe a été retiré],‎ , p. 23-29 (DOI 10.1109/INCoS.2013.14, lire en ligne)
      State Key Lab. of Integrated Service Networks, Xidian Univ., Xi'an, China
    • (en) B. Upadhyaya, F. Azimov, T.T. Doan, Choi Eunmi, Kim SangBum et Kim Pilsung, « Distributed File System: Efficiency Experiments for Data Access and Communication », Networked Computing and Advanced Information Management, 2008. NCM '08. Fourth International Conference on (Volume:2 )[le lien externe a été retiré],‎ , p. 400-405 (DOI 10.1109/NCM.2008.164, lire en ligne)
      Sch. of Bus. IT, Kookmin Univ., Seoul
    • (en) Abzetdin Adamov, « Distributed File System as a basis of Data-Intensive Computing », Application of Information and Communication Technologies (AICT), 2012 6th International Conference on[le lien externe a été retiré],‎ , p. 1-3 (DOI 10.1109/ICAICT.2012.6398484, lire en ligne)
      Comput. Eng. Dept., Qafqaz Univ., Baku, Azerbaijan
    • (en) S.A. Brandt, E.L. Miller, D.D.E. Long et Lan Xue, « Efficient metadata management in large distributed storage systems », Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings. 20th IEEE/11th NASA Goddard Conference on[le lien externe a été retiré],‎ , p. 290 - 298 (DOI 10.1109/MASS.2003.1194865, lire en ligne)
      Storage Syst. Res. Center, California Univ., Santa Cruz, CA, USA
    • (en) Garth A. Gibson et Rodney MVan Meter, « Network attached storage architecture », COMMUNICATIONS OF THE ACM, vol. 43, no 11,‎ (lire en ligne)
    • (en) Cho Cho Khaing et Thinn Thu Naing, « The efficient data storage management system on cluster-based private cloud data center », Cloud Computing and Intelligence Systems (CCIS), 2011 IEEE International Conference on[le lien externe a été retiré],‎ , p. 235 - 239 (DOI 10.1109/CCIS.2011.6045066, lire en ligne)
    • (en) S.A. Brandt, E.L. Miller, D.D.E. Long et Lan Xue, « A carrier-grade service-oriented file storage architecture for cloud computing », Web Society (SWS), 2011 3rd Symposium on[le lien externe a été retiré],‎ , p. 16 - 20 (DOI 10.1109/SWS.2011.6101263, lire en ligne)
      PCN&CAD Center, Beijing Univ. of Posts & Telecommun., Beijing, China
    • (en) Sanjay Ghemawat, Howard Gobioff et Shun-Tak Leung, « The Google File System », SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles[le lien externe a été retiré],‎ , p. 29-43 (DOI 10.1145/945445.945450, lire en ligne)
  2. Security Concept
    • (en) C Vecchiola, S. Pandey et R. Buyya, « High-Performance Cloud Computing: A View of Scientific Applications », Pervasive Systems, Algorithms, and Networks (ISPAN), 2009 10th International Symposium on[le lien externe a été retiré],‎ , p. 4-16 (DOI 10.1109/I-SPAN.2009.150, lire en ligne)
      Dept. of Comput. Sci. & Software Eng., Univ. of Melbourne, Melbourne, VIC, Australia
    • (en) Du Hongtao et Li Zhanhuai, « Efficient metadata management in large distributed storage systems », Measurement, Information and Control (MIC), 2012 International Conference on[le lien externe a été retiré], vol. 1,‎ , p. 327 - 331 (DOI 10.1109/MIC.2012.6273264, lire en ligne)
      Comput. Coll., Northwestern Polytech. Univ., XiAn, China
    • (en) A.Brandt Scott, L.Miller Ethan, D.E.Long Darrell et Xue Lan, « Efficient Metadata Management in Large Distributed Storage Systems », 11th NASA Goddard Conference on Mass Storage Systems and Technologies,SanDiego,CA,‎ (lire en ligne)
      Storage Systems Research Center University of California,Santa Cruz
    • (en) Lori M. Kaufman, « Data Security in the World of Cloud Computing », Security & Privacy, IEEE (Volume:7 , Issue: 4 )[le lien externe a été retiré],‎ , p. 161 - 64 (DOI 10.1109/MSP.2009.87, lire en ligne)
    • (en) Kevin D. Bowers, Ari Juels et Alina Oprea, « HAIL: a high-availability and integrity layer for cloud storageComputing », Proceedings of the 16th ACM conference on Computer and communications security[le lien externe a été retiré],‎ , p. 187-198 (DOI 10.1145/1653662.1653686, lire en ligne)
    • (en) Ari Juels et Alina Oprea, « New approaches to security and availability for cloud data », Magazine Communications of the ACM CACM Homepage archive Volume 56 Issue 2, February 2013[le lien externe a été retiré],‎ , p. 64-73 (DOI 10.1145/2408776.2408793, lire en ligne)
    • (en) Zhang Jing, Wu Gongqing, Hu Xuegang et Wu Xindong, « A Distributed Cache for Hadoop Distributed File System in Real-Time Cloud Services », Grid Computing (GRID), 2012 ACM/IEEE 13th International Conference on[le lien externe a été retiré],‎ , p. 12 - 21 (DOI 10.1109/Grid.2012.17, lire en ligne)
      Dept. of Comput. Sci., Hefei Univ. of Technol., Hefei, China
    • (en) A. Pan, J.P. Walters, V.S. Pai, D.-I.D. Kang et S.P. Crago, « Integrating High Performance File Systems in a Cloud Computing Environment », High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:[le lien externe a été retiré],‎ , p. 753 - 759 (DOI 10.1109/SC.Companion.2012.103, lire en ligne)
      Dept. of Electr. & Comput. Eng., Purdue Univ., West Lafayette, IN, USA
    • (en) Tseng Fan-Hsun, Chen Chi-Yuan, Chou Li-Der et Chao Han-Chieh, « Implement a reliable and secure cloud distributed file system », Intelligent Signal Processing and Communications Systems (ISPACS), 2012 International Symposium on[le lien externe a été retiré],‎ , p. 227 - 232 (DOI 10.1109/ISPACS.2012.6473485, lire en ligne)
      Dept. of Comput. Sci. & Inf. Eng., Nat. Central Univ., Taoyuan, Taiwan
    • (en) M Di Sano, A. Di Stefano, G. Morana et D. Zito, « File System As-a-Service: Providing Transient and Consistent Views of Files to Cooperating Applications in Clouds », Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE), 2012 IEEE 21st International Workshop on[le lien externe a été retiré],‎ , p. 173 - 178 (DOI 10.1109/WETICE.2012.104, lire en ligne)
      Dept. of Electr., Electron. & Comput. Eng., Univ. of Catania, Catania, Italy
    • (en) Sheng Zhonghua, Ma Zhiqiang, Gu Lin et Li Ang, « A privacy-protecting file system on public cloud storage », Cloud and Service Computing (CSC), 2011 International Conference on[le lien externe a été retiré],‎ , p. 141 - 149 (DOI 10.1109/CSC.2011.6138512, lire en ligne)
      Dept. of Comput. Sci. & Eng., Hong Kong Univ. of Sci. & Technol., Hong Kong, China
    • (en) Zhifeng Xiao, « Security and Privacy in Cloud Computing », Communications Surveys & Tutorials, IEEE (Volume:15 , Issue: 2 )[le lien externe a été retiré],‎ , p. 843 - 859 (DOI 10.1109/SURV.2012.060912.00182, lire en ligne)
    • (en) John B Horrigan, « Use of cloud computing applications and services »,
    • (en) Stephen S. Yau et Ho G. An, « Confidentiality Protection in cloud computing systems », Int J Software Informatics, Vol.4, No.4,,‎ , p. 351 - 365 (lire en ligne)
    • (en) T. Plantard, W. Susilo et Z. Zhang, « Fully Homomorphic Encryption Using Hidden Ideal Lattice », Information Forensics and Security, IEEE Transactions on (Volume:8 , Issue: 12 )[le lien externe a été retiré],‎ , p. 2127 - 2137 (DOI 10.1109/TIFS.2013.2287732, lire en ligne)
    • (en) Naehrig Michael et Lauter Kristin, « Can homomorphic encryption be practical? », CCSW '11 Proceedings of the 3rd ACM workshop on Cloud computing security workshop[le lien externe a été retiré],‎ , p. 113-124 (DOI 10.1145/2046660.2046682, lire en ligne)
    • (en) Mowbray Miranda et Pearson Siani, « A client-based privacy manager for cloud computing », COMSWARE '09 Proceedings of the Fourth International ICST Conference on COMmunication System softWAre and middlewaRE[le lien externe a été retiré],‎ (DOI 10.1145/1621890.1621897, lire en ligne)
    • (en) Vogels Werner, « Eventually consistent », Communications of the ACM - Rural engineering development CACM Volume 52 Issue 1[le lien externe a été retiré],‎ , p. 40 - 44 (DOI 10.1145/1435417.1435432, lire en ligne)
    • (en) Nicolas Bonvin, Thanasis G. Papaioannou et Karl Aberer, « A self-organized, fault-tolerant and scalable replication scheme for cloud storage », SoCC '10 Proceedings of the 1st ACM symposium on Cloud computing[le lien externe a été retiré],‎ , p. 205-216 (DOI 10.1145/1807128.1807162, lire en ligne)
    • (en) Tim Kraska, Martin Hentschel, Gustavo Alonso et Donald Kossma, « Consistency rationing in the cloud: pay only when it matters », Proceedings of the VLDB Endowment VLDB Endowment Hompage archive Volume 2 Issue 1,[le lien externe a été retiré],‎ , p. 253-264 (lire en ligne)
    • (en) Daniel J. Abadi, « Data Management in the Cloud: Limitations and Opportunities », IEEE[le lien externe a été retiré],‎ (lire en ligne)
    • (en) Ari Juels et Alina Oprea, « New Approaches to Security and Availability for Cloud Data », Communications of the ACM CACM Volume 56 Issue 2[le lien externe a été retiré],‎ , p. 64-73 (DOI 10.1145/2408776.2408793, lire en ligne)
    • (en) Ari Juels, S. Burton et Jr Kaliski, « Pors: proofs of retrievability for large files », Communications of the ACM CACM Volume 56 Issue 2[le lien externe a été retiré],‎ , p. 584-597 (DOI 10.1145/1315245.1315317, lire en ligne)
    • (en) Ari Ateniese, Randal Burns, Johns Reza, Curtmola Joseph, Herring Burton, Lea Kissner, Zachary Peterson et Dawn Song, « PDP: Provable data possession at untrusted stores », CCS '07 Proceedings of the 14th ACM conference on Computer and communications security[le lien externe a été retiré],‎ , p. 598-609 (DOI 10.1145/1315245.1315318, lire en ligne)
    • (en) Giuseppe Ateniese, Roberto Di Pietro, Luigi V. Mancini et Gene Tsudik, « SPDP: Scalable and efficient provable data possession », Proceedings of the 4th international conference on Security and privacy in communication netowrks Article No. 9[le lien externe a été retiré],‎ (DOI 10.1145/1460877.1460889, lire en ligne)
    • (en) Chris Erway, Alptekin Küpçü, Charalampos Papamanthou et Roberto Tamassia, « Dynamic provable data possession », CCS '09 Proceedings of the 16th ACM conference on Computer and communications security[le lien externe a été retiré],‎ , p. 213-222 (DOI 10.1145/1653662.1653688, lire en ligne)
  3. synchronization
    • (en) S. Uppoor, M.D. Flouris et A. Bilas, « Cloud-based Synchronization of Distributed File System Hierarchies », Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS), 2010 IEEE International Conference on[le lien externe a été retiré],‎ , p. 1-4 (DOI 10.1109/CLUSTERWKSP.2010.5613087, lire en ligne)
  4. Economic aspects
    • (en) Lori M. Kaufman, « Data Security in the World of Cloud Computing », Security & Privacy, IEEE (Volume:7 , Issue: 4 )[le lien externe a été retiré],‎ , p. 161 - 64 (DOI 10.1109/MSP.2009.87, lire en ligne)
    • (en) Zhi Lia, Subhajyoti Bandyopadhyaya, Juheng Zhanga et Anand Ghalsasib, « Cloud computing — The business perspective », Decision Support Systems Volume 51, Issue 1,[le lien externe a été retiré],‎ , p. 176–189 (DOI http://dx.doi.org/10.1016/j.dss.2010.12.006, lire en ligne)
    • (en) A. Angabini, N. Yazdani, T. Mundt et F. Hassani, « Suitability of Cloud Computing for Scientific Data Analyzing Applications; An Empirical Study », P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC), 2011 International Conference on[le lien externe a été retiré],‎ , p. 193 - 199 (DOI 10.1109/3PGCIC.2011.37, lire en ligne)
      Sch. of Electr. & Comput. Eng., Univ. of Tehran, Tehran, Iran