AN EFFECTIVE FRAMEWORK FOR MANAGING REPLICA SYSTEM IN DISTRIBUTED FILE SHARING ENVIRONMENTS
Tesmy K Jose, Dr.V.Ulagamuthalvi
Abstract-An Enhanced file system called Probabilistic File Share System is used to resolve all the distributed file update issues. There are three mechanisms designed in the probabilistic file share system such as Lazy Adaptive Synchronization Approach, Standard Replica System Replay Approach and Probabilistic method. The adaptive replica synchronization and Standard Replica System Replay approaches are implemented among the Storage Servers (SSs) and it makes the Meta Data Server (MDS) free from replica synchronization. Furthermore, a probabilistic control system is deployed into the proposed work in order to managing replicas replacement, overloading and their failures where the system can be measure the possibilities of every replicas replacement, overloading and failures level according their communication overhead and physical information. If the communication overhead or physical failure probability is high then the replica system replaced from replicas environment as well as sends the notification message to its neighbor replicas with the failure system details.
Keywords- Metadata Server, Lazy Adaptive Synchronization, Standard Replica System Replay, Probabilistic Control System.
1. Introduction
As the volume of digital data grows, reliable, low-cost storage systems that do not compromise on access performance are increasingly important. A number of storage systems (e.g., libraries, tape and optical jukeboxes) provide high reliability coupled with low I/O throughput. However, as throughput requirements grow, using high-end components leads to increasingly costly systems. In general, the client contacts the metadata server (MDS), which handles all the properties of the whole file system, to get the authorization to work on the file and the information of the file’s layout. Then, the client accesses the corresponding storage servers (SSs), which handle the file data management on storage machines, to execute the actual file I/O operations after parsing the layout information obtained from the MDS. A number of existing distributed storage systems (e.g., cluster-based and peer-to-peer storage systems) attempt to offer cost-effective, reliable data stores on top of unreliable, commodity or even donated storage components. To tolerate failures of individual nodes, these systems use data redundancy through replication or erasure coding. This approach faces two problems. First, regardless of the redundancy level used, there is always a non-zero probability of a burst of correlated permanent failures up to the redundancy level used; hence the possibility of permanently losing data always exists. Second, data loss probability increases with the data volume stored when all other characteristics of the system are kept constant
Get Help With Your Essay
If you need assistance with writing your essay, our professional essay writing service is here to help!
One of disadvantage of clusters is that programs must be grouped to run on multiple equipments, and it is difficult for these grouped programs to cooperate or distribute resources. Perhaps the most significant such resource is the file system. In the absence of a cluster file system, individual components of a grouped program must share cluster storage in an unplanned manner. This typically complicates programming, restricts performance, and compromises reliability. Also, the Meta Data Server is responsible for handling all the information about chunk replicas and generating replica synchronization when one of the storage servers has been updated. However, saving the recently written data to the disk becomes a blockage to the whole file system because all other threads need to remain until the synchronous flush-and sync procedure started by one of the SSs is completed.
A Probabilistic File Share System is proposed to resolve the abovementioned issues. It is used to support lazy and adaptive replica synchronization with replica replacement management among the SSs and make the MDS free from replica synchronization and failure maintenance.
2. Literature Survey
Different types of distributed file system supports chunk replication for reliability and produce high data bandwidth as same as similar replica synchronization mechanisms. A class of file system extends the traditional file server architecture to a storage area network (SAN) environment which allows the file server to access data directly from the disk through (SAN). Examples of SAN file system are IBM/Tivoli SANergy and Veritas SAN Point Direct [8,9].
GPFS allows chunk replication by partitioning space for multiple copies of each data chunk on the different Storage Servers and updates to all locations synchronously. Before the completion of write operation, GPFS used to follow the updates of chunk replicas which files had updated on the primary SSs and then updates other replicas[7].Ceph also had similar replica synchronization policy, i.e., the newly written data should be applied to all replicas stored on the different Storage Servers[5].
In the Hadoop file system, the replicated chunks are stored on the Storage Servers. Storage Server’s list will contains copies of any stripe produced and managed by Metadata Server. So, the Metadata Server handles the replicas synchronization and if new data written on any of the replicas,it will be triggered [4]. In GFS, the Metadata Server computes the location and data layout among the various chunk servers. Every chunk is replicated on multiple chunk servers and the replica synchronization is done by Metadata server (MDS) [6]. In Lustre file system, which is the parallel file system has a same chunk called replication mechanism [10].
The researchers are successively presented MinCopysets and Copysets replication techniques to enhance data durability (i.e., data loss) during retain the benefits of randomized load balancing by using derandomized replicas placement policy. However, researchers didn’t enclose the algorithm of replica synchronization and replica replacement [3,2].
3. Proposed System
3.1 Probabilistic File Shared System Architecture
The probabilistic file share system copy and give out the locations of all replicas belonging to the same file chunk to the Storage Servers (SSs) where the replicas are stored. Fig. 1 shows the architecture of probabilistic file share system. The probabilistic control system is organized to calculate the failure rate of every replica in the probabilistic file share system environment. To calculate the failure rate of replicas, our system examine each replicas for communication overhead and also obtains the CPU and memory utilization. By this our proposed system maintains better data consistency in the distributed file shared environment.
Fig. 1 Probabilistic File Share System Architecture
3.2 Data Updating
Fig. 2: Adaptive Synchronization Approach
In the case of processing a write request, the probabilistic file share system use the mechanism of lazy replica synchronization. This probabilistic system firstly completes the write operation and each update process in probabilistic file share system storage is replicated using adaptive replica synchronization. Here adaptive replica synchronization approach is used to copy the each modification in a storage management of distributed file system where primary replica updates the result into replica n and passes the acknowledgement into primary replica.
3.3 System Crash Handling
The probabilistic file share system adopts a deferred replica synchronization mechanism for reconstructing the lost file updates. i.e., it allows only the primary Storage Server to manage the latest data snapshot for reducing write latency and the synchronous process of replica to other SSs will be conducted along the timeline. The Meta Data Server buffers ascertain the latest write requests in the memory; when the number of cached requests is larger than a predefined threshold, the MDS is supposed to direct SSs to perform regular replica synchronization, so that the cached requests can be removed from the memory.
3.4 SS’s Failure and Replacement
The proposed file sharing system arranged in a probabilistic control system that examine the system details and every replica communication. The probabilistic control system keeps a replacement list to store the system details such as CPU utilization, Memory Utilization and etc. By using the abovementioned information, the probabilistic control system measures the failure rate for each replica. If the communication overhead or physical failure probability is high then the replica system replaced from replicas environment as well as sends the notification message to its neighbor replicas with the failure system details.
Figure 3: Illustrated of Probabilistic Control System
The Figure 3 shows the illustrated replica replacement management process. The following function is used to measure the failure rate of replica:
5. Conclusion
This research work proposed a new probabilistic file share system. The modified lazy adaptive synchronization approach successfully updates the data in the Storage Servers. This approach will take less I/O execution time, computation and storage compared to other approaches. The standard replica system replay approach can well handle the crashes of Storage Servers and can improve the lost data. At last, a probabilistic control system is positioned in the new probabilistic file share system. The replica failure calculation and their replacement management are extremely directed by the probabilistic control system.
Cite This Work
To export a reference to this article please select a referencing style below: