Hadoop Perfect File: A fast access container for small files with direct in disc metadata access

03/14/2019
by   Jude Tchaye-Kondi, et al.
0

Storing and processing massive small files is one of the major challenges for the Hadoop Distributed File System (HDFS). In order to provide fast data access, the NameNode (NN) in HDFS maintains the metadata of all files in its main-memory. Hadoop performs well with a small number of large files that require relatively little metadata in the NN s memory. But for a large number of small files, Hadoop has problems such as NN memory overload caused by the huge metadata size of these small files. We present a new type of archive file, Hadoop Perfect File (HPF), to solve HDFS s small files problem by merging small files into a large file on HDFS. Existing archive files offer limited functionality and have poor performance when accessing a file in the merged file due to the fact that during metadata lookup it is necessary to read and process the entire index file(s). In contrast, HPF file can directly access the metadata of a particular file from its index file without having to process it entirely. The HPF index system uses two hash functions: file s metadata are distributed through index files by using a dynamic hash function and, for each index file, we build an order preserving perfect hash function that preserves the position of each file s metadata in the index file. The HPF design will only read the part of the index file that contains the metadata of the searched file during its access. HPF file also supports the file appending functionality after its creation. Our experiments show that HPF can be more than 40 file s access from the original HDFS. If we don t consider the caching effect, HPF s file access is around 179 file. If we consider caching effect, HPF is around 35 105

READ FULL TEXT

page 1

page 3

research
11/08/2019

CFS: A Distributed File System for Large Scale Container Platforms

We propose CFS, a distributed file system for large scale container plat...
research
04/18/2021

FOX: Hardware-Assisted File Auditing for Direct Access NVM-Hosted Filesystems

With emerging non-volatile memories entering the mainstream market, seve...
research
10/26/2021

BuffetFS: Serve Yourself Permission Checks without Remote Procedure Calls

The remote procedure call (a.k.a. RPC) latency becomes increasingly sign...
research
04/23/2018

Forensic Analysis of the exFAT artefacts

Although keeping some basic concepts inherited from FAT32, the exFAT fil...
research
08/29/2021

Making Honey Files Sweeter: SentryFS – A Service-Oriented Smart Ransomware Solution

The spread of ransomware continues to cause devastation and is a major c...
research
04/24/2019

Reconstruct the Directories for In-Memory File Systems

Existing path lookup routines in file systems need to construct an auxil...
research
01/09/2022

Camera-Model Identification Using Encoding and Container Characteristics of Video Files

We introduce a new method for camera-model identification. Our approach ...

Please sign up or login with your details

Forgot password? Click here to reset