Storing data involves many facets that need to be considered, ranging from cost, performance, and accessibility to the specific requirements of the applications that interact with the data.

Object Storage

Object Storage is one of the three most widely used cloud storage architectures. Objects consist of the data itself, a variable number of metadata attributes, and a globally unique identifier (GUID). Object storage is primarily used for unstructured data, such as Spotify songs or images on Facebook. Services like Amazon S3, Microsoft Azure Blob Storage, OpenStack Swift, and Google Cloud Storage utilize this architecture. Data is accessed via APIs over a network.

Use Cases

Block Storage

Block Storage provides a fixed-size storage volume to an instance (e.g., a virtual machine). The storage typically uses standard file systems such as NTFS, FAT32, or EXT4. Examples of block storage services include AWS EBS, Rackspace Block Storage, and Azure Disk Storage. Access to block storage is only possible when it is attached to a VM and its operating system.

Use Cases

Block storage is used for databases requiring consistent I/O and low-latency connections. It can also be employed for RAID systems, linking multiple block storage devices together to perform operations like striping and mirroring. Additionally, block storage is well-suited for server-side data processing.

Distributed File Systems

Distributed file systems conceptually differ significantly from block and object storage. They need to provide functionality similar to local file systems but operate across multiple systems.

Example: BeeGFS

BeeGFS is a distributed file system developed by a Fraunhofer Institute. It offers a POSIX-like interface, allowing applications to interact with it similarly to local file systems. This enables applications running on physically separate machines to share the same file system. Unlike local file systems, distributed file systems must address challenges such as file synchronization and path resolution.

BeeGFS consists of four components:

The metadata and storage architecture follow a scale-out design, ensuring that adding more instances increases performance.

BeeGFS Architecture

Architecture

Image Source: An Introduction to BeeGFS by Frank Herold, Sven Breuner, June 2018

Other Distributed File Systems

Hadoop Distributed File System (HDFS)

HDFS is optimized for large files with a "write-once, read-many" access pattern. It uses extremely large block sizes (128 MB compared to 4 KB in local file systems). HDFS is not designed for POSIX compatibility, which limits features like modifying existing files.

Ceph FS

Ceph FS is versatile and supports three interfaces:

Data is stored collectively in the Ceph Storage Cluster. Ceph FS is a general-purpose distributed file system that can handle a variety of applications deployed on a virtualized cluster. It is layered on top of a distributed object store.

Sources