Skip to main content

Buckets Overview

What is a Bucket?

A Bucket is the second level in the Ozone data hierarchy, residing within a volume. Buckets are analogous to directories or folders in a traditional file system. They serve as containers for keys (data objects).

Key Characteristics:

  • Contained within Volumes: Every bucket must belong to a volume.
  • Container for Keys: A bucket can contain any number of keys.
  • No Nested Buckets: Unlike directories, buckets cannot contain other buckets.
Volume/Bucket Naming Convention

To maintain S3 compatibility, Ozone volume and bucket name follows S3 naming convention.

This means volume/bucket names in Ozone:

Allowed Characters and Length:

  • Allowed characters: Lowercase letters (a-z), numbers (0-9), dots (.), and hyphens (-)
  • Length: Must be between 3 and 63 characters long
  • Start and End: Must begin and end with a letter or a number

Prohibitions:

  • Cannot contain uppercase letters or underscores (_)
  • Cannot be formatted as an IP address (e.g., 192.168.5.4)
  • Cannot have consecutive periods (e.g., my..bucket) or have dashes adjacent to periods (e.g., my-.bucket)
  • Cannot end with a dash

This can cause trouble when migrating HDFS workloads to Ozone, since HDFS path names are POSIX-compliant.

To relax the compliance check, configure the property ozone.om.namespace.s3.strict to false in the ozone-site.xml of Ozone Manager.

Details

Creation and Management

Buckets are created within a specified volume.

ozone sh bucket create /myvolume/mybucket

For more details on bucket operations, refer to the Ozone CLI documentation.

Bucket Layouts (Object Store vs. File System Optimized)

Ozone supports different bucket layouts, primarily:

  • Object Store (OBS): The traditional object storage layout, where keys are stored with their full path names. This is suitable for S3-like access patterns. For more details, refer to the Object Store documentation.
  • File System Optimized (FSO): An optimized layout for Hadoop Compatible File System (HCFS) semantics, where intermediate directories are stored separately, improving performance for file system operations like listing and renaming. For more details, refer to the Prefix FSO documentation.

Erasure Coding

Erasure Coding (EC) can be enabled at the bucket level to define data redundancy strategies. This allows for more efficient storage compared to replication, especially for large datasets. For more information, see the Erasure Coding documentation.

Snapshots

Ozone's snapshot feature allows users to take point-in-time consistent images of a given bucket. These snapshots are immutable and can be used for backup, recovery, archival, and incremental replication purposes. For more details, refer to the Ozone Snapshot documentation.

GDPR Compliance

Ozone provides features to support GDPR compliance, particularly the "right to be forgotten." When a GDPR-compliant bucket is created, encryption keys for deleted data are immediately removed, making the data unreadable even if the underlying blocks haven't been physically purged yet. For more details, refer to the GDPR documentation.

Bucket Linking

Bucket linking allows exposing a bucket from one volume (or even another bucket) as if it were in a different location, particularly useful for S3 compatibility or cross-tenant access. This creates a symbolic link-like behavior. For more information, see the Bucket Links documentation.

Access Control Lists (ACLs)

ACLs define permissions for buckets, controlling who can list keys, read/write data, or delete the bucket. For more details, refer to the Security ACLs documentation.