Skip to main content
Version: Next

Ozone Manager RocksDB Schema

This page describes the RocksDB layout used by the Ozone Manager (OM). The OM persists namespace and related metadata in RocksDB: volumes, buckets, keys, S3-style multipart uploads, snapshots, tenancy, and supporting system state.

The exact column families and key encodings can evolve between Ozone releases; treat this as a guide to how data is organized, not a stable API.

Database overview

PropertyValue
Database directoryConfigured by ozone.om.db.dirs (falls back to ozone.metadata.dirs if unset)
On-disk nameom.db
EngineRocksDB with multiple column families

Column families

Column families are grouped by role below. Key format uses placeholders such as volume, bucket, and key for path-style keys, or volId, buckId, parentId for ID-based (FSO) keys. Most namespace keys use a leading /, per OzoneConsts.OM_KEY_PREFIX.

1. Hierarchy and ownership

Stores the top levels of the namespace: users, volumes, and buckets.

Table nameKey formatValue typeDescription
userTableuserNameUserVolumeInfoUser to owned volumes
volumeTable/{volume}OmVolumeArgsVolume metadata (owner, quota, ACLs)
bucketTable/{volume}/{bucket}OmBucketInfoBucket metadata (layout, quota, ACLs)

2. Object store (OBS) layout

Used for buckets with LEGACY or OBJECT_STORE layout. Keys are stored under their full path names.

Table nameKey formatValue typeDescription
keyTable/{volume}/{bucket}/{key}OmKeyInfoCommitted keys (includes block locations)
openKeyTable/{volume}/{bucket}/{key}/{clientId}OmKeyInfoIn-progress writes (uncommitted)
deletedTable/{volume}/{bucket}/{key}/{objectID}RepeatedOmKeyInfoKeys pending deletion / GC

3. File system optimized (FSO) layout

Used for buckets with FILE_SYSTEM_OPTIMIZED layout. Keys use volume ID, bucket ID, and parent object ID so directory operations (for example ls, rename) can avoid scanning full string paths. See also File System Optimized buckets.

Table nameKey formatValue typeDescription
directoryTable/{volId}/{buckId}/{parentId}/{dirName}OmDirectoryInfoDirectories
fileTable/{volId}/{buckId}/{parentId}/{fileName}OmKeyInfoCommitted files
openFileTable/{volId}/{buckId}/{parentId}/{fileName}/{clientId}OmKeyInfoFiles currently being written
deletedDirectoryTable/{volId}/{buckId}/{parentId}/{dirName}/{objId}OmKeyInfoDirectories marked for deletion

4. Multipart upload

Metadata for S3-style multipart uploads.

Table nameKey formatValue typeDescription
multipartInfoTable/{volume}/{bucket}/{key}/{uploadId}OmMultipartKeyInfoOverall upload session
multipartPartsTable{uploadId}/{partNumber}OmMultipartPartInfoIndividual parts (partNumber is not a full path key)

5. Snapshots

Snapshot metadata and bookkeeping for snapshot-related garbage collection.

Table nameKey formatValue typeDescription
snapshotInfoTable/{volume}/{bucket}/{snapshotName}SnapshotInfoOne snapshot’s metadata
snapshotRenamedTable/{volName}/{buckName}/{objectId}StringTracks renames across snapshots for correct GC
compactionLogTable{dbTrxId}-{compactionTime}CompactionLogEntryCompaction history used by snapshot services

6. Multi-tenant and security

Tenants, access mappings, S3 secrets, and delegation tokens.

Table nameKey formatValue typeDescription
tenantStateTabletenantIdOmDBTenantStateTenant configuration and state
tenantAccessIdTableaccessIdOmDBAccessIdInfoAccess ID to secret and tenant
principalToAccessIdsTableuserPrincipalOmDBUserPrincipalInfoKerberos principal to access IDs
s3SecretTableaccessKeyIdS3SecretValueS3 secrets
dTokenTableOzoneTokenIDLongDelegation tokens and renewal times

7. Administrative and system

Prefix ACLs, Ratis apply point, and miscellaneous metadata.

Table nameKey formatValue typeDescription
prefixTableprefixOmPrefixInfoPrefix-level ACLs and metadata
transactionInfoTable#TRANSACTIONINFO (literal key)TransactionInfoLast applied Ratis transaction index and term
metaTablemetaDataKeyStringMisc. system metadata (for example DB layout version)

Key concepts

  • OBS vs. FSO: OBS encodes the namespace with path strings under volume and bucket names. FSO encodes parents with numeric object IDs (parentId), which makes hierarchical FS operations cheaper at scale.
  • Object ID: A 64-bit identifier for volumes, buckets, keys, and directories. In FSO tables, parentId refers to the parent object’s ID.
  • OM epoch: The high bits of object IDs can encode an epoch so IDs stay unique across OM restarts or metadata migrations.
  • Key prefix: Most hierarchy and object keys start with / as defined by OzoneConsts.OM_KEY_PREFIX. Some tables (multipart parts, compaction log) use other key shapes as noted above.