Skip to main content

ofs: Hadoop Compatible Interface

The Hadoop compatible file system interface allows storage backends like Ozone to be easily integrated into Hadoop eco-system. Ozone file system is a Hadoop compatible file system.

warning

Currently, Ozone supports two schemes: o3fs:// and ofs://.

The biggest difference between o3fs and ofs is that o3fs supports operations only at a single bucket, while ofs supports operations across all volumes and buckets and provides a full view of all the volume/buckets.

The Basics

Examples of valid ofs paths:

ofs://om1/
ofs://om3:9862/
ofs://omservice/
ofs://omservice/volume1/
ofs://omservice/volume1/bucket1/
ofs://omservice/volume1/bucket1/dir1
ofs://omservice/volume1/bucket1/dir1/key1

ofs://omservice/tmp/
ofs://omservice/tmp/key1

Volumes and mount(s) are located at the root level of an ofs Filesystem. Buckets are listed naturally under volumes. Keys and directories are under each buckets.

Note that for mounts, only temp mount /tmp is supported at the moment.

Configuration

Please add the following entry to the core-site.xml.

<property>
<name>fs.ofs.impl</name>
<value>org.apache.hadoop.fs.ozone.RootedOzoneFileSystem</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>ofs://om-host.example.com/</value>
</property>

This will make all the volumes and buckets to be the default Hadoop compatible file system and register the ofs file system type.

You also need to add the Ozone filesystem JAR file to the classpath:

export HADOOP_CLASSPATH=/opt/ozone/share/ozone/lib/ozone-filesystem-hadoop3-*.jar:$HADOOP_CLASSPATH
note

With Hadoop 2.x, use the Hadoop 2.x version.

Once the default Filesystem has been setup, users can run commands like ls, put, mkdir, etc. For example:

hdfs dfs -ls /

Note that ofs works on all buckets and volumes. Users can create buckets and volumes using mkdir, such as create volume named volume1 and bucket named bucket1:

hdfs dfs -mkdir /volume1
hdfs dfs -mkdir /volume1/bucket1

Or use the put command to write a file to the bucket.

hdfs dfs -put /etc/hosts /volume1/bucket1/test

For more usage, see: https://issues.apache.org/jira/secure/attachment/12987636/Design%20ofs%20v1.pdf

Differences from o3fs

Creating files

ofs doesn't allow creating keys(files) directly under root or volumes. Users will receive an error message when they try to do that:

ozone fs -touch /volume1/key1
# Output: touch: Cannot create file under root or volume.

Simplify fs.defaultFS

With ofs, fs.defaultFS (in core-site.xml) no longer needs to have a specific volume and bucket in its path like o3fs did. Simply put the OM host or service ID (in case of HA):

<property>
<name>fs.defaultFS</name>
<value>ofs://omservice</value>
</property>

The client would then be able to access every volume and bucket on the cluster without specifying the hostname or service ID.

ozone fs -mkdir -p /volume1/bucket1

Volume and bucket management directly from FileSystem shell

Admins can create and delete volumes and buckets easily with Hadoop FS shell. Volumes and buckets are treated similar to directories so they will be created if they don't exist with -p:

ozone fs -mkdir -p ofs://omservice/volume1/bucket1/dir1/

Note that the supported volume and bucket name character set rule still applies. For instance, bucket and volume names don't take underscore(_):

$ ozone fs -mkdir -p /volume_1
mkdir: Bucket or Volume name has an unsupported character : _

Mounts and Configuring /tmp

In order to be compatible with legacy Hadoop applications that use /tmp/, we have a special temp mount located at the root of the FS. This feature may be expanded in the feature to support custom mount paths.

Currently Ozone supports two configurations for /tmp. The first (default), is a tmp directory for each user comprised of a mount volume with a user specific temp bucket. The second (configurable through ozone-site.xml), a sticky-bit like tmp directory common to all users comprised of a mount volume and a common temp bucket.

Important: To use it, first, an admin needs to create the volume tmp (the volume name is hardcoded for now) and set its ACL to world ALL access. Namely:

ozone sh volume create tmp
ozone sh volume setacl tmp -al world::a

These commands only need to be done once per cluster.

For /tmp directory per user (default)

Then, each user needs to mkdir first to initialize their own temp bucket once.

$ ozone fs -mkdir /tmp
2020-06-04 00:00:00,050 [main] INFO rpc.RpcClient: Creating Bucket: tmp/0238 ...

After that they can write to it just like they would do to a regular directory. e.g.:

ozone fs -touch /tmp/key1

For a sharable /tmp directory common to all users

To enable the sticky-bit common /tmp directory, update the ozone-site.xml with the following property

<property>
<name>ozone.om.enable.ofs.shared.tmp.dir</name>
<value>true</value>
</property>

Then after setting up the volume tmp as admin, also configure a tmp bucket that serves as the common /tmp directory for all users, for example,

$ ozone sh bucket create /tmp/tmp
$ ozone sh volume setacl tmp -a user:anyuser:rwlc \
user:adminuser:a,group:anyuser:rwlc,group:adminuser:a tmp/tmp

where, anyuser is username(s) admin wants to grant access and, adminuser is the admin username.

Users then access the tmp directory as,

ozone fs -put ./NOTICE.txt ofs://om/tmp/key1

Delete with trash enabled

In order to enable trash in Ozone, Please add these configs to core-site.xml

<property>
<name>fs.trash.interval</name>
<value>10</value>
</property>
<property>
<name>fs.trash.classname</name>
<value>org.apache.hadoop.fs.ozone.OzoneTrashPolicy</value>
</property>

When keys are deleted with trash enabled, they are moved to a trash directory under each bucket, because keys aren't allowed to be moved(renamed) between buckets in Ozone.

$ ozone fs -rm /volume1/bucket1/key1
2020-06-04 00:00:00,100 [main] INFO fs.TrashPolicyDefault: Moved: 'ofs://id1/volume1/bucket1/key1' to trash at: ofs://id1/volume1/bucket1/.Trash/hadoop/Current/volume1/bucket1/key1

This is very similar to how the HDFS encryption zone handles trash location.

Note

  1. The flag -skipTrash can be used to delete files permanently without being moved to trash.

  2. Deletes at bucket or volume level with trash enabled are not allowed. One must use skipTrash in such cases. i.e ozone fs -rm -R ofs://vol1/bucket1 or ozone fs -rm -R o3fs://bucket1.vol1 are not allowed without skipTrash

Recursive listing

ofs supports recursive volume, bucket and key listing.

i.e. ozone fs -ls -R ofs://omservice/ will recursively list all volumes, buckets and keys the user has LIST permission to if ACL is enabled. If ACL is disabled, the command would just list literally everything on that cluster.

This feature wouldn't degrade server performance as the loop is on the client. Think it as a client is issuing multiple requests to the server to get all the information.

Migrating from HDFS

This guide helps you migrate applications from HDFS to Ozone by detailing the API-level compatibility of the Ozone File System (ofs).

To ensure a smooth transition, you should first verify that your existing applications will work with ofs. You can check for potential incompatibilities by identifying the HDFS APIs your applications use.

  • For Cluster Administrators: Analyze the NameNode audit logs on your source HDFS cluster to identify the operations performed by your applications.
  • For Application Developers: Review your application's source code to identify which FileSystem or DistributedFileSystem APIs are being called.

Once you have a list of APIs, compare it against the tables below to identify any unsupported operations.

Supported FileSystem APIs

The following standard FileSystem APIs are supported by ofs.

OperationNameNode audit logDescriptionSupport
accesscheckAccessChecks if the user can access a path.Supported
createcreateCreates a new file.Supported
openopenOpens a file for reading.Supported
renamerenameRenames a file or directory.Supported [1]
deletedeleteDeletes a file or directory.Supported [2]
listStatuslistStatusLists the status of files in a directory.Supported [3]
mkdirsmkdirsCreates a directory and its parents.Supported
getFileStatusgetfileinfoGets the status of a file.Supported
setTimessetTimesSets the modification and access times.Supported
getLinkTargetgetfileinfoGets the target of a symbolic link.Supported [4]
getFileChecksumopenGets the checksum of a file.Supported
setSafeModesafemode_leave, safemode_enter, safemode_get, safemode_force_exitEnters or leaves safe mode.Supported
recoverLeaserecoverLeaseRecovers a file lease.Supported
isFileClosedisFileClosedChecks if a file is closed.Supported
createSnapshotcreateSnapshotCreates a snapshot.Supported [5]
deleteSnapshotdeleteSnapshotDeletes a snapshot.Supported [5]
renameSnapshotrenameSnapshotRenames a snapshot.Supported [5]
getSnapshotDiffReportcomputeSnapshotDiffGets a snapshot diff report.Supported [5]
getContentSummarycontentSummaryGets the content summary of a path.Supported
getDelegationTokengetDelegationTokenGets a delegation token.Supported
globStatuslistStatusFinds files matching a pattern.Supported
copyFromLocalFilegetfileinfoCopies a file from the local filesystem.Supported
existsgetfileinfoChecks if a path exists.Supported
getFileBlockLocationsopenGets file block locations.Supported
getTrashRootlistSnapshottableDirectory, getEZForPathGets the trash root for a path.Supported
getTrashRootslistStatus, listEncryptionZonesGets all trash roots.Supported
isDirectorygetfileinfoChecks if a path is a directory.Supported
isFilegetfileinfoChecks if a path is a file.Supported
listFileslistStatusReturns a remote iterator for files.Supported
listLocatedStatuslistStatusReturns a remote iterator for located file statuses.Supported
listStatusIteratorlistStatusReturns a remote iterator for file statuses.Supported
getDefaultBlockSizeN/AGets the default block size.Supported
getDefaultReplicationN/AGets the default replication factor.Supported
getHomeDirectoryN/AGets the user's home directory.Supported
getServerDefaultsN/AGets the server default values.Supported
getWorkingDirectoryN/AGets the current working directory.Supported
hasPathCapabilityN/AQueries for a path capability.Supported
setWorkingDirectoryN/ASets the current working directory.Supported
supportsSymlinksN/AChecks if symbolic links are supported.Supported

Note: An audit log of N/A means the API is client-side only and does not access the NameNode.

Unsupported FileSystem APIs

The following standard FileSystem APIs are not supported by ofs.

OperationNameNode audit logDescription
appendappendAppends to an existing file.
setPermissionsetPermissionSets the permission of a file.
setOwnersetOwnerSets the owner of a file.
setReplicationsetReplicationSets the replication factor.
createSymlinkcreateSymlinkCreates a symbolic link.
resolveLinkgetfileinfoResolves a symbolic link.
setXAttrsetXAttrSets an extended attribute.
getXAttrgetXAttrsGets an extended attribute.
getXAttrsgetXAttrsGets extended attributes.
listXAttrslistXAttrsLists extended attributes.
removeXAttrremoveXAttrRemoves an extended attribute.
setAclsetAclSets an ACL.
getAclStatusgetAclStatusGets an ACL status.
modifyAclEntriesmodifyAclEntriesModifies ACL entries.
removeAclEntriesremoveAclEntriesRemoves ACL entries.
removeDefaultAclremoveDefaultAclRemoves the default ACL.
removeAclremoveAclRemoves an ACL.
truncatetruncateTruncates a file.
concatconcatConcatenates files.
listCorruptFileBlockslistCorruptFileBlocksList corrupted file blocks
msyncN/ADoes not have a corresponding HDFS audit log.

Unsupported HDFS-Specific APIs

The following APIs are specific to HDFS's DistributedFileSystem implementation and are not part of the generic org.apache.hadoop.fs.FileSystem API. Therefore, they are not supported by ofs. However, see the footnotes below for equivalent APIs.

OperationNameNode audit logDescriptionSupport
getErasureCodingPolicygetErasureCodingPolicyGets the erasure coding policy of a file/dir.Unsupported [6]
setErasureCodingPolicysetErasureCodingPolicySets an erasure coding policy on a directory.Unsupported [6]
unsetErasureCodingPolicyunsetErasureCodingPolicyUnsets an erasure coding policy on a directory.Unsupported [6]
addErasureCodingPoliciesaddErasureCodingPoliciesAdds erasure coding policies.Unsupported [6]
getErasureCodingPoliciesgetErasureCodingPoliciesGets the available erasure coding policies.Unsupported [6]
removeErasureCodingPolicyremoveErasureCodingPolicyRemoves an erasure coding policy.Unsupported [6]
enableErasureCodingPolicyenableErasureCodingPolicyEnables an erasure coding policy.Unsupported [6]
disableErasureCodingPolicydisableErasureCodingPolicyDisables an erasure coding policy.Unsupported [6]
getErasureCodingCodecsgetErasureCodingCodecsLists all erasure coding codecs.Unsupported [6]
getECTopologyResultForPoliciesgetECTopologyResultForPoliciesGet erasure coding topology result for policies.Unsupported [6]
getSnapshotListingListSnapshotList all snapshots of a snapshottable directory.Unsupported [6]
allowSnapshotallowSnapshotAllows snapshots to be taken on a directory.Unsupported [6]
disallowSnapshotdisallowSnapshotDisallows snapshots to be taken on a directory.Unsupported [6]
provisionSnapshotTrashgetfileinfo, mkdirs, setPermissionProvision trash for a snapshottable directory.Unsupported [6]
createEncryptionZonecreateEncryptionZoneCreates an encryption zone.Unsupported [6]
getEZForPathgetEZForPathGets the encryption zone for a path.Unsupported [6]
listEncryptionZoneslistEncryptionZonesLists all encryption zones.Unsupported [6]
reencryptEncryptionZonereencryptEncryptionZoneReencrypt an encryption zone.Unsupported [6]
listReencryptionStatuslistReencryptionStatusList reencryption status.Unsupported [6]
getFileEncryptionInfogetfileinfoGet file encryption info.Unsupported [6]
provisionEZTrashgetEZForPath, getfileinfo, mkdirs, setPermissionProvision trash for an encryption zone.Unsupported [6]
setQuotaclearQuota or clearSpaceQuota or setQuota or setSpaceQuotaSets the quota usage for a path.Unsupported [6]
getQuotaUsagequotaUsageGets the quota usage for a path.Unsupported [6]
setQuotaByStorageTypesetSpaceQuotaSets quota by storage type for a path.Unsupported [6]
unsetStoragePolicyunsetStoragePolicyUnsets a storage policy on a file or directory.Unsupported
setStoragePolicysetStoragePolicySets a storage policy on a file or directory.Unsupported
getStoragePolicygetStoragePolicyGets the storage policy of a file or directory.Unsupported
satisfyStoragePolicysatisfyStoragePolicySatisfies the storage policy of a file.Unsupported
addCachePooladdCachePoolAdds a cache pool.Unsupported
modifyCachePoolmodifyCachePoolModifies a cache pool.Unsupported
removeCachePoolremoveCachePoolRemoves a cache pool.Unsupported
listCachePoolslistCachePoolsLists all cache pools.Unsupported
addCacheDirectiveaddCacheDirectiveAdds a cache directive.Unsupported
modifyCacheDirectivemodifyCacheDirectiveModifies a cache directive.Unsupported
removeCacheDirectiveremoveCacheDirectiveRemoves a cache directive.Unsupported
listCacheDirectiveslistCacheDirectivesLists cache directives.Unsupported
getSlowDatanodeStatsdatanodeReportGet slow Datanode stats.Unsupported
saveNamespacesaveNamespaceSave the namespace.Unsupported
restoreFailedStoragecheckRestoreFailedStorage, enableRestoreFailedStorage, disableRestoreFailedStorageRestore failed storage.Unsupported
refreshNodesrefreshNodesRefresh nodes.Unsupported
setBalancerBandwidthsetBalancerBandwidthSet balancer bandwidth.Unsupported
metaSavemetaSaveMeta save.Unsupported
rollingUpgradequeryRollingUpgrade, startRollingUpgrade, finalizeRollingUpgradeRolling upgrade.Unsupported
finalizeUpgradefinalizeUpgradeFinalize upgrade.Unsupported [7]
listOpenFileslistOpenFilesList open files.Unsupported [7]

Footnotes:

[1]

Renaming files or directories across different buckets is not supported. Within a File System Optimized (FSO) bucket, rename is an atomic metadata operation. For legacy buckets, renaming a directory is a non-atomic operation that renames each file and subdirectory individually.

[2]

Deleting the filesystem root is not allowed. Recursively deleting a volume is also not supported.

[3]

Recursive listing is not supported at the root or volume level.

[4]

ofs supports "linked buckets," where one bucket serves as a reference to another. However, general-purpose symbolic links for files or directories are not supported.

[5]

Snapshots are supported at the bucket level only.

[6]

For operations related to encryption zones, erasure coding, snapshots, and quotas, use the corresponding Ozone bucket-level APIs instead.

[7]

Replace with OzoneManagerProtocol.finalizeUpgrade() and OzoneManagerProtocol.listOpenFiles().

The following audit logs are typically produced by HDFS internal services and are not relevant for application migration: slowDataNodesReport, getDatanodeStorageReport, rollEditLog, renewDelegationToken, cancelDelegationToken, gcDeletedSnapshot.

The following audit logs are produced by the HDFS output stream: getAdditionalBlock, getAdditionalDatanode, abandonBlock, completeFile, fsync.

The getPreferredBlockSize audit log is used in testing only.