The Hadoop compatible file system interface allows storage backends like Ozone to be easily integrated into Hadoop eco-system. Ozone file system is an Hadoop compatible file system.
Currently, Ozone supports two scheme:
The biggest difference between the
o3fs supports operations
only at a single bucket, while ofs supports operations across all volumes and buckets and
provides a full view of all the volume/buckets.
To create an ozone file system, we have to choose a bucket where the file system would live. This bucket will be used as the backend store for OzoneFileSystem. All the files and directories will be stored as keys in this bucket.
Please run the following commands to create a volume and bucket, if you don’t have them already.
ozone sh volume create /volume ozone sh bucket create /volume/bucket
Once this is created, please make sure that bucket exists via the list volume or list bucket commands.
Please add the following entry to the core-site.xml.
<property> <name>fs.AbstractFileSystem.o3fs.impl</name> <value>org.apache.hadoop.fs.ozone.OzFs</value> </property> <property> <name>fs.defaultFS</name> <value>o3fs://bucket.volume</value> </property>
This will make this bucket to be the default Hadoop compatible file system and register the o3fs file system type.
You also need to add the ozone-filesystem-hadoop3.jar file to the classpath:
(Note: with Hadoop 2.x, use the
Once the default Filesystem has been setup, users can run commands like ls, put, mkdir, etc. For example,
hdfs dfs -ls /
hdfs dfs -mkdir /users
Or put command etc. In other words, all programs like Hive, Spark, and Distcp will work against this file system. Please note that any keys created/deleted in the bucket using methods apart from OzoneFileSystem will show up as directories and files in the Ozone File System.
Note: Bucket and volume names are not allowed to have a period in them.
Moreover, the filesystem URI can take a fully qualified form with the OM host and an optional port as a part of the path following the volume name. For example, you can specify both host and port:
hdfs dfs -ls o3fs://bucket.volume.om-host.example.com:5678/key
When the port number is not specified, it will be retrieved from config key
if defined; or it will fall back to the default port
For example, we have
ozone.om.address configured as following in
<property> <name>ozone.om.address</name> <value>0.0.0.0:6789</value> </property>
When we run command:
hdfs dfs -ls o3fs://bucket.volume.om-host.example.com/key
The above command is essentially equivalent to:
hdfs dfs -ls o3fs://bucket.volume.om-host.example.com:6789/key
Note: Only port number from the config is used in this case,
whereas the host name in the config
ozone.om.address is ignored.