s3a and Ozone
Ozone exposes an S3-compatible REST interface via the S3 Gateway. Hadoop's S3A filesystem (s3a://) is a cloud connector that translates the AWS S3 API into a Hadoop-compatible file system interface. Hadoop-style data analytics tools such as Hive, Impala, and Spark can access Ozone's S3 interface using the Hadoop S3A connector, so you can use Ozone buckets from existing Hadoop ecosystem tools without application changes.
This page explains how to configure the Hadoop S3A client to use Ozone's S3 Gateway (s3g) and provides sample commands to access Ozone s3g using s3a. For details about the Ozone S3 Gateway itself (supported REST APIs, URL schemes, security), see the S3 Protocol page. For more information about S3A, see the official Hadoop S3A documentation.
Prerequisites
- A running Ozone cluster with the S3 Gateway enabled. You can start a Docker-based cluster (including S3 Gateway) as described in the S3 Protocol documentation.
- Ozone S3 endpoint (for example
http://localhost:9878or a load balancer DNS name). - Hadoop distribution with the
hadoop-awsmodule available. See the official Hadoop S3A documentation:
Configuring S3A for Ozone
Enable the S3A client
Ensure the hadoop-aws module is on the client classpath. In a typical Hadoop installation:
- Set
HADOOP_OPTIONAL_TOOLSinhadoop-env.shto includehadoop-aws, or - Add a dependency on
org.apache.hadoop:hadoop-awswith the same version ashadoop-common.
See the Hadoop S3A Getting Started section for details.
core-site.xml: point S3A to Ozone
Add the following properties to the Hadoop configuration (for example core-site.xml) so that s3a:// URIs use the Ozone S3 Gateway instead of AWS S3:
<property>
<name>fs.s3a.endpoint</name>
<value>http://ozone-s3g-host:9878</value>
<description>
Ozone S3 Gateway endpoint. Replace with your s3g hostname or load balancer.
</description>
</property>
<property>
<name>fs.s3a.endpoint.region</name>
<value>us-east-1</value>
<description>
Logical region name required by the S3A client. Ozone does not enforce regions,
but this must be a valid-looking value.
</description>
</property>
<property>
<name>fs.s3a.path.style.access</name>
<value>true</value>
<description>
Ozone S3 Gateway defaults to path-style URLs (http://host:9878/bucket),
so S3A should use path-style access.
</description>
</property>
These properties follow the official S3A connection settings in Connecting to an S3 store.
Recommended settings for Ozone
Ozone S3 Gateway adds ETag support for S3 Multipart Upload (MPU). Object versioning and some other S3 behaviors may still differ from AWS S3. To avoid compatibility issues with older clients or when not using MPU, you can set these options when using S3A with Ozone:
<property>
<name>fs.s3a.bucket.probe</name>
<value>0</value>
<description>
Disable the bucket existence probe at startup. This is the default in recent Hadoop
versions and is recommended for third-party S3-compatible stores such as Ozone.
</description>
</property>
<property>
<name>fs.s3a.change.detection.mode</name>
<value>none</value>
<description>Disable change detection; not applicable to Ozone S3.</description>
</property>
Credentials
Ozone uses the same AWS-style access key and secret key model for the S3 Gateway.
- If security is disabled, any
AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEYpair can be used. - If security is enabled, obtain a key and secret via
ozone s3 getsecret(Kerberos authentication is required). See the S3 Protocol — Security and Securing S3 sections for details.
Configure S3A credentials in core-site.xml:
<property>
<name>fs.s3a.access.key</name>
<value>your-access-key</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>your-secret-key</value>
</property>
Alternatively, use environment variables as documented in Authenticating via AWS environment variables:
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
For generating and revoking Ozone S3 secrets, see the Security section of the S3 Protocol page.
If the Ozone S3 Gateway is exposed over HTTPS, the JVM must trust the gateway's TLS certificate. The Hadoop AWS client (hadoop-aws) uses the default Java truststore; if the gateway uses a custom or internal CA, add that CA to JAVA_HOME/lib/security/jssecacerts or configure the JVM truststore accordingly. Otherwise S3A connections to the HTTPS endpoint may fail with certificate errors.
Example: using hadoop fs with Ozone via S3A
The examples below assume:
- Ozone S3 Gateway is reachable at
http://localhost:9878 core-site.xmlis configured as above- An S3 bucket (for example
bucket1) already exists (you can create it withaws s3api --endpoint http://localhost:9878 create-bucket --bucket bucket1)
S3A URLs use the form s3a://<bucket>/<path>. The bucket corresponds to an Ozone bucket under the /s3v volume or a bucket link.
List objects in an Ozone S3 bucket
hadoop fs -ls s3a://bucket1/
Upload a local file to Ozone using S3A
hadoop fs -put /data/local-file.txt s3a://bucket1/path/local-file.txt
Download from Ozone to local or HDFS
# To local filesystem
hadoop fs -copyToLocal s3a://bucket1/path/file.txt /tmp/from-ozone.txt
# Copy to HDFS
hadoop fs -cp s3a://bucket1/path/file.txt hdfs:///user/test/from-ozone.txt
Quick test with inline configuration
If you cannot modify cluster-wide core-site.xml, you can pass S3A options on the command line. Replace the endpoint, bucket, and credentials with your values:
hadoop fs \
-D fs.s3a.endpoint=http://localhost:9878 \
-D fs.s3a.endpoint.region=us-east-1 \
-D fs.s3a.path.style.access=true \
-D fs.s3a.bucket.probe=0 \
-D fs.s3a.change.detection.mode=none \
-D fs.s3a.access.key=your-access-key \
-D fs.s3a.secret.key=your-secret-key \
-ls s3a://bucket1/
Example: using distcp between HDFS and Ozone
You can use S3A as a source or destination for distcp to move data between HDFS and Ozone. Use the same S3A configuration as above.
Copy from HDFS to Ozone:
hadoop distcp hdfs:///data/source/dir s3a://bucket1/backup/dir
Copy from Ozone to HDFS:
hadoop distcp s3a://bucket1/backup/dir hdfs:///data/restore/dir
Relation to Ozone S3 documentation
This page describes using Ozone from the Hadoop FileSystem perspective (S3A client). For REST API details, supported S3 operations, bucket linking, and S3 security, see the S3 Protocol and Securing S3 pages.
For advanced S3A options (performance tuning, encryption, retries), refer to the official Hadoop S3A documentation and its sub-pages such as Performance and Encryption.