This document explores how we can improve the Ozone volume semantics especially with respect to the S3 compatibility layer.
We explore some of these in more detail in subsequent sections.
Currently when a user enumerates volumes, they see the list of volumes that they own. This means that when an unprivileged user enumerates volumes, it always gets any empty list. Instead users should be able to see all volumes that they have been granted read or write access to.
This also has an impact on ofs which makes volumes appear as top-level directories.
Ozone has the semantics of volume and buckets while S3 has only buckets. To make it possible to use the same bucket both from Hadoop world and via S3 we need a mapping between them.
Currently we maintain a map between the S3 buckets and Ozone volumes + buckets in OmMetadataManagerImpl
s3_bucket --> ozone_volume/ozone_bucket
The current implementation uses the "s3" + s3UserName
string as the volume name and the s3BucketName
as the bucket name. Where s3UserName
is is the DigestUtils.md5Hex(kerberosUsername.toLowerCase())
To create an S3 bucket and use it from o3fs, you should:
> kinit -kt /etc/security/keytabs/testuser.keytab testuser/scm
> ozone s3 getsecret
awsAccessKey=testuser/scm@EXAMPLE.COM
awsSecret=7a6d81dbae019085585513757b1e5332289bdbffa849126bcb7c20f2d9852092
> export AWS_ACCESS_KEY_ID=testuser/scm@EXAMPLE.COM
> export AWS_SECRET_ACCESS_KEY=7a6d81dbae019085585513757b1e5332289bdbffa849126bcb7c20f2d9852092
> aws s3api --endpoint http://localhost:9878 create-bucket --bucket=bucket1
> ozone s3 path bucket1
Volume name for S3Bucket is : s3c89e813c80ffcea9543004d57b2a1239
Ozone FileSystem Uri is : o3fs://bucket1.s3c89e813c80ffcea9543004d57b2a1239
Problem #5 can be easily supported with improving the ozone s3
CLI. Ozone has a separated table for the S3 secrets and the API can be improved to handle multiple secrets for one specific kerberos user.
s3v
volume for all the s3 buckets if the bucket is created from the s3 interface.This is an easy an fast method, but with this approach not all the volumes are available via the S3 interface. We need to provide a method to publish any of the ozone volumes / buckets.
o3:/vol1/bucketx
as an S3 bucket s3://foobar
)Implementation:
The first part is easy compared to the current implementation. We don’t need any mapping table any more.
To implement the second (expose ozone buckets as s3 buckets) we have multiple options:
The first approach required a secondary cache table and it violates the naming hierarchy. The s3 bucket name is a global unique name, therefore it’s more than just a single attribute on a specific object. It’s more like an element in the hierarchy. For this reason the second option is proposed:
For example if the default s3 volume is s3v
/s3v
volumeozone sh bucket link /vol1/bucket1 /s3v/s3bucketname
Lock contention problem
One possible problem with using just one volume is using the locks of the same volume for all the S3 buckets (thanks Xiaoyu). But this shouldn’t be a big problem.
Note: Sanjay is added to the authors as the original proposal of this approach.
bucket link
operation creates a link bucket. Links are like regular buckets, stored in DB the same way, but with two new, optional pieces of information: source volume and bucket. (The bucket being referenced by the link is called “source”, not “target”, to follow symlink terminology.)BUCKET_ALREADY_EXISTS
result is returned.s3v
volume.To solve the the s3 bucket name to ozone bucket name mapping problem some other approaches are also considered. They are rejected but keeping them in this section together with the reasons to reject.
ACCESS_KEY_ID
for the same user.ACCESS_KEY_ID
a volume name MUST be defined.ACCESS_KEY_ID
would provide a view of the buckets in the specified volume.With this approach the used volume will be more visible and – hopefully – understandable.
Instead of using ozone s3 getsecret
, following commands would be used:
ozone s3 secret create --volume=myvolume
: To create a secret and use myvolume for all of these bucketsozone s3 secret list
: To list all of the existing S3 secrets (available for the current user)ozone s3 secret delete <ACCESS_KEY_ID
: To delete any secretThe AWS_ACCESS_KEY_ID
should be a random identifier instead of using a kerberos principal.
We can try to make volume name visible for the S3 world by using some structured bucket names. Unfortunately the available separator characters are very limited:
For example we can’t use /
aws s3api create-bucket --bucket=vol1/bucket1
Parameter validation failed:
Invalid bucket name "vol1/bucket1": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$" or be an ARN matching the regex "^arn:(aws).*:s3:[a-z\-0-9]+:[0-9]{12}:accesspoint[/:][a-zA-Z0-9\-]{1,63}$"
But it’s possible to use volume-bucket
notion:
aws s3api create-bucket --bucket=vol1-bucket1
-
)/
. It can be confusing.We can also make volumes a lightweight bucket group object by removing it from the ozonefs path. With this approach we can use all the benefits of the volumes as an administration object but it would be removed from the o3fs
path.
ofs
scheme.