Ozone Repair
Ozone Repair (ozone repair
) is an advanced tool to repair Ozone. The nodes being repaired must be stopped before the tool is run.
Note: All repair commands support a --dry-run
option which allows a user to see what repair the command will be performing without actually making any changes to the cluster.
Use the --force
flag to override the running service check in false-positive cases.
Usage: ozone repair [-hV] [--verbose] [-conf=<configurationPath>]
[-D=<String=String>]... [COMMAND]
Advanced tool to repair Ozone. The nodes being repaired must be stopped before
the tool is run.
-conf=<configurationPath>
-D, --set=<String=String>
-h, --help Show this help message and exit.
-V, --version Print version information and exit.
--verbose More verbose output. Show the stack trace of the errors.
Commands:
datanode Tools to repair Datanode
ldb Operational tool to repair ldb.
om Operational tool to repair OM.
scm Operational tool to repair SCM.
For more detailed usage see the output of --help
for each of the subcommands.
ozone repair datanode
Operational tool to repair datanode.
upgrade-container-schema
Upgrade all schema V2 containers to schema V3 for a datanode in offline mode.
Optionally takes --volume
option to specify which volume needs the upgrade.
ozone repair ldb
Operational tool to repair ldb.
compact
Compact a column family in the DB to clean up tombstones while the service is offline.
Usage: ozone repair ldb compact [-hV] [--dry-run] [--force] [--verbose]
--cf=<columnFamilyName> --db=<dbPath>
CLI to compact a column-family in the DB while the service is offline.
Note: If om.db is compacted with this tool then it will negatively impact the
Ozone Manager\'s efficient snapshot diff.
--cf, --column-family, --column_family=<columnFamilyName>
Column family name
--db=<dbPath> Database File Path
ozone repair om
Operational tool to repair OM.
Subcommands under OM
- fso-tree
- snapshot
- update-transaction
- quota
- compact
- skip-ratis-transaction
fso-tree
Identify and repair a disconnected FSO tree by marking unreferenced entries for deletion. Reports the reachable, unreachable (pending delete) and unreferenced (orphaned) directories and files. OM should be stopped while this tool is run.
Usage: ozone repair om fso-tree [-hV] [--dry-run] [--force] [--verbose]
[-b=<bucketFilter>] --db=<omDBPath>
[-v=<volumeFilter>]
Identify and repair a disconnected FSO tree by marking unreferenced entries for
deletion. OM should be stopped while this tool is run.
-b, --bucket=<bucketFilter>
Filter by bucket name
--db=<omDBPath> Path to OM RocksDB
-v, --volume=<volumeFilter>
Filter by volume name. Add '/' before the volume name.
snapshot
Subcommand for all snapshot related repairs.
chain
Update global and path previous snapshot for a snapshot in case snapshot chain is corrupted.
Usage: ozone repair om snapshot chain [-hV] [--dry-run] [--force] [--verbose]
--db=<dbPath>
--gp=<globalPreviousSnapshotId>
--pp=<pathPreviousSnapshotId> <value>
<snapshotName>
CLI to update global and path previous snapshot for a snapshot in case snapshot
chain is corrupted.
<value> URI of the bucket (format: volume/bucket).
<snapshotName> Snapshot name to update
--db=<dbPath> Database File Path
--gp, --global-previous=<globalPreviousSnapshotId>
Global previous snapshotId to set for the given snapshot
--pp, --path-previous=<pathPreviousSnapshotId>
Path previous snapshotId to set for the given snapshot
update-transaction
To avoid modifying Ratis logs and only update the latest applied transaction, use update-transaction
command.
This updates the highest transaction index in the OM transaction info table.
Usage: ozone repair om update-transaction [-hV] [--dry-run] [--force]
[--verbose] --db=<dbPath> --index=<highestTransactionIndex>
--term=<highestTransactionTerm>
CLI to update the highest index in transaction info table.
--db=<dbPath> Database File Path
--index=<highestTransactionIndex>
Highest index to set. The input should be non-zero long
integer.
--term=<highestTransactionTerm>
Highest term to set. The input should be non-zero long
integer.
quota
Operational tool to repair quota in OM DB.
start
To trigger quota repair use the start
command.
Usage: ozone repair om quota start [-hV] [--dry-run] [--force] [--verbose]
[--buckets=<buckets>]
[--service-host=<omHost>]
[--service-id=<omServiceId>]
CLI to trigger quota repair.
--buckets=<buckets> start quota repair for specific buckets. Input will
be list of uri separated by comma as
/<volume>/<bucket>[,...]
--service-host=<omHost>
Ozone Manager Host. If OM HA is enabled, use
--service-id instead. If you must use
--service-host with OM HA, this must point
directly to the leader OM. This option is
required when --service-id is not provided or
when HA is not enabled.
--service-id, --om-service-id=<omServiceId>
Ozone Manager Service ID
status
Get the status of last triggered quota repair.
Usage: ozone repair om quota status [-hV] [--verbose] [--service-host=<omHost>]
[--service-id=<omServiceId>]
CLI to get the status of last trigger quota repair if available.
--service-host=<omHost>
Ozone Manager Host. If OM HA is enabled, use --service-id
instead. If you must use --service-host with OM HA, this
must point directly to the leader OM. This option is
required when --service-id is not provided or when HA is
not enabled.
--service-id, --om-service-id=<omServiceId>
Ozone Manager Service ID
compact
Compact a column family in the OM DB to clean up tombstones. The compaction happens asynchronously. Requires admin privileges.
Usage: ozone repair om compact [-hV] [--dry-run] [--force] [--verbose]
--cf=<columnFamilyName> [--node-id=<nodeId>]
[--service-id=<omServiceId>]
CLI to compact a column family in the om.db. The compaction happens
asynchronously. Requires admin privileges.
--cf, --column-family, --column_family=<columnFamilyName>
Column family name
--node-id=<nodeId> NodeID of the OM for which db needs to be compacted.
--service-id, --om-service-id=<omServiceId>
Ozone Manager Service ID
skip-ratis-transaction, srt
Omit a raft log in a ratis segment file by replacing the specified index with a dummy EchoOM command. This is an offline tool meant to be used only when all 3 OMs crash on the same transaction. If the issue is isolated to one OM, manually copy the DB from a healthy OM instead.
Usage: ozone repair om skip-ratis-transaction [-hV] [--dry-run] [--force]
[--verbose] -b=<backupDir> --index=<index> (-s=<segmentFile> |
-d=<logDir>)
CLI to omit a raft log in a ratis segment file. The raft log at the index
specified is replaced with an EchoOM command (which is a dummy command). It is
an offline command i.e., doesn\'t require OM to be running. The command should
be run for the same transaction on all 3 OMs only when all the OMs are crashing
while applying the same transaction. If only one OM is crashing and the other
OMs have executed the log successfully, then the DB should be manually copied
from one of the good OMs to the crashing OM instead.
-b, --backup=<backupDir> Directory to put the backup of the original
repaired segment file before the repair.
-d, --ratis-log-dir=<logDir>
Path of the ratis log directory
--index=<index> Index of the failing transaction that should be
removed
-s, --segment-path=<segmentFile>
Path of the input segment file
ozone repair scm
Operational tool to repair SCM.
Subcommands under SCM
- cert
- update-transaction
cert
Subcommand for all certificate related repairs on SCM
recover
Recover Deleted SCM Certificate from RocksDB
Usage: ozone repair scm cert recover [-hV] [--dry-run] [--force] [--verbose]
--db=<dbPath>
Recover Deleted SCM Certificate from RocksDB
--db=<dbPath> SCM DB Path
update-transaction
To avoid modifying Ratis logs and only update the latest applied transaction, use update-transaction
command.
This updates the highest transaction index in the SCM transaction info table.
Usage: ozone repair scm update-transaction [-hV] [--dry-run] [--force]
[--verbose] --db=<dbPath> --index=<highestTransactionIndex>
--term=<highestTransactionTerm>
CLI to update the highest index in transaction info table.
--db=<dbPath> Database File Path
--index=<highestTransactionIndex>
Highest index to set. The input should be non-zero long
integer.
--term=<highestTransactionTerm>
Highest term to set. The input should be non-zero long
integer.