[HDDS-4470] Ozone Manager Prepare for Upgrade (implemented)
 
Abstract
In the context of upgrades, a slow follower presents a problem when there are changes in the OM Ratis request handler logic. This means that when there is a slow follower, it is possible to have different versions of the code processing the request at different OM nodes causing problems from DB divergence to unexpected crashes. For example, a leader could have “applied” a transaction in the older OM version, while a slow follower who has the request in its logs may apply it after the upgrade.
The objective of the ‘Prepare upgrade’ step is to make sure all the OMs use the same version of the software to update their DBs (apply transaction) for a given request.
To ensure this, we need to make sure that
- For every operational OM (at least a quorum of OMs should be operational ), all unapplied transactions should be applied.
- For an OM that is not operational during the prepare step, it should get a Ratis snapshot (entire OM RocksDB) to get up to speed with the rest of the OMs after the upgrade.
Usage
How do you prepare an Ozone manager quorum
ozone admin om -id=<om-sevice-id> prepare
This leaves the Ozone manager in a state where it cannot accept new writes.
How do you cancel a “prepared” Ozone manager quorum
In the case of a cancelled upgrade, the OM can be brought out off the prepared state by using the following command.
ozone admin om -id=<om-sevice-id> cancelprepare
Link
https://issues.apache.org/jira/secure/attachment/13015491/OM%20Prepare%20Upgrade.pdf
https://issues.apache.org/jira/secure/attachment/13015411/OM%20Prepare%20Upgrade%2CDowngrade.jpg