SPIKE Recovery Procedures
SPIKE ensures that your secrets are secure and resilient, aiming for seamless operations even in the most challenging situations. This document outlines the steps required for recovering SPIKE in various scenarios, ensuring you have the right guidance to handle any eventuality.
SPIKE Nexus Crash Recovery
SPIKE is designed to automatically recover SPIKE Nexus from crashes. Here is how this happens:
- SPIKE Nexus crashes.
- New SPIKE Nexus instance starts.
- SPIKE Nexus ask for shards from SPIKE Keepers.
- Once SPIKE Nexus gathers adequate shards, it recreates its root key and resumes normal operations.
SPIKE Keeper Crash Recovery
SPIKE Keeper recovery is automatic, and does not require any manual intervention.
SPIKE Nexus regularly sends the shard that a SPIKE Keeper has to store. So, if a SPIKE Keeper instance crashes, it will eventually receive its shard.
Complete System Recovery
In critical scenarios where SPIKE remains unavailable for extended periods,
In the unlikely case that both SPIKE Nexus and all SPIKE Keeper instances crash all together, the system may transition to a state where it cannot automatically recover.
In that case, manual intervention will be necessary. The following sections describe this “break-the-glass” procedure to help restore SPIKE back to its operational state:
1. Before complete system failure:
- Change the SPIFFE ID of SPIKE Pilot to recovery mode by
executing
./hack/spire-server-entry-recover-register.sh
- Run
spike recover
- Save the files generated in
~/.spike/recover
folder to a safe, encrypted, and password-protected medium. - Securely erase the ~/.spike/recover` folder.
- Change the SPIFFE ID of SPIKE Pilot back using
./hack/spire-server-entry-su-register.sh
or delete the registration entry entirely for extra security. - You can create the entry back using
./hack/spire-server-entry-su-register.sh
when you need to use SPIKE Pilot.
2. During complete system failure:
- Change the SPIFFE ID of SPIKE Pilot to restore mode:
./hack/spire-server-entry-restore-register.sh
- Execute
spike restore
and enter the shards you created in the previous step one by one. Eachspike restore
call accepts a single shard. - When you provide enough shards, the system will restore itself: SPIKE Nexus will restore its root key, and it will also hydrate its peer SPIKE Keeper instances to protect itself against future crashes.
- Change the SPIFFE ID of SPIKE Pilot back using
./hack/spire-server-entry-su-register.sh
or delete the registration entry entirely for extra security.- You can create the entry back using
./hack/spire-server-entry-su-register.sh
when you need to use SPIKE Pilot.
- You can create the entry back using
- Both SPIKE Nexus, SPIKE Keeper are unavailable, or the system is in on other irrecoverable state.
- Admin executes
spike recover
. - Admin provides their password.
- The encrypted root key is fetched from the database and injected to the memory of SPIKE Nexus.
- SPIKE Nexus syncs the root key with SPIKE Keeper.
- The system resumes normal operation.
Total System Reset
This procedure is for resetting SPIKE to its factory defaults.
The situation:
- Both SPIKE Nexus and all SPIKE Keeper instances have crashed, there is no way to fetch the root key from SPIKE Keeper(s).
- The system administrator has not used
spike recover
to create recovery shards, or they have lost access to the recovery shards. - Everyone have learned their lessons, and now it’s time to reset the system and conduct an extensive “what went wrong / what should have been done” analysis.
How to proceed:
- Delete
~/.spike
folder, which will also delete all the persisted secrets in the SQLite backing store. - Delete SPIRE Server registration entries.
- Redeploy SPIKE using your preferred method.
- You can check out
./hack/start.sh
to see a sample startup/deployment script.
- You can check out
- This is a complete system reset; you’ll lose all data and all former configuration, including secret access policies.
- SPIKE Cross-Platform Build
- SPIKE Production Setup
- SPIKE Recovery Procedures
- SPIKE Release Management