SPIKE Recovery Procedures

SPIKE ensures that your secrets are secure and resilient, aiming for seamless operations even in the most challenging situations. This document outlines the steps required for recovering SPIKE in various scenarios, ensuring you have the right guidance to handle any eventuality.

SPIKE Nexus Crash Recovery

SPIKE is designed to automatically recover SPIKE Nexus from crashes. Here is how this happens:

SPIKE Nexus crashes.
New SPIKE Nexus instance starts.
SPIKE Nexus asks for shards from SPIKE Keepers.
Once SPIKE Nexus gathers adequate shards, it recreates its root key and resumes normal operations.

SPIKE Keeper Crash Recovery

SPIKE Keeper recovery is automatic and does not require any manual intervention.

SPIKE Nexus regularly sends the shard that a SPIKE Keeper has to store. So, if a SPIKE Keeper instance crashes, it will eventually receive its shard.

Complete System Recovery

In critical scenarios where SPIKE remains unavailable for extended periods,

In the unlikely case that both SPIKE Nexus and all SPIKE Keeper instances crash all together, the system may transition to a state where it cannot automatically recover.

In that case, manual intervention will be necessary. The following sections describe this “break-the-glass” procedure to help restore SPIKE back to its operational state:

1. Before complete system failure:

Change the SPIFFE ID of SPIKE Pilot to recovery mode by executing ./hack/bare-metal/entry/spire-server-entry-recover-register.sh
Run spike recover
Save the files generated in ~/.spike/recover folder to a safe, encrypted, and password-protected medium.
Securely erase the ~/.spike/recover` folder.
Change the SPIFFE ID of SPIKE Pilot back using ./hack/bare-metal/entry/spire-server-entry-su-register.sh or delete the registration entry entirely for extra security.
You can create the entry back using ./hack/bare-metal/entry/spire-server-entry-su-register.sh when you need to use SPIKE Pilot.

2. During complete system failure:

Change the SPIFFE ID of SPIKE Pilot to restore mode: ./hack/bare-metal/entry/spire-server-entry-restore-register.sh
Execute spike restore and enter the shards you created in the previous step one by one. Each spike restore call accepts a single shard.
When you provide enough shards, the system will restore itself: SPIKE Nexus will restore its root key, and it will also hydrate its peer SPIKE Keeper instances to protect itself against future crashes.
Change the SPIFFE ID of SPIKE Pilot back using ./hack/bare-metal/entry/spire-server-entry-su-register.sh or delete the registration entry entirely for extra security.
- You can create the entry back using ./hack/bare-metal/entry/spire-server-entry-su-register.sh when you need to use SPIKE Pilot.

Both SPIKE Nexus, SPIKE Keeper are unavailable, or the system is in another irrecoverable state.
Admin executes spike recover.
Admin provides their password.
The encrypted root key is fetched from the database and injected to the memory of SPIKE Nexus.
SPIKE Nexus syncs the root key with SPIKE Keeper.
The system resumes normal operation.

Why Do We Change SVIDs Between Operations?

This approach is similar to “Admin Account Tiering” commonly found in zero trust architectures: Certain operations are forbidden between tiers; for example, a restore account cannot create secrets, and an account that can manage secrets and policies cannot initiate restoration operations.

For operations that need unusual/elevated access, and administrator will explicitly have to sign off for that elevated privilege.

Total System Reset

This procedure is for resetting SPIKE to its factory defaults.

The situation:

Both SPIKE Nexus and all SPIKE Keeper instances have crashed, there is no way to fetch the root key from SPIKE Keeper(s).
The system administrator has not used spike recover to create recovery shards, or they have lost access to the recovery shards.
Everyone has learned their lessons, and now it’s time to reset the system and conduct an extensive “what went wrong / what should have been done” analysis.

How to proceed:

Delete ~/.spike folder, which will also delete all the persisted secrets in the SQLite backing store.
Delete SPIRE Server registration entries.
Redeploy SPIKE using your preferred method.
- You can check out ./hack/bare-metal/startup/start.sh to see a sample startup/deployment script.
This is a complete system reset; you’ll lose all data and all former configuration, including secret access policies.