Waiting for input...
Star SPIKE on GitHub

ADR-0030: Minimal Error Messages in API Responses


  • Status: accepted
  • Date: 2025-11-22
  • Tags: Security, API Design, Error Handling, Information Disclosure

Context

SPIKE Nexus exposes an HTTP API that workloads use to interact with the secret management system. When operations fail, the API must communicate errors to clients. The design of error responses involves a fundamental trade-off between security and diagnostics:

  1. Detailed error messages provide rich diagnostic information to clients, making debugging easier, but risk information leakage
  2. Minimal error messages provide only error codes, maintaining security but offering limited client-side diagnostics

For a security-critical secret management system, we need to determine the appropriate balance between these concerns.

Decision

SPIKE Nexus API responses will return error codes only, with no descriptive error messages to clients.

Specifically:

  • API responses contain only structured error codes (e.g., NOT_FOUND, UNAUTHORIZED, BAD_REQUEST)
  • No additional error message fields, stack traces, or diagnostic information
  • All detailed error context is logged server-side with full audit trail
  • Clients must interpret errors based solely on:
    • HTTP status codes (404, 401, 400, 500)
    • Structured error code enumerations
    • Request parameters they provided

Rationale

Security: Preventing Information Leakage

Error messages can reveal sensitive information about the system:

Path existence enumeration:

❌ "Secret 'secrets/admin/root-password' not found"
   → Reveals path structure even when denied

✓  NOT_FOUND
   → Reveals nothing about whether path exists or is unauthorized

Permission structure disclosure:

❌ "Permission denied for path 'secrets/database'"
   → Leaks information about permission boundaries

✓  UNAUTHORIZED
   → No information about what paths exist or their structure

Implementation details:

❌ "Database query failed: table 'secrets' locked"
   → Leaks internal architecture details

✓  INTERNAL_SERVER_ERROR
   → No information about internal implementation

Stack traces (the “one bad commit” risk):

❌ Adding an ErrMsg field creates risk of accidentally including:
   - File paths
   - Internal function names
   - SQL queries
   - Configuration details

✓  No message field = no risk of accidental disclosure

Defense Against Enumeration Attacks

Minimal errors prevent attackers from probing the system:

Attack VectorDetailed MessagesMinimal Codes
Path enumeration“Path X not found” vs “Path Y unauthorized” reveals valid pathsAll failures return same code
Permission probingMessages reveal permission boundariesNo distinction between not-found and unauthorized
Version detectionStack traces reveal library versionsNo version information leaked
Schema discoveryError messages reveal data structureNo schema information exposed

API Design: Clean and Stable

Error codes provide better API stability than messages:

Programmatic handling:

// Client can reliably handle specific errors
switch response.Err {
case data.ErrNotFound:
    // Handle missing secret
case data.ErrUnauthorized:
    // Handle permission denied
}

No versioning issues:

  • Error codes remain stable across versions
  • No message format changes breaking clients
  • No localization complexity
  • Consistent parsing and handling

Testability:

  • Deterministic error codes are easy to test
  • No string matching or regex required
  • Clear expected outcomes in test cases

Operational Model: Server-Side Context

SPIKE’s architecture provides full diagnostics where they belong:

Audit logging captures everything:

Server log:
[req_abc123] [SPIFFE: spiffe://example.org/workload/app]
Failed to read secret 'secrets/db/password': permission denied
Policy check failed: path pattern '^secrets/admin/.*' required

Client receives:

{
  "err": "UNAUTHORIZED"
}

Clear separation of concerns:

  • Clients: Get actionable error codes for programmatic handling
  • Operators: Have server access and can see full audit logs with context
  • Authorized users: Can correlate their requests with server logs if needed
  • Unauthorized users: Get nothing useful for reconnaissance

Industry Validation

Security-critical systems follow this pattern:

HashiCorp Vault:

API Response: {"errors":["permission denied"]}
Server logs:  Detailed context with paths, policies, tokens

AWS Secrets Manager:

API Response: Generic error codes
CloudTrail:   Full audit trail with all context

Kubernetes Secrets API:

API Response: Standard error codes
Audit logs:   Complete request/response details

All separate client-facing errors from server-side diagnostics.

Alternatives Considered

Alternative 1: Include Generic Error Messages

Provide generic messages without sensitive details:

{
  "err": "NOT_FOUND",
  "message": "Secret not found"
}

Rejected because:

  • Adds API surface complexity with minimal benefit
  • Generic messages don’t provide actionable information beyond the code
  • Risk of messages accidentally becoming more detailed over time
  • The error code already conveys the same information
  • No clear line between “safe” and “unsafe” detail levels

Alternative 2: Detailed Messages for Authenticated Users

Provide detailed errors only to authenticated, authorized users:

{
  "err": "UNAUTHORIZED",
  "message": "Policy 'db-read' denies access to 'secrets/db/password'"
}

Rejected because:

  • Still risks information leakage (policy names, path details)
  • Adds complexity to determine what details are “safe”
  • Authentication doesn’t mean users should see internal details
  • Creates inconsistent error handling logic
  • Server-side logs already provide this for operators

Alternative 3: Request IDs for Correlation

Include correlation IDs so clients can reference server logs:

{
  "err": "NOT_FOUND",
  "requestId": "req_abc123"
}

Considered acceptable but not required because:

  • SPIKE’s audit logging already provides correlation via SPIFFE ID and timestamp
  • Users with legitimate need for diagnostics have server log access
  • Adding request IDs provides minimal benefit over existing correlation methods
  • Can be added later if operational experience shows clear need
  • Keeping responses minimal is preferred for initial implementation

Status: May be reconsidered based on operational feedback

Consequences

Positive

  • Security by design: Information leakage is prevented at the API layer
  • Enumeration protection: Attackers cannot probe system structure via errors
  • No accidental disclosure: Impossible to leak stack traces or implementation details
  • Clean API surface: Simple, stable error code enumeration
  • Programmatic handling: Clients can reliably switch on error codes
  • Stable interface: Error codes don’t change; messages would
  • Clear security model: “If you’re authorized, the code tells you everything. If you’re not, you get nothing.”

Negative

  • Limited client diagnostics: Clients cannot see detailed error reasons
  • Operator workflow: Users must correlate client errors with server logs for debugging
  • Learning curve: New users might expect more detailed error messages
  • Script debugging: Wrapper scripts get less information for error handling

Neutral

  • Consistent with design: SPIKE already has comprehensive audit logging
  • Expected for security systems: Users familiar with Vault, etc., expect this pattern
  • Operational requirement: Operators need server access anyway for secret management

Implementation Details

Response Structure

All error responses follow this structure:

type ErrorResponse struct {
    Err data.ErrorCode `json:"err"`
    // No message, details, or stack trace fields
}

Error Codes

Standard error codes returned:

HTTP StatusError CodeMeaning
200nullSuccess
400BAD_REQUESTInvalid request format or parameters
401UNAUTHORIZEDAuthentication or authorization failure
404NOT_FOUNDResource does not exist (or unauthorized)
500INTERNAL_SERVER_ERRORBackend or server-side failure

Note: 404 is used for both “not found” and “not authorized” to prevent enumeration.

Server-Side Logging

All errors are logged with full context:

// Handler logs detailed context
log.DebugErr(fName, sdkErrors.ErrAPINotFound.Wrap(err))
// Audit trail captures request details
journal.AuditRequest(fName, r, audit, journal.AuditRead)

// Client receives only:
net.Fail(reqres.SecretGetNotFound, w, http.StatusNotFound)

Error Handling Pattern

All route handlers follow this pattern:

func RouteGetSecret(w http.ResponseWriter, r *http.Request,
    audit *journal.AuditEntry) *sdkErrors.SDKError {

    secret, err := state.GetSecret(path, version)
    if err != nil {
        // Server-side: Log full context
        log.DebugErr("RouteGetSecret", err)

        // Client-side: Return only code
        if err.Is(sdkErrors.ErrEntityNotFound) {
            net.Fail(reqres.SecretGetNotFound, w, http.StatusNotFound)
        } else {
            net.Fail(reqres.SecretGetInternal, w,
                http.StatusInternalServerError)
        }
        return err
    }

    // Success response
    net.Success(reqres.SecretGetSuccess, w)
    return nil
}

Client Interpretation

Clients use error codes programmatically:

// Client code
resp, err := nexus.GetSecret(ctx, path, version)
if err != nil {
    switch resp.Err {
    case data.ErrNotFound:
        // Secret doesn't exist or not authorized
    case data.ErrUnauthorized:
        // Authentication failed
    case data.ErrInternal:
        // Server error - retry or escalate
    }
}

Future Enhancements

Correlation IDs

If operational experience shows a clear need, we may add request correlation IDs to API responses:

{
  "err": "NOT_FOUND",
  "requestId": "req_abc123"
}

Benefits:

  • Users can reference specific requests when asking operators for help
  • Operators can quickly locate relevant log entries
  • No security information is leaked (ID is opaque)
  • Improves support workflow without compromising security

Current status:

  • Not implemented in initial version
  • Existing correlation via SPIFFE ID and timestamp is sufficient
  • Will reconsider based on operational feedback and support burden
  • Can be added non-breaking if needed

Evaluation criteria:

  • Frequency of users needing operator assistance for error diagnosis
  • Time spent by operators correlating client errors with server logs
  • User feedback on debugging difficulty
  • Comparison with alternative approaches (timestamp-based correlation, SPIFFE ID filtering)

References