ADR-0030: Minimal Error Messages in API Responses
- Status: accepted
- Date: 2025-11-22
- Tags: Security, API Design, Error Handling, Information Disclosure
Context
SPIKE Nexus exposes an HTTP API that workloads use to interact with the secret management system. When operations fail, the API must communicate errors to clients. The design of error responses involves a fundamental trade-off between security and diagnostics:
- Detailed error messages provide rich diagnostic information to clients, making debugging easier, but risk information leakage
- Minimal error messages provide only error codes, maintaining security but offering limited client-side diagnostics
For a security-critical secret management system, we need to determine the appropriate balance between these concerns.
Decision
SPIKE Nexus API responses will return error codes only, with no descriptive error messages to clients.
Specifically:
- API responses contain only structured error codes (e.g.,
NOT_FOUND,UNAUTHORIZED,BAD_REQUEST) - No additional error message fields, stack traces, or diagnostic information
- All detailed error context is logged server-side with full audit trail
- Clients must interpret errors based solely on:
- HTTP status codes (404, 401, 400, 500)
- Structured error code enumerations
- Request parameters they provided
Rationale
Security: Preventing Information Leakage
Error messages can reveal sensitive information about the system:
Path existence enumeration:
❌ "Secret 'secrets/admin/root-password' not found"
→ Reveals path structure even when denied
✓ NOT_FOUND
→ Reveals nothing about whether path exists or is unauthorized
Permission structure disclosure:
❌ "Permission denied for path 'secrets/database'"
→ Leaks information about permission boundaries
✓ UNAUTHORIZED
→ No information about what paths exist or their structure
Implementation details:
❌ "Database query failed: table 'secrets' locked"
→ Leaks internal architecture details
✓ INTERNAL_SERVER_ERROR
→ No information about internal implementation
Stack traces (the “one bad commit” risk):
❌ Adding an ErrMsg field creates risk of accidentally including:
- File paths
- Internal function names
- SQL queries
- Configuration details
✓ No message field = no risk of accidental disclosure
Defense Against Enumeration Attacks
Minimal errors prevent attackers from probing the system:
| Attack Vector | Detailed Messages | Minimal Codes |
|---|---|---|
| Path enumeration | “Path X not found” vs “Path Y unauthorized” reveals valid paths | All failures return same code |
| Permission probing | Messages reveal permission boundaries | No distinction between not-found and unauthorized |
| Version detection | Stack traces reveal library versions | No version information leaked |
| Schema discovery | Error messages reveal data structure | No schema information exposed |
API Design: Clean and Stable
Error codes provide better API stability than messages:
Programmatic handling:
// Client can reliably handle specific errors
switch response.Err {
case data.ErrNotFound:
// Handle missing secret
case data.ErrUnauthorized:
// Handle permission denied
}
No versioning issues:
- Error codes remain stable across versions
- No message format changes breaking clients
- No localization complexity
- Consistent parsing and handling
Testability:
- Deterministic error codes are easy to test
- No string matching or regex required
- Clear expected outcomes in test cases
Operational Model: Server-Side Context
SPIKE’s architecture provides full diagnostics where they belong:
Audit logging captures everything:
Server log:
[req_abc123] [SPIFFE: spiffe://example.org/workload/app]
Failed to read secret 'secrets/db/password': permission denied
Policy check failed: path pattern '^secrets/admin/.*' required
Client receives:
{
"err": "UNAUTHORIZED"
}
Clear separation of concerns:
- Clients: Get actionable error codes for programmatic handling
- Operators: Have server access and can see full audit logs with context
- Authorized users: Can correlate their requests with server logs if needed
- Unauthorized users: Get nothing useful for reconnaissance
Industry Validation
Security-critical systems follow this pattern:
HashiCorp Vault:
API Response: {"errors":["permission denied"]}
Server logs: Detailed context with paths, policies, tokens
AWS Secrets Manager:
API Response: Generic error codes
CloudTrail: Full audit trail with all context
Kubernetes Secrets API:
API Response: Standard error codes
Audit logs: Complete request/response details
All separate client-facing errors from server-side diagnostics.
Alternatives Considered
Alternative 1: Include Generic Error Messages
Provide generic messages without sensitive details:
{
"err": "NOT_FOUND",
"message": "Secret not found"
}
Rejected because:
- Adds API surface complexity with minimal benefit
- Generic messages don’t provide actionable information beyond the code
- Risk of messages accidentally becoming more detailed over time
- The error code already conveys the same information
- No clear line between “safe” and “unsafe” detail levels
Alternative 2: Detailed Messages for Authenticated Users
Provide detailed errors only to authenticated, authorized users:
{
"err": "UNAUTHORIZED",
"message": "Policy 'db-read' denies access to 'secrets/db/password'"
}
Rejected because:
- Still risks information leakage (policy names, path details)
- Adds complexity to determine what details are “safe”
- Authentication doesn’t mean users should see internal details
- Creates inconsistent error handling logic
- Server-side logs already provide this for operators
Alternative 3: Request IDs for Correlation
Include correlation IDs so clients can reference server logs:
{
"err": "NOT_FOUND",
"requestId": "req_abc123"
}
Considered acceptable but not required because:
- SPIKE’s audit logging already provides correlation via SPIFFE ID and timestamp
- Users with legitimate need for diagnostics have server log access
- Adding request IDs provides minimal benefit over existing correlation methods
- Can be added later if operational experience shows clear need
- Keeping responses minimal is preferred for initial implementation
Status: May be reconsidered based on operational feedback
Consequences
Positive
- Security by design: Information leakage is prevented at the API layer
- Enumeration protection: Attackers cannot probe system structure via errors
- No accidental disclosure: Impossible to leak stack traces or implementation details
- Clean API surface: Simple, stable error code enumeration
- Programmatic handling: Clients can reliably switch on error codes
- Stable interface: Error codes don’t change; messages would
- Clear security model: “If you’re authorized, the code tells you everything. If you’re not, you get nothing.”
Negative
- Limited client diagnostics: Clients cannot see detailed error reasons
- Operator workflow: Users must correlate client errors with server logs for debugging
- Learning curve: New users might expect more detailed error messages
- Script debugging: Wrapper scripts get less information for error handling
Neutral
- Consistent with design: SPIKE already has comprehensive audit logging
- Expected for security systems: Users familiar with Vault, etc., expect this pattern
- Operational requirement: Operators need server access anyway for secret management
Implementation Details
Response Structure
All error responses follow this structure:
type ErrorResponse struct {
Err data.ErrorCode `json:"err"`
// No message, details, or stack trace fields
}
Error Codes
Standard error codes returned:
| HTTP Status | Error Code | Meaning |
|---|---|---|
| 200 | null | Success |
| 400 | BAD_REQUEST | Invalid request format or parameters |
| 401 | UNAUTHORIZED | Authentication or authorization failure |
| 404 | NOT_FOUND | Resource does not exist (or unauthorized) |
| 500 | INTERNAL_SERVER_ERROR | Backend or server-side failure |
Note: 404 is used for both “not found” and “not authorized” to prevent enumeration.
Server-Side Logging
All errors are logged with full context:
// Handler logs detailed context
log.DebugErr(fName, sdkErrors.ErrAPINotFound.Wrap(err))
// Audit trail captures request details
journal.AuditRequest(fName, r, audit, journal.AuditRead)
// Client receives only:
net.Fail(reqres.SecretGetNotFound, w, http.StatusNotFound)
Error Handling Pattern
All route handlers follow this pattern:
func RouteGetSecret(w http.ResponseWriter, r *http.Request,
audit *journal.AuditEntry) *sdkErrors.SDKError {
secret, err := state.GetSecret(path, version)
if err != nil {
// Server-side: Log full context
log.DebugErr("RouteGetSecret", err)
// Client-side: Return only code
if err.Is(sdkErrors.ErrEntityNotFound) {
net.Fail(reqres.SecretGetNotFound, w, http.StatusNotFound)
} else {
net.Fail(reqres.SecretGetInternal, w,
http.StatusInternalServerError)
}
return err
}
// Success response
net.Success(reqres.SecretGetSuccess, w)
return nil
}
Client Interpretation
Clients use error codes programmatically:
// Client code
resp, err := nexus.GetSecret(ctx, path, version)
if err != nil {
switch resp.Err {
case data.ErrNotFound:
// Secret doesn't exist or not authorized
case data.ErrUnauthorized:
// Authentication failed
case data.ErrInternal:
// Server error - retry or escalate
}
}
Future Enhancements
Correlation IDs
If operational experience shows a clear need, we may add request correlation IDs to API responses:
{
"err": "NOT_FOUND",
"requestId": "req_abc123"
}
Benefits:
- Users can reference specific requests when asking operators for help
- Operators can quickly locate relevant log entries
- No security information is leaked (ID is opaque)
- Improves support workflow without compromising security
Current status:
- Not implemented in initial version
- Existing correlation via SPIFFE ID and timestamp is sufficient
- Will reconsider based on operational feedback and support burden
- Can be added non-breaking if needed
Evaluation criteria:
- Frequency of users needing operator assistance for error diagnosis
- Time spent by operators correlating client errors with server logs
- User feedback on debugging difficulty
- Comparison with alternative approaches (timestamp-based correlation, SPIFFE ID filtering)
References
- OWASP: Information Exposure Through Error Messages
- CWE-209: Information Exposure Through an Error Message
- NIST SP 800-53: Security and Privacy Controls (SI-11: Error Handling)
- HashiCorp Vault API documentation
Related ADRs
- ADR-0028: Use Human-Readable Error Messages in CLI Tools (different audience: humans vs. API clients)
- ADR-0029: Restrict Recovery and Restoration Operations to SPIKE Pilot (related security-critical design decision)
- ADR-0032: Standard 12-Byte Nonce Size for AES-GCM
- ADR-0031: AST-Based Test Enforcement for Route Guard Functions
- ADR-0030: Minimal Error Messages in API Responses
- ADR-0029: Restrict Recovery and Restoration Operations to SPIKE Pilot
- ADR-0028: Use Human-Readable Error Messages in CLI Tools
- ADR-0027: Separate Audit Logs from Operational Logs
- ADR-0026: Configurable Data Directory for SPIKE Components
- ADR-0025: Path Patterns as Key Namespaces with Regular Expression Matching
- ADR-0024: Transition from In-Memory Cache to Direct Backend Storage for High Availability
- ADR-0023: Decision Against Implementing Lock/Unlock Mechanism in SPIKE Nexus
- ADR-0022: Continuous Polling of SPIKE Keepers Despite 404 Response
- ADR-0021: SPIKE Keeper as a Stateless Shard Holder
- ADR-0020: Switch to Zola for Documentation System
- ADR-0019: Plugin-Based Storage Backend Architecture
- ADR-0018: Administrative Access to SPIKE
- ADR-0017: Synchronous Persistence for SPIKE Secrets Store
- ADR-0016: Memory-First Secrets Store
- ADR-0015: Use Singular Form for File and Package Naming
- ADR-0014: Maintaining SQLite as SPIKE’s Primary Storage Backend
- ADR-0013: S3-Compatible Storage as SPIKE’s Backing Store
- ADR-0012: HTTP Methods for SPIKE API
- ADR-0011: PostgreSQL as SPIKE’s Backing Store
- ADR-0010: Session Token Storage Strategy for SPIKE Nexus
- ADR-0009: Multi-Administrator Support System
- ADR-0008: Administrative Access Control System
- ADR-0007: Root Key Lifecycle and Management Strategy
- ADR-0006: Trust Boundary Definition and Security Assumptions
- ADR-0005: Use SPIFFE mTLS for Inter-Component Authentication and Communication
- ADR-0004: SPIKE Keeper Minimalist Design Approach
- ADR-0003: Root Key Management and Storage Strategy
- ADR-0002: Use Docsify for Documentation System
- ADR-0001: Display Secrets in Plain Text in SPIKE Pilot Admin CLI