ADR-0027: Separate Audit Logs from Operational Logs
- Status: accepted
- Date: 2025-11-13
- Tags: Security, Compliance, Observability, Kubernetes
Context and Problem Statement
Currently, SPIKE sends both audit logs and operational logs to stdout without differentiation. This creates challenges for:
- Compliance requirements that mandate immutable audit trails with specific retention policies
- Security teams needing to route audit events to SIEM systems
- Different access controls between audit and operational logs
- Performance optimization as audit logs have different characteristics than operational logs
We need to determine the most effective way to separate audit logs from operational logs while maintaining simplicity and Kubernetes-native practices.
Decision Drivers
- Compliance requirements: Audit logs often need years of retention versus days/weeks for operational logs
- Security isolation: Audit logs require stricter access controls and tamper-evident storage
- Operational simplicity: Solution should work seamlessly in Kubernetes environments
- Performance considerations: Different log volumes and processing requirements
- Integration flexibility: Easy routing to different backends (SIEM versus observability stacks)
Current Implementation
SPIKE currently implements a basic audit logging system that outputs to stdout alongside operational logs. The implementation consists of:
Architecture
-
Wrapper-Based Auditing (
internal/net/handle.go):HandleRoute()wraps all HTTP handlers with audit logging- Automatically creates an
AuditEntryfor each request - Logs two events per request: entry (
AuditEnter) and exit (AuditExit) - Tracks request duration and completion state (
AuditSuccessorAuditErrored) - Generates unique trail IDs using
crypto.ID()
-
Route-Level Auditing (e.g.,
app/keeper/internal/route/store/contribute.go):- Each route handler receives an
*journal.AuditEntryparameter journal.AuditRequest()logs specific actions (create, read, delete, list, etc.)- Updates the audit entry with component name, path, resource, and action
- Provides fine-grained operation tracking within the request lifecycle
- Each route handler receives an
-
Audit Entry Structure (
internal/journal/audit.go):type AuditEntry struct { Component string // Component that performed the action TrailID string // Unique trail identifier Timestamp time.Time // When the action occurred UserID string // User identifier Action AuditAction // Operation performed Path string // URL path Resource string // Resource acted upon (query params) SessionID string // Session identifier State AuditState // Entry state (created/success/errored) Err string // Error message if failed Duration time.Duration // Request processing time } -
Output Mechanism:
journal.Audit()marshals entries to JSON- Outputs to stdout via
fmt.Println() - Crashes with
log.FatalLn()if JSON marshaling fails (fail-secure)
Audit Actions
The system defines specific audit actions:
AuditEnter/AuditExit: Request lifecycleAuditCreate: Resource creationAuditRead: Resource retrievalAuditList: Resource listingAuditDelete: Resource deletionAuditUndelete: Resource restorationAuditFallback: Undefined route accessAuditBlocked: Blocked/unauthorized access
Current Limitations
- No separation: Audit logs mix with operational logs on stdout
- No tamper detection: Events lack HMAC signatures
- No guaranteed delivery: Uses stdout without delivery confirmation
- Limited metadata: Missing SPIFFE ID, source IP, and other security context
- Single output: Cannot route to multiple destinations simultaneously
Considered Options
- Use stderr for audit logs (stdout for operational)
- Structured logging with type field (both to stdout)
- Dedicated audit sidecar pattern
- Direct audit system integration (separate API calls)
- Pluggable audit devices (Vault-style architecture)
Decision
Implement a two-phase approach:
Phase 1 (Immediate): Use stderr for audit logs while keeping operational logs on stdout, with structured JSON format and clear prefixes.
Phase 2 (Future): Evolve to pluggable audit devices.
Rationale
Phase 1 Justification
- Immediate value: Can be implemented quickly with minimal changes
- Kubernetes-native: Works with existing log collectors (Fluentd/Fluent Bit)
- Clear separation: File descriptors provide OS-level isolation
Phase 2 Justification
- Enterprise readiness: Matches proven patterns for log collection and routing
- Flexibility: Supports file, socket, syslog, and custom backends
- Guaranteed delivery: Can implement blocking behavior when audit fails
- Compliance: Better suits enterprise audit requirements
Why Not Other Options
- Structured logging only: Doesn’t provide strong enough separation for compliance
- Sidecar pattern: Adds complexity without clear benefits over stderr approach
- Direct integration only: Less flexible, harder to adapt to different environments
Implementation Details
Phase 1 Implementation
Phase 1 builds on the existing audit infrastructure by redirecting audit
output to stderr while enhancing the AuditEntry structure:
// Enhanced audit entry for Phase 1
type AuditEntry struct {
// Existing fields (keep current structure)
Component string
TrailID string
Timestamp time.Time
UserID string
Action AuditAction
Path string
Resource string
SessionID string
State AuditState
Err string
Duration time.Duration
// New fields for Phase 1
SPIFFEID string `json:"spiffe_id,omitempty"`
SourceIP string `json:"src_ip,omitempty"`
Signature string `json:"sig,omitempty"` // HMAC for tamper detection
}
// Modified audit logger for Phase 1
func Audit(entry AuditEntry) {
audit := AuditLogLine{
Timestamp: time.Now(),
AuditEntry: entry,
}
body, err := json.Marshal(audit)
if err != nil {
logger.FatalLn("Audit",
"message", "Problem marshalling audit entry",
"err", err.Error())
return
}
// Change from fmt.Println() to stderr output
fmt.Fprintf(os.Stderr, "%s\n", string(body))
}
// Operational logs continue to stdout via logger.Log()
Key Changes from Current Implementation:
- Change
fmt.Println()tofmt.Fprintf(os.Stderr, ...)injournal.Audit() - Add SPIFFE ID field (extract from request context in
HandleRoute()) - Add source IP field (extract from
http.Request) - Optional HMAC signing for tamper detection
Phase 2 Architecture
Phase 2 extends the audit system with pluggable devices while maintaining
backward compatibility with the existing journal.Audit() interface:
// Pluggable audit device interface (future)
type AuditDevice interface {
Write(entry AuditEntry) error
Close() error
}
type AuditManager struct {
devices []AuditDevice
blocking bool // If true, operations fail when audit fails
hmacKey []byte
}
// Modified Audit function to support multiple devices
func Audit(entry AuditEntry) {
// Existing stderr output (backward compatibility)
audit := AuditLogLine{
Timestamp: time.Now(),
AuditEntry: entry,
}
body, err := json.Marshal(audit)
if err != nil {
logger.FatalLn("Audit",
"message", "Problem marshalling audit entry",
"err", err.Error())
return
}
fmt.Fprintf(os.Stderr, "%s\n", string(body))
// New: Write to pluggable devices
if auditManager != nil {
for _, device := range auditManager.devices {
if err := device.Write(entry); err != nil {
if auditManager.blocking {
logger.FatalLn("Audit",
"message", "Critical audit device failure",
"err", err.Error())
}
logger.Log().Warn("Audit",
"message", "Audit device write failed",
"err", err.Error())
}
}
}
}
// Device implementations
type FileAuditDevice struct { /* ... */ }
type SocketAuditDevice struct { /* ... */ }
type SyslogAuditDevice struct { /* ... */ }
Migration Path
Current → Phase 1:
- Modify
internal/journal/audit.go:- Change
fmt.Println()tofmt.Fprintf(os.Stderr, ...) - Add SPIFFE ID and SourceIP fields to
AuditEntry
- Change
- Modify
internal/net/handle.go:- Extract SPIFFE ID from request context
- Extract source IP from
http.Request.RemoteAddr - Populate new fields in
AuditEntry
- Update Kubernetes log collectors to route stderr separately
- Optional: Implement HMAC signing for tamper detection
Phase 1 → Phase 2:
- Create
AuditDeviceinterface and implementations - Add
AuditManagerinitialization in service startup code - Modify
journal.Audit()to write to configured devices - Add configuration for audit device selection and options
- Maintain stderr output for backward compatibility
Sample Kubernetes Configuration
# Fluentd/Fluent Bit routing based on stream
<source>
@type tail
path /var/log/containers/*.log
<parse>
@type multi_format
<pattern>
format regexp
expression /^(?<time>.+) (?<stream>stdout|stderr) (?<log>.*)$/
</pattern>
</parse>
</source>
<filter **>
@type record_transformer
<record>
log_type ${tag_parts[0] == 'stderr' ? 'audit' : 'operational'}
</record>
</filter>
<match audit.**>
@type elasticsearch
index_name audit-logs
# Immutable index settings
</match>
Implementation Status
What Works Today (Current)
- ✅ Structured audit logging with
AuditEntryandAuditLogLine - ✅ Automatic request lifecycle tracking (enter/exit)
- ✅ Wrapper-based auditing via
HandleRoute() - ✅ Route-level audit actions (create, read, delete, list, etc.)
- ✅ Unique trail IDs for request correlation
- ✅ Request duration tracking
- ✅ Success/error state tracking
- ✅ JSON-formatted output
- ✅ Fail-secure behavior (crashes on marshal failure)
What Needs Implementation
-
❌ Phase 1:
- Separation of audit logs to stderr
- SPIFFE ID capture in audit entries
- Source IP capture in audit entries
- HMAC signatures for tamper detection
- Kubernetes log routing configuration examples
-
❌ Phase 2:
- Pluggable audit device interface
- File, socket, and syslog device implementations
- Blocking/non-blocking device behavior configuration
- Multi-destination audit delivery
- Guaranteed delivery mechanisms
Consequences
Positive
- Existing foundation: Current implementation provides solid base for enhancement
- Proven patterns: Wrapper-based and route-level auditing work well
- Immediate compliance improvement: Phase 1 separation enables better audit trail management
- Simple migration path: Changes build incrementally on existing code
- Kubernetes-friendly: Works with existing tooling
- Future-proof: Phase 2 provides enterprise-grade capabilities
- SPIFFE integration: Natural fit with existing SPIFFE-based auth
- Tamper detection: Optional HMAC signatures on audit events
Negative
- Current mixing: Audit and operational logs currently indistinguishable on stdout
- Missing context: SPIFFE ID and source IP not currently captured
- Two-phase complexity: Requires planning for migration
- Stderr convention: Some tools expect only errors on stderr
- Configuration overhead: More complex log routing rules
- Potential performance impact: Audit devices could block operations
References
External
- SPIFFE Audit Considerations: https://spiffe.io/docs/latest/planning/audit/
- Kubernetes Logging Architecture: https://kubernetes.io/docs/concepts/cluster-administration/logging/
Current Implementation Code
internal/journal/audit.go- Core audit entry structure and logginginternal/net/handle.go- HTTP handler wrapper with automatic auditingapp/keeper/internal/route/store/contribute.go- Example of route-level auditingapp/nexus/internal/route/secret/delete.go- Example of operation-specific audit actionsapp/nexus/internal/route/acl/policy/create.go- Example of policy operation auditing