Waiting for input...
Star SPIKE on GitHub

ADR-0027: Separate Audit Logs from Operational Logs

  • Status: accepted
  • Date: 2025-11-13
  • Tags: Security, Compliance, Observability, Kubernetes

Context and Problem Statement

Currently, SPIKE sends both audit logs and operational logs to stdout without differentiation. This creates challenges for:

  • Compliance requirements that mandate immutable audit trails with specific retention policies
  • Security teams needing to route audit events to SIEM systems
  • Different access controls between audit and operational logs
  • Performance optimization as audit logs have different characteristics than operational logs

We need to determine the most effective way to separate audit logs from operational logs while maintaining simplicity and Kubernetes-native practices.

Decision Drivers

  • Compliance requirements: Audit logs often need years of retention versus days/weeks for operational logs
  • Security isolation: Audit logs require stricter access controls and tamper-evident storage
  • Operational simplicity: Solution should work seamlessly in Kubernetes environments
  • Performance considerations: Different log volumes and processing requirements
  • Integration flexibility: Easy routing to different backends (SIEM versus observability stacks)

Current Implementation

SPIKE currently implements a basic audit logging system that outputs to stdout alongside operational logs. The implementation consists of:

Architecture

  1. Wrapper-Based Auditing (internal/net/handle.go):

    • HandleRoute() wraps all HTTP handlers with audit logging
    • Automatically creates an AuditEntry for each request
    • Logs two events per request: entry (AuditEnter) and exit (AuditExit)
    • Tracks request duration and completion state (AuditSuccess or AuditErrored)
    • Generates unique trail IDs using crypto.ID()
  2. Route-Level Auditing (e.g., app/keeper/internal/route/store/contribute.go):

    • Each route handler receives an *journal.AuditEntry parameter
    • journal.AuditRequest() logs specific actions (create, read, delete, list, etc.)
    • Updates the audit entry with component name, path, resource, and action
    • Provides fine-grained operation tracking within the request lifecycle
  3. Audit Entry Structure (internal/journal/audit.go):

    type AuditEntry struct {
        Component  string        // Component that performed the action
        TrailID    string        // Unique trail identifier
        Timestamp  time.Time     // When the action occurred
        UserID     string        // User identifier
        Action     AuditAction   // Operation performed
        Path       string        // URL path
        Resource   string        // Resource acted upon (query params)
        SessionID  string        // Session identifier
        State      AuditState    // Entry state (created/success/errored)
        Err        string        // Error message if failed
        Duration   time.Duration // Request processing time
    }
    
  4. Output Mechanism:

    • journal.Audit() marshals entries to JSON
    • Outputs to stdout via fmt.Println()
    • Crashes with log.FatalLn() if JSON marshaling fails (fail-secure)

Audit Actions

The system defines specific audit actions:

  • AuditEnter / AuditExit: Request lifecycle
  • AuditCreate: Resource creation
  • AuditRead: Resource retrieval
  • AuditList: Resource listing
  • AuditDelete: Resource deletion
  • AuditUndelete: Resource restoration
  • AuditFallback: Undefined route access
  • AuditBlocked: Blocked/unauthorized access

Current Limitations

  1. No separation: Audit logs mix with operational logs on stdout
  2. No tamper detection: Events lack HMAC signatures
  3. No guaranteed delivery: Uses stdout without delivery confirmation
  4. Limited metadata: Missing SPIFFE ID, source IP, and other security context
  5. Single output: Cannot route to multiple destinations simultaneously

Considered Options

  1. Use stderr for audit logs (stdout for operational)
  2. Structured logging with type field (both to stdout)
  3. Dedicated audit sidecar pattern
  4. Direct audit system integration (separate API calls)
  5. Pluggable audit devices (Vault-style architecture)

Decision

Implement a two-phase approach:

Phase 1 (Immediate): Use stderr for audit logs while keeping operational logs on stdout, with structured JSON format and clear prefixes.

Phase 2 (Future): Evolve to pluggable audit devices.

Rationale

Phase 1 Justification

  • Immediate value: Can be implemented quickly with minimal changes
  • Kubernetes-native: Works with existing log collectors (Fluentd/Fluent Bit)
  • Clear separation: File descriptors provide OS-level isolation

Phase 2 Justification

  • Enterprise readiness: Matches proven patterns for log collection and routing
  • Flexibility: Supports file, socket, syslog, and custom backends
  • Guaranteed delivery: Can implement blocking behavior when audit fails
  • Compliance: Better suits enterprise audit requirements

Why Not Other Options

  • Structured logging only: Doesn’t provide strong enough separation for compliance
  • Sidecar pattern: Adds complexity without clear benefits over stderr approach
  • Direct integration only: Less flexible, harder to adapt to different environments

Implementation Details

Phase 1 Implementation

Phase 1 builds on the existing audit infrastructure by redirecting audit output to stderr while enhancing the AuditEntry structure:

// Enhanced audit entry for Phase 1
type AuditEntry struct {
    // Existing fields (keep current structure)
    Component  string
    TrailID    string
    Timestamp  time.Time
    UserID     string
    Action     AuditAction
    Path       string
    Resource   string
    SessionID  string
    State      AuditState
    Err        string
    Duration   time.Duration

    // New fields for Phase 1
    SPIFFEID  string `json:"spiffe_id,omitempty"`
    SourceIP  string `json:"src_ip,omitempty"`
    Signature string `json:"sig,omitempty"` // HMAC for tamper detection
}

// Modified audit logger for Phase 1
func Audit(entry AuditEntry) {
    audit := AuditLogLine{
        Timestamp:  time.Now(),
        AuditEntry: entry,
    }

    body, err := json.Marshal(audit)
    if err != nil {
        logger.FatalLn("Audit",
            "message", "Problem marshalling audit entry",
            "err", err.Error())
        return
    }

    // Change from fmt.Println() to stderr output
    fmt.Fprintf(os.Stderr, "%s\n", string(body))
}

// Operational logs continue to stdout via logger.Log()

Key Changes from Current Implementation:

  1. Change fmt.Println() to fmt.Fprintf(os.Stderr, ...) in journal.Audit()
  2. Add SPIFFE ID field (extract from request context in HandleRoute())
  3. Add source IP field (extract from http.Request)
  4. Optional HMAC signing for tamper detection

Phase 2 Architecture

Phase 2 extends the audit system with pluggable devices while maintaining backward compatibility with the existing journal.Audit() interface:

// Pluggable audit device interface (future)
type AuditDevice interface {
    Write(entry AuditEntry) error
    Close() error
}

type AuditManager struct {
    devices  []AuditDevice
    blocking bool // If true, operations fail when audit fails
    hmacKey  []byte
}

// Modified Audit function to support multiple devices
func Audit(entry AuditEntry) {
    // Existing stderr output (backward compatibility)
    audit := AuditLogLine{
        Timestamp:  time.Now(),
        AuditEntry: entry,
    }

    body, err := json.Marshal(audit)
    if err != nil {
        logger.FatalLn("Audit",
            "message", "Problem marshalling audit entry",
            "err", err.Error())
        return
    }

    fmt.Fprintf(os.Stderr, "%s\n", string(body))

    // New: Write to pluggable devices
    if auditManager != nil {
        for _, device := range auditManager.devices {
            if err := device.Write(entry); err != nil {
                if auditManager.blocking {
                    logger.FatalLn("Audit",
                        "message", "Critical audit device failure",
                        "err", err.Error())
                }
                logger.Log().Warn("Audit",
                    "message", "Audit device write failed",
                    "err", err.Error())
            }
        }
    }
}

// Device implementations
type FileAuditDevice struct { /* ... */ }
type SocketAuditDevice struct { /* ... */ }
type SyslogAuditDevice struct { /* ... */ }

Migration Path

Current → Phase 1:

  1. Modify internal/journal/audit.go:
    • Change fmt.Println() to fmt.Fprintf(os.Stderr, ...)
    • Add SPIFFE ID and SourceIP fields to AuditEntry
  2. Modify internal/net/handle.go:
    • Extract SPIFFE ID from request context
    • Extract source IP from http.Request.RemoteAddr
    • Populate new fields in AuditEntry
  3. Update Kubernetes log collectors to route stderr separately
  4. Optional: Implement HMAC signing for tamper detection

Phase 1 → Phase 2:

  1. Create AuditDevice interface and implementations
  2. Add AuditManager initialization in service startup code
  3. Modify journal.Audit() to write to configured devices
  4. Add configuration for audit device selection and options
  5. Maintain stderr output for backward compatibility

Sample Kubernetes Configuration

# Fluentd/Fluent Bit routing based on stream
<source>
  @type tail
  path /var/log/containers/*.log
  <parse>
    @type multi_format
    <pattern>
      format regexp
      expression /^(?<time>.+) (?<stream>stdout|stderr) (?<log>.*)$/
    </pattern>
  </parse>
</source>

<filter **>
  @type record_transformer
  <record>
    log_type ${tag_parts[0] == 'stderr' ? 'audit' : 'operational'}
  </record>
</filter>

<match audit.**>
  @type elasticsearch
  index_name audit-logs
  # Immutable index settings
</match>

Implementation Status

What Works Today (Current)

  • ✅ Structured audit logging with AuditEntry and AuditLogLine
  • ✅ Automatic request lifecycle tracking (enter/exit)
  • ✅ Wrapper-based auditing via HandleRoute()
  • ✅ Route-level audit actions (create, read, delete, list, etc.)
  • ✅ Unique trail IDs for request correlation
  • ✅ Request duration tracking
  • ✅ Success/error state tracking
  • ✅ JSON-formatted output
  • ✅ Fail-secure behavior (crashes on marshal failure)

What Needs Implementation

  • Phase 1:

    • Separation of audit logs to stderr
    • SPIFFE ID capture in audit entries
    • Source IP capture in audit entries
    • HMAC signatures for tamper detection
    • Kubernetes log routing configuration examples
  • Phase 2:

    • Pluggable audit device interface
    • File, socket, and syslog device implementations
    • Blocking/non-blocking device behavior configuration
    • Multi-destination audit delivery
    • Guaranteed delivery mechanisms

Consequences

Positive

  • Existing foundation: Current implementation provides solid base for enhancement
  • Proven patterns: Wrapper-based and route-level auditing work well
  • Immediate compliance improvement: Phase 1 separation enables better audit trail management
  • Simple migration path: Changes build incrementally on existing code
  • Kubernetes-friendly: Works with existing tooling
  • Future-proof: Phase 2 provides enterprise-grade capabilities
  • SPIFFE integration: Natural fit with existing SPIFFE-based auth
  • Tamper detection: Optional HMAC signatures on audit events

Negative

  • Current mixing: Audit and operational logs currently indistinguishable on stdout
  • Missing context: SPIFFE ID and source IP not currently captured
  • Two-phase complexity: Requires planning for migration
  • Stderr convention: Some tools expect only errors on stderr
  • Configuration overhead: More complex log routing rules
  • Potential performance impact: Audit devices could block operations

References

External

  • SPIFFE Audit Considerations: https://spiffe.io/docs/latest/planning/audit/
  • Kubernetes Logging Architecture: https://kubernetes.io/docs/concepts/cluster-administration/logging/

Current Implementation Code

  • internal/journal/audit.go - Core audit entry structure and logging
  • internal/net/handle.go - HTTP handler wrapper with automatic auditing
  • app/keeper/internal/route/store/contribute.go - Example of route-level auditing
  • app/nexus/internal/route/secret/delete.go - Example of operation-specific audit actions
  • app/nexus/internal/route/acl/policy/create.go - Example of policy operation auditing