Log Analysis
Log Analysis
Section titled “Log Analysis”Overview
Section titled “Overview”Platform services emit structured JSON logs with consistent fields. This guide covers techniques for finding and analyzing log entries to debug issues.
Log Format
Section titled “Log Format”Structured JSON Logs
Section titled “Structured JSON Logs”All platform services use structured logging:
{ "level": "info", "timestamp": "2024-01-15T12:00:00.123Z", "service": "user-service", "correlationId": "abc-123-def-456", "message": "Command received", "context": { "commandType": "CreateUserCommand", "userId": "user-123" }}Standard Fields:
level: Log level (debug, info, warn, error, fatal)timestamp: ISO 8601 timestampservice: Service namecorrelationId: Request correlation IDmessage: Human-readable messagecontext: Additional metadata
Viewing Logs
Section titled “Viewing Logs”Docker Logs
Section titled “Docker Logs”View all logs:
docker logs user-serviceFollow logs (real-time):
docker logs -f user-serviceLast N lines:
docker logs --tail 100 user-serviceWith timestamps:
docker logs --timestamps user-serviceSince timestamp:
docker logs --since 2024-01-15T12:00:00 user-serviceAll containers:
docker compose logs -fMultiple services:
docker compose logs -f user-service api-gatewaySearching Logs
Section titled “Searching Logs”Basic Grep
Section titled “Basic Grep”Find errors:
docker logs user-service | grep -i errorFind specific message:
docker logs user-service | grep "Command received"Case-insensitive search:
docker logs user-service | grep -i "database"Multiple patterns (OR):
docker logs user-service | grep -E "error|fail|timeout"Exclude pattern:
docker logs user-service | grep -v "health check"Search by Correlation ID
Section titled “Search by Correlation ID”Most important search pattern:
CORR_ID="abc-123-def-456"docker logs user-service | grep "$CORR_ID"With context (lines before/after):
docker logs user-service | grep -C 5 "$CORR_ID"All services:
docker ps --format "{{.Names}}" | xargs -I {} sh -c "echo '=== {} ===' && docker logs {} 2>&1 | grep '$CORR_ID'"Search by Log Level
Section titled “Search by Log Level”Errors only:
docker logs user-service | grep '"level":"error"'Warnings and errors:
docker logs user-service | grep -E '"level":"(error|warn)"'Debug logs:
docker logs user-service | grep '"level":"debug"'Search by Time Range
Section titled “Search by Time Range”Last hour:
docker logs --since 1h user-serviceSpecific time range:
# Since specific timedocker logs --since 2024-01-15T12:00:00 user-service
# Between two timesdocker logs --since 2024-01-15T12:00:00 --until 2024-01-15T13:00:00 user-serviceDuring deployment:
# Get deployment timeDEPLOY_TIME="2024-01-15T14:30:00"
# Logs after deploymentdocker logs --since $DEPLOY_TIME user-service | grep -i errorSearch by Context Fields
Section titled “Search by Context Fields”Find specific user:
docker logs user-service | grep '"userId":"user-123"'Find specific command:
docker logs user-service | grep '"commandType":"CreateUserCommand"'Find handler execution:
docker logs user-service | grep '"handlerName":"CreateUserHandler"'Advanced Techniques
Section titled “Advanced Techniques”Count Occurrences
Section titled “Count Occurrences”Count errors:
docker logs user-service | grep -c "error"Count by error type:
docker logs user-service | grep error | sort | uniq -c | sort -rnExample output:
42 DatabaseConnectionError 15 ValidationError 8 TimeoutError 3 UnknownErrorExtract Correlation IDs
Section titled “Extract Correlation IDs”Get all correlation IDs from errors:
docker logs user-service | grep error | grep -o '"correlationId":"[^"]*"' | cut -d'"' -f4Unique correlation IDs:
docker logs user-service | grep error | grep -o '"correlationId":"[^"]*"' | cut -d'"' -f4 | sort -uTimeline Analysis
Section titled “Timeline Analysis”Show timestamps for errors:
docker logs --timestamps user-service | grep error | cut -d' ' -f1-2Group errors by minute:
docker logs --timestamps user-service | grep error | cut -d':' -f1-2 | uniq -cExample output:
5 2024-01-15 12:00 12 2024-01-15 12:01 ← Spike! 3 2024-01-15 12:02 2 2024-01-15 12:03Parse JSON Logs
Section titled “Parse JSON Logs”Using jq (if logs are valid JSON):
# Pretty print logsdocker logs user-service | jq
# Extract specific fielddocker logs user-service | jq -r '.message'
# Filter by leveldocker logs user-service | jq 'select(.level == "error")'
# Extract errors with contextdocker logs user-service | jq 'select(.level == "error") | {timestamp, message, correlationId}'
# Count errors by typedocker logs user-service | jq -r 'select(.level == "error") | .context.errorType' | sort | uniq -cNote: Some logs may not be pure JSON due to Docker formatting. Use grep for reliable searching.
Export Logs
Section titled “Export Logs”Save to file:
docker logs user-service > user-service.logSave with timestamps:
docker logs --timestamps user-service > user-service.logSave errors only:
docker logs user-service | grep error > errors.logSave specific correlation ID:
CORR_ID="abc-123"docker logs user-service | grep "$CORR_ID" > request-$CORR_ID.logCommon Search Patterns
Section titled “Common Search Patterns”Find Handler Errors
Section titled “Find Handler Errors”# Find which handler faileddocker logs user-service | grep error | grep handler
# Extract handler names from errorsdocker logs user-service | grep error | grep -o '"handlerName":"[^"]*"' | sort | uniq -cFind Database Errors
Section titled “Find Database Errors”# Find database-related errorsdocker logs user-service | grep -i "database\|postgres\|query\|connection"
# Connection errors specificallydocker logs user-service | grep -i "connection refused\|connection timeout\|connection lost"Find Authentication Issues
Section titled “Find Authentication Issues”# Find auth errorsdocker logs api-gateway | grep -i "auth\|unauthorized\|forbidden\|jwt"
# Find specific auth failuresdocker logs api-gateway | grep '"status":401'docker logs api-gateway | grep '"status":403'Find Message Bus Issues
Section titled “Find Message Bus Issues”# Find RabbitMQ errorsdocker logs user-service | grep -i "rabbitmq\|amqp\|message bus"
# Find message routing errorsdocker logs user-service | grep -i "handler not found\|routing"Find Performance Issues
Section titled “Find Performance Issues”# Find slow operationsdocker logs user-service | grep -i "slow\|timeout\|duration"
# Find operations > thresholddocker logs user-service | grep duration | awk '$8 > 1000' # duration in msDebugging Workflows
Section titled “Debugging Workflows”Workflow 1: Investigate Error Report
Section titled “Workflow 1: Investigate Error Report”User reports: “Error creating user”
Steps:
# 1. Get correlation ID from userCORR_ID="abc-123"
# 2. Search all logs for correlation IDdocker ps --format "{{.Names}}" | xargs -I {} sh -c "docker logs {} | grep -q '$CORR_ID' && echo '=== {} ===' && docker logs {} | grep '$CORR_ID'"
# 3. Find error in results# Example output:# === user-service ===# {"level":"error","correlationId":"abc-123","message":"Email already exists"}
# 4. Get more contextdocker logs user-service | grep -C 10 "$CORR_ID"
# 5. Resolution: Tell user email already registeredWorkflow 2: Investigate Service Failure
Section titled “Workflow 2: Investigate Service Failure”Symptom: Service crashed
Steps:
# 1. Check recent errors before crashdocker logs user-service --tail 200 | grep error
# 2. Look for fatal errorsdocker logs user-service | grep fatal
# 3. Check for uncaught exceptionsdocker logs user-service | grep -i "uncaught\|unhandled"
# 4. Check startup logsdocker logs user-service --tail 500 | head -50
# 5. Look for resource issuesdocker logs user-service | grep -i "memory\|out of memory\|heap"Workflow 3: Investigate Performance Degradation
Section titled “Workflow 3: Investigate Performance Degradation”Symptom: Service slow
Steps:
# 1. Check for errors causing retriesdocker logs user-service --since 1h | grep -c error
# 2. Find slow operationsdocker logs user-service --since 1h | grep -i "slow\|duration" | tail -20
# 3. Check database query timesdocker logs user-service --since 1h | grep -i "query.*duration"
# 4. Check for queue buildupdocker logs user-service | grep -i "queue\|backlog"
# 5. Correlate with Jaeger traces for detailed analysisWorkflow 4: Deployment Validation
Section titled “Workflow 4: Deployment Validation”After deployment:
# 1. Check service starteddocker logs user-service --tail 50 | grep -i "started\|listening"
# 2. Check for startup errorsdocker logs user-service --since 5m | grep error
# 3. Check handler discoverydocker logs user-service | grep "Handler discovery"
# 4. Check service registrationdocker logs user-service | grep -i "registered"
# 5. Monitor for errorsdocker logs -f user-service | grep errorLog Level Management
Section titled “Log Level Management”Set Log Level
Section titled “Set Log Level”Environment variable:
services: user-service: environment: - LOG_LEVEL=debug # debug, info, warn, errorRuntime (if supported):
# Some services support runtime log level changescurl -X POST http://localhost:3000/admin/log-level \ -H "Content-Type: application/json" \ -d '{"level":"debug"}'Log Levels
Section titled “Log Levels”DEBUG:
- Most verbose
- All operations logged
- Use for development/troubleshooting
- Not recommended for production (performance impact)
INFO:
- Key operations
- Handler execution
- Service lifecycle
- Default for production
WARN:
- Potential issues
- Degraded performance
- Retries
ERROR:
- Operation failures
- Exceptions
- Require attention
FATAL:
- Service-level failures
- Unrecoverable errors
- Service will likely crash
Best Practices
Section titled “Best Practices”1. Always Search by Correlation ID First
Section titled “1. Always Search by Correlation ID First”Most efficient way to find related logs:
docker logs user-service | grep "$CORR_ID"2. Use Context Lines for Full Picture
Section titled “2. Use Context Lines for Full Picture”# See surrounding contextdocker logs user-service | grep -C 5 "$SEARCH_TERM"3. Save Logs for Analysis
Section titled “3. Save Logs for Analysis”# Don't lose logs when container restartsdocker logs user-service > analysis.log4. Search Multiple Services
Section titled “4. Search Multiple Services”Requests span multiple services:
# Search all services for correlation IDfor service in api-gateway user-service email-service; do echo "=== $service ===" docker logs $service | grep "$CORR_ID"done5. Combine with Jaeger
Section titled “5. Combine with Jaeger”- Find error in logs
- Get correlation ID
- Search Jaeger for visual timeline
- Use both for complete picture
Common Patterns
Section titled “Common Patterns”Startup Logs
Section titled “Startup Logs”# Check service started correctlydocker logs user-service | grep -E "started|listening|ready"Handler Discovery
Section titled “Handler Discovery”# Verify handlers founddocker logs user-service | grep "Handler discovery"Message Processing
Section titled “Message Processing”# Track message flowdocker logs user-service | grep -E "received|processing|completed"Database Operations
Section titled “Database Operations”# Monitor database queriesdocker logs user-service | grep -i "query\|database"Quick Reference
Section titled “Quick Reference”Essential Commands
Section titled “Essential Commands”# View logsdocker logs user-service
# Follow logsdocker logs -f user-service
# Search by correlation IDdocker logs user-service | grep "$CORR_ID"
# Find errorsdocker logs user-service | grep error
# Count errorsdocker logs user-service | grep -c error
# Export logsdocker logs user-service > service.log
# All servicesdocker compose logs -f
# Time rangedocker logs --since 1h user-serviceRelated Documentation
Section titled “Related Documentation”- Correlation ID Tracking - Using correlation IDs
- Jaeger Tracing - Visual trace analysis
- Telemetry Architecture - Logging architecture
Summary
Section titled “Summary”Effective log analysis:
- Start with correlation ID - Most efficient search
- Use grep patterns - Find specific issues quickly
- Add context - Use
-Cflag for surrounding lines - Export for analysis - Save logs to files
- Combine with Jaeger - Logs + traces = complete picture
Master these techniques and you can debug any issue by following the log trail from client request to backend error.