Deployment Guide
Deployment Guide
Section titled “Deployment Guide”Overview
Section titled “Overview”The banyan-core platform deploys as a Docker Compose stack with all services containerized. This guide covers local development, staging, and production deployment strategies.
Local Development
Section titled “Local Development”Quick Start
Section titled “Quick Start”# Clone repositorygit clone https://github.com/your-org/banyan-core.gitcd banyan-core
# Start all servicesdocker compose up
# Verify services are healthydocker compose psDevelopment Workflow
Section titled “Development Workflow”# Start infrastructure onlydocker compose up postgres rabbitmq redis jaeger grafana elasticsearch
# Run services locally for debuggingcd platform/services/api-gatewaypnpm run dev
# Watch mode with hot-reloadpnpm run devStopping Services
Section titled “Stopping Services”# Stop all servicesdocker compose down
# Stop and remove volumes (reset data)docker compose down -v
# Stop specific servicedocker compose stop api-gatewayProduction Deployment
Section titled “Production Deployment”Prerequisites
Section titled “Prerequisites”- Docker Engine: 20.10+
- Docker Compose: 2.0+
- CPU: 4+ cores recommended
- Memory: 8GB+ RAM recommended
- Disk: 50GB+ available space
Environment Configuration
Section titled “Environment Configuration”Create production .env file:
# Production environmentNODE_ENV=production
# Database (use secure credentials)DATABASE_HOST=postgresDATABASE_NAME=eventstoreDATABASE_USER=prod_userDATABASE_PASSWORD=<SECURE_PASSWORD>DATABASE_PORT=5432
# Message BusRABBITMQ_URL=amqp://prod_user:<SECURE_PASSWORD>@rabbitmq:5672
# CacheREDIS_URL=redis://:<SECURE_PASSWORD>@redis:6379
# JWT (use strong secret)JWT_SECRET=<SECURE_RANDOM_SECRET>JWT_EXPIRATION=300
# Auth0 (if using external auth)AUTH0_DOMAIN=your-tenant.us.auth0.comAUTH0_AUDIENCE=https://api.your-domain.comAUTH0_ISSUER=https://your-tenant.us.auth0.com/
# TelemetryJAEGER_ENDPOINT=http://jaeger:4318/v1/traces
# Admin emailADMIN_EMAIL=admin@your-domain.comSecurity Hardening
Section titled “Security Hardening”1. Change Default Credentials
Section titled “1. Change Default Credentials”services: rabbitmq: environment: RABBITMQ_DEFAULT_USER: ${RABBITMQ_USER} RABBITMQ_DEFAULT_PASS: ${RABBITMQ_PASSWORD}
postgres: environment: POSTGRES_USER: ${DATABASE_USER} POSTGRES_PASSWORD: ${DATABASE_PASSWORD}
redis: command: redis-server --requirepass ${REDIS_PASSWORD}2. Enable TLS
Section titled “2. Enable TLS”services: api-gateway: environment: ENABLE_HTTPS: true SSL_CERT_PATH: /certs/server.crt SSL_KEY_PATH: /certs/server.key volumes: - ./certs:/certs:ro3. Network Isolation
Section titled “3. Network Isolation”networks: frontend: driver: bridge backend: driver: bridge internal: true # No external access
services: api-gateway: networks: - frontend # Exposed - backend
postgres: networks: - backend # Internal onlyResource Limits
Section titled “Resource Limits”Configure resource limits for production:
services: api-gateway: deploy: resources: limits: cpus: '2' memory: 2G reservations: cpus: '1' memory: 1G
postgres: deploy: resources: limits: cpus: '2' memory: 4G reservations: cpus: '1' memory: 2G
rabbitmq: deploy: resources: limits: cpus: '2' memory: 2G reservations: cpus: '1' memory: 1G
elasticsearch: environment: ES_JAVA_OPTS: "-Xms2g -Xmx2g" deploy: resources: limits: memory: 4GHealth Checks
Section titled “Health Checks”Ensure all services have health checks:
services: api-gateway: healthcheck: test: ["CMD", "curl", "-f", "http://localhost:3003/health"] interval: 30s timeout: 10s retries: 3 start_period: 40s
postgres: healthcheck: test: ["CMD-SHELL", "pg_isready -U ${DATABASE_USER}"] interval: 10s timeout: 5s retries: 5
rabbitmq: healthcheck: test: ["CMD", "rabbitmq-diagnostics", "ping"] interval: 30s timeout: 10s retries: 5Starting Production
Section titled “Starting Production”# Pull latest imagesdocker compose pull
# Start in detached modedocker compose up -d
# Check service healthdocker compose ps
# View logsdocker compose logs -f
# Follow specific servicedocker compose logs -f api-gatewayHorizontal Scaling
Section titled “Horizontal Scaling”Scaling Services
Section titled “Scaling Services”Scale specific services for load:
# Scale API Gateway to 3 instancesdocker compose up -d --scale api-gateway=3
# Scale business servicedocker compose up -d --scale user-service=5Load Balancing
Section titled “Load Balancing”Use reverse proxy for load balancing:
services: nginx: image: nginx:alpine ports: - "80:80" - "443:443" volumes: - ./nginx.conf:/etc/nginx/nginx.conf:ro depends_on: - api-gateway
api-gateway: deploy: replicas: 3 expose: - "3003"nginx.conf:
upstream api_gateway { least_conn; server api-gateway-1:3003; server api-gateway-2:3003; server api-gateway-3:3003;}
server { listen 80; server_name api.your-domain.com;
location / { proxy_pass http://api_gateway; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; }}Message Bus Considerations
Section titled “Message Bus Considerations”RabbitMQ handles load balancing automatically:
- Commands/Queries: Round-robin to available handlers
- Events: Each subscriber gets copy
- Competing Consumers: Multiple instances share queue
Data Persistence
Section titled “Data Persistence”Backup Strategy
Section titled “Backup Strategy”PostgreSQL Backup
Section titled “PostgreSQL Backup”# Create backupdocker exec flow-platform-postgres pg_dump -U ${DATABASE_USER} eventstore > backup.sql
# Automated daily backups0 2 * * * docker exec flow-platform-postgres pg_dump -U ${DATABASE_USER} eventstore | gzip > /backups/eventstore-$(date +\%Y\%m\%d).sql.gzVolume Backup
Section titled “Volume Backup”# Backup volumesdocker run --rm -v flow-platform-postgres-data:/data -v $(pwd):/backup alpine tar czf /backup/postgres-backup.tar.gz /data
# Restore volumedocker run --rm -v flow-platform-postgres-data:/data -v $(pwd):/backup alpine tar xzf /backup/postgres-backup.tar.gz -C /Disaster Recovery
Section titled “Disaster Recovery”- Stop services:
docker compose down - Restore volumes from backup
- Restore database from SQL dump
- Start services:
docker compose up -d - Verify health:
docker compose ps
Monitoring
Section titled “Monitoring”Production Monitoring
Section titled “Production Monitoring”Grafana Dashboards
Section titled “Grafana Dashboards”Access: http://your-domain:5005
Key Dashboards:
- Service health and performance
- Message bus metrics
- Database performance
- Error rates and trends
- Business metrics
Jaeger Tracing
Section titled “Jaeger Tracing”Access: http://your-domain:16686
Monitor distributed traces for:
- Slow requests
- Error traces
- Service dependencies
- Performance bottlenecks
Alerting
Section titled “Alerting”Configure alerts for critical metrics:
# Grafana alert configurationalerts: - name: High Error Rate condition: error_rate > 0.05 for: 5m notify: slack, email
- name: Database Connection Issues condition: postgres.connections.active < postgres.connections.max * 0.9 for: 2m notify: pagerduty
- name: Queue Depth Warning condition: rabbitmq.queue.depth > 1000 for: 5m notify: slackLogging
Section titled “Logging”Centralized Logging
Section titled “Centralized Logging”Elasticsearch Storage
Section titled “Elasticsearch Storage”Logs stored in Elasticsearch indices:
# View indicescurl http://localhost:9200/_cat/indices?v
# Query logscurl http://localhost:9200/logs-*/_search?q=level:error
# Delete old logs (30+ days)curl -X DELETE "http://localhost:9200/logs-*/_delete_by_query" -H 'Content-Type: application/json' -d'{ "query": { "range": { "timestamp": { "lt": "now-30d" } } }}'Log Rotation
Section titled “Log Rotation”Configure Elasticsearch index lifecycle:
# Create lifecycle policycurl -X PUT "http://localhost:9200/_ilm/policy/logs-policy" -H 'Content-Type: application/json' -d'{ "policy": { "phases": { "hot": { "actions": { "rollover": { "max_size": "50GB", "max_age": "7d" } } }, "delete": { "min_age": "30d", "actions": { "delete": {} } } } }}'Updates and Maintenance
Section titled “Updates and Maintenance”Rolling Updates
Section titled “Rolling Updates”# Pull new imagedocker compose pull api-gateway
# Restart with zero downtime (if scaled)docker compose up -d --no-deps --scale api-gateway=3 api-gateway
# Verify healthdocker compose psDatabase Migrations
Section titled “Database Migrations”# Run migrations in devbox containerdocker exec flow-platform-devbox pnpm run migrate
# Or run migration servicedocker compose run --rm migrationsMaintenance Window
Section titled “Maintenance Window”For major updates:
# 1. Notify users of maintenance# 2. Drain traffic (stop accepting new requests)# 3. Wait for in-flight requests to complete# 4. Stop servicesdocker compose down
# 5. Update configuration# 6. Pull new imagesdocker compose pull
# 7. Start servicesdocker compose up -d
# 8. Verify healthdocker compose ps
# 9. Monitor logsdocker compose logs -fTroubleshooting
Section titled “Troubleshooting”Service Won’t Start
Section titled “Service Won’t Start”# Check logsdocker compose logs <service-name>
# Check dependenciesdocker compose ps
# Verify configurationdocker compose configHigh Memory Usage
Section titled “High Memory Usage”# Check container memorydocker stats
# Increase limits if needed# Edit docker-compose.yml resources section
# Restart servicedocker compose restart <service-name>Database Connection Pool Exhausted
Section titled “Database Connection Pool Exhausted”# Check active connectionsdocker exec flow-platform-postgres psql -U ${DATABASE_USER} -d eventstore -c "SELECT count(*) FROM pg_stat_activity;"
# Increase pool size in service configurationDATABASE_POOL_MAX=20
# Restart servicesdocker compose restartBest Practices
Section titled “Best Practices”1. Use Environment Variables
Section titled “1. Use Environment Variables”# Don't hardcode credentialsservices: postgres: environment: POSTGRES_PASSWORD: ${DATABASE_PASSWORD}2. Enable Health Checks
Section titled “2. Enable Health Checks”All services should have health checks for automatic recovery.
3. Set Resource Limits
Section titled “3. Set Resource Limits”Prevent resource exhaustion:
deploy: resources: limits: memory: 2G4. Use Named Volumes
Section titled “4. Use Named Volumes”For data persistence:
volumes: postgres-data: name: production-postgres-data5. Implement Backups
Section titled “5. Implement Backups”Automated daily backups are essential for production.