Scalability Guide
Scalability Guide
Section titled “Scalability Guide”Overview
Section titled “Overview”The banyan-core platform is designed for horizontal scalability. Services scale independently, and the message bus automatically load-balances work across instances.
Scaling Strategy
Section titled “Scaling Strategy”Horizontal Scaling
Section titled “Horizontal Scaling”Add more instances of services rather than increasing single-instance resources:
# Scale API Gateway to handle more external trafficdocker compose up -d --scale api-gateway=5
# Scale business services for processing capacitydocker compose up -d --scale user-service=10docker compose up -d --scale order-service=10Why Horizontal Scaling Works
Section titled “Why Horizontal Scaling Works”- Stateless Services: Services don’t maintain local state
- Message Bus: Work distributed automatically via queues
- Event Store: Shared PostgreSQL for persistence
- Cache: Shared Redis for query results
Message Bus Load Balancing
Section titled “Message Bus Load Balancing”Automatic Distribution
Section titled “Automatic Distribution”RabbitMQ distributes messages automatically:
Commands/Queries (Request-Response):
Client → Queue → [Handler 1, Handler 2, Handler 3, ...] Round-robin distributionEvents (Publish-Subscribe):
Publisher → Exchange → [Subscriber 1, Subscriber 2, Subscriber 3] Each gets a copyCompeting Consumers
Section titled “Competing Consumers”Multiple service instances share the same queue:
service.user-service.commands.CreateUser ↓[Instance 1] [Instance 2] [Instance 3] Message 1 Message 2 Message 3Benefits:
- Automatic load balancing
- Fault tolerance (if one fails, others continue)
- Linear scaling (double instances ≈ double throughput)
Prefetch Tuning
Section titled “Prefetch Tuning”Control concurrency per instance:
await messageBus.registerHandler(ProcessVideoContract, handler, { prefetch: 1 // Process one at a time (CPU intensive)});
await messageBus.registerHandler(GetUserContract, handler, { prefetch: 10 // Process 10 concurrent (I/O bound)});Guidelines:
- CPU-intensive: prefetch = 1-2
- I/O-bound: prefetch = 10-20
- Memory-intensive: prefetch = 1-5
Database Scaling
Section titled “Database Scaling”Read Replicas
Section titled “Read Replicas”Add PostgreSQL read replicas for queries:
services: postgres-primary: image: postgres:16-alpine environment: POSTGRES_REPLICATION_MODE: master
postgres-replica-1: image: postgres:16-alpine environment: POSTGRES_REPLICATION_MODE: slave POSTGRES_MASTER_HOST: postgres-primaryRoute queries to replicas:
// Write to primaryawait primaryDb.execute('INSERT INTO users ...');
// Read from replicaconst users = await replicaDb.query('SELECT * FROM users ...');Connection Pooling
Section titled “Connection Pooling”Configure appropriate pool sizes:
{ database: { pool: { min: 2, // Minimum connections max: 10, // Maximum per instance idleTimeout: 30000 } }}Calculation: totalConnections = serviceInstances × poolMax
Example:
- 5 service instances × 10 connections = 50 total connections
- Ensure PostgreSQL
max_connections > 50
Query Optimization
Section titled “Query Optimization”- Add Indexes: Speed up frequently queried fields
- Use CQRS: Separate read models for complex queries
- Cache Results: Use Redis for repeated queries
- Pagination: Limit result sets
Cache Scaling
Section titled “Cache Scaling”Redis Cluster
Section titled “Redis Cluster”For high cache load, use Redis cluster:
services: redis-node-1: image: redis:7-alpine command: redis-server --cluster-enabled yes
redis-node-2: image: redis:7-alpine command: redis-server --cluster-enabled yes
redis-node-3: image: redis:7-alpine command: redis-server --cluster-enabled yesCache Strategies
Section titled “Cache Strategies”Cache-Aside Pattern:
@QueryHandler(GetUserContract)export class GetUserHandler { async handle(input: { id: string }) { // Try cache first const cached = await this.cache.get(`user:${input.id}`); if (cached) return cached;
// Query database const user = await this.userRepository.findById(input.id);
// Store in cache await this.cache.set(`user:${input.id}`, user, 300); // 5 min TTL
return user; }}Cache Invalidation:
@CommandHandler(UpdateUserContract)export class UpdateUserHandler { async handle(input: { id: string; ... }) { const user = await this.userRepository.update(input);
// Invalidate cache await this.cache.delete(`user:${input.id}`);
return user; }}API Gateway Scaling
Section titled “API Gateway Scaling”Multiple Instances
Section titled “Multiple Instances”Scale API Gateway for external traffic:
docker compose up -d --scale api-gateway=5Load Balancer
Section titled “Load Balancer”Add Nginx for load balancing:
upstream api_gateway { least_conn; # Route to least busy instance server api-gateway-1:3003; server api-gateway-2:3003; server api-gateway-3:3003; server api-gateway-4:3003; server api-gateway-5:3003;}
server { listen 80;
location / { proxy_pass http://api_gateway; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection 'upgrade'; proxy_set_header Host $host; proxy_cache_bypass $http_upgrade; }}Rate Limiting
Section titled “Rate Limiting”Distribute rate limits across instances:
Use shared Redis for rate limiting:
{ rateLimit: { store: 'redis', // Shared across instances max: 100, windowMs: 60000 }}RabbitMQ Scaling
Section titled “RabbitMQ Scaling”Cluster Setup
Section titled “Cluster Setup”For high message volume, cluster RabbitMQ:
services: rabbitmq-1: image: rabbitmq:3.13-management-alpine environment: RABBITMQ_ERLANG_COOKIE: secret_cookie volumes: - ./cluster-config.conf:/etc/rabbitmq/rabbitmq.conf
rabbitmq-2: image: rabbitmq:3.13-management-alpine environment: RABBITMQ_ERLANG_COOKIE: secret_cookie depends_on: - rabbitmq-1
rabbitmq-3: image: rabbitmq:3.13-management-alpine environment: RABBITMQ_ERLANG_COOKIE: secret_cookie depends_on: - rabbitmq-1Quorum Queues
Section titled “Quorum Queues”Use quorum queues for high availability:
{ queueOptions: { durable: true, arguments: { 'x-queue-type': 'quorum' } }}Message Persistence
Section titled “Message Persistence”Balance persistence vs performance:
// Critical messages: persistentawait messageBus.send(ProcessPaymentContract, payload, { persistent: true});
// Non-critical: transient (faster)await messageBus.send(UpdateCacheContract, payload, { persistent: false});Performance Metrics
Section titled “Performance Metrics”Target Metrics
Section titled “Target Metrics”| Metric | Target | Scaling Action |
|---|---|---|
| Request Latency (p95) | < 200ms | Scale API Gateway |
| Queue Depth | < 100 | Scale service handlers |
| Database Connections | < 80% max | Increase pool or add replicas |
| Cache Hit Rate | > 90% | Increase cache size or TTL |
| Error Rate | < 1% | Investigate and fix |
Monitoring Thresholds
Section titled “Monitoring Thresholds”Set alerts for scaling needs:
{ alerts: { queueDepth: { threshold: 100, action: 'Scale service instances' }, responseTime: { threshold: 200, // ms action: 'Scale API Gateway' }, cpuUsage: { threshold: 70, // % action: 'Scale containers' } }}Auto-Scaling
Section titled “Auto-Scaling”Kubernetes (Future)
Section titled “Kubernetes (Future)”For cloud deployment, use Kubernetes HPA:
apiVersion: autoscaling/v2kind: HorizontalPodAutoscalermetadata: name: api-gatewayspec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: api-gateway minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80Docker Swarm
Section titled “Docker Swarm”For Docker-based auto-scaling:
services: api-gateway: deploy: replicas: 3 update_config: parallelism: 2 delay: 10s restart_policy: condition: on-failure resources: limits: cpus: '2' memory: 2G reservations: cpus: '1' memory: 1GCapacity Planning
Section titled “Capacity Planning”Estimating Capacity
Section titled “Estimating Capacity”- Measure Single Instance: Benchmark one instance throughput
- Calculate Requirements: Total load / single instance capacity
- Add Headroom: Multiply by 1.5-2x for peaks and failures
- Monitor and Adjust: Track metrics and scale as needed
Example Calculation
Section titled “Example Calculation”Single API Gateway instance: 1000 req/sExpected peak load: 5000 req/sRequired instances: 5000 / 1000 = 5With headroom (2x): 5 × 2 = 10 instancesResource Planning
Section titled “Resource Planning”| Component | Small (< 1K req/s) | Medium (1K-10K) | Large (> 10K) |
|---|---|---|---|
| API Gateway | 2 instances | 5 instances | 10+ instances |
| Services | 3 instances | 10 instances | 20+ instances |
| PostgreSQL | 1 primary | 1 primary + 2 replicas | Cluster |
| RabbitMQ | 1 node | 3 node cluster | 5+ node cluster |
| Redis | 1 node | 3 node cluster | 6+ node cluster |
Best Practices
Section titled “Best Practices”1. Design for Horizontal Scaling
Section titled “1. Design for Horizontal Scaling”// Good: Stateless handler@CommandHandler(CreateUserContract)export class CreateUserHandler { async handle(input: any) { return await this.userRepository.create(input); }}
// Avoid: Local statelet localCache = {}; // Don't do this!2. Use Idempotent Operations
Section titled “2. Use Idempotent Operations”@CommandHandler(CreateUserContract)export class CreateUserHandler { async handle(input: { email: string }) { // Idempotent: Safe to retry const existing = await this.userRepository.findByEmail(input.email); if (existing) return existing;
return await this.userRepository.create(input); }}3. Implement Circuit Breakers
Section titled “3. Implement Circuit Breakers”Prevent cascade failures:
{ circuitBreaker: { failureThreshold: 5, successThreshold: 2, recoveryTimeout: 30000 }}4. Monitor Performance
Section titled “4. Monitor Performance”Track key metrics:
- Request latency (p50, p95, p99)
- Queue depth
- Error rate
- Resource usage (CPU, memory)
5. Test Under Load
Section titled “5. Test Under Load”# Load test with Apache Benchab -n 10000 -c 100 http://localhost:3003/api/users
# Load test with k6k6 run --vus 100 --duration 30s load-test.jsTroubleshooting
Section titled “Troubleshooting”High Queue Depth
Section titled “High Queue Depth”Cause: Handlers can’t keep up with message rate
Solution:
# Scale service instancesdocker compose up -d --scale user-service=10
# Or increase prefetch for I/O-bound handlersprefetch: 20Database Connection Exhaustion
Section titled “Database Connection Exhaustion”Cause: Too many service instances × pool size
Solution:
// Reduce pool size per instancepool: { max: 5 }
// Or increase PostgreSQL max_connections# postgresql.confmax_connections = 200Memory Issues
Section titled “Memory Issues”Cause: Unbounded message processing or cache growth
Solution:
# Set container memory limitsdeploy: resources: limits: memory: 2G