Skip to content

Scalability Guide

The banyan-core platform is designed for horizontal scalability. Services scale independently, and the message bus automatically load-balances work across instances.

Add more instances of services rather than increasing single-instance resources:

Terminal window
# Scale API Gateway to handle more external traffic
docker compose up -d --scale api-gateway=5
# Scale business services for processing capacity
docker compose up -d --scale user-service=10
docker compose up -d --scale order-service=10
  1. Stateless Services: Services don’t maintain local state
  2. Message Bus: Work distributed automatically via queues
  3. Event Store: Shared PostgreSQL for persistence
  4. Cache: Shared Redis for query results

RabbitMQ distributes messages automatically:

Commands/Queries (Request-Response):

Client → Queue → [Handler 1, Handler 2, Handler 3, ...]
Round-robin distribution

Events (Publish-Subscribe):

Publisher → Exchange → [Subscriber 1, Subscriber 2, Subscriber 3]
Each gets a copy

Multiple service instances share the same queue:

service.user-service.commands.CreateUser
[Instance 1] [Instance 2] [Instance 3]
Message 1 Message 2 Message 3

Benefits:

  • Automatic load balancing
  • Fault tolerance (if one fails, others continue)
  • Linear scaling (double instances ≈ double throughput)

Control concurrency per instance:

await messageBus.registerHandler(ProcessVideoContract, handler, {
prefetch: 1 // Process one at a time (CPU intensive)
});
await messageBus.registerHandler(GetUserContract, handler, {
prefetch: 10 // Process 10 concurrent (I/O bound)
});

Guidelines:

  • CPU-intensive: prefetch = 1-2
  • I/O-bound: prefetch = 10-20
  • Memory-intensive: prefetch = 1-5

Add PostgreSQL read replicas for queries:

services:
postgres-primary:
image: postgres:16-alpine
environment:
POSTGRES_REPLICATION_MODE: master
postgres-replica-1:
image: postgres:16-alpine
environment:
POSTGRES_REPLICATION_MODE: slave
POSTGRES_MASTER_HOST: postgres-primary

Route queries to replicas:

// Write to primary
await primaryDb.execute('INSERT INTO users ...');
// Read from replica
const users = await replicaDb.query('SELECT * FROM users ...');

Configure appropriate pool sizes:

{
database: {
pool: {
min: 2, // Minimum connections
max: 10, // Maximum per instance
idleTimeout: 30000
}
}
}

Calculation: totalConnections = serviceInstances × poolMax

Example:

  • 5 service instances × 10 connections = 50 total connections
  • Ensure PostgreSQL max_connections > 50
  1. Add Indexes: Speed up frequently queried fields
  2. Use CQRS: Separate read models for complex queries
  3. Cache Results: Use Redis for repeated queries
  4. Pagination: Limit result sets

For high cache load, use Redis cluster:

services:
redis-node-1:
image: redis:7-alpine
command: redis-server --cluster-enabled yes
redis-node-2:
image: redis:7-alpine
command: redis-server --cluster-enabled yes
redis-node-3:
image: redis:7-alpine
command: redis-server --cluster-enabled yes

Cache-Aside Pattern:

@QueryHandler(GetUserContract)
export class GetUserHandler {
async handle(input: { id: string }) {
// Try cache first
const cached = await this.cache.get(`user:${input.id}`);
if (cached) return cached;
// Query database
const user = await this.userRepository.findById(input.id);
// Store in cache
await this.cache.set(`user:${input.id}`, user, 300); // 5 min TTL
return user;
}
}

Cache Invalidation:

@CommandHandler(UpdateUserContract)
export class UpdateUserHandler {
async handle(input: { id: string; ... }) {
const user = await this.userRepository.update(input);
// Invalidate cache
await this.cache.delete(`user:${input.id}`);
return user;
}
}

Scale API Gateway for external traffic:

Terminal window
docker compose up -d --scale api-gateway=5

Add Nginx for load balancing:

upstream api_gateway {
least_conn; # Route to least busy instance
server api-gateway-1:3003;
server api-gateway-2:3003;
server api-gateway-3:3003;
server api-gateway-4:3003;
server api-gateway-5:3003;
}
server {
listen 80;
location / {
proxy_pass http://api_gateway;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_cache_bypass $http_upgrade;
}
}

Distribute rate limits across instances:

Use shared Redis for rate limiting:

{
rateLimit: {
store: 'redis', // Shared across instances
max: 100,
windowMs: 60000
}
}

For high message volume, cluster RabbitMQ:

services:
rabbitmq-1:
image: rabbitmq:3.13-management-alpine
environment:
RABBITMQ_ERLANG_COOKIE: secret_cookie
volumes:
- ./cluster-config.conf:/etc/rabbitmq/rabbitmq.conf
rabbitmq-2:
image: rabbitmq:3.13-management-alpine
environment:
RABBITMQ_ERLANG_COOKIE: secret_cookie
depends_on:
- rabbitmq-1
rabbitmq-3:
image: rabbitmq:3.13-management-alpine
environment:
RABBITMQ_ERLANG_COOKIE: secret_cookie
depends_on:
- rabbitmq-1

Use quorum queues for high availability:

{
queueOptions: {
durable: true,
arguments: {
'x-queue-type': 'quorum'
}
}
}

Balance persistence vs performance:

// Critical messages: persistent
await messageBus.send(ProcessPaymentContract, payload, {
persistent: true
});
// Non-critical: transient (faster)
await messageBus.send(UpdateCacheContract, payload, {
persistent: false
});
MetricTargetScaling Action
Request Latency (p95)< 200msScale API Gateway
Queue Depth< 100Scale service handlers
Database Connections< 80% maxIncrease pool or add replicas
Cache Hit Rate> 90%Increase cache size or TTL
Error Rate< 1%Investigate and fix

Set alerts for scaling needs:

{
alerts: {
queueDepth: {
threshold: 100,
action: 'Scale service instances'
},
responseTime: {
threshold: 200, // ms
action: 'Scale API Gateway'
},
cpuUsage: {
threshold: 70, // %
action: 'Scale containers'
}
}
}

For cloud deployment, use Kubernetes HPA:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-gateway
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-gateway
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80

For Docker-based auto-scaling:

services:
api-gateway:
deploy:
replicas: 3
update_config:
parallelism: 2
delay: 10s
restart_policy:
condition: on-failure
resources:
limits:
cpus: '2'
memory: 2G
reservations:
cpus: '1'
memory: 1G
  1. Measure Single Instance: Benchmark one instance throughput
  2. Calculate Requirements: Total load / single instance capacity
  3. Add Headroom: Multiply by 1.5-2x for peaks and failures
  4. Monitor and Adjust: Track metrics and scale as needed
Single API Gateway instance: 1000 req/s
Expected peak load: 5000 req/s
Required instances: 5000 / 1000 = 5
With headroom (2x): 5 × 2 = 10 instances
ComponentSmall (< 1K req/s)Medium (1K-10K)Large (> 10K)
API Gateway2 instances5 instances10+ instances
Services3 instances10 instances20+ instances
PostgreSQL1 primary1 primary + 2 replicasCluster
RabbitMQ1 node3 node cluster5+ node cluster
Redis1 node3 node cluster6+ node cluster
// Good: Stateless handler
@CommandHandler(CreateUserContract)
export class CreateUserHandler {
async handle(input: any) {
return await this.userRepository.create(input);
}
}
// Avoid: Local state
let localCache = {}; // Don't do this!
@CommandHandler(CreateUserContract)
export class CreateUserHandler {
async handle(input: { email: string }) {
// Idempotent: Safe to retry
const existing = await this.userRepository.findByEmail(input.email);
if (existing) return existing;
return await this.userRepository.create(input);
}
}

Prevent cascade failures:

{
circuitBreaker: {
failureThreshold: 5,
successThreshold: 2,
recoveryTimeout: 30000
}
}

Track key metrics:

  • Request latency (p50, p95, p99)
  • Queue depth
  • Error rate
  • Resource usage (CPU, memory)
Terminal window
# Load test with Apache Bench
ab -n 10000 -c 100 http://localhost:3003/api/users
# Load test with k6
k6 run --vus 100 --duration 30s load-test.js

Cause: Handlers can’t keep up with message rate

Solution:

Terminal window
# Scale service instances
docker compose up -d --scale user-service=10
# Or increase prefetch for I/O-bound handlers
prefetch: 20

Cause: Too many service instances × pool size

Solution:

// Reduce pool size per instance
pool: { max: 5 }
// Or increase PostgreSQL max_connections
# postgresql.conf
max_connections = 200

Cause: Unbounded message processing or cache growth

Solution:

# Set container memory limits
deploy:
resources:
limits:
memory: 2G