Service Discovery Service
Service Discovery Service
Section titled “Service Discovery Service”The Service Discovery service maintains a registry of all platform services, their contracts, and health status, enabling dynamic service discovery and API generation.
Overview
Section titled “Overview”The Service Discovery service provides:
- Contract Registry - Centralized storage of service contracts
- Health Monitoring - Continuous health checks of all services
- Service Registration - Automatic service registration on startup
- Contract Validation - Schema validation for contracts
- Dependency Tracking - Service dependency graph
- Cache Management - Redis-backed contract caching
- Persistence - PostgreSQL-backed contract storage
Architecture
Section titled “Architecture”┌─────────────────────────────────────────────────────────┐│ Service Discovery ││ ││ ┌──────────────────────────────────────────────────┐ ││ │ Contract Registry │ ││ │ │ ││ │ ┌──────────────┐ ┌─────────────────────────┐ │ ││ │ │ PostgreSQL │ │ Redis Cache │ │ ││ │ │ │ │ │ │ ││ │ │ - Contracts │ │ - Active contracts │ │ ││ │ │ - Versions │ │ - TTL: 5 minutes │ │ ││ │ │ - Checksums │ │ - Quick lookups │ │ ││ │ └──────────────┘ └─────────────────────────┘ │ ││ └──────────────────────────────────────────────────┘ ││ ││ ┌──────────────────────────────────────────────────┐ ││ │ Health Monitor │ ││ │ │ ││ │ - Continuous health checks │ ││ │ - Service status tracking │ ││ │ - Dependency health │ ││ │ - Cascading failure detection │ ││ └──────────────────────────────────────────────────┘ ││ ││ ┌──────────────────────────────────────────────────┐ ││ │ Event Handlers │ ││ │ │ ││ │ - ServiceContractsRegistered │ ││ │ - ServiceHealthCheck │ ││ │ - ServiceDeregistered │ ││ └──────────────────────────────────────────────────┘ │└───────────────────────────────────────────────────────┘Configuration
Section titled “Configuration”Environment Variables
Section titled “Environment Variables”# Port ConfigurationPORT=3002 # HTTP port
# Database (Contract Storage)DATABASE_HOST=postgres # PostgreSQL hostDATABASE_NAME=eventstore # Database nameDATABASE_USER=actor_user # Database userDATABASE_PASSWORD=actor_pass123 # Database passwordDATABASE_PORT=5432 # Database port
# Legacy database URL (for compatibility)SERVICE_DISCOVERY_DATABASE_URL=postgresql://actor_user:actor_pass123@postgres:5432/eventstore
# Redis (Contract Cache)REDIS_URL=redis://:redis123@redis:6379
# Message BusMESSAGE_BUS_URL=amqp://admin:admin123@rabbitmq:5672
# Health MonitoringHEALTH_CHECK_INTERVAL_MS=10000 # Health check interval (10 seconds)HEALTH_CHECK_TIMEOUT_MS=5000 # Health check timeout (5 seconds)SERVICE_UNHEALTHY_THRESHOLD=3 # Failed checks before unhealthy
# TelemetryJAEGER_ENDPOINT=http://jaeger:4318/v1/tracesDocker Compose
Section titled “Docker Compose”service-discovery: build: ./platform/services/service-discovery ports: - "3002:3002" environment: - PORT=3002 - DATABASE_HOST=postgres - DATABASE_NAME=eventstore - DATABASE_USER=actor_user - DATABASE_PASSWORD=actor_pass123 - REDIS_URL=redis://:redis123@redis:6379 - MESSAGE_BUS_URL=amqp://admin:admin123@rabbitmq:5672 - HEALTH_CHECK_INTERVAL_MS=10000 depends_on: postgres: condition: service_healthy redis: condition: service_healthy rabbitmq: condition: service_healthyContract Registration
Section titled “Contract Registration”Automatic Registration
Section titled “Automatic Registration”Services automatically register their contracts on startup:
// In service main.tsawait BaseService.start({ serviceName: 'my-service', serviceVersion: '1.0.0'});
// BaseService automatically:// 1. Discovers all contracts from handlers// 2. Broadcasts ServiceContractsRegistered event// 3. Service Discovery stores contractsContract Structure
Section titled “Contract Structure”interface ContractDefinition { messageType: string; // "CreateUserCommand" description?: string; // Human-readable description inputSchema: JSONSchema; // Input validation schema outputSchema: JSONSchema; // Output schema requiredPermissions: string[]; // ["users:create"] isPublic: boolean; // Public endpoint? version: string; // "1.0.0"}Contract Storage
Section titled “Contract Storage”PostgreSQL Schema:
CREATE TABLE service_contracts ( id SERIAL PRIMARY KEY, service_name VARCHAR(255) NOT NULL, version VARCHAR(50) NOT NULL, contracts JSONB NOT NULL, checksum VARCHAR(64) NOT NULL, registered_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(), last_updated TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(), UNIQUE(service_name, version));
CREATE INDEX idx_service_contracts_service_name ON service_contracts(service_name);
CREATE INDEX idx_service_contracts_registered_at ON service_contracts(registered_at DESC);
CREATE INDEX idx_service_contracts_service_version ON service_contracts(service_name, version);Redis Cache:
Key: contracts:<service-name>:<version>Value: JSON array of contractsTTL: 300 seconds (5 minutes)Health Monitoring
Section titled “Health Monitoring”Health Check Mechanism
Section titled “Health Check Mechanism”Message-Based Health Checks:
// Service Discovery sends HealthCheckRequestmessageBus.publish('HealthCheckRequest', { serviceName: 'auth-service', timestamp: Date.now()});
// Service responds with HealthCheckResponsemessageBus.publish('HealthCheckResponse', { serviceName: 'auth-service', status: 'healthy', timestamp: Date.now(), dependencies: { database: 'healthy', messageBus: 'healthy' }});Health Check Interval:
- Default: 10 seconds
- Configurable via
HEALTH_CHECK_INTERVAL_MS - Timeout: 5 seconds (configurable)
Health Status States
Section titled “Health Status States”Healthy:
- Service responding to health checks
- Response time < threshold
- All dependencies healthy
Degraded:
- Service responding but slow
- Some dependencies unhealthy
- Partial functionality
Unhealthy:
- Service not responding
- Failed > threshold consecutive checks
- Critical dependencies down
Unknown:
- No health data available
- Service just registered
- Service not yet checked
Health Check Response
Section titled “Health Check Response”interface HealthCheckResponse { serviceName: string; status: 'healthy' | 'degraded' | 'unhealthy'; timestamp: number; responseTime: number; dependencies?: { database?: 'healthy' | 'unhealthy'; messageBus?: 'healthy' | 'unhealthy'; cache?: 'healthy' | 'unhealthy'; [key: string]: string | undefined; }; errors?: string[];}Queries
Section titled “Queries”GetAllContracts
Section titled “GetAllContracts”Get all contracts from all services.
Permission Required: admin:view-contracts
Input:
{ includeInternalServices: false}Output:
[ { serviceName: "auth-service", version: "1.0.0", contracts: [ { messageType: "CreateUserCommand", description: "Create a new user", inputSchema: { ... }, outputSchema: { ... }, requiredPermissions: ["users:create"] }, ... ], registeredAt: "2025-11-15T10:00:00Z" }, { serviceName: "api-gateway", version: "1.0.0", contracts: [ ... ] }]GetServiceContracts
Section titled “GetServiceContracts”Get contracts for a specific service.
Permission Required: admin:view-contracts
Input:
{ serviceName: "auth-service"}Output:
{ serviceName: "auth-service", version: "1.0.0", contracts: [ { messageType: "CreateUserCommand", description: "Create a new user", requiredPermissions: ["users:create"] } ]}GetServiceHealth
Section titled “GetServiceHealth”Get health status for a specific service.
Permission Required: admin:view-services
Input:
{ serviceName: "auth-service"}Output:
{ serviceName: "auth-service", status: "healthy", lastHealthCheck: "2025-11-15T14:30:00Z", responseTime: 12, errors: null}GetAllServiceHealth
Section titled “GetAllServiceHealth”Get health status for all services.
Permission Required: admin:view-services
Output:
{ services: [ { serviceName: "auth-service", status: "healthy", lastHealthCheck: "2025-11-15T14:30:00Z", responseTime: 12 }, { serviceName: "api-gateway", status: "healthy", lastHealthCheck: "2025-11-15T14:30:05Z", responseTime: 8 } ]}GetDependencyHealth
Section titled “GetDependencyHealth”Get health status for service dependencies (cascading check).
Permission Required: admin:view-services
Input:
{ serviceName: "api-gateway"}Output:
{ serviceName: "api-gateway", status: "healthy", dependencies: [ { serviceName: "auth-service", status: "healthy", isDirect: true }, { serviceName: "service-discovery", status: "healthy", isDirect: true } ], cascadingStatus: "healthy"}Events
Section titled “Events”ServiceContractsRegistered
Section titled “ServiceContractsRegistered”Published when a service registers its contracts.
Published By: Services on startup (via BaseService)
Event Data:
{ serviceName: "auth-service", version: "1.0.0", serviceId: "auth-service-v1-abc123", contracts: [ { messageType: "CreateUserCommand", description: "Create a new user", inputSchema: { ... }, outputSchema: { ... }, requiredPermissions: ["users:create"] } ], endpoint: "amqp://rabbitmq:5672", timestamp: "2025-11-15T14:30:00.123Z"}Handler Actions:
- Validate contracts against schema
- Store in PostgreSQL
- Cache in Redis
- Register service for health monitoring
- Notify API Gateway of new contracts
ServiceHealthChanged
Section titled “ServiceHealthChanged”Published when service health status changes.
Published By: Service Discovery health monitor
Event Data:
{ serviceName: "auth-service", status: "unhealthy", previousStatus: "healthy", healthCheckTime: "2025-11-15T14:35:00Z", responseTime: null, errors: ["Health check timeout after 5000ms"], timestamp: "2025-11-15T14:35:00.456Z"}Subscribers:
- API Gateway (for circuit breaker state)
- Monitoring systems
- Admin dashboards
ServiceDeregistered
Section titled “ServiceDeregistered”Published when a service is removed.
Event Data:
{ serviceName: "auth-service", serviceId: "auth-service-v1-abc123", reason: "Graceful shutdown", timestamp: "2025-11-15T14:40:00.789Z"}Contract Validation
Section titled “Contract Validation”Schema Validation
Section titled “Schema Validation”All contracts are validated on registration:
interface ContractValidationResult { isValid: boolean; errors: ValidationError[];}
interface ValidationError { field: string; message: string; severity: 'error' | 'warning';}Validation Rules:
messageTypemust be unique per serviceinputSchemamust be valid JSON SchemaoutputSchemamust be valid JSON SchemarequiredPermissionsmust be string arrayversionmust follow semver
Validation Errors:
{ isValid: false, errors: [ { field: "messageType", message: "Duplicate message type: CreateUserCommand", severity: "error" }, { field: "inputSchema", message: "Invalid JSON Schema: missing 'type' property", severity: "error" } ]}API Gateway Integration
Section titled “API Gateway Integration”Contract Discovery Flow
Section titled “Contract Discovery Flow”1. API Gateway starts └─> Queries Service Discovery for all contracts
2. Service Discovery responds └─> Returns all registered contracts from cache/DB
3. API Gateway generates schemas └─> REST routes from contracts └─> GraphQL schema from contracts └─> WebSocket event subscriptions
4. Service registers new contracts └─> ServiceContractsRegistered event
5. API Gateway receives event └─> Refreshes contract cache └─> Regenerates schemas └─> Updates routes
6. Clients see new operations └─> New REST endpoints └─> New GraphQL operations └─> New WebSocket subscriptionsDynamic Route Generation
Section titled “Dynamic Route Generation”// Service Discovery provides contractsconst contracts = await serviceDiscovery.getAllContracts();
// API Gateway generates routesfor (const contract of contracts) { if (contract.messageType.endsWith('Command')) { // Generate POST endpoint router.post(`/api/${toRoute(contract)}`, handler); } else if (contract.messageType.endsWith('Query')) { // Generate GET endpoint router.get(`/api/${toRoute(contract)}`, handler); }}Monitoring and Observability
Section titled “Monitoring and Observability”Metrics
Section titled “Metrics”# Contract metricsservice_discovery_contracts_totalservice_discovery_services_registered_totalservice_discovery_contract_updates_total
# Health metricsservice_discovery_health_checks_total{service="auth-service", status="success"}service_discovery_unhealthy_services_totalservice_discovery_health_check_duration_seconds{service="auth-service"}
# Cache metricsservice_discovery_cache_hits_totalservice_discovery_cache_misses_totalservice_discovery_cache_sizeHealth Check Endpoint
Section titled “Health Check Endpoint”curl http://localhost:3002/healthResponse:
{ "status": "healthy", "service": "service-discovery", "version": "1.0.0", "dependencies": { "database": "healthy", "redis": "healthy", "messageBus": "healthy" }, "registeredServices": 3, "healthyServices": 3}Admin Endpoints
Section titled “Admin Endpoints”List All Services:
curl http://localhost:3002/admin/servicesView Service Details:
curl http://localhost:3002/admin/services/auth-serviceContract Statistics:
curl http://localhost:3002/admin/statsResponse:
{ "totalServices": 3, "totalContracts": 47, "healthyServices": 3, "unhealthyServices": 0, "cacheHitRatio": 0.95, "averageResponseTime": 15}Troubleshooting
Section titled “Troubleshooting”Service Not Appearing in Registry
Section titled “Service Not Appearing in Registry”Symptoms:
- Service started but not in registry
- Contracts not available to API Gateway
Checks:
# Check if service published ServiceContractsRegistered eventdocker logs service-discovery | grep "ServiceContractsRegistered"
# Check PostgreSQL for contractsdocker exec -it postgres psql -U actor_user -d eventstore \ -c "SELECT service_name, version, registered_at FROM service_contracts;"
# Check Redis cachedocker exec -it redis redis-cli> KEYS contracts:*> GET contracts:auth-service:1.0.0Fix:
# Restart service to re-registerdocker restart auth-service
# Check service logs for errorsdocker logs auth-service | grep "contract"Stale Contracts in API Gateway
Section titled “Stale Contracts in API Gateway”Symptoms:
- Old operations still visible
- New operations not available
- Schema not updated
Checks:
# Check cache TTLdocker exec -it redis redis-cli> TTL contracts:auth-service:1.0.0
# Check contract update timestampcurl http://localhost:3002/admin/services/auth-serviceFix:
# Clear API Gateway schema cachecurl -X POST http://localhost:3003/admin/clear-cache
# Restart API Gatewaydocker restart api-gateway
# Clear Service Discovery cachedocker exec -it redis redis-cli> FLUSHDBHealth Check Failures
Section titled “Health Check Failures”Symptoms:
- Service marked as unhealthy
- Circuit breakers opening
- Requests failing
Checks:
# Check service logsdocker logs auth-service | grep "health"
# Check health endpoint directlycurl http://localhost:3001/health
# Check Service Discovery health statuscurl http://localhost:3002/admin/services/auth-serviceFix:
# Restart unhealthy servicedocker restart auth-service
# Check dependenciesdocker ps | grep postgresdocker ps | grep rabbitmq
# Adjust health check interval if needed# In docker-compose.yml:HEALTH_CHECK_INTERVAL_MS=30000 # Longer intervalDatabase Connection Issues
Section titled “Database Connection Issues”Symptoms:
- Contract storage failures
- Service registration errors
Checks:
# Check database connectivitydocker exec -it postgres pg_isready
# Check table existsdocker exec -it postgres psql -U actor_user -d eventstore \ -c "\dt service_contracts"
# Check database logsdocker logs postgres | grep ERRORFix:
# Restart PostgreSQLdocker restart postgres
# Recreate tables (if schema issue)docker exec service-discovery npm run migrateBest Practices
Section titled “Best Practices”Contract Versioning
Section titled “Contract Versioning”// Include version in contract@Command({ description: 'Create user (v2)', version: '2.0.0', permissions: ['users:create']})export class CreateUserCommandV2 { // New fields}
// Keep old version for compatibility@Command({ description: 'Create user (v1 - deprecated)', version: '1.0.0', permissions: ['users:create']})export class CreateUserCommand { // Old fields}Health Check Implementation
Section titled “Health Check Implementation”// In your serviceclass MyService { async healthCheck(): Promise<HealthCheckResponse> { const start = Date.now();
// Check dependencies const dbHealthy = await this.checkDatabase(); const mbHealthy = await this.checkMessageBus();
return { serviceName: 'my-service', status: dbHealthy && mbHealthy ? 'healthy' : 'degraded', timestamp: Date.now(), responseTime: Date.now() - start, dependencies: { database: dbHealthy ? 'healthy' : 'unhealthy', messageBus: mbHealthy ? 'healthy' : 'unhealthy' } }; }}Graceful Deregistration
Section titled “Graceful Deregistration”// On service shutdownprocess.on('SIGTERM', async () => { // Publish deregistration event await messageBus.publish('ServiceDeregistered', { serviceName: 'my-service', serviceId: 'my-service-v1-abc123', reason: 'Graceful shutdown' });
// Wait for message to be sent await new Promise(resolve => setTimeout(resolve, 1000));
// Stop service await service.stop();});Next Steps
Section titled “Next Steps”- API Gateway - How contracts generate APIs
- Auth Service - Service example
- Base Service Package - Service startup
- Service Discovery Concepts - Architecture details