Service Discovery Issues
Service Discovery Issues
Section titled “Service Discovery Issues”Component Overview
Section titled “Component Overview”Service Discovery maintains a registry of all running services and their contracts. It provides:
- Service Registration: Services register on startup
- Health Monitoring: Periodic health checks
- Contract Registry: Stores service operation contracts
- Service Catalog: Queryable service information
- Availability Tracking: Which services are online/offline
Common Issues
Section titled “Common Issues”1. Service Not Registered
Section titled “1. Service Not Registered”Symptoms:
- Service running but not in registry
- API Gateway returns 404 for service operations
- Service not visible in discovery API
Diagnostic Steps:
# Check registered servicescurl http://localhost:3001/api/services | jq '.services[].name'
# Check specific servicecurl http://localhost:3001/api/services/my-service | jq
# If not found, check service logsdocker logs my-service | grep -i "register\|discovery"Common Causes:
A. BaseService.start() Not Called:
// ❌ WRONG: Manual setup without BaseServiceconst messageBus = new MessageBusClient(config);await messageBus.connect();// Service never registers!
// ✓ CORRECT: Use BaseServiceimport { BaseService } from '@banyanai/platform-base-service';
await BaseService.start({ name: 'my-service', version: '1.0.0'});// Automatically registers with discoveryB. Service Discovery Not Running:
# Check service discovery statusdocker ps | grep service-discovery
# If not running, start itdocker compose up -d service-discoveryC. Message Bus Connection Failed:
Service can’t connect to RabbitMQ to send registration:
# Check message bus connectiondocker logs my-service | grep -i "rabbitmq\|connected"
# Should see: "Connected to RabbitMQ"See Message Bus Issues for RabbitMQ troubleshooting.
D. Registration Message Lost:
# Check service discovery logsdocker logs service-discovery | grep -i "register"
# Should see: "Service registered: my-service"Solution:
- Ensure
BaseService.start()called - Verify service discovery running
- Check message bus connectivity
- Restart service to re-register:
docker compose restart my-service
# Check registrationcurl http://localhost:3001/api/services | jq '.services[] | select(.name=="my-service")'Prevention:
- Always use
BaseService.start()for service initialization - Monitor service registration in logs
- Add health check dependencies in docker-compose
2. Health Check Failures
Section titled “2. Health Check Failures”Symptoms:
- Service registered but marked unhealthy
- Service removed from registry after timeout
- Health endpoint returning errors
Diagnostic Steps:
# Check service health statuscurl http://localhost:3001/api/services/my-service | jq '.status'
# Test service health endpoint directlycurl http://localhost:3000/health
# Check service discovery logsdocker logs service-discovery | grep -i "health\|unhealthy"Common Causes:
A. Health Endpoint Not Responding:
# Test health endpointcurl -i http://localhost:3000/health
# If 404 or error, health endpoint not set upSolution:
BaseService automatically provides health endpoint at /health. Ensure service listening:
await BaseService.start({ name: 'my-service', version: '1.0.0', port: 3000 // Ensure port specified});
// Health endpoint automatically available at /healthB. Service Dependencies Unhealthy:
Service health check fails because dependencies (database, Redis) unavailable:
// Health check implementationexport class HealthCheck { async check(): Promise<HealthStatus> { // Check dependencies const dbHealthy = await this.checkDatabase(); const redisHealthy = await this.checkRedis();
if (!dbHealthy || !redisHealthy) { return { status: 'unhealthy', dependencies: { db: dbHealthy, redis: redisHealthy } }; }
return { status: 'healthy' }; }}Solution:
Fix dependency connections or implement graceful degradation.
C. Health Check Timeout:
Health endpoint takes too long to respond:
# Check health response timetime curl http://localhost:3000/health
# Should be < 1sSolution:
Optimize health check logic:
// ❌ SLOW: Waits for all checks sequentiallyconst dbHealth = await checkDatabase(); // 500msconst redisHealth = await checkRedis(); // 500ms// Total: 1000ms
// ✓ FAST: Run checks in parallelconst [dbHealth, redisHealth] = await Promise.all([ checkDatabase(), checkRedis()]);// Total: 500ms (max of both)Prevention:
- Keep health checks simple and fast
- Use parallel checks for dependencies
- Set reasonable health check intervals
3. Contracts Not Broadcast
Section titled “3. Contracts Not Broadcast”Symptoms:
- Service registered but no contracts
- API Gateway can’t route to service operations
- Operations return 404
Diagnostic Steps:
# Check service contractscurl http://localhost:3001/api/services/my-service/contracts | jq
# If empty, contracts not broadcast
# Check service logsdocker logs my-service | grep -i "contract\|broadcast"
# Check service discovery logsdocker logs service-discovery | grep -i "contract.*my-service"Common Causes:
A. Handlers Not Discovered:
If handlers not found, contracts not generated:
# Check handler discoverydocker logs my-service | grep "Handler discovery"
# Should show:# Handler discovery completed {# commandHandlers: 2,# queryHandlers: 1,# ...# }
# If totalHandlers: 0, see handlers-not-discovered.mdB. Contract Broadcasting Failed:
Service couldn’t send contracts to discovery:
# Check for broadcast errorsdocker logs my-service | grep -i "broadcast.*error\|contract.*fail"
# Check message bus connectiondocker logs my-service | grep -i "rabbitmq"C. Service Discovery Not Receiving Contracts:
# Check service discovery subscriptiondocker logs service-discovery | grep -i "subscribe\|platform.contracts"
# Should see: "Subscribed to platform.contracts exchange"Solution:
- Ensure handlers discovered (check directory structure, decorators)
- Verify message bus connection healthy
- Restart service to rebroadcast contracts:
docker compose restart my-service- Check contracts received:
curl http://localhost:3001/api/services/my-service/contracts | jqPrevention:
- Verify handler discovery on service startup
- Monitor contract broadcasting in logs
- Use contract schema validation
4. Service Deregistration Issues
Section titled “4. Service Deregistration Issues”Symptoms:
- Service deregistered while still running
- Service removed after restart
- Duplicate registrations
Diagnostic Steps:
# Check service registrycurl http://localhost:3001/api/services | jq '.services[] | select(.name=="my-service")'
# Check for duplicatescurl http://localhost:3001/api/services | jq '.services[] | select(.name=="my-service") | .serviceId'
# View service discovery logsdocker logs service-discovery | grep -i "deregister\|remove"Common Causes:
A. Health Check Timeout:
Service discovery removes services that fail health checks:
# Check service discovery health check intervaldocker logs service-discovery | grep -i "health check"
# Default: 30s interval, 3 failures = deregisterSolution:
Ensure service health endpoint responsive:
# Test health endpointwhile true; do curl -s http://localhost:3000/health || echo "FAILED" sleep 1doneB. Graceful Shutdown Not Implemented:
Service exits without deregistering:
// Implement graceful shutdownprocess.on('SIGTERM', async () => { console.log('Received SIGTERM, shutting down gracefully');
// BaseService automatically deregisters await BaseService.shutdown();
process.exit(0);});C. Service Instance ID Collision:
Multiple instances with same ID:
# Ensure unique instance IDs# BaseService generates unique ID automatically# Format: {serviceName}-{uuid}Prevention:
- Implement graceful shutdown
- Ensure health endpoint reliable
- Use BaseService auto-generated instance IDs
5. Contract Version Conflicts
Section titled “5. Contract Version Conflicts”Symptoms:
- Service updated but old contracts cached
- API Gateway using stale contract definitions
- Type mismatches between service versions
Diagnostic Steps:
# Check contract versionscurl http://localhost:3001/api/services/my-service | jq '{ name: .name, version: .version, contractCount: (.contracts | length)}'
# View specific contractcurl http://localhost:3001/api/services/my-service/contracts | jq '.contracts[] | select(.name=="CreateUser")'Common Causes:
A. Old Service Instances Still Registered:
# Check for multiple versionscurl http://localhost:3001/api/services | jq '.services[] | select(.name=="my-service") | {serviceId, version}'
# If multiple versions, old instances not deregisteredSolution:
Stop old service instances:
# Stop all instancesdocker compose down my-service
# Start new versiondocker compose up -d my-serviceB. Contract Cache Not Invalidated:
Service discovery caches contracts. Restart to clear:
docker compose restart service-discoverydocker compose restart api-gatewaySolution:
Implement contract versioning:
export const CreateUserCommand = { name: 'CreateUser', version: '2.0.0', // Increment version on breaking changes inputSchema: { /* ... */ }};Prevention:
- Use semantic versioning for contracts
- Deploy with zero-downtime (blue-green, rolling)
- Implement contract compatibility checks
Service Discovery API
Section titled “Service Discovery API”List All Services
Section titled “List All Services”curl http://localhost:3001/api/services | jqResponse:
{ "services": [ { "serviceId": "my-service-uuid-123", "name": "my-service", "version": "1.0.0", "status": "healthy", "endpoint": "http://my-service:3000", "registeredAt": "2024-01-15T12:00:00Z", "lastHealthCheck": "2024-01-15T12:05:00Z" } ]}Get Specific Service
Section titled “Get Specific Service”curl http://localhost:3001/api/services/my-service | jqGet Service Contracts
Section titled “Get Service Contracts”curl http://localhost:3001/api/services/my-service/contracts | jqResponse:
{ "service": "my-service", "contracts": [ { "name": "CreateUser", "type": "command", "requiredPermissions": ["users:create"], "inputSchema": { /* ... */ }, "outputSchema": { /* ... */ } } ]}Health Check Configuration
Section titled “Health Check Configuration”Service Health Endpoint
Section titled “Service Health Endpoint”BaseService provides automatic health endpoint:
await BaseService.start({ name: 'my-service', version: '1.0.0', port: 3000});
// Health endpoint: GET http://localhost:3000/health// Returns:// {// "status": "healthy",// "service": "my-service",// "version": "1.0.0",// "timestamp": "2024-01-15T12:00:00Z"// }Custom Health Checks
Section titled “Custom Health Checks”import { HealthCheckManager } from '@banyanai/platform-base-service';
// Add custom health checksHealthCheckManager.registerCheck('database', async () => { const isConnected = await database.ping(); return { healthy: isConnected };});
HealthCheckManager.registerCheck('redis', async () => { const isConnected = await redis.ping(); return { healthy: isConnected };});Health endpoint returns:
{ "status": "healthy", "service": "my-service", "checks": { "database": { "healthy": true }, "redis": { "healthy": true } }}Debugging Techniques
Section titled “Debugging Techniques”Monitor Service Registration
Section titled “Monitor Service Registration”# Watch for registrationsdocker logs -f service-discovery | grep -i "register"
# Should see:# Service registered: my-service (version: 1.0.0)# Contracts received: my-service (3 contracts)Test Registration Flow
Section titled “Test Registration Flow”# 1. Stop servicedocker compose stop my-service
# 2. Check deregisteredcurl http://localhost:3001/api/services | jq '.services[] | select(.name=="my-service")'# Should return nothing
# 3. Start servicedocker compose up -d my-service
# 4. Wait for registration (check logs)docker logs my-service | grep -i "registered"
# 5. Verify registeredcurl http://localhost:3001/api/services/my-service | jq '.status'# Should return: "healthy"Check Health Check Frequency
Section titled “Check Health Check Frequency”# Monitor health checksdocker logs service-discovery | grep -i "health check.*my-service"
# Should see periodic checks:# Health check: my-service - status: healthy# (every 30 seconds by default)Common Error Messages
Section titled “Common Error Messages””Service not found”
Section titled “”Service not found””Solution: Ensure service registered and running. Check BaseService.start() called.
”Health check failed”
Section titled “”Health check failed””Solution: Fix health endpoint or service dependencies. Check /health endpoint.
”Contract validation failed”
Section titled “”Contract validation failed””Solution: Fix contract schema. Ensure all required fields present.
”Service already registered”
Section titled “”Service already registered””Solution: Deregister old instance or use unique service instance ID.
Verification Steps
Section titled “Verification Steps”After fixing service discovery issues:
1. Service Registered
Section titled “1. Service Registered”curl http://localhost:3001/api/services/my-service | jq '{ name: .name, status: .status, version: .version}'
# Should return:# {# "name": "my-service",# "status": "healthy",# "version": "1.0.0"# }2. Contracts Available
Section titled “2. Contracts Available”curl http://localhost:3001/api/services/my-service/contracts | jq '.contracts | length'
# Should return count > 03. Health Checks Passing
Section titled “3. Health Checks Passing”# Test health endpointcurl http://localhost:3000/health | jq '.status'
# Should return: "healthy"4. API Gateway Can Route
Section titled “4. API Gateway Can Route”# Test operation via API Gatewaycurl -X POST http://localhost:3000/api/test-operation \ -H "Content-Type: application/json" \ -H "X-Dev-User-Id: test" \ -H "X-Dev-Permissions: *" \ -d '{}'
# Should return 200 OK (not 404)Related Documentation
Section titled “Related Documentation”- Service Won’t Start - Startup troubleshooting
- API Calls Failing - API routing issues
- Service Discovery Architecture - How discovery works
- BaseService - Service initialization
Summary
Section titled “Summary”Common service discovery issues:
- Service not registered - Ensure
BaseService.start()called and message bus connected - Health checks failing - Fix health endpoint or dependencies
- Contracts not broadcast - Verify handler discovery and message bus connection
- Deregistration issues - Implement graceful shutdown and reliable health checks
- Version conflicts - Stop old instances and implement contract versioning
Always check service discovery API to verify registration, contracts, and health status.