Telemetry Infrastructure
Telemetry Infrastructure
Section titled “Telemetry Infrastructure”Core Idea: OpenTelemetry provides unified tracing, metrics, and logging with Jaeger for distributed tracing and Prometheus + Grafana for metrics visualization.
Overview
Section titled “Overview”The platform provides complete observability through:
- Distributed Tracing: Jaeger 2.9 with Service Performance Monitoring
- Metrics: Prometheus + Grafana dashboards
- Logging: Structured logs with correlation IDs
- Automatic Instrumentation: Zero telemetry code in business logic
Components
Section titled “Components”Jaeger (Distributed Tracing)
Section titled “Jaeger (Distributed Tracing)”jaeger: image: cr.jaegertracing.io/jaegertracing/jaeger:2.9.0 ports: - "16686:16686" # Jaeger UI - "4318:4318" # OTLP HTTP environment: - SPAN_STORAGE_TYPE=elasticsearchAccess: http://localhost:16686
Grafana (Dashboards)
Section titled “Grafana (Dashboards)”grafana: image: grafana/grafana:11.5.0 ports: - "5005:3000" environment: - GF_SECURITY_ADMIN_PASSWORD=adminAccess: http://localhost:5005 (admin/admin)
Elasticsearch (Jaeger Backend)
Section titled “Elasticsearch (Jaeger Backend)”elasticsearch: image: elasticsearch:8.11.0 environment: - discovery.type=single-node - xpack.security.enabled=false ports: - "9200:9200"Automatic Correlation
Section titled “Automatic Correlation”// Automatic correlation ID propagation// API Gateway generates correlation IDconst correlationId = generateCorrelationId();
// Set AsyncLocalStorage contextawait correlationContext.run(correlationId, async () => { // All message bus calls include correlation ID automatically await messageBus.send(CreateOrderContract, data);
// No manual correlation ID management needed // Jaeger automatically links all spans});Key Features
Section titled “Key Features”Distributed Tracing
Section titled “Distributed Tracing”- Complete Request Flow: See entire request path across services
- Performance Breakdown: Time spent in each service
- Error Tracking: Failed requests highlighted
- Dependency Graph: Service communication map
Service Performance Monitoring
Section titled “Service Performance Monitoring”- RED Metrics: Rate, Errors, Duration
- Service Latency: P50, P95, P99 percentiles
- Error Rates: Per service and operation
- Call Graphs: Service dependency visualization
Custom Metrics
Section titled “Custom Metrics”import { Metrics } from '@banyanai/platform-telemetry';
// CounterMetrics.counter('orders.created').inc();
// HistogramMetrics.histogram('order.processing.time').observe(duration);
// GaugeMetrics.gauge('active.orders').set(count);Monitoring Dashboards
Section titled “Monitoring Dashboards”Pre-configured Grafana Dashboards
Section titled “Pre-configured Grafana Dashboards”- Service overview (latency, throughput, errors)
- Message bus metrics (queue depth, message rate)
- Database metrics (query time, connection pool)
- Cache metrics (hit rate, memory usage)
Best Practices
Section titled “Best Practices”-
Use Correlation IDs
- Automatically propagated by platform
- Include in all logs
- Track requests across services
-
Monitor Key Metrics
- Command latency (target: <100ms)
- Query latency (target: <50ms)
- Error rate (target: <0.1%)
-
Set Up Alerts
- Latency spikes
- Error rate increases
- Service unavailability
Troubleshooting with Jaeger
Section titled “Troubleshooting with Jaeger”Find Slow Requests
Section titled “Find Slow Requests”- Open Jaeger UI:
http://localhost:16686 - Select service
- Set min duration: 1000ms
- View slow traces
- Identify bottleneck spans
Track Errors
Section titled “Track Errors”- Select service
- Filter by tag:
error=true - View failed traces
- Check error messages in span logs