MCPG emits three streams:
- Prometheus metrics — counters and histograms, fleet-aggregated
- OpenTelemetry traces — per-request spans with W3C trace context
- Audit ledger — every policy decision, with Ed25519 chain signatures
All three are on by default and can be disabled or rerouted independently.
Prometheus
The control plane exposes a Prometheus exposition endpoint at:
GET /v1/orgs/{org}/metrics?granularity=hour
Returns the rolled-up metrics for the entire fleet, labeled by org / workspace / plugin / tool:
# HELP mcpg_tool_calls_total Total tool invocations
# TYPE mcpg_tool_calls_total counter
mcpg_tool_calls_total{org="default",workspace="prod",plugin="github",tool="list_repos"} 4827
# HELP mcpg_tool_errors_total Total tool errors
# TYPE mcpg_tool_errors_total counter
mcpg_tool_errors_total{org="default",workspace="prod",plugin="github",tool="list_repos"} 12
# HELP mcpg_tool_latency_ms Tool latency quantiles
# TYPE mcpg_tool_latency_ms gauge
mcpg_tool_latency_ms{quantile="0.5",org="default",plugin="github",tool="list_repos"} 84
mcpg_tool_latency_ms{quantile="0.95",org="default",plugin="github",tool="list_repos"} 312
mcpg_tool_latency_ms{quantile="0.99",org="default",plugin="github",tool="list_repos"} 891
Scrape config:
- job_name: mcpg-fleet
scrape_interval: 30s
metrics_path: /v1/orgs/default/metrics
params:
granularity: [hour]
static_configs:
- targets: ['mcpg-cp.svc.cluster.local:7843']
The gateway itself also exposes Prometheus metrics directly at :9090/metrics for
local-only scraping (legacy compatibility) — but for fleet observability, prefer the CP
endpoint.
OpenTelemetry
Every request creates a span. Spans are exported via OTLP to whichever collector you point at:
observability:
otlp:
endpoint: https://otel-collector.example.com:4317
headers:
authorization: "Bearer ${OTEL_TOKEN}"
sample_rate: 0.1 # 10% of requests
W3C trace context propagates inbound and outbound — if your client sends traceparent,
MCPG joins the trace. If your upstream tool accepts traceparent, MCPG sends it.
Audit ledger
Every policy decision (permit, deny, with-obligations), every plugin lifecycle event, and every operator action writes an audit row. The ledger is:
- Per-org chained — each row references the previous row's hash, forming a per-org Merkle chain. Tampering with any row breaks the chain.
- Ed25519-signed — the chain head is signed periodically. Verifiers can check the signature without trusting the database.
- Retention-bounded — Community: 30 days, Pro: 90, Team: 180, Enterprise: 7 years.
Query via the HTTP API:
curl 'https://mcpg-cp.example.com/v1/orgs/default/audit?action_prefix=tool.&limit=100'
Filter by action, actor, time range, or instance.
Per-call telemetry
The control plane stores per-call samples (with BLAKE3-hashed error messages — no
plaintext leakage) in the tool_invocations table. Hourly + daily rollups serve the
metrics endpoint and the dashboard.
For Enterprise customers with payload_capture entitled, request and response payloads
are encrypted per-tenant using AES-256-GCM and stored in tool_invocation_payloads.
The dashboard surfaces them with tenant-bound decrypt-on-view.
Drilling down
The dashboard's Tool activity view lets you:
- Filter by plugin, tool, outcome, or time range
- Drill from a metric anomaly to the specific samples that contributed
- Open an audit trace for any sample showing identity → policy → dispatch
- Compare arrival times to processing times (lag detection)
Alerting
Wire your Prometheus → Alertmanager pipeline as usual. Common alerts:
- alert: MCPGToolErrorRate
expr: |
sum(rate(mcpg_tool_errors_total[5m])) by (plugin, tool)
/ sum(rate(mcpg_tool_calls_total[5m])) by (plugin, tool)
> 0.05
for: 5m
annotations:
summary: "High error rate on {{ $labels.plugin }}.{{ $labels.tool }}"
See observability deep-dive for the architecture behind the metrics pipeline.