Observability And Analytics
Operate with evidence
Builder Insights needs more than application logs. It needs a clear observability model that tells engineering whether the system is healthy, tells product whether the platform is being used, and tells leadership whether adoption and signal quality are improving.
Observability layers
- reliability telemetry for engineering and operators
- usage analytics for product and leadership
- audit visibility for sensitive actions and access changes
- reporting definitions that stay consistent over time
Four layers of visibility
| Layer | Audience | Primary question |
|---|---|---|
| Reliability telemetry | engineering and operators | is the platform healthy right now? |
| Product analytics | product and leadership | are people using the system in valuable ways? |
| Security and audit visibility | access and security-minded admins | who changed access or performed sensitive operations? |
| Executive reporting | leadership | what trends and adoption signals matter over time? |
Recommended instrumentation model
Every critical request path and privileged workflow should emit structured logs with enough context to support triage.
- request and route context
- auth and role information where appropriate
- error payload summaries
- sync and retry state transitions
Metrics should make health visible before users tell the team something is broken.
- latency and error rates
- uptime and request success ratios
- sync success and failure counts
- queue drain and reconnect behavior
Usage analytics should explain whether Builder Relations teams are adopting the workflow and whether the product is creating real operating value.
- active users by role and org slice
- capture volume by event, team, and time period
- offline usage and recovery rates
- dashboard and reporting engagement
Sensitive actions should be traceable long after the original operator has forgotten they happened.
- login and failed-auth events
- role changes and entitlement changes
- admin-only route access
- exports and high-sensitivity operations
Decision tracker
| Decision | Current recommendation | Owner | Status |
|---|---|---|---|
| Telemetry stack | approved internal logging, metrics, and tracing stack | Platform Engineering and Application Engineering | Needs decision |
| Reliability dashboard ownership | engineering and operators | Application Engineering | Drafted |
| Product usage dashboard ownership | product and leadership-facing analytics owner | Product and Builder Relations Ops | Drafted |
| Audit-event scope | auth, role changes, privileged routes, exports | Security and Application Engineering | Drafted |
Priority dashboards
Track auth, sync, API health, and the paths most likely to break user trust first.
Track adoption, capture volume, and role-based activity so the team can tell whether the product is actually being used.
Track role changes, privileged actions, and unusual access patterns that matter operationally and defensibly.
Track the higher-level patterns that connect product usage to business value and field intelligence outcomes.
Minimum telemetry baseline
QA check
What should exist before broader internal rollout
- structured logs for the critical request paths
- metrics for auth, sync, queue state, and API health
- an explicit definition of core usage metrics
- audit events for role changes and sensitive operations
- alerting for major reliability regressions
Stakeholder questions to answer
Engineering and platform owners
- what logging, metrics, and tracing stack is approved internally?
- what alerting thresholds matter for auth, sync, and internal APIs?
- how should telemetry from Kanopy-hosted services be collected?
Product and leadership owners
- which usage metrics best reflect adoption and value?
- how should Builder Relations activity be sliced by role, event, or reporting line?
- which dashboards need leadership-ready reporting versus operator-only detail?
Security and compliance-minded owners
- which access and privilege events require audit retention?
- what export, admin, or user-management actions count as sensitive?
Common failure mode to avoid
Risk
Do not confuse analytics with observability
Usage analytics can tell you whether people are using the platform. They cannot replace the logs, metrics, and alerts needed to keep the platform healthy. Treat reliability telemetry and product analytics as related but distinct layers.