Skip to main content

Observability And Analytics

Operate with evidence

Builder Insights needs more than application logs. It needs a clear observability model that tells engineering whether the system is healthy, tells product whether the platform is being used, and tells leadership whether adoption and signal quality are improving.

Observability layers

  • reliability telemetry for engineering and operators
  • usage analytics for product and leadership
  • audit visibility for sensitive actions and access changes
  • reporting definitions that stay consistent over time

Four layers of visibility

LayerAudiencePrimary question
Reliability telemetryengineering and operatorsis the platform healthy right now?
Product analyticsproduct and leadershipare people using the system in valuable ways?
Security and audit visibilityaccess and security-minded adminswho changed access or performed sensitive operations?
Executive reportingleadershipwhat trends and adoption signals matter over time?
Logs

Every critical request path and privileged workflow should emit structured logs with enough context to support triage.

  • request and route context
  • auth and role information where appropriate
  • error payload summaries
  • sync and retry state transitions
Metrics

Metrics should make health visible before users tell the team something is broken.

  • latency and error rates
  • uptime and request success ratios
  • sync success and failure counts
  • queue drain and reconnect behavior
Analytics

Usage analytics should explain whether Builder Relations teams are adopting the workflow and whether the product is creating real operating value.

  • active users by role and org slice
  • capture volume by event, team, and time period
  • offline usage and recovery rates
  • dashboard and reporting engagement
Audit events

Sensitive actions should be traceable long after the original operator has forgotten they happened.

  • login and failed-auth events
  • role changes and entitlement changes
  • admin-only route access
  • exports and high-sensitivity operations

Decision tracker

DecisionCurrent recommendationOwnerStatus
Telemetry stackapproved internal logging, metrics, and tracing stackPlatform Engineering and Application EngineeringNeeds decision
Reliability dashboard ownershipengineering and operatorsApplication EngineeringDrafted
Product usage dashboard ownershipproduct and leadership-facing analytics ownerProduct and Builder Relations OpsDrafted
Audit-event scopeauth, role changes, privileged routes, exportsSecurity and Application EngineeringDrafted

Priority dashboards

1. Reliability dashboard

Track auth, sync, API health, and the paths most likely to break user trust first.

2. Usage dashboard

Track adoption, capture volume, and role-based activity so the team can tell whether the product is actually being used.

3. Audit dashboard

Track role changes, privileged actions, and unusual access patterns that matter operationally and defensibly.

4. Leadership dashboard

Track the higher-level patterns that connect product usage to business value and field intelligence outcomes.

Minimum telemetry baseline

QA check

What should exist before broader internal rollout

  • structured logs for the critical request paths
  • metrics for auth, sync, queue state, and API health
  • an explicit definition of core usage metrics
  • audit events for role changes and sensitive operations
  • alerting for major reliability regressions

Stakeholder questions to answer

Engineering and platform owners

  • what logging, metrics, and tracing stack is approved internally?
  • what alerting thresholds matter for auth, sync, and internal APIs?
  • how should telemetry from Kanopy-hosted services be collected?

Product and leadership owners

  • which usage metrics best reflect adoption and value?
  • how should Builder Relations activity be sliced by role, event, or reporting line?
  • which dashboards need leadership-ready reporting versus operator-only detail?

Security and compliance-minded owners

  • which access and privilege events require audit retention?
  • what export, admin, or user-management actions count as sensitive?

Common failure mode to avoid

Risk

Do not confuse analytics with observability

Usage analytics can tell you whether people are using the platform. They cannot replace the logs, metrics, and alerts needed to keep the platform healthy. Treat reliability telemetry and product analytics as related but distinct layers.