Jump to content

Metrics Platform/Custom Data Monitor

From Wikitech

schemas/event/secondary stores the JSON Schema schemas for validating events submitted by analytics instrumentation. It also stores the Metrics Platform per-platform base schemas (herein "the MP base schemas") and the Legacy EventLogging base schema.

Data Products wants to track the number of high-level metrics about the schemas used for analytics instrumentation over time. Those high-level metrics are:

  1. The number of schemas
  2. The number of schemas that include the MP base schemas
  3. The average number of so-called "custom data" per schema

Here Data Products defines custom data as any property defined in the schema that isn't in either the MP base schemas or the Legacy EventLogging base schema.

Data Products runs the script and updates the spreadsheet roughly once per quarter. The work is tracked in Phabricator. Since the work is routine, they have a template task, which can be duplicated whenever it needs to be done. Clicking the button below will create such a duplicate task.

Create "Run custom data property count script…" task

  1. https://gitlab.wikimedia.org/repos/data-engineering/custom-data-monitor
  2. T354965: [Epic] SDS 2.5 Establish baselines for Metrics Platform & Experimentation success indicators
  3. T356610: [SPIKE] Determine how to capture number of instruments developed total and via Metrics Platform over time
  4. T356610: Write a script to capture custom data properties counts in secondary schemas