Jump to content

Metrics Platform/Custom schemas

From Wikitech

In the Metrics Platform, schemas are used to validate event data. This page documents how to create a custom schema to use with the Metrics Platform.

When to create a custom schema

Metrics Platform provides reusable base schemas designed to fit the needs of most instruments:

These base schemas provide flexible interaction data properties that you can customize to provide the event data needed for your instrument. Using a base schema is the quickest way to launch an instrument.

If your event requires data that is not supported by the base schemas, follow this guide to create a custom schema. You can also reach out to the Data Products team for support in deciding if a custom schema is needed.

Custom schemas for specific product areas

In addition to the base schemas, there are reusable schemas designed to support the needs of a given product area. Using one of these schemas may provide the event data you need for your instrument and offer a quicker path to launching an instrument than creating a custom schema.

Available schemas for specific product areas:[1]

Ownership

While Metrics Platform owns and maintains the base schemas, product and feature teams who create custom schemas and fragments are responsible for owning, updating, and maintaining them. This includes manually updating the version numbers of included schema fragments to the latest version.

Components of a custom schema

Custom schemas are a combination of schema fragments. At a high level, creating a custom schema involves designing a new schema fragment that meets your specific needs and combining your new fragment with fragments provided by Metrics Platform. This system of fragments helps standardize event data and reduce duplication between schemas.

Common fragment

All custom schemas must include the common fragment. This fragment provides the standard set of properties used by all events, including interaction data and contextual attributes.

In the schema definition, you can see that the common fragment is itself composed of two smaller fragments: /fragment/analytics/common and /fragment/analytics/product_metrics/experiments (Fragments all the way down!)

Platform fragments

Your custom schema should also include either the web or app fragment. These fragment provide contextual attributes specific to each platform, such as the mobile app version.

Creating a custom schema

To create a custom schema:

  1. Design a new fragment
  2. Create a schema
  3. Configure an event stream to use the new schema
  4. Write instrument code that specifies the new schema

Ensure that you follow the instructions for submitting changes to data-engineering/schemas-event-secondary.

1. Design a new fragment

Once you've planned which data you need in your custom fragment, create a directory in the data-engineering/schemas-event-secondary repository under jsonschema/fragment/analytics/product_metrics. See this example patch.

See the Event Platform data modeling guidelines for help designing your fragment. Keep in mind that any properties marked as required in your fragment must be included in each event in order to validate against your custom schema. For example, if you mark an object property as required, your instrument code must submit that object with every event, even if the object is empty ({}).

2. Create a schema

Once you've added your custom fragment, create a schema that combines your new fragment, the common fragment, and a platform fragment as a new directory under jsonschema/analytics/product_metrics.Is this correct or should it be jsonschema/analytics/mediawiki/product_metrics? See this example patch.

See the Event Platform fragment guidelines for help setting up a new schema.

For example, this custom schema, named analytics/product_metrics/web/translation, includes:

  • the common fragment
  • the web fragment
  • a custom translation fragment

Note that x.x.x should be replaced with the latest version of each fragment.

title: analytics/product_metrics/web/translation
description: Web schema for wiki tranlation workflows.
$id: /analytics/product_metrics/web/translation/1.0.0
$schema: 'https://json-schema.org/draft-07/schema#'
type: object
allOf:
  - $ref: /fragment/analytics/product_metrics/common/x.x.x
  - $ref: /fragment/analytics/product_metrics/web/x.x.x
  - $ref: /fragment/analytics/product_metrics/translation/1.0.0

3. Configure an event stream

In order to use a custom schema, it must be configured as part of an event stream. See Metrics Platform/Stream configuration for information on how to declare a Metrics Platform stream. All instruments using custom schemas must complete this stream configuration step, including instruments that have been configured using the experimentation lab.

If a custom concrete schema is created with a Metrics Platform base fragment, and custom data is passed into an API submit method along with the custom schema id and stream name, there needs to be production stream configuration that specifies the schema title and associates it with the stream name, as well as identify the event names and/or event name prefixes that will enable an instrument to submit the event to its corresponding stream.

Existing instrument that have already collected event data cannot be switched to use a custom schema. In this case, you must create a new stream that uses the custom schema.

4. Write instrument code

Each of the Metrics Platform clients provides methods to submit events. When using a custom schema, the ID of the custom schema must be passed to these methods as a parameter. For the example above, the schema ID is /analytics/product_metrics/web/translation/1.0.0.

We recommend that you follow the process to validate events when writing instrument code that uses a custom schema.

Top-level properties

By passing in custom data objects as parameters into available submit methods, each of the client libraries parses custom data as top-level properties to be submitted with an event.

Formatting requirements

Custom data in this context must be passed in as key-value pairs with the key formatted as a string which serves as the name of the custom data property, and its corresponding value type being one of the currently allowed enums:

  • String
  • Integer
  • Boolean
  • Null

Note that any custom data value submitted that does not conform to one of the allowable enums will log an error in the client library and it will be omitted from the event which could result in the event being invalidated if the said custom data property is required.

Namespacing

As a convention, it is recommended that wherever a product/feature team keeps their schemas in the secondary repository, a product_metrics directory is created for holding these custom schemas that make use of the Metrics Platform base schemas and fragments.

During onboarding to Metrics Platform, a product/feature team will typically port an existing instrument to submit events via Metrics Platform in parallel with how that instrument currently submits events in order to perform data parity checks. By placing custom schemas in a product_metrics directory alongside where the current schema resides helps organize Metrics Platform-based schemas until complete adoption/migration is undertaken. Note that these custom schemas should be considered owned by the product team.

References

  1. See the contents of the web and app directories in analytics/product_metrics