Gerrit/Events ingestion
This was an experiment over the summer 2022. As of March 2023 there is no plan to complete it
Gerrit is able to emit events as JSON data. With the appropriate permission, events can be streamed via the stream-events ssh command which is how Zuul CI listens for new patches and comments. Internally an action, such as commenting on a patch, populates a Java class corresponding to the event, which is serialized with Gson and emitted on the stream events channel. Each event is represented by a Java class which all extend the abstract class com.google.gerrit.server.events.Event
.
To publish Gerrit data to our Event Platform we could have:
- have the platform listen to the events directly much like CI is doing. This option has not been investigated.
- emit events directly to a Kafka topic. This got ruled out since any changes to the format of events would lead to side effects in the backend storage and the Kafka topics can take any arbitrary data
- HTTP POST to EventGate. This is the retained option which ensures data are validated and give a clear view of the data layout to the data engineering team.
EventGate can optionally validate any events it receives against a Json Schema and those schemas are required to setup the data backends (Hadoop, MySQL...). We thus had to provide a Json Schema for each events, rather than writing them manually, the schema are generated from the Java event classes. This is done by the events-wikimedia Gerrit plugin which relies on the jsonschema-generator library. Upon upgrading Gerrit, the schema are regenerated and send to the secondary event schema repository.
References
- task T304947 send Gerrit events to our data lake
- task T311615 Audit JSON schemas for Gerrit events
- Event Platform/Schemas
- Event_Platform/Schemas/Guidelines#Required_fields