Event Platform/Flaws
Like any system, Event Platform has its flaws. This page aims to document them.
It has inherited design decisions that may have been good at the time, but with hindsight are not.
meta
field
The meta
field is very confusing. It was originally created as a way of referencing a single subobject field that contained fields EventBus events needed to operate. This allowed for easier copy/pasting the field between different schemas, which we had to do before we had jsonschema-tools and schema $refs and materialization.
Ideally, the fields in meta
would be top level and named appropriately. If we could get rid of meta
we would. Doing so would be a lot of work. In 2023-01 the Event Platform team considered doing this work, but decided not to. (See also Event Platform/Decision Log decision #006).
meta.domain
field
The description of this field (as of 2024-02) is "Domain the event or entity pertains to".
The semantics of this field were never well defined. It is often used to hold a domain name the event pertains to, but it is also sometimes used as the 'business' domain. It is also used by the WMF canary (heartbeat) events system.
dt fields
Every event needs to have an 'event time' field, specifying the time at which the event happened. Ideally, this would be the only field we'd need to require for all events. We would then use this field for Kafka timestamps and Hive hourly partitioning.
However, we accept events from unauthenticated external clients, so we can't totally trust them. A client might send an event with a timestamp in the distant past or future, which would cause issues for data ingestion. In cases where we can't trust the event time, we fall back to using the server side receive time.
So we need 2 timestamp fields: event time and server receive time. As of 2023-06, the intention is to always use meta.dt
for server receive time and dt
for event time. These field names are not particularly descriptive, but creating and using new dt fields is a non trivial amount of work.