Q2 Overview

Q2 will see the start of the WDQS scaling work, with a first experiment around a semantic split of scholarly articles.

The work on the Search Update Pipeline will continue, with the deployment in production of the pipeline and the migration of wikis to this new pipeline.

The improvements to multilingual zero-results rate will continue as planned.

The Search SLO might spill over for a few weeks to complete the appropriate dashboards.

Q2 OKRs

Things we ship

Search Update Pipeline (SUP)

Higher level Objective: Users can reliably query and search for content, so that they can depend on Wikimedia Foundation projects as knowledge infrastructure (ERF: Technical Infrastructure, Product Platform)

KR:

90% of search update load is migrated away from JobQueue at the end of the project
At least 1 wiki is migrated to the new SUP by the end of Q2
Search update lag is < 10 minutes 95% of the time over a 3 month period for the wikis migrated to SUP
Revision based events have < than 100 failures per day
Page refresh based events have < 1000 failures per day

Description: The search update pipeline is currently broken, resulting in updates being sporadically lost often enough that users are reporting bugs. The Saneitizer is supposed to resolve this, but is running with ~4 weeks of lag. Work needs to be done to understand the current rate of lost updates so we can benchmark the SUP against it.

The current update pipeline processes Mediawiki updates as a stream, with < 5 minutes of lag. Most additional data is processed as a batch, with a lag > 1h. By using streams for both kinds of sources, we want to reduce the lag of secondary data sources from > 1h to < 10 minutes.

Docs:

Phab:

task T317045

Milestones:

Feature complete code for the SUP
Validation of standard operations in the content of Flink / k8s / zookeeper - task T342149
Migration of the WDQS update pipeline to target deployment environment as a validation - task T326409
Deployment of SUP to Flink / k8s - task T340548
Migration of at least 1 wiki to the new SUP by the end of Q2

Improve multilingual zero-results rate

Objective: Searchers of emerging languages can search in their own language

KR: Increase recall (reduce ZRR and/or increase number of results returned) for 75% of relevant languages.

Description: Following the work of unpacking all the language analyzers, we can now work on harmonising language processing across wikis and deploy global improvements.

To ensure that our users can more easily understand how search is working and to ensure that improvements to search are replicated across languages, we want differences in how we treat different languages to be linguistic, not accidental. For example: how we treat CamelCase or apostrophes should be the same in all languages.

In Q2 we will continue to focus on increasing recall (with decreasing zero-results rates and increasing number of results as proxy metrics), assuming that increased recall improves the odds of content discovery, especially on smaller language wikis. Note that this is an imperfect KPI for search relevancy overall.

Phab: task T219550

Milestones:

Reimplement camelCase and acronym processing as filters (task T346051)
- Development and testing of new filters in new plugin
- Enable fallback versions of camelCase and acronym processing in analysis config
Complete code to repair multi-script tokens split by the ICU tokenizer (task T332337)
Deploy updated new plugin (for both task T346051 and task T332337)
Reindex wikis with recent improvements (task T342444)
- Analysis of impacts of reindexing with apostrophe, camelCase, and acronym/WBH improvements, plus ICU token repair
Stretch: Complete ASCII-folding/ICU-folding harmonisation analysis & code (task T332342)

Search SLOs

Description: To ensure that we can understand the quality of our search and invest the appropriate efforts in operating it, we want to have clear SLOs for key aspects of the Search experience.

Doc: Search SLOs

Phab: task T335576

Milestones:

Standard SLO dashboard is created

WDQS Split the Graph

Higher level Objective: SDS 3.1 (rewording of KR still in progress)

Description: If we expose 1 experimental service, with a proposed split of the Wikidata graph (scholarly articles vs the rest), we can get feedback from our communities and measure performances and functionalities of queries federated across graphs. To simplify implementation, those graphs would not be updated and thus could only be used to validate query patterns. We can identify query types that are running better, or worse, or not at all in the context of this split. Feedback from our communities will help us understand the limitations of the split and help tune which entities need to be present in which graph. Success is defined as having a clear definition of the rules used to split the graph, accepted in principle by our communities. The size of the largest subgraph needs to be less than 75%, with the number of new federated queries being less than 5% of overall query numbers.

Doc:

Phab: task T337013

Milestones:

Decision on the initial split strategy
Process to split a dump into subgraph according to the above decision
Start the loading of the full graph on one test server
Improve the data loading process to enable faster iteration with less human errors during the experiment
Define a set of test queries (WMDE)
Investigate existing tooling to execute test queries (DE? / WMDE?)
Decision on what kind of feedback we want from our communities
The running experiment is presented to our communities

Dependencies:

WMDE for analysis and communication with our communities
Data Engineering for analysis and query testing framework
Data Platform Engineering SRE for servers and data loading
Data Persistence for intermediate storage in the data loading process (Swift?)
Movement Communications for communication with our communities