Things we ship

WDQS Split the Graph

Description: The Graph Split project started last fiscal year as a way to scale Wikidata Query Service. Most of the implementation is done, but some of the update pipeline needs to be completed.

Migrating queries to the new endpoints will be done by our users, but will require support from our team.

Timeline: Q1+Q2+Q3 Doc:

Phab: Splitting the graph in WDQS

Milestones:

Graph split endpoints are available, and updater in real time
Graph split endpoints are production ready (redundancy, monitoring, automation)
Traffic using scholarly subgraph is reduced by X% on the full graph endpoint

Dependencies:

WMDE for analysis and communication with our communities
Data Platform Engineering SRE for servers and data loading

Improve multilingual zero-results rate

Description: Following the work of unpacking all the language analyzers, we can now work on harmonising language processing across wikis and deploy global improvements.

To ensure that our users can more easily understand how search is working and to ensure that improvements to search are replicated across languages, we want differences in how we treat different languages to be linguistic, not accidental. For example: how we treat CamelCase or apostrophes should be the same in all languages.

We will continue to focus on increasing recall (with decreasing zero-results rates and increasing number of results as proxy metrics), assuming that increased recall improves the odds of content discovery, especially on smaller language wikis. Note that this is an imperfect KPI for search relevancy overall.

Phab: https://phabricator.wikimedia.org/T219550

Milestones:

Complete https://phabricator.wikimedia.org/T332342 Standardize ASCII-folding/ICU-folding across analyzers

Reduce Wikidata search lag on edits

Description: The new Search Update Pipeline is increasing the indexing lag for Wikidata compared to the previous pipeline. It is still well within our usual expectations (we don’t have a formal SLO for indexing lag). Wikidata has some editing workflows that rely on Search and low update lag. Search is not meant to be updated with low latency. While a long term solution needs to be implemented by reducing the dependency of Wikidata on Search for editing workflows, we will implement a short term workaround by prioritising Wikidata edits.

Timeline: Q1

Phab: https://phabricator.wikimedia.org/T365831

Dependencies:

DPE SRE for deployment

Migrate Private wikis to the new Search Update Pipeline

Description: The new SUP is deployed for all public wikis. Private wikis are managed differently and currently don’t provide an update stream or a method for internal services to run read-only api requests on private wikis.

Migrating all wikis to a single update pipeline will simplify operation and allow us to fully remove unused code, reducing our complexity.

Timeline: Q1(+Q2?)

Phab: https://phabricator.wikimedia.org/T341332

Dependency: Data Engineering

Support WE3.1

Description: Web team will experiment with the Search box empty state, in particular with article recommendations that are likely based on ArticleRecommendation / MoreLike. We support that effort in terms of engineering consulting and as needed by addressing potential scaling issues.

The exact work needing to be done will depend on the needs of the web team.

Phab: https://phabricator.wikimedia.org/T369632

Convert Graphite metrics to Prometheus

Description: Observability team is transitioning from Graphite to Prometheus for alerting. This requires us to migrate the metrics published by CirrusSearch and used for alerting to Prometheus.

Timeline: Q1

Phab: https://phabricator.wikimedia.org/T350597

Things we plan

OpenSearch Migration

Description: Elasticsearch is a dead end for us due to licensing issues (SSPL). We need to eventually migrate to OpenSearch. Outside of the obvious need to run a supported and mostly recent software stack, the migration to a recent version of OpenSearch will introduce new capabilities, in particular regarding Vector Search, which is a topic that has been discussed on multiple occasions at the Foundation.

The goal here is to get a clear understanding of what is required to execute a migration to OpenSearch, not to implement that migration.

Define use cases for vector based search

Description: There have been multiple conversations at WMF about the use of Vector Search. We need to clarify what is possible, what might make sense and what are the steps to get there.