Search Platform/Goals/OKR 2024-2025 Q1
Things we ship
WDQS Split the Graph
Description: The Graph Split project started last fiscal year as a way to scale Wikidata Query Service. Most of the implementation is done, but some of the update pipeline needs to be completed.
Migrating queries to the new endpoints will be done by our users, but will require support from our team.
Timeline: Q1+Q2+Q3 Doc:
- SDS 3.1: Wikidata Graph Splitting
- WDQS Graph Splitting Plan
- WDQS Graph Splitting - Analysis needs
- WDQS - Scaling Plan & Risk Mitigation
Phab: Splitting the graph in WDQS
Milestones:
- Graph split endpoints are available, and updater in real time
- Graph split endpoints are production ready (redundancy, monitoring, automation)
- Traffic using scholarly subgraph is reduced by X% on the full graph endpoint
Dependencies:
- WMDE for analysis and communication with our communities
- Data Platform Engineering SRE for servers and data loading
Improve multilingual zero-results rate
Description: Following the work of unpacking all the language analyzers, we can now work on harmonising language processing across wikis and deploy global improvements.
To ensure that our users can more easily understand how search is working and to ensure that improvements to search are replicated across languages, we want differences in how we treat different languages to be linguistic, not accidental. For example: how we treat CamelCase or apostrophes should be the same in all languages.
We will continue to focus on increasing recall (with decreasing zero-results rates and increasing number of results as proxy metrics), assuming that increased recall improves the odds of content discovery, especially on smaller language wikis. Note that this is an imperfect KPI for search relevancy overall.
Phab: https://phabricator.wikimedia.org/T219550
Milestones:
- Complete https://phabricator.wikimedia.org/T332342 Standardize ASCII-folding/ICU-folding across analyzers
Reduce Wikidata search lag on edits
Description: The new Search Update Pipeline is increasing the indexing lag for Wikidata compared to the previous pipeline. It is still well within our usual expectations (we don’t have a formal SLO for indexing lag). Wikidata has some editing workflows that rely on Search and low update lag. Search is not meant to be updated with low latency. While a long term solution needs to be implemented by reducing the dependency of Wikidata on Search for editing workflows, we will implement a short term workaround by prioritising Wikidata edits.
Timeline: Q1
Phab: https://phabricator.wikimedia.org/T365831
Dependencies:
- DPE SRE for deployment
Migrate Private wikis to the new Search Update Pipeline
Description: The new SUP is deployed for all public wikis. Private wikis are managed differently and currently don’t provide an update stream or a method for internal services to run read-only api requests on private wikis.
Migrating all wikis to a single update pipeline will simplify operation and allow us to fully remove unused code, reducing our complexity.
Timeline: Q1(+Q2?)
Phab: https://phabricator.wikimedia.org/T341332
Dependency: Data Engineering
Support WE3.1
Description: Web team will experiment with the Search box empty state, in particular with article recommendations that are likely based on ArticleRecommendation / MoreLike. We support that effort in terms of engineering consulting and as needed by addressing potential scaling issues.
The exact work needing to be done will depend on the needs of the web team.
Phab: https://phabricator.wikimedia.org/T369632
Convert Graphite metrics to Prometheus
Description: Observability team is transitioning from Graphite to Prometheus for alerting. This requires us to migrate the metrics published by CirrusSearch and used for alerting to Prometheus.
Timeline: Q1
Phab: https://phabricator.wikimedia.org/T350597
Things we plan
OpenSearch Migration
Description: Elasticsearch is a dead end for us due to licensing issues (SSPL). We need to eventually migrate to OpenSearch. Outside of the obvious need to run a supported and mostly recent software stack, the migration to a recent version of OpenSearch will introduce new capabilities, in particular regarding Vector Search, which is a topic that has been discussed on multiple occasions at the Foundation.
The goal here is to get a clear understanding of what is required to execute a migration to OpenSearch, not to implement that migration.
Define use cases for vector based search
Description: There have been multiple conversations at WMF about the use of Vector Search. We need to clarify what is possible, what might make sense and what are the steps to get there.