Jump to content

Search Platform/Goals/OKR 2023-2024 Q3

From Wikitech

Q3 Overview

The work on Search Update Pipeline will be ramping down, with deployment to all wikis (except private wikis).

WDQS Split the Graph will be our main focus, with work on testing and validating the split, both internally by running test queries and externally in collaboration with our communities.

The improvements to multilingual zero-results rate will continue as planned.

Q3 OKRs

Things we ship

Search Update Pipeline (SUP)

Higher level Objective: Users can reliably query and search for content, so that they can depend on Wikimedia Foundation projects as knowledge infrastructure (ERF: Technical Infrastructure, Product Platform)

KR:

  • 90% of search update load is migrated away from JobQueue at the end of Q3
  • Search update lag is < 10 minutes 95% of the time over a 3 month period for the wikis migrated to SUP
  • Revision based events have < than 100 failures per day
  • Page refresh based events have < 1000 failures per day

Description: The search update pipeline is currently broken, resulting in updates being sporadically lost often enough that users are reporting bugs. The Saneitizer is supposed to resolve this, but is running with ~4 weeks of lag. Work needs to be done to understand the current rate of lost updates so we can benchmark the SUP against it. The current update pipeline processes Mediawiki updates as a stream, with < 5 minutes of lag. Most additional data is processed as a batch, with a lag > 1h. By using streams for both kinds of sources, we want to reduce the lag of secondary data sources from > 1h to < 10 minutes.

Docs:

Phab:

Milestones:

  • Migration of all wikis (except private wikis) by the end of Q3

WDQS Split the Graph

Higher level Objective: SDS 3.1

Description: If we expose 1 experimental service, with a proposed split of the Wikidata graph (scholarly articles vs the rest), we can get feedback from our communities and measure performances and functionalities of queries federated across graphs. To simplify implementation, those graphs would not be updated and thus could only be used to validate query patterns. We can identify query types that are running better, or worse, or not at all in the context of this split. Feedback from our communities will help us understand the limitations of the split and help tune which entities need to be present in which graph. Success is defined as having a clear definition of the rules used to split the graph, accepted in principle by our communities. The size of the largest subgraph needs to be less than 75%, with the number of new federated queries being less than 5% of overall query numbers.

Doc:

Phab:

Milestones:

  • Data loaded into test servers
  • Test servers exposed to the internet
  • Define a set of test queries (WMDE)
  • Design and execute test plan
  • Refine split based on learnings
  • Decision on what kind of feedback we want from our communities
  • The running experiment is presented to our communities
  • By the end of Q3, we are confident that the proposed split is the right solution moving forward

Dependencies:

  • WMDE for analysis and communication with our communities
  • Data Engineering for analysis and query testing framework
  • Data Platform Engineering SRE for servers and data loading
  • Data Persistence for intermediate storage in the data loading process (Swift?)
  • Movement Communications for communication with our communities

Improve multilingual zero-results rate

Objective: Searchers of emerging languages can search in their own language

KR: Increase recall (reduce ZRR and/or increase number of results returned) for 75% of relevant languages.

Description: Following the work of unpacking all the language analyzers, we can now work on harmonising language processing across wikis and deploy global improvements. To ensure that our users can more easily understand how search is working and to ensure that improvements to search are replicated across languages, we want differences in how we treat different languages to be linguistic, not accidental. For example: how we treat CamelCase or apostrophes should be the same in all languages.

We will continue to focus on increasing recall (with decreasing zero-results rates and increasing number of results as proxy metrics), assuming that increased recall improves the odds of content discovery, especially on smaller language wikis. Note that this is an imperfect KPI for search relevancy overall.

Phab: https://phabricator.wikimedia.org/T219550

Milestones:

  • Reimplement camelCase and acronym processing as filters (T346051) [Done in Q2, but not deployed yet]
  • Development and testing of new filters in new plugin
  • Enable fallback versions of camelCase and acronym processing in analysis config
  • Complete code to repair multi-script tokens split by the ICU tokenizer (T332337) [Will overflow in Q3]
  • Deploy updated new plugin (for both T346051 and T332337) [Not done]
  • Reindex wikis with recent improvements (T342444) [Not done]
  • Analysis of impacts of reindexing with apostrophe, camelCase, and acronym/WBH improvements, plus * ICU token repair
  • Complete ASCII-folding/ICU-folding harmonisation analysis & code (T332342)
  • Reindex for ASCII-folding/ICU-folding

Search SLOs

Description: To ensure that we can understand the quality of our search and invest the appropriate efforts in operating it, we want to have clear SLOs for key aspects of the Search experience.

Spill over work from Q2.

Doc: Search SLOs

Phab: https://phabricator.wikimedia.org/T335576

Milestones:

  • Standard SLO dashboard is created