Search Platform/Weekly Updates/2024-10-11
Appearance
Summary
Now that the dust has settled on licensing changes between OpenSearch and Elasticsearch, we're starting preliminary work on migrating to OpenSearch.
Exposing the stream of RDF updates for WDQS is moving forward. The main step will be reworking the schema of those updates. We want to get this right before making it public, as it will be much harder to change it once it is used by more clients. Some level of bike shedding will occur.
What we've accomplished
WDQS Expose RDF stream publicly
- Working on refactoring the input/output streams to use eventutilities - https://gerrit.wikimedia.org/r/q/topic:%22output-schema-v2%22
Improve multilingual zero-results rate
- ICU folding before-and-after analysis is done! 100% of reasonably-sized targeted samples showed improvement in ZRR! 84% of general query samples showed improvements (mostly ZRR, some increase in results, a few top result changes). [TJ]
- Details in https://phabricator.wikimedia.org/T375557T375557. More details on-wiki:
- Side note: weighted vs unweighted ZRR doesn't matter much, except for Vietnamese, which has some craaaaaazy bot traffic that gets through into the sample.
Search backend replacement
- Have the basics of cindy's environment running with a new plugins.deb and cirrus-opensearch image. Getting the basics together having new repos created in gerrit, new branches where appropriate. https://phabricator.wikimedia.org/T372769
- Re-affirmed decision to migrate to OpenSearch. Update at https://www.mediawiki.org/w/index.php?title=Wikimedia_Search_Platform%2FDecision_Records%2FSearch_backend_replacement_technology&diff=6793828&oldid=6737032 and https://phabricator.wikimedia.org/T370661, with addition of a checklist item in https://phabricator.wikimedia.org/T370148 for planning a migration guide later.