Search Platform/Weekly Updates/2023-12-08
Appearance
Summary
End of year is close, things are slowing down.
We've made good progress on WDQS Graph Split, with a completed split that is now being loaded onto our test servers. We've had a session at Wikidata Modeling days to present the project, it was well received.
We have a selection of test wikis for Search Update Pipeline (commons, fr, it, wikidata and testwiki) and we're updating Cloudelastic with the new pipeline as a low risk first end to end deployment.
What we've accomplished
Improve multilingual zero-results rate
- Integration tests and performance validation on the ICU token repair plugin. Performance overhead is between 4.5% and 6.5%, which is acceptable - https://phabricator.wikimedia.org/T332337
Search Update Pipeline
- We're updating indices on Cloudelastic with the new update pipeline for our test wikis (commons, fr, it, wikidata and testwiki) - https://phabricator.wikimedia.org/T352335
WDQS graph splitting
- Creation of split files that can be imported into test servers completed - https://phabricator.wikimedia.org/T350106
- Import of the split graph into test servers started - https://phabricator.wikimedia.org/T350465
- Write a tool that converts IGUANA test results into tabular data suited for analysis needs - https://phabricator.wikimedia.org/T351894
- We had a session about the graph split at the Wikidata Modeling Days (https://www.wikidata.org/wiki/Wikidata:Events/Data_Modelling_Days_2023#Sessions). The Graph Split proposal was well received. See the recording: https://www.youtube.com/watch?v=Krk1EcP0TyA
Misc
- WDQS Streaming Updater has good concurrency limit configuration options - https://phabricator.wikimedia.org/T346456
- WDQS Streaming Updater has a way to tag side output events to allow reprocessing - https://phabricator.wikimedia.org/T347515