Search Platform/Weekly Updates/2023-11-10
Appearance
Summary
Search Update Pipeline: we're resolving issues as they come up in deployment. We still don't have a stable data pipeline, but we're getting there.
WDQS graph split: We've started investigating the different test framework option so that we can validate the split on a subset of queries. The Graph split itself needs a bit more work on some corner cases, but we're getting there. Collaboration with WMDE is working well, we're getting support in terms of analysis.
What we've accomplished
Improve multilingual zero-results rate
- work started on repair multi-script tokens split by the ICU tokenizer - https://phabricator.wikimedia.org/T332337
Search Update Pipeline
- Number of fixes and improvements to get production ready:
- Filter/drop canary events from update stream - https://phabricator.wikimedia.org/T349591
- logging: add correlating information - https://phabricator.wikimedia.org/T348211
- pick kafka consumer offset defaults - https://phabricator.wikimedia.org/T348112
- Use flink's AsyncRetryStrategy instead of custom retry logic - https://phabricator.wikimedia.org/T347545
- Fetch: Handle Timeout of AsyncAwaitOperator - https://phabricator.wikimedia.org/T347543
WDQS graph splitting
- Initial data load of the full graph on test servers is still in progress (started on 2023-10-24). Roughly 70% done so far, but we expect the process to slow down over time - https://phabricator.wikimedia.org/T347504
- We will be hosting a session about Graph Split at Wikidata Data Modeling Days -https://www.wikidata.org/wiki/Wikidata:Events/Data_Modelling_Days_2023#Sessions
- Initial version of the graph split code, needs more work to support corner cases - https://phabricator.wikimedia.org/T347989
Operations
- Java security restarts https://phabricator.wikimedia.org/T350703
Misc
- rdf-streaming-updater migration: now running in staging https://phabricator.wikimedia.org/T349095
- Ensure mjolnir can work on Python 3.9 or later (as part of the migration to Debian Bullseye) - https://phabricator.wikimedia.org/T346373
- Some of our CI pipelines have been migrated to Java 11 - https://phabricator.wikimedia.org/T350587