Search Platform/Weekly Updates/2024-01-26
Appearance
Summary
We now have test endpoints for the WDQS Graph Split exposed to the internet (see below for links). We have not yet communicated those to our users, but we're technically ready to get more feedback from our communities about the WDQS graph split. Functional and performance testing is moving forward, we are identifying a few use cases that will require more attention.
Search Update Pipeline has a few more quality of life and performance improvements, our Cloudelastic cluster is ingesting updates without issues, we can move to our production search cluster soon!
What we've accomplished
Improve multilingual zero-results rate
- ICU token repair plugin code is done, waiting for a last code review, write up is in progress, but most information is already available in the readme of the project https://gerrit.wikimedia.org/r/c/search/extra/+/972478/8/docs/icu_tok_repair.md - https://phabricator.wikimedia.org/T332337
WDQS graph splitting
- Meeting with WMDE about sample of queries to analyze, new and larger query sample should be ready in a few days
- A few fixes to IGUANA to make tests less flaky and to integrate with WMF build system - https://gitlab.wikimedia.org/repos/search-platform/IGUANA/-/merge_requests/4
- Performance test with IGUANA is running with ~17K queries known to work on both the full and main graph - https://phabricator.wikimedia.org/T355037
- Experimental endpoints with full/main/scholarly graphs are publicly available - https://phabricator.wikimedia.org/T350464 (and sub tasks)
Search Update Pipeline
- A few more quality of file / performance improvements:
- SUP: Process (large) JSON responses non-blocking to save memory, with a nice reduction in garbage collection after this change (see task for the graph) - https://phabricator.wikimedia.org/T355066
- SUP: deployment: allow passing integers - https://phabricator.wikimedia.org/T354197
- ConsumerApplicationIT should fail when the update request payload changed - https://phabricator.wikimedia.org/T353427
- The consumer job of the SUP does not achieve its expected throughput - https://phabricator.wikimedia.org/T353460