Search Platform/Weekly Updates/2024-06-14
Appearance
Summary
Search Update Pipeline is mostly done' with a few minor tuning still going on now that all wikis are using it.
First data reload of the WDQS graph split with the new automated process is ongoing. Work on the split update pipeline in progress.
What we've accomplished
Search Update Pipeline
- wrote a small python script to send rerender events scanning the allpages API, reindexed all 1.3M lexemes using it - https://phabricator.wikimedia.org/T365692
- Rate limiting:
- Use specific user agent in mediawiki api requests - https://phabricator.wikimedia.org/T366363
- Tested back-fill of production-like volume of updates on staging: Requests exceeding 1k/s are blocked - https://phabricator.wikimedia.org/T362310
WDQS graph splitting
- Testing of WDQS data reload. https://phabricator.wikimedia.org/P64016
- Ensure that WDQS query throttling does not interfere with federation - https://phabricator.wikimedia.org/T361950
Search Metrics
- Collaboration with Product Analytics to improve metrics on mobile apps - https://phabricator.wikimedia.org/T259883
Misc
- Fixed a bug in wikimedia-eventutilities to allow parsing iso8601 dates with trailing timezone offsets instead of a trailing 'Z' that caused the producer to fail after sending such formats
- Made all the subgraph analysis hive table readable by others (requested by product analytics)