Search Platform/Weekly Updates/2023-04-28
Appearance
Summary
Annual planning took a reasonable amount of mind share this week. Search Update Pipeline is moving forward. Some unexpected work around operations and development support.
What we've accomplished
Search Analysis
- Blog post is in the queue to be published. Hopefully in a couple of weeks.
- Enabling new Estonian analyzer, this seems to have a significant impact - https://phabricator.wikimedia.org/T332322
Search Update Pipeline
- Write a small job that copies the cirrus index data in avro to a smaller dataset written in parquet, will be used to identify inconsistencies between the cirrus index the mysql page table.
- Upgrade WDQS Streaming Updater to Flink 1.16, testing is done, deployment to production is still pending - https://phabricator.wikimedia.org/T289836
Search SLO
- Created phab tickets to track the work - https://phabricator.wikimedia.org/T335576
Operations / SRE
- Hardware requests for next fiscal year
Misc
- Add a new keyword to filter pages based on their "length", in support of article suggestion - https://phabricator.wikimedia.org/T328332
- Browser tests for CirrusSearch migrated from Vagrant to Docker. Documentation is updated https://wikitech.wikimedia.org/wiki/Cindy_The_Browser_Test_Bot. This still does not integrate as well as we would like into our CI environment (the tests run in our own WMCS instance), but it should significantly reduce the amount of ongoing work needed to keep the tests green - https://phabricator.wikimedia.org/T333183
- Search autocomplete was broken on enwiki for a couple of weeks. The problem has been fixed and new alert has been created. https://phabricator.wikimedia.org/T327199
Wrote a small patch to cleanup error handling of CirrusSearch writes (work initially started by Emmanuel).
- Tried to figure out what is going on with WCQS/commons next year, if anything.
- Met with Growth team to talk about product collaboration between reader/editor needs and search, disc, brows. There is a lot of shared thinking and overlap, and hopefully there's some room to collaborate with the team on reader discovery stuff.