Search Platform/Weekly Updates/2023-05-12
Appearance
Summary
We had a major lag issue on Wikidata Query Service codfw cluster not being updated (see below for details). This took significant time and focus to resolve. Work on the Search Update pipeline continues, with conversations with other consumers of the event stream to implement changes that are needed for search.
What we've accomplished
Search - Analysis
- Starting work on putting in place the required infrastructure to measure the planned improvements.
- Estonian reindexing complete! It enabled a new stemmer (and other bits) and had a pretty big impact, roughly similar to the Bengali new stemmer—1 in 6 previous Estonian Wikipedia zero-results queries get results. Almost 1 in 3 of non-zero-results queries get more results, and more than 1 in 8 queries had their top result change. - https://phabricator.wikimedia.org/T335704
- A user asking about stop words prompted the creation of documentation - https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Stopwords_2023
Operations / SRE
- Ironed out the last performance issues of the WDQS SLO dashbaord - https://phabricator.wikimedia.org/T313751
- We had a major lag incident on WDQS, where the codfw cluster stopped being updated from Friday to Monday. While it was still serving queries, it was answering them with outdated data. The incident report is still being written. Our current understanding is that the whole thing was first triggered by a bug in Flink (https://issues.apache.org/jira/browse/FLINK-22597), probably leading to higher memory usage during recovery, which led to the service not recovering as it should have. There are also a number of learnings on how to manage those kinds of incidents, where WDQS is serving traffic, but with outdated data. Follow up on https://phabricator.wikimedia.org/T336134 and https://wikitech.wikimedia.org/wiki/Incidents/2023-05-05_wdqs_not_updating_in_codfw
- Improve ban cookbook https://phabricator.wikimedia.org/T331303
- Decommission airflow (WIP) https://phabricator.wikimedia.org/T333697
- Review hardware requests for next fiscal year - https://phabricator.wikimedia.org/T334210
Misc
- Some questions on wikitech-l about why Search isn't using word embedding / vector search (https://lists.wikimedia.org/hyperkitty/list/wikitech-l@lists.wikimedia.org/thread/PJGRN2H2CUSACZH4QSI2VOV3WBDZ3J6O/). The search platform team has been called out a few times on that thread. We are working on a reply - https://docs.google.com/document/d/1XLzHNqwEyD42mw3Zj4boJb9grh56UWhGYyJrm-g1VOg/edit#