Search Platform/Weekly Updates/2023-06-16
Summary
We're getting close to the end of the quarter. Improvements to the multilingual zero results rate for Search has made progress, with most of the infrastructure to evaluate changes in place and good progress on the handling of apostrophe. The project plan for a first experiment about splitting the WDQS graph is in good shape.
As expected, the Search Update Pipeline work will overflow to next quarter. As identified earlier in this quarter, the Search SLI have been defined, but the full implementation will not fit in this quarter and has been moved to the next one.
The Data Platform Engineering virtual offsite happened this week. This marks a transition point into our new team organization. In particular, have a look at this talk if you want an overview of the systems that the Search Platform manages: https://drive.google.com/file/d/1APye8qEdYP_OKBCnb0jsqSCDemobuUBT/view
What we've accomplished
Search Update Pipeline
- Identified the kind of updates that flowed into cirrusSearchLinksPrioritized (these are "purge" api action that bots can trigger to force a LinksUpdate). We can now move those updates to a non-prioritized queue - https://phabricator.wikimedia.org/T320408
- Added event-time, meta.dt, meta.domain fields to ML revision-score based events - https://phabricator.wikimedia.org/T267648
Improve multilngual zero-results rate
- Started looking into aggressive_splitting - https://phabricator.wikimedia.org/T219108
Operations / SRE
- optimization of data transfer for WDQS - https://phabricator.wikimedia.org/T321605
- added federation with UNESCO Sparql endpoint - https://phabricator.wikimedia.org/T335994
- added federation with BNCF Sparql endpoint - https://phabricator.wikimedia.org/T336709
- migrate W[CD]QS puppet code to use profile::java - https://phabricator.wikimedia.org/T264181
Misc
- Fix capitalization issue on search result page - https://phabricator.wikimedia.org/T335551
- update current search update pipeline to changes is ML generated revision scores - https://phabricator.wikimedia.org/T333468