Search Platform/Weekly Updates/2024-01-19
Appearance
Summary
We've addressed all known performance issues on the Search Update Pipeline and are now generating re-render events for all public wikis. Some more work is required around private wikis, but that's not blocking the roll out.
Query analysis for WDQS Graph Split is showing some results and raising more questions. Some clients seem to have no issues (Pywikibot, SPARQLWrapper), others show queries returning different results (Listeria, MixNMatch, WikidataIntegrator, ...). More investigation are required to understand if this is a limitation of our testing strategy or of the graph split itself.
What we've accomplished
Search Update Pipeline
- New NetworkSession extension is code ready. This is one step in allowing SUP to update also private wikis. We're going through the "new extension for deployment checklist". This is probably going to be blocked on security review, which are prioritized once per quater. - https://phabricator.wikimedia.org/T345185
- All known performance issues are addressed, we are seeing the throughput that we expect.
- The main SUP grafana dashboard now links to related dashboards - https://grafana.wikimedia.org/d/jKqki4MSk/cirrus-streaming-updater / https://phabricator.wikimedia.org/T354322
- All non-private wikis featuring the cirrussearch extension publish page_rerender events by default. - https://phabricator.wikimedia.org/T351503
WDQS graph splitting
- Some progress on analyzing sparql query results differences - https://phabricator.wikimedia.org/T355040
- Our query logs do not only contains sparql queries and the sparql client used to collect the data has to be adapted to support these (ASK, CONSTRUCT, DESCRIBE) (https://gerrit.wikimedia.org/r/c/wikidata/query/rdf/+/991622)
- Getting failures due to response size, bumped the limit to 16M but still getting problems, I might stop here and simply ignore these queries
- Getting very bad numbers from Listeria and MixNMatch (34% and 17% identical respectively), avg result size is 1.6k and 8k so might explain partly why getting identical results is difficult, need more investigation to understand the cause...
- Getting pretty meh numbers for WikidataIntegrator at 88% with very small avg result size at 8, more investigation needed
- Pywikibot and SPARQLWrapper are good at 99.4% for both
- Expose 3 new dedicated WDQS endpoints: DNS entries, SSL certificate and microsite configuration are ready, but not yet working. Investigation required, but we're almost there - https://phabricator.wikimedia.org/T351650
- Spark job to export the split graph from HDFS is completed, with appropriate tests - https://phabricator.wikimedia.org/T350106
Operations
- Fix dags affected by canary events failure, it affected 4 dags + a downstream one so far on our side - https://phabricator.wikimedia.org/T337055#9470093