Search Platform/Weekly Updates/2023-12-15
Appearance
Summary
We're fixing a few more bugs on the Search Update Pipeline and improving our understanding of operations.
Graph split is well on track, with data loaded into test servers and and the starting point of a test framework.
Next week is expected to be extremely low key, with multiple team members already starting their end of year vacation.
What we've accomplished
Search Update Pipeline
- Discovered a new bug in Flink Elasticsearch integration related to a wrong estimation of the size of the update requests - https://phabricator.wikimedia.org/T353430
- Better understanding of throughput limitations of the pipeline - https://phabricator.wikimedia.org/T353460
- Fixed Envoy telemetry for SUP, we now have Envoy metrics - https://phabricator.wikimedia.org/T353224
- We now capture the http client metrics
Improve multilingual zero-results rate
- Reviewing real world data for ICU tokenizer and the required repairs. Most results are good, but we might need to exclude some specific languages or scripts.
- Thinking about internal representation of allowed and denied language combinations for ICU token repair and how to specify them in the config - https://phabricator.wikimedia.org/T332337
WDQS graph splitting
- We have the 3 test machines (wdqs102[234]) loaded with the data we need: full, wikidata-main, scholarly articles - https://phabricator.wikimedia.org/T350465
- Discussion with Traffic team about exposing test servers, we have an agreement on the strategy - https://phabricator.wikimedia.org/T351650
- First step on comparing queries between graph split and full graph: we have a Spark UDF that captures a sparql query output, writing a small comparison job that emits various stats comparing a set of queries queried against endpoint A and B. - https://phabricator.wikimedia.org/T351819
Operations
- Check & Reset the image suggestion failed airflow task - https://phabricator.wikimedia.org/T353134
Misc
- Review of Java / Scala documentation completed, as part of the JVM Languages Stewardship effort - https://phabricator.wikimedia.org/T344595