Search Platform/Weekly Updates/2023-12-22
Summary
Very slow week, most of the team is already on holiday.
Search Update Pipeline: We're ready to enable 3 more wikis (dewiki, frwiktionary, kuwikitionary) to our current test setup (commonswiki, frwiki, itwiki, testwiki, wikidatawiki). We will wait until after the holiday to actually enable them. Those tests are end to end, updating the production cloudelastic indices, but not yet updating the main search cluster. We are still working on understanding the throughput limitations of the pipeline, but this isn't blocking the rollout yet see https://phabricator.wikimedia.org/T353460).
WDQS Split the Graph: We had a meeting with WMDE about query analysis work, so that we can have a set of queries to validate the impact of a graph split. This work was delayed due to other priorities, but we should have enough to get started on validation in early January. As our test framework isn't entirely ready yet, this will have no impact on our timeline.
What we've accomplished
Search Update Pipeline
- Second batch of wikis ready to be enabled on Cloudelastic - https://phabricator.wikimedia.org/T351503
- Pull request sent upstream to allow more flexibility in flushing bulk requests - https://issues.apache.org/jira/browse/FLINK-33857
WDQS graph splitting
- Discussion with WMDE on query log analysis to extract queries for our tests, we should have input data for our test in January - https://phabricator.wikimedia.org/T349512
- We're seeing instability on our test servers, which we do not understand yet, we suspect something low level, since they are running the same code and configuration as the production machines - https://phabricator.wikimedia.org/T352878
- First version of a Jupyter notebook ready to record queries for analysis - https://gerrit.wikimedia.org/r/c/wikidata/query/rdf/+/981550/7/rdf-spark-tools/docs/query_recorder.md