Search Platform/Weekly Updates/2023-10-13
Appearance
Summary
Search Update Pipeline (SUP) is getting ready for a first test deployment. We're working out through a list of small improvements needed to make the application production ready (various dependencies upgrades, better routing of requests, ...). The main focus is on getting WDQS Updater and SUP deployed, using the same underlying principles.
Work on Splitting the WDQS graph is starting. The test servers are in place, only missing data. We have a communication to our users ready to be out by next Monday. We are starting on the mechanics of splitting the graph, the current focus is on getting a development environment in place.
What we've accomplished
Search Update Pipeline
- Almost all of operations are tested, we want to run a last test to simulate an upgrade of the k8s cluster itself - https://phabricator.wikimedia.org/T342149
- Ongoing work to deploy a test instance in k8s, we are learning about k8s, tweaking configurations at various level, checking connectivity, etc - https://phabricator.wikimedia.org/T347075
- Migrate the WDQS streaming updater from FlinkKafkaConsumer/Producer to KafkaSource/Sink - https://phabricator.wikimedia.org/T326914
- Filtering added to avoid duplicate updates in a cross datacenter context - https://phabricator.wikimedia.org/T344357
- Make HTTP route configuration more flexible - https://phabricator.wikimedia.org/T345612
- Adding support for page re-render - https://phabricator.wikimedia.org/T325565
- Upgrade to flink 1.17.1 - https://phabricator.wikimedia.org/T346719
- Reference latest streams/schemas - https://phabricator.wikimedia.org/T346895
WDQS graph splitting
- Ongoing work to get test servers ready for our experiment. Basic configuration is almost done, but no data is loaded yet - https://phabricator.wikimedia.org/T347505
Improve multilingual zero-results rate
- Wrestling with Java to get filter parameters working. I've never actually had params for any of the plugins/filters I've made before. Talked about it as a future option, but it never actually came up. Need to put up a patch and a write up.
Operations
- Manual reconciliation of a few deleted Wikidata items in WDQS - https://phabricator.wikimedia.org/T342593
- Cleanup of Swift buckets used for the WDQS Streaming Updater, recovering almost 1TB of space - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater?from=1696938873511&orgId=1&to=1696952558553&viewPanel=14
- Image suggestion pipeline was blocked due to missing data. Current situation is resolved, but more discussion with Data Engineering team is needed to have a coherent strategy on how we deal with data quality and operatibility of our data pipelines - https://phabricator.wikimedia.org/T347832