Search Platform/Weekly Updates/2023-10-20
Appearance
Summary
We've sent our first public communication about the WDQS graph split experiment (https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_backend_update/October_2023_scaling_update). The project is starting, with infrastructure already well on the way and software engineering making sure we have a working development environment. Things should pick up speed soon.
Search Update Pipeline is being deployed on our staging wikikube environment, we're ironing out the last configuration, network routes, and optimizing the code for production.
Search analysis work is being optimized. Since deployment and reindexing is an expensive operation, we want to bundle all changes together.
What we've accomplished
Search Update Pipeline
- All Flink operators are named and identified (part of the production recommendations) - https://phabricator.wikimedia.org/T346717
- Max parallelism is set on all operators with a state (part of the production recommendations) - https://phabricator.wikimedia.org/T346718
- We decided that we've learned enough about Flink deployment by using the WDQS updater as a test application. While we will still need to migrate it, we decided to remove this dependency from the Search Update Pipeline work. https://phabricator.wikimedia.org/T326409
WDQS graph splitting
- Communication about the project sent to our communities. No reaction yet - https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_backend_update/October_2023_scaling_update
- We've contacted some people from Scholia, they are open to discussion, we will schedule a meeting.
- Tests servers are configured, initial (full) dataset is being loaded (which could take a few weeks / months)
- Development environment in place, we can start working on the graph split job itself.
Operations
- Failure of WDQS updater after switching to the newer kafka connector API. The cause is related to a bug in older versions of kafka that forget consumer offsets on idle streams - https://phabricator.wikimedia.org/T349147