Search Platform/Weekly Updates/2023-04-07
Appearance
Summary
We are starting a new quarter, with new goals. The plan is not entirely formalized yet, but:
- Now that Spark 3 migration is completed, we will focus engineering efforts on the Search Update Pipeline
- Once all search analysis chains are unpacked (should be completed shortly), we are going to address a few long standing issues, with the goal of reducing zero result rate or increasing the number of results returned for most languages.
- Defining and implementing SLOs for Search.
- Create a plan to split the WDQS graph.
What we've accomplished
Search Analysis
- Brazilian Portuguese unpacking completed, still needs reindex -https://phabricator.wikimedia.org/T325092
Search Update Pipeline
- Demonstrate how to run rdf-streaming-updater using the new flink-app helm chart - https://phabricator.wikimedia.org/T328675
- Starting to setup a Java CI pipeline on Gitlab. Multiple roadblocks on getting the proper docker images in place - https://phabricator.wikimedia.org/T326318
Misc
- Start gathering hardware requests for next fiscal year - https://phabricator.wikimedia.org/T334207
- Fix broken multi-byte characters presented in the end of search snippets (third party usage of Mediawiki) - https://phabricator.wikimedia.org/T333653 (Thanks Func86 for the fix!)
- Discussion during the Search Platform Office Hours around the support of OpenRefine reconciliation API and the documentation of the WDQS data reload process.