Search Platform/Weekly Updates/2024-04-05
Appearance
Summary
A new quarter is starting! We haven't yet published the list of goals for this quarter, but it's mostly a continuation from the previous one.
The last 2 known patches for Search Update Pipeline are ready and need to be merged.
We have a good write up on the last batch of multilingual search improvements: https://www.mediawiki.org/w/index.php?title=User%3ATJones_%28WMF%29%2FNotes%2FLanguage_Analyzer_Harmonization_Notes&wvprov=sticky-header#Harmonization_Post-Reindex_Evaluation,_Part_I_(T359100)
What we've accomplished
WDQS graph splitting
- Our current plan to do one last iteration on how we split the graph was discussed with Scholia and seems to be accepted.
- Tentatively, it does appear that there is a configuration variable that may yield somewhat faster imports. This is being tested against a split graph side, which uses a different record format. https://phabricator.wikimedia.org/T359062#9686176
- The limitations of federation are documented: https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Federation_Limits
Improve multilingual zero-results rate
- Analyze results of harmonization - https://phabricator.wikimedia.org/T359100
Search Update Pipeline
- Stability improvements to elasticsearch writes - https://phabricator.wikimedia.org/T356933
- Integrate Saneitizer with SUP - https://phabricator.wikimedia.org/T358599
Misc
- Update URLs on MediaWiki:Elastica-desc - https://phabricator.wikimedia.org/T355451
- Unable to find a file by filename while adding a Commons media file statement - https://phabricator.wikimedia.org/T353683