Search Platform/Weekly Updates/2023-06-09
Appearance
Summary
- Hiring a new senior SRE for the Data Platforms SRE team is our highest priority.
- Work on language analysis and search update pipeline is moving forward.
What we've accomplished
Language analysis harmonization
- Analysis of the use of multiple apostrophe like character on wiki completed, a patch will be created to harmonize the language analysis across languages. I encourage you to have a look at the write up for more context: https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Language_Analyzer_Harmonization_Notes
Search Update Pipeline
- We are getting closer to have data on the inconsistencies between MySQL and CirrusSearch. Preliminary data indicates that there are not many inconsistencies in general, with some potential issues related to lexemes - https://phabricator.wikimedia.org/T328330 / https://phabricator.wikimedia.org/T338255
- Modified EventBus and related schema to handle page redirects
Operations / SRE
- Improvements to the WDQS data transfer cookbook, which will make data reload more stable - https://phabricator.wikimedia.org/T321605
- Reboot of multiple servers for kernel upgrade completed - https://phabricator.wikimedia.org/T335835
Misc
- Multiple interview for the Senior SRE position on the Data Platforms SRE team and the Senior Engineering Manager position on the Data Engineering team - https://boards.greenhouse.io/wikimedia/jobs/5070612?gh_src=aae3b0fb1us / https://boards.greenhouse.io/wikimedia/jobs/4990750?gh_src=4adc4c9d1us
- Discussions on the use of our search engine use in the context of a ChatGPT plugin. The initial understanding was that our search engine does not provide good enough ranking to be used. After reviewing the queries and results, it seems that it does indeed do a good enough job. We could address some of the limitations by improving the queries sent to the search engine.