Search Platform/Weekly Updates/2024-04-12
Appearance
Summary
Every so often, celestial bodies align just so, you look up, and you can see the moon set in front of the sun, and so many stars off in the distance in the middle of the day. One of our team members had a chance to see the eclipse this week.
In other movements, we're looking into some basic search metrics such as number of searches, pageviews from searches, and so on:
https://phabricator.wikimedia.org/T358345
Next week we will review some of the preliminary results.
What we've accomplished
WDQS graph splitting
- Changes to WDQS stream updater design are being specified. https://phabricator.wikimedia.org/T361935
- Started to write a simple model for defining sugraphs and refactored the split Spark job to make it a bit more generic. https://phabricator.wikimedia.org/T362060
- A configuration variable change yields faster imports on the scholarly article entity graph import. We will be looking to try this configuration variable in a near future import and are looking into a faster hard drive, which also seems to make things faster. https://phabricator.wikimedia.org/T359062#9704104
Improve multilingual zero-results rate
- Test & fixture refactoring is almost done. https://phabricator.wikimedia.org/T361377
- Started doing data collection and test set extraction for hiragana/katakana mapping. https://phabricator.wikimedia.org/T180387
- Language analysis test case refactoring merged. https://phabricator.wikimedia.org/T361377
- An Azerbaijani admin/volunteer dev (NMW03) found the dotted I (İ) task and was hoping it would take care of more general dotted I lowercasing problems on the user's home wiki. Alas, it will not, but the user received some insight into what is being seen in Javascript and what to ask for in terms of MediaWiki internationalization. https://phabricator.wikimedia.org/T358495
Search Update Pipeline
- Saneitizer deployment for Cloudelastic. https://phabricator.wikimedia.org/T358599
Misc
- There is discussion about Kafka upgrades. Search Platform's Search Satisfaction schema would probably be a good early adopter to verify performance characteristics.