Search Platform/Weekly Updates/2024-12-20
Appearance
Ongoing work
WDQS Expose RDF stream publicly
- Agreed on a strategy to expose "double compute" streams via the EventStreams HTTP service, patch uploaded and merged - https://phabricator.wikimedia.org/T382065
- WDQS update stream definitions deployed to the event stream config
- Add support for eventstream utilities is almost done, couple patches still in review (this should not block exposing the stream tho) - https://phabricator.wikimedia.org/T374919
Search Update Pipeline / Weighted tags
- We have a new search keyword "inproject" whose data is populated via weighted_tags, the data is close to be fully populated (I measured the growth at https://docs.google.com/spreadsheets/d/14I8o4lJxqrT11oa1yyDGBbs01A7Lsp0a5QCz8TAuyIs/edit?usp=sharing and it seems to stabilize), the work has been done by a community member - https://phabricator.wikimedia.org/T378868
Misc
- Migration of cirrus tools based on mwscript to mwscript-k8s done (https://phabricator.wikimedia.org/T378382):
- generalized one of our repo as cirrus-toolbox so that it can hold more than one script and can be re-used as library: https://gitlab.wikimedia.org/repos/search-platform/cirrus-toolbox
- cirrus-reindex-orchestrator almost done but blocked on a race between the php & tls-proxy containers: https://phabricator.wikimedia.org/T382398
- A/B test of new Mjolnir models (https://phabricator.wikimedia.org/T377128) [EB]
- Test reports have been generated. Results look very good for jawiki and kowiki, which did not have the ML based ranker yet. Results look either slightly positive or neutral for the other reports I've looked over. Generally it doesn't seem to get worse anywhere, it gets a little better in some places, and it gets much better in places where we were not yet applying the ML models.
- In the future I think we should publish these via Data Platform/Web publication, but i feel like it's currently premature. The next time we run a test we should tighten up the report a bit more, make sure it has what we want. There are probably some extra graphs that are unnecessary in the current report.
- Reports can be found at: https://people.wikimedia.org/~ebernhardson/T377128/
What we've accomplished
WDQS Graph Split
Search Update Pipeline / Weighted tags
- We have a new search keyword "inproject" whose data is populated via weighted_tags, the data is close to be fully populated (I measured the growth at https://docs.google.com/spreadsheets/d/14I8o4lJxqrT11oa1yyDGBbs01A7Lsp0a5QCz8TAuyIs/edit?usp=sharing and it seems to stabilize), the work has been done by a community member - https://phabricator.wikimedia.org/T378868
Operations / Misc
- T378097 Investigation: why do statements on Senses and Forms not show up in searches using haswbstatement
- T375641 [ES-M3: Implement label and aliases search for EntitySchemas via the wbsearchentities API] (work done by WMDE)