Search Platform/Weekly Updates/2024-10-25
Appearance
Summary
We made steady progress on various initiatives, though most efforts remain in progress: Release of a public RDF stream, Improvements for Japanese wikis, and consolidation of data flows for weighted tags.
What's ongoing
WDQS: Public RDF Stream
- Continued migration of the RDF streaming updater (flink application) to leverage event utilities library.
Language Stuff
- Continued analysis and tests of Kuromoji, a Japanese language analyzer, see T318269
Spark Kafka Writer (Weighted tags/Dumps 2.0)
- Completed first test runs of the kafka wrapper with schema validation. Still needs work to simplify usage, see T374341 and T372912
What we've completed
- T375557 Reindex all wikis to enable folding harmonization and new functionality
- T372904 Use page_weighted_tags_changed stream
- T376715 TypeError: Argument 3 passed to CirrusSearch\DataSender::sendWeightedTagsUpdate() must be of the type array, null given, called in /srv/mediawiki/php-1.43.0-wmf.25/extensions/CirrusSearch/includes/Job/ElasticaWrite.php on line
- T376161 Classify fulltext search abandonment: sampling