Jump to content

Search Platform/Weekly Updates/2024-05-10

From Wikitech

Summary

Lot of public holidays this week across the team.

Work is ramping down on Search Update Pipeline and ramping up on the implementation of the WDQS Graph Split.

What we've accomplished

WDQS graph splitting

  • A Signpost update on the graph split and the size of the WDQS graph is drafted and forthcoming - https://en.wikipedia.org/wiki/User:Bluerasberry/signpost_wikicite
  • Benchmarking of CPU governor and BlazeGraph configuration variable complete. It took 2.15 days to load the scholarly graph with this configuration, compared to the original configuration taking 5.875 days; the "main" (not scholarly) graph would be slightly slower, but in the same ballpark of 2-3 days. Although further performance gain would likely be achievable with an NVMe based on behavior on another computer, it isn't possible right now to install the NVMe to replicate this. The performance gain has provided a bigger buffer for ensuring imports in an even more timely fashion, though, already. https://phabricator.wikimedia.org/T362920
  • Adapted the import_ttl dag to always do the graph split and parquet -> n3 transformation - https://phabricator.wikimedia.org/T362060
  • Working on adapting the data-reload cookbook to source its data from HDFS - https://phabricator.wikimedia.org/T349069

Search Metrics

Search Update Pipeline

Operations