Commons Impact Metrics/Algorithm

This page describes the algorithm used by the Commons Impact Metrics data product to traverse the commons category graph. These slides are from a presentation given by Marcel Ruiz Forns at the 2024 Wikimedia Hackathon.

Because of these computational challenges, we set a couple of boundaries to the calculation.

Instead of calculating metrics for all Commons categories, we do it only for a curated list of categories related to commons mass upload, usually GLAM. Also, we define a maximum “depth” that our algorithm will compute, to avoid going down an infinite rabbit hole of unrelated subcategories. And finally, we aggregate metrics at the coarsest granularity (monthly), to keep the size of the output data small enough for community members to handle without the need of a distributed computation cluster.