Annual Report 2024/Story: Important contributions to OpenRefine – a central tool
In this story, we focus on our further development of the software tool OpenRefine as an example of our involvement in external tools that benefit everyone working with open knowledge.
OpenRefine is an open source application for analyzing and editing datasets as well as uploading them to Wikidata and Wikimedia Commons. We use OpenRefine extensively in our various projects. When working with Wikidata, it is a key tool, as it can be used in every step of the process: analyzing a dataset to assess whether it is interesting and relevant for open knowledge platforms; processing data to fit Wikidata's structure; and finally, uploading it either as new Wikidata items or as improvements to existing ones. We have been using OpenRefine for many years, building up our expertise along the way.
For some time now, OpenRefine has also been able to interact with Wikimedia Commons, the media database of the Wikimedia ecosystem, particularly for uploading new files, such as images and audio recordings. Collaboration with GLAM institutions that wish to share their digitized collections is a significant part of our work, so this new development was very valuable to us. However, the software had a limitation: it only allowed uploads of files smaller than 100 MB. This was a drawback for us and for anyone working with GLAM collections, as museums and archives often wish to publish high-resolution images and other large files from their collections on Wikimedia Commons.
This is why we chose to actively work on developing OpenRefine further in 2024. The file size limitation was a technical issue: large files are uploaded differently, in smaller chunks. While this process is invisible to the user, it requires a more complex technical solution behind the scenes. Since we have both skilled programmers and dedicated OpenRefine users on our team, there was no doubt that we could contribute to making the software more useful for everyone. Thanks to our long-standing engagement in the OpenRefine community (as users of the software), we had both an open dialogue with the developers and insights into other minor improvements we could make — which we did.
In addition to using and contributing to OpenRefine, we also help others understand and implement the tool in their work. Cultural heritage institutions, with whom we collaborate daily, also need powerful and flexible tools for handling large datasets — whether they are working with Wikimedia platforms or simply cleaning up their data for internal use. As a small organization, it is impossible for us to achieve our mission — free knowledge for all — without engaging more people, especially subject matter experts, and helping them become independent.
That is why we develop and offer training sessions on linked open data for GLAM institutions, with OpenRefine as a key component. One example is the project A Network of Places, where we support several institutions — ArkDes, the Swedish National Heritage Board, the Nationalmuseum, and the Swedish National Museum of Science and Technology — in their work with data about the built cultural heritage on the Wikimedia platforms. Rather than uploading the data to Wikidata ourselves, we play a support role, aiming to help the project group become comfortable on Wikimedia platforms and capable of spreading knowledge about Wikidata and OpenRefine to their colleagues and other GLAM professionals in Sweden and beyond.
Thanks to OpenRefine, the threshold for working with large datasets and Wikidata has never been lower. The tool is far more user-friendly than the solutions we used just a few years ago. We are proud to have made a small but meaningful contribution to an application that is used by Wikimedians and GLAM professionals worldwide — and that now is a perfectly adequate tool for uploading files to Wikimedia Commons, regardless of size.
With this contribution, we reaffirm Wikimedia Sverige's role in the global Wikimedia community: we understand which tools and support initiatives are needed and we are happy to contribute in areas where we can help the entire open knowledge movement. We see our contribution as an important pilot project for our ongoing work in the thematic hub for content partnerships, which we are currently developing. The purpose of the hub is to make it easier for the global movement to plan and execute successful content partnerships — such as collaborations with GLAM institutions — which require appropriate and well-functioning technical infrastructure. OpenRefine is now a central tool for Wikimedians working with data and media files, which means it is essential to maintain and further develop it to meet users' needs. As a future hub, we are uniquely positioned to contribute our expertise; a single development effort can lead to stronger and more efficient content partnerships worldwide in the future.