Annual Report 2021/Story: A structured effort on Structured Data on Commons
In this Story, we give an overview of our large-scale work with Structured Data on Commons, which included data uploads, outreach and documentation.
Structured Data on Commons (SDC), a modern, structured and machine-readable way of describing the millions of pictures, audio clips and videos on Wikimedia Commons, has gone a long way since it was first implemented in 2017. SDC makes the files easier to view, search, edit, organize and re-use, in many languages. It has become something of a household name among the Wikimedia Community, with more users actively adding data to the files they upload.
But there is still a lot to do. Out of the 80 million files on Wikimedia Commons, 7 million have a depicts statement and 1.7 million have a creator statement. Many community members have been diligently chipping away at these numbers, but what we really need are tools and workflows to quickly and efficiently work with large amounts of files – both for the Wikimedia community and for our GLAM partners, who can benefit from SDC making their collections easier to find and analyze. By adding the basic information through bots, the volunteer community can focus their efforts on the type of work where a human intellect is needed.
That's why at the start of our project Content partnerships support 2021, we committed ourselves to uploading 1 million SDC statements to files on Wikimedia Commons. Our goal, apart from contributing with valuable data to the Wikimedia project, was to strengthen our skills in the related areas in preparation for further work as a hub, as well as to test the available tools and develop efficient workflows for processing and uploading large amounts of structured data to Wikimedia Commons. We achieved this goal in December 2021, halfway through the project, and so we are preparing to upload another 1 million SDC statements in 2022, building on this initial success.
We focused our efforts primarily on Wiki Loves Monuments photos from all around the world. Once enriched with SDC, the photos contributed by thousands of volunteer photographers become more valuable, as they are easier to find and understand. Thanks to the development and release of MediaSearch, a new search engine for Wikimedia Commons that makes use of SDC, all the structured data can now be used to deliver better search results to everyone.
In order to further share our experiences and networks with others interested in SDC, we participated in several events, such as conferences and workshops. We hope that this will help spur some interest amongst the participants to contribute to the data in a more active way. The workload is enormous and we need to work together to create significant value.
Structured Data on Commons is still developing. While the technical side is in place – anyone can add and edit statements – the best practices in regards to modeling, tools etc. are a living process. One particular tool that will give Wikimedians improved possibilities to contribute to SDC is OpenRefine, which has received a grant to implement Wikimedia Commons functionalities. The development work started mid-2021, and WMSE has been following it closely, participating in design discussions and sharing our experiences from training Wikimedians and GLAM professionals about SDC. As we watch the software grow and improve in real time, we are among the first to see and test the new functionalities, which will enable us to participate in developing documentation and training others. We are proud to be able to contribute to the further development of this tool, which we already use a lot for our Wikidata work. We have also helped out in improving other important software such as the ISA tool and Pattypan, which are both crucial tools for Wikimedians and GLAM professionals who want to contribute to the Wikimedia platforms (see also our Story about support for volunteer developers)