Projekt:Strategisk inkludering av biblioteksdata på Wikidata 2018/Background

Från Wikimedia
Hoppa till navigering Hoppa till sök

Strategic inclusion of library data on Wikidata 2018

Strategic Inclusion of Library Data on Wikidata is a 2018 project by Wikimedia Sverige, done in cooperation with the National Library of Sweden. As part of the project, we will upload large amounts of data to Wikidata using bots.

Our Phabricator tag is #WMSE-Library-Data-2018

See Mysql statistics for all the Wikidata edits made within this project

Background

The National Library of Sweden maintains a national database of library data, Libris, which is available as Open Data. The main parts of Libris are the works database (books, journals etc. in the collections of libraries in Sweden) as well as the authority database (authority posts for authors represented in the works database). Our goal is to upload parts of this data to Wikidata. Our primary focus is on works that are used as references on Swedish Wikipedia, and their authors.

The National Library of Sweden is currently moving to a new data framework, Bibframe 2.0, and implementing it in a more modern version of Libris. What it means for Wikidata is that a new identifier has been introduced – Libris URI, which will replace the old SELIBR (authorities) and Libris Editions (works) identifiers. New authority posts in the new Libris are only given a URI, but not a SELIBR ID.

Stage 1

Status: The upload ran Aug 31–Sept 2. 60,000 items were edited.

We will start by adding Libris URI's to Wikidata items for persons that already have a SELIBR ID – about 60,000 items. This is a relatively straightforward task, as authority posts in the new Libris that have been imported from the old Libris contain both a URI and a SELIBR ID, allowing for a 1:1 match.

The desired output is that every item that has a SELIBR ID also has a Libris URI.

Stage 2

We imported person information from the new Libris to the authority items that have a Libris URI, such as: full name, nationality, profession, birth/death dates. Note that Libris does not contain data about sex/gender.

The same same items as in Stage 2 were targeted, i.e. items with Libris identifier properties. Person items with neither of the two properties were not touched, even if the source data contained a corresponding entry. The reason is that the Libris authority database contains a lot of entries that are hard to automatically match to Wikidata – pretty much every Swedish author who has published a book that is included in the National Library's catalog. That's a lot of entries that have sparse information, e.g. only a name, making them hard to disambiguate. This also means no new items will be created.

We are working on mapping this information on Wikidata.

Stage 3

We imported bibliographical information about books from Libris to Wikidata. Main points of interest:

We learned that due to copyright issues, we could only upload data from the Swedish National Bibliography (i.e. books published in Sweden), which is licensed CC0. Data about foreign books, even though it's included in Libris, has different copyright status depending on its source.

We opted for importing data about ca. 500 books (Editions). We selected the books that are most often used as sources on Swedish Wikipedia (and published in Sweden, due to the copyright limitations). In the future, we would also like to import data about the Works, but it is not yet available in Libris.

QUERY of all Editions that have a Libris URI.

We talked about this import at WikiCite 2018.