Projekt:WFD-data till Wikidata 2016/2016-10-17 Reference group meeting

Från Wikimedia
Hoppa till navigering Hoppa till sök

Notes from Wikidata and WFD reference meeting 2

Place: Av. Beaulieu 5, Brussels
Time: 10.30-16.30, 2016-10-17

Introduction and presentation of participants.

General - information about the project

Presentation (Niklas)

Background to the project, motivation for why it was initialised and a bit on what we have done.

Presentation (André)

Information about the Wikimedia wikis, which they are and how they are connected. A look at how the community decides on content based on consensus and therefore the limitations to what can be the guaranteed outcome of the project. A closer look at WIkidata, how the structure is constructed and what can be done with the info once it is there.

Comments:

  • Manuela (Germany): It is important that IDs don’t differ too much between versions. This is true both for Wikidata and for EEA comparisons across reporting periods.
  • Fernanda (EEA): It is not a [copyright] problem to extract single data points and make these available. The problem arises when all are extracted and we explicitly want to make them CC0.

Latest progress and discussions regarding the project

André showed examples of RBD items created for Finland and Sweden with the basis in the RBDSUCA files made available in the WFD reporting. A script exists which can automatically import these for any country which publishes it’s RBDSUCA file and makes it publicly accessible.

Comment: Note that even though RBDSUCA files are publicly available it does not mean that they have been accepted. It might be better to build the system on top of the EEA processed data. Note that since Competent Authorities are not of a big interest to the EEA these are not included in the compilations. RBDSUCA should soon be available on Eionet[1].

André explained how the property proposal process works and showed the two properties which have been created. It was pointed out that Wikidata prefers to not have too many overly specific properties since this risks fragmenting the system.

Comment: A limitation is that Wikidata does not have a boolean property type. As a result some of the data in the WFD reports needs to be expressed very differently.

André showed the suggested structure for waterbodies (lakes). A discussion was held on Impact types and Pressures.

Impacts: For a (reporting) year there can either be one novalue claim (if there were no significant impacts) or multiple claims (one for each type which was reported as significant). These take the type (a wikidata item) as a value and carry a (start) date qualifier).
If an impact type has been added reported before and is:
  • not present in a later year: add an end date qualifier,
  • is present in a later year: no change
If an impact type “re-appears” it is added as a new claim with a new start date.
Pressures: There is an aggregate level with only 8 entries (instead of the normal 66 entries). These might be a good choice for an initial mapping… if the aggregate groups are ones which are relevant to the Wikidata public.
Measures: Lars (Norway) mentioned that it would be of interest to show measures on waterbodies (mayor in Östfold municipality asked for that). Other participants objected that this might be to specific information and that ensuring that it is easy to get into the national system from the Wikidata entry would fill a similar purpose.

André mentioned that any ingested data must explicitly be licensed as CC0. Even though data is made publicly available this does not indicate it is available for re-use or explicitly licensed. If there is no license “all rights reserved” must be assumed.

Comments: Countries might restrict data – at Reportnet. One example is monitoring stations for Drinking water where access restraints from member states kan occur.
Many agencies have started to open up their data by setting it to CC BY. This is part of the maturity process on the way to CC0.
Fernanda (EEA): Identifiers, should be easy to make available and share. EEA might be able to explicitly license these as CC0 (pending internal discussions).
Joaquim (DG Environment): Prepare a briefing for the upcoming Water Directors meeting stating that data paid for by public spending should be made publically available as CC0 data. If Water directors could approve/endorse such a statement it would be a powerful tool in getting member countries to explicitly license their data. Niklas will take the lead, Manuela (Germany) offered to help, Wikipedia has one lobbyist whom could also be advised.

General discussion

2016 reporting will include links back to national systems. There exist many good such systems, such as France, but it is normally hard to find a way in or a specific water body.

On geographical information. This data is often derived from other national providers, as such it might not be possible to freely license it. Some geographical information might also be politically sensitive. Starting with centroids should be fairly safe though.

EEA had long internal discussion on what metadata to store with regards to versioning. The conclusion was to only store when an entity was created and when it becomes deprecated, nothing of in between changes.

Lasse (Finland): There is a difference in the management unit of a lake water body, and the actual water body. It is important to make a distinction between these in Wikidata. André: Entries would explicitly marked as lakes and/or administrative water body areas (both only if they overlap).

Niklas: Are anyone aware of “Wikipedia strategy” for agencies in EU/national states? None so far.

In what format would we like to get the information from Eionet?

There will be a data dictionary for status and impacts definition according to the State of Environment[2] reporting.

Decisions

  1. Start with RBD 2016, timetable with EEA.
    1. Name
    2. Codes
    3. International or not
    4. Area
    5. Centroids
    6. URL CA
  2. Lakes – (when? What countries? What data? How?)
    1. ID
    2. URL
    3. Ecological status
    4. QE
    5. Impacts
    6. Pressures

Tour de table (end of the meeting)

Manuela (Germany): Meeting prior to attending with German colleagues. Strong support  from all areas except groundwater. Good idea to make the public more aware of what we are doing. It brings attention to the information and how we are proceeding with WFD. Taking back home (how they can proceed). Have water body fact sheets on waterbodies, linking them intelligently, create extra values. Need to make it clear that this does not mean an additional workload.

Kirsten (Denmark): There is a need for an internal benefit for the organisation to convince them why they should open up the data. Wants a dummy on what to do.

Jean-Philippe (France): Effektive if the objective is to provide information, clear motivation for disseminating data. But what does the governance model (on wiki) looks like? The information is already available in so many places, where is the best place for dissemination.

Mária (Hungary): Good idea will bring it back home.

Lars (Norway): Outreach is a big problem, WFD sounds very dry but is very important. Also important for spreading information internally (within agencies). Increased visibility will actually save us work.

Alberto (Guadaltel, Spain): From the outside. Wikidata is a place where things happen, a good place for the data to be publicly available. From the WFD complexity, to make the data useful in WFD it needs to be aggregated in the appropriate way.

Joaquim (DG Environment): Making people understand the issues is not easy. Yes we are spreading the information to different places, but they have different audiences in mind. Yes, people care about overall (national) levels, but they also care about the levels in their particular river/lake. Wikidata is probably most useful for the latter one.

Lasse (Finland): The idea is nice but we are afraid of how to maintain the data and keep it synchronized. Questionnaire to the public gave the answer that we have to many systems/services/places where the information is available. Also we have our own wiki (Jaarvi wiki). Also easier to maintain in a locked system where the data cannot be changed, mixing with open data can be problematic.

Piotr (Poland): Good idea, no own public system for dissemination. There is no perfect data structure. Look for the simplest possible solution. Timing is problematic since it will take a long time untill all of the data is available on EEA (compared to having access to it now). It also means that the data will become delayed later on. There is already pressure from municipalities wanting to know which water bodies are at risk

Mihail (Romania): Interesting but need funding. On a national level there is information on basin and national level. Provide results and plans. Also a plan to develop a portal for public access to the data (also for INSPIRE) [so why the need for Wikidata].

Fernanda (EEA): The idea is not to replace WISE or any system which is needed to access it or to replace any national portal. The idea is to add value and visibility to the national initiatives which are out there. It also makes the information accessible on a multilingual platform. The question is if it is doable or would it be an insurmountable overhead. And it doesn't look like it and I cannot see any downsides it.

André (Wikimedia): Good to get feedback and learn more. Envision it as a way to expose the data and make it easily available. Not the official information – the reference.

Niklas (Sweden): We got good feedback to finish the work on the Wikidata mappings. We should be able to have properties live and maybe a demo site with some real lakes. EEA and CC0 on names/ids should be possible. And we will prepare a briefing for CC0 and Water Directors. Also we need to have a think about how Wikipedia affects our work.

Conclusions

  • Proposal at EEA: CC0 in Eionet for RBD and lakes regarding: ID, name and relations
  • Proposal water directors: provide parts of the reported data as public available in accordance with CC0
  • Data modeling Wikidata – finalise and send to the reference group (Boolean values an ongoing discussion on Wikidata)
  • Make demo for RBD and a lake in Wikidata
  • One final reference group meeting in spring, end of project with report

Participant list

Name Organisation
Niklas Holmgren South Baltic Water Authority, Sweden
André Costa Wikimedia Sweden
Fernanda Néry EEA
Lars Stalsberg NVE Norway
Lasse Jarvenpaa SYKE, Finland
Joaquim Capitão DG Environment
Kirsten Broch Danish Agency for Water and Nature Management
Piotr Piorkowski Poland
Cécile Gozler Ministry of the Environnement, Energy and the Sea (MEEM)
Manuela Pfeiffer German LAWA
Jean-Philippe Goyen French national agency for water and aquatic environments -ONEMA
Mihail Costache Romania/Ministry of Environment, Waters and Forests
José Enrique Soriano Sevilla GUADALTEL (Consultant DG ENV)
Alberto Santamaria Bilbomatica (Consultant DG ENV)
Mária Szomolányi  Ritvayné Hungary

Footnotes

  1. http://dd.eionet.europa.eu/
  2. http://www.eea.europa.eu/themes/water/interactive/by-category/status-of-water-quality