Projekt:WFD-data till Wikidata 2016/Mappings

Från Wikimedia
Hoppa till navigering Hoppa till sök

These were initially the notes from the WFD-data to Wikidata start-up meeting held in Kalmar on 12 April 2016. The focus was on the WFD Reporting Guidance 2016 (V6.0.2) and how this can be mapped to Wikidata. Since then the document has evolved further and should be considered a living documentation.

Water bodies

In the European Water Framework Directive (WFD) water bodies are structured according to below (Article 2):

  1. "Surface water" means inland waters, except groundwater; transitional waters and coastal waters.
  2. "Groundwater" means all water which is below the surface of the ground in the saturation zone and in direct contact with the ground or subsoil.
  3. "Inland water" means all standing or flowing water on the surface of the land, and all groundwater on the landward side of the baseline from which the breadth of territorial waters is measured (i.e. ground water under land or "territorial land", since actual coastline might change due to varying water level).
  4. "River" means a body of inland water flowing for the most part on the surface of the land but which may flow underground for part of its course.
  5. "Lake" means a body of standing inland surface water.
  6. "Transitional waters" are bodies of surface water in the vicinity of river mouths which are partly saline in character as a result of their proximity to coastal waters but which are substantially influenced by freshwater flows.
  7. "Coastal water" means surface water on the landward side of a line, every point of which is at a distance of one nautical mile on the seaward side from the nearest point of the baseline from which the breadth of territorial waters is measured, extending where appropriate up to the outer limit of transitional waters.

This classification is the basis for the characterisation, pressure analysis and more for all European waters in a very comprehensive way with the support of commonly agreed WFD Guidance Documents (34 documents) and Technical reports (9). As well the member states should report to the European Commission at the water body level once every 6th year - according to agreed standards.

In this project we have the ambitions to add the following water body categories to Wikidata ordered according to priority where to start:

  1. Lakes
  2. Coastal water bodies
  3. Rivers
  4. Ground waters (requires ability to show the geographical area of these since this is hard to comprehend for the general public).

The properties mentioned are those which have been deemed suitable (or potentially suitable) for Wikidata. They are designated as:

  • Needed (a requirement for the project)
  • Desired (should be implemented)
  • Future (interesting but not implemented as this stage)
  • Unsure (could be of interest but unclear if it is desired on Wikidata)

Known Open Issues

A SWB need not correspond to what we normally consider to be a Water body. E.g. Vänern is a known concept with a specific Qid but it contains several SWBs since it is divided by different natural basins within the lake.

In Ecological status we have both global levels and levels per quality element. It is unclear how this should be structured on Wikidata so as to avoid either too many properties or properties where the values are hard to grasp. Two suggestions are:

1) A property per element - example "swEcologicalStatusOrPotentialValue" (page 49-50 in the Guidance Document)
2) A value per element


While 2) is more attractive it becomes problematic when you have 19 values (and need to look at the qualifier to figure out which is which) and even more so when you add multiple years.

With value definitions:

1 = High status or maximum potential.

2 = Good status or potential.

3 = Moderate status or potential.

4 = Poor status or potential.

5 = Bad status or potential.

Unknown = Unknown status or potential.

Not applicable = Not applicable

This definitions for swEcologicalStatusOrPotentialValue could then be used with the other 19 quality elements (Annex 8h in the Reporting Guidance) as: River continuity conditions, Nitrogen conditions, Fish, Benthic invertebrates and Macroalgae.

Surface Water Body (SWB)

Basic data

euSurfaceWaterBodyCode (Needed)

This identifier is required since it is the key to all other info.

Use P2856 = euSurfaceWaterBodyCode

Individual countries may also have their own properties which can additionally be used:

  • Sweden: P761 = euSurfaceWaterBodyCode minus the SE-prefix

euSubUnitCode (Desired)

This is covered by P4614 (watershed) where the target is an instance of a River Basin District (RBD) or River Basin District Sub-unit <P31 = Q132017 / Q25344201>.

For the structure of each RBD (which may also need to be created) see #River Basin District (RBD).

surfaceWaterBodyCategory (Desired)

This is covered by P31 (instance of) where the target is one of the four (five) allowed values:

  • RW: River Water Body
  • LW: Lake Water Body
  • TW: Transitional Water Body
  • CW: Coastal Water Body
  • (TeW: Territorial water Body) This is not included in WFD

These have been mapped in surfaceWaterBodyCategory. Each one being a subclass of the especially created Q30091952.

naturalAWBHMWB (Future)

Whether the water body is natural or not (natural/Artificial/Heavily modified)

This could possibly be a qualifier to surfaceWaterBodyCategory, if so this would likely require a new property.

hmwbWaterUse (Future)

What the water use has been designated for (e.g. industry, energy, tourism, transport etc.)

Only used for Heavily modified Water bodies and as such connected to naturalAWBHMWB.

hmwbPhysicalAlteration (Future)

Physical alterations of the water body (locks, dam, dredging etc.)

Only used for Heavily modified Water bodies and as such connected to naturalAWBHMWB.

reservoir (Future)

Is it a reservoir, and if so was it originally a lake or a river.

Only used for Heavily modified Water bodies and as such connected to naturalAWBHMWB.

swAssociatedProtectedArea (Future)

If the Water body is associated with a protected area (bool)

Since there isn’t a system for protected areas on Wikidata this will have to wait.

Pressure and impact

swSignificantPressureType (Future)

What are the main sources of pressure (e.g. sources of pollution)

This would require mapping the enumerated list of answers (Annex 1A p.304) to Wikidata objects.

swSignificantImpactType (Needed)

This is handled by P3643 (significant environmental impact). The allowed values Annex 1b (page 307) have been mapped to appropriate Wikidata items (new items created where needed).

Advanced use cases are described on the property discussion page but in short:

  • If there are no impacts novalue is set otherwise one or more of the mapped Impact types are added
  • For the first year the reporting year is added as P585 (time point)
  • For later years.
    • If a claim is no longer present an P582 (end date) is added to the old claim.
    • If a claim is still present no change is needed but P585 could be changed to P580 (start date)

Ecological status

The measures named “sw…” are reported as the global levels of the SWB.

Additionally each measure named “qe…” is repeated for each of the 19 Quality elements (QE). Examples of QEs are fish, fauna, continuity, nutrient status etc.)

As such we need to decide how overarching or specific we want to be for each of these and build the mapping in such a way that the properties can be reused.

swEcologicalStatusOrPotentialValue (Needed)

Ecological status of the Water Body (1-5)

Suggestion: Introduce a new Property: EU-EcoStatus which takes integer values (1-5)

This is handled by P4002 (Ecological status). Instead of using integer values an item was created for each of the values (see the list). This makes it easier to validate the input and attach explanations to the different statuses.

The exact structure of the qualifiers is still under discussion[1] but preliminary the reporting year is added as P585 (time point).

swEcologicalAssessmentYear (Needed)

Year (year range) when the status was assessed.

Not currently used but could/should be a qualifier for swEcologicalStatusOrPotentialValue.

The exact setup (and interaction with the reporting year) is still unclear but the discussion[1] suggests that new properties may be needed for the data collection span.

swEcologicalAssessmentConfidence (Unsure)

Scale of 0-4 how confident the swEcologicalStatusOrPotentialValue is.

Qualifier for swEcologicalStatusOrPotentialValue (if we choose to include this). If so we would need a new Prop for this.

swFailingRBSP (Unsure)

RBSP = River Basin Specific Pollutants

Identifies which pollutant(s) (enumerated) is causing the RBSP QE to fail (i.e. be reported as less than good).

This would require mapping the enumerated list of answers to Wikidata objects. And to suggest a property such as “high level pollutants”?

qeCode (Needed)

This is the code given to each quality element.

A mapping of these has been begun.

Ideally each of the mapped items would be an entity with a qeCode property.

qeStatusOrPotentialValue (Needed)

Status of the QE for the Water Body (1-5)

The QE property was decided to be split from (overall) Ecological status P4002 (swEcologicalStatusOrPotentialValue) during one of the reference group meetings.

The motivation is that the Ecological status value is an overall value supported by the values for the QEs done in such a way that any failing QE results in a failing status overall.

Thus a new property should be proposed. The complication here will be to clearly communicate why one cannot use the same property with no qualifier indicating the overall status. The second complication is the sheer number of quality elements which could be applied to a single SWB (each reporting cycle) which might be a deterrent for getting it accepted.

The new property could possible reuse the same target values as developed for P4002.

qeMonitoringPeriod (Needed)

Qualifier on qeStatusOrPotentialValue (see comments for swEcologicalAssessmentYear)

Chemical Status

The structure here is similar to that for Ecological Status.

swChemicalStatusValue (Needed)

Chemical status of the Water body (2-3)

The allowed values have been mapped and a property has been proposed (same structure as for P4002).

swChemicalAssesmentYear (Needed)

Qualifier on swChemicalStatusValue (see discussion for swEcologicalAssessmentYear)

Ground Water Body (GWB)

This will not be included as the GWB concept is considered to not be of interest for the Wikidata community at this point

River Basin District (RBD)

The mapping below is based on the information available in RBDSUCA_2016.

RBD

unhandled

  • <internationalRBDName> — Need a clearer example to determine need/use

RBD sub-unit

to propose / under proposal

  • P?: <euSubUnitCode>

unhandled

  • <primeCompetentAuthority> — might not be needed since this is added to the parent RBD
  • <internationalRBDName> — Need a clearer example to determine need/use

CompetentAuthority

  • label(en): <competentAuthorityName>
  • label(<competentAuthorityNameNLLanguage>): <competentAuthorityNameNL>[3]
  • alias(en): <acronym>
  • P17: <country>
  • P2541: <list of RBDs for which this is CA>
  • P856: <linkToCompetentAuthority>

to propose / under proposal

  • P?: <euCACode>

SE-specific

Geographical/GIS info

With each reported SWB/RBD there is also a GIS file containing info about the water body. The guidence document for this is WISE_GISGuidance.

Local names

The GIS files contains the local (non-English) names of both RBDs and SWBs.

wfdgml:nameText

The local name of the object.

Note that the field can sometimes contain multiple names concatenated by / .

Added as a Label (and/or aliases) in the language specified by wfdgml:nameLanguage.

wfdgml:nameLanguage

The (three-letter) language code of the language used for wfdgml:nameText.

Must be mapped to the two-letter codes used on Wikidata and is then used in conjunction with wfdgml:nameText.

Area (SWB only) =

For SWBs the area of the object can be found in the GIS file.

wfdgml:sizeValue

The surface area of the object.

Added as a Quantity (with unit specified by wfdgml:sizeUom) to P2046.

wfdgml:sizeUom

The abbreviated unit used for wfdgml:sizeValue.

wfdgml:nameTextInternational

This should always be the English name of the object.

Not yet used as it is unclear how his interacts with surfaceWaterBodyName / rbdName (in the SWB/RBD xml) which should a also be the English name.

Must be mapped to the corresponding Wikidata item and is then used in conjunction with wfdgml:sizeValue.

Useful examples

Useful resources on Wikidata for SWB geo-properties Wikidata:WikiProject_Rivers. And also the following two examples:

Footnotes

  1. 1,0 1,1 Wikidata:Project_chat#Date_qualifiers
  2. See discussion in relation to rejection of more specific property
  3. Although language codes are not the same as on Wikidata