ArcToolbox banner

WDPA validation 9.3

Background

UNEP-WCMC are responsible for the collation and management of the World Database on Protected Areas (WDPA) which includes information on protected areas throughout the world. This information has been collated together from contributions made by a wide range of individuals and organisations for many years. UNEP-WCMC do not create any data themselves but help build capacity in other organisations and provide them with tools so that they can contribute their own data more easily.

In order to make contribution as easy as possible and to provide as wide as possible access to the data, UNEP-WCMC have developed the WDPA website (http://www.wdpa.org). This website provides access to information on all protected areas and provides tools for contributors to upload their own protected area data and to document it. This data is uploaded to UNEP-WCMC where it enters the WDPA workflow which is designed to take the raw data through a number of stages before finally being published in the website.

WDPA Workflow

Once the data providers have contributed their protected areas data and metadata to the website, the data then goes through a number of stages before being published on the website. The following stages make up the WDPA workflow:

This tool represents the third stage in the WDPA workflow - the validation routines and is described here in detail. The other stages of the workflow are documented elsewhere.

Validation routines

The validation routines run through a set of steps which check the data quality of files submitted to the website once they have been standardised to a particular schema. This process analyses the data and identifies and flags any data quality issues at the feature level and a summarises these errors at the dataset level. The main purpose of the routines are to help improve the data quality to a minimum standard which will allow subsequent analysis and downstream products. Currently the validation tools are only available at UNEP-WCMC, but it is hoped that in future the tools will be available over the internet so that data providers can check the quality of their data before submission.

One of the other purposes of the validation routines is to rectify some common data errors. In some cases the sources of data errors are well understood and can be automatically corrected (e.g. lat/long values are entered round the wrong way). However, in most cases the source of the data error may not be known and the data will need to be manually cleaned (the next step in the WDPA workflow). This manual cleaning is likely to involve the data provider and may require them to work on the data and resubmit it.

The validation routines themselves are part of a single geoprocessing tool which can be run in ArcToolbox, the command line or from script. Whichever environment is used the overall process is the same:

The validation routines themselves can be grouped into the following categories: Geometry validation; Geographic validation; Topological validation and Attribute validation. These are described in detail in the sections below.

Geometry validation

The geometry validation routines are a set of rules to check if the the data is spatially consistent. The following checks are made:

Geographic validation

Geographic validation is a single rule to check the data to make sure that it is situated within the correct country.

Topological validation (Polygons only)

This set of validation steps check the relationships of the data to itself to make sure that the data is topologically consistent. Currently there are two rules within this step that apply to polygons only:

Attribute validation

Attribute validation is used to check that the values in the feature classes conform to a set of values which are defined at the field level. For most of the fields in the feature classes, their values are constrained by referencing values in a coded value domain. This is a set of look up values within ArcGIS and any data which cannot be matched to one of these look up values will fail in the scema standardisation stage of the workflow. Within the validation routines, additional checks are made on the data. These are described below:


Usage Tips


Command line syntax

WDPA_Validation <Input_folder> <Output_Folder> <Sliver_vertices> <Sliver_thickness> <Sliver_area>

Parameters
Expression Explanation
<Input_folder>

The input folder is the network folder where the unvalidated file geodatabases are located. These file geodatabase will be validated by this geoprocessing tool. In order for the validation routines to validate a particular file geodatabase it must be named with the ISO country code, e.g. the file geodatabase for Finland would be called 'FIN.gdb'. If the name of the geodatabase does not match a country ISO code then it will not be validated.

Within the file geodatabase the feature classes must be named according to the following naming conventions in order to be validated:

  • Point feature classes - should be name 'points_unvalidated'. Any feature class with this name will undergo point validation.
  • Polygon feature classes - should be name 'polygons_unvalidated'. Any feature class with this name will undergo polygon validation.
<Output_Folder>

The output folder is the folder location where the validated file geodatabase will be copied to at the end of the validation process. The file geodatabase will be given a name according to the following naming convention: '<ISO>_Validated_<DateTime>.gdb', where <ISO> is the ISO country code and <DateTime> is the time when the new file geodatabase is created. Each of the feature classes will be renamed to points_validated or polygons_unvalidated.

<Sliver_vertices>

Sliver vertices is a threshold value that is used to help flag sliver polygons. Polygons with fewer vertices than this value (and with an area less than the sliver area value) will be flagged as possible sliver polygons. The default value should be adequate for most situations.

<Sliver_thickness>

Sliver thickness is a threshold value that is used to help flag sliver polygons. Polygons with thickness than this value (and with an area less than the sliver area value) will be flagged as possible sliver polygons. The default value should be adequate for most situations.

<Sliver_area>

Sliver area is a threshold value that is used to help flag sliver polygons. Polygons with an area less than this value (and either with less vertices than 'Sliver vertices' or a thickness less than 'Sliver thickness') will be flagged as possible sliver polygons. The default value should be adequate for most situations.

Command Line Example

Scripting syntax

WDPA_Validation (Input_folder, Output_Folder, Sliver_vertices, Sliver_thickness, Sliver_area)

Parameters
Expression Explanation
Input folder (Required)

The input folder is the network folder where the unvalidated file geodatabases are located. These file geodatabase will be validated by this geoprocessing tool. In order for the validation routines to validate a particular file geodatabase it must be named with the ISO country code, e.g. the file geodatabase for Finland would be called 'FIN.gdb'. If the name of the geodatabase does not match a country ISO code then it will not be validated.

Within the file geodatabase the feature classes must be named according to the following naming conventions in order to be validated:

  • Point feature classes - should be name 'points_unvalidated'. Any feature class with this name will undergo point validation.
  • Polygon feature classes - should be name 'polygons_unvalidated'. Any feature class with this name will undergo polygon validation.
Output Folder (Required)

The output folder is the folder location where the validated file geodatabase will be copied to at the end of the validation process. The file geodatabase will be given a name according to the following naming convention: '<ISO>_Validated_<DateTime>.gdb', where <ISO> is the ISO country code and <DateTime> is the time when the new file geodatabase is created. Each of the feature classes will be renamed to points_validated or polygons_unvalidated.

Sliver vertices (Required)

Sliver vertices is a threshold value that is used to help flag sliver polygons. Polygons with fewer vertices than this value (and with an area less than the sliver area value) will be flagged as possible sliver polygons. The default value should be adequate for most situations.

Sliver thickness (Required)

Sliver thickness is a threshold value that is used to help flag sliver polygons. Polygons with thickness than this value (and with an area less than the sliver area value) will be flagged as possible sliver polygons. The default value should be adequate for most situations.

Sliver area (Required)

Sliver area is a threshold value that is used to help flag sliver polygons. Polygons with an area less than this value (and either with less vertices than 'Sliver vertices' or a thickness less than 'Sliver thickness') will be flagged as possible sliver polygons. The default value should be adequate for most situations.

Script Example