Geog 303 - Intro GIS

Week 6 Lectures


Data Quality Measurement and Assessment

(Data Quality: this section modified from the NCGIA core curriculum in GIScience:)
Written by Howard Veregin, Department of Geography,University of Minnesota, Room 414267 19th Avenue South, Minneapolis, MN 55455, USA veregin@atlas.socsci.umn.edu

This section was edited by Gary Hunter, Department of Geomatics, University of Melbourne, Australia.

This unit is part of the NCGIA Core Curriculum in Geographic Information Science. These materials may be used for study, research, and education, but please credit the authors Howard Veregin, and the project, NCGIA Core Curriculum in GIScience. All commercial rights reserved. Copyright 1998 by Howard Veregin.


1. Data Quality

2. Accuracy

2.1. Spatial Accuracy

2.2. Temporal accuracy

2.3. Thematic Accuracy

3. Resolution (precision)

3.1. Spatial Resolution

3.2. Temporal Resolution

3.3. Thematic Resolution

4. Consistency

5. Completeness

6. Summary of Important Points

7. References and Bibliography

Citation

To reference this material use the appropriate variation of the following format: The correct URL for this page is: http://www.ncgia.ucsb.edu/giscc/units/u100/u100_f.html.
Created March 23, 1998.

_________________________________________________________________________________

Wednesday was the midterm exam.

________________________________________________________________________________

Process of data capture

The process of data capture must involve both spatial and attribute data.  If they are considered separately, then more effort usually is required to integrate both components within a GIS database.  It is better to consider both components in planning the capture process to ensure that appropriate IDs and tags are assigned to enable attributes to be correctly attached to the appropriate spatial units.
  Essentially three general operations are required for the capture of geographic data:

  1. Entering the spatial data
  2. Entering the non-spatial (attribute) data
  3. Linking the spatial data to the non-spatial data

Manual data capture

Manual data capture includes the entry of data from field notes and maps, data entry sheets, data recordings sheets and graphs, etc.  The process of manual entry can be sped up through by-passing the intermediate paper records and directly recording data through laptop computers in the field, electronic surveying instruments, automated GPS position recording, etc.

Most attribute data is entered via a keyboard into a database.  Often, much attribute data exists prior to the GIS being built.  In terms of volume, usually the vast majority of a GIS database consists of attribute data.  The entry of attribute data and its pitfalls is well known and hence we will concentrate primarily on the spatial component of geographic data.

The process of manually entering spatial data is dependent on whether a grid-based or a vector-based database is to be generated. 
 

Vector
  • Type in the coordinates of points, lines, areas
  • Coordinates can be two dimensional (X and Y coordinates) or three dimensional (X, Y and Z)
  • Usually an integer ID must be attached to each coordinate (used to attach attributes)
Grid
  • Determine the grid cell size
  • Overlay the grid on the data
  • Determine the grid cell values
  • Type in the values

Global positioning system


The Global Positioning System, more commonly referred to as GPS, is used to compute positions in 2 or 3 dimensional space from signals obtained from a series of NAVSTAR satellites. It is owned by the U.S. Department of Defence.
 

Satellite details:
  • 21 satellites (and 3 spares) provide continuous coverage of the earth
  • 6 orbits at 20,200 km
  • one rotation every 12 hours
  • 6 satellites are in view at all times at a given location on the earth

The Russian equivalent is GLONASS - Global Navigation Satellite System.

GPS Accuracies

  • "standard" uncorrected, x and y to 15 meters. Z is a bit worse
  • Differential GPS - correct using base station data. x and y to 1-5 meters
  • RTK - Real time kinematic - accuracies to sub-centimeter.

 
Digitising

A digitiser consists of a tablet with an electronic mesh and a cursor (also known as puck or mouse). A map sheet is mounted on the digitiser and the cursor is used to enter required points or trace the desired lines on the map.  Lines are digitised as a series of points which can then be processed by the software (eg. editing, conversion to raster format, etc.).

Digitising is largely a manual process involving the concentration and skillfulness of a digitising operator.  It is both a time-consuming and boring task. 

Accuracy of digitising

The accuracy of digitising is limited by a number of factors related to the data source (map), equipment being used and human factors. During the digitising process, a number of errors may occur. Some of these errors can be prevented or alleviated during the digitising process, while others can be correctly/edited in a following step (either automated or manually).  Such (potential - not all are necessarily incorrect) errors are discussed in the following sections and include:

Data input errors (Dangles)

 
Dangles, also referred to as Dangling Nodes, are identified as nodes with only one line (arc) attached.  They include both overshoots (extended too far past another line) and undershoots (not quite reaching another line).  In such a case, the line was intended to connect up with another line and hence overshoots and undershoots are errors.  

However, dangles are also identified with lines that are not connected to another line at one end, such as cul-de-sac's or dead-end streets in a road network. Obviously, such dangles are NOT errors. 

Correcting dangles (that are errors!) can be partially automated by setting a dangle length tolerance value which specifies the maximum distance within which nodes will automatically be "snapped" to a line. Any dangles falling outside this tolerance level will have to be corrected maually in the editing process.

Data input errors (Pseudo nodes)


Pseudo nodes are identified by nodes that have only two arcs attached to it.  This often occurs two lines are digitised and are connected together at one end, but the connection point does not occur at a junction with other lines (ie. node breaks up a long and complex line such as a contour line).  If the lines are intended to represent the same entity with the same attributes, then the pseudo node is an error and should be removed.  In other cases, pseudo nodes do not cause any problems, but can be removed to simplify the storage and provide a "cleaner" representation.

In certain cases, pseudo nodes are REQUIRED and therefore certainly are not errors.  The most obvious example is an island polygon where the bounding line begins and ends at the SAME node.  

Removing a pseudo node involves joining the two (different!) arcs on either side of it into one.  This usually means ensuring that the attributes of each arc are the same so that there is no problem in merging them.  It may also be possible to specify a "snapping distance" tolerance value which will cause all nodes within a specified distance to "snap" together, thereby eliminating some pseudo nodes.

Wierd polygons are tiny "knots" usually caused by digitising errors where lines are accidentally crossed.  In the "cleaning" process, the intersection points are identified resulting in additional nodes and lines (and the weird polygon).  They can be removed by deleting the offending line AND node.

Tolerance values


For the digitising process in a GIS a number of tolerance values can usually be specified to  prevent some errors from occurring:

 


week 7