Last Updated:  14 January 2004

 

LTER and U.S. Forest Service Climate/Hydrology Database Guidelines

 

(To download a Word version of this document, right click, then choose "save target/link as" and specify a location: Guide)

 

 

i. Table of Contents

 

 

1.0.      Introduction

1.1.    ClimDB Overview

1.2.    HydroDB Overview

1.3.    USGS Data Overview

1.4.    Project History

1.5.    Required Steps for Site Participation

1.6.    Implementation

1.6.1.   Do-it-yourself harvest

1.6.2.   Do-it-yourself metadata

1.6.3.   Measurement Parameters

1.7.    Disclaimer and Caveats

2.0.      Exchange Format

2.1.    Exchange Format Specification

2.2.    Data Quality Flags

2.3.    Detailed Notes and Examples of the Exchange Format

2.3.1.   Exchange Format Header Line

2.3.2.   Missing Data

2.3.3.   Exchange Data Format Rules, Errors and Warnings (Also see Appendix A.)

2.4.    Data Aggregation Rules

2.5.    Guidelines for Units of Measurement and Precision for Each Variable

3.0.      Quality Assurance and Control

3.1.    Guidelines for General Network QA

3.2.    Guidelines for Parameter-Specific Range Checking

3.3.    Parameter-Specific Default QC Threshold Values

3.4.    Changing Threshold Values for QC Checks

3.5.    Implementation of QA/QC Guidelines

3.5.1.   General Harvest

3.5.2.   Parameter-Specific Guidelines

3.5.3.   QA Warnings Using Data Quality Flags

4.0.      Metadata Database

4.1.    Metadata Categories

5.0.      Variable Naming Conventions

6.0.      Literature Cited

7.0.      Appendix: Errors, Warnings, and Fatal Errors

7.1.    Fatal Error Messages - program halts, data is not accepted.

7.2.    Error Messages - Program continues, data point or record is not harvested.

7.3.    Warning Messages - Program continues, data points and records may be accepted or ignored.

 

 


1.0.                        Introduction

 

The National Science Foundation's Long-Term Ecological Research (LTER) program and many U. S. Forest Service Experimental Research Stations collect and maintain extensive, long-term ecological databases including streamflow and meteorological measurements.  These databases have been widely used in intersite comparisons, modeling studies, and land management-related studies.  To facilitate intersite research among the network of LTER sites, information managers have developed a prototype to provide climatic summaries dynamically over the Internet (http://www.fsl.orst.edu/climhy/), and serves as one model for improving access to data across sites (Baker et al. 2000, Henshaw et al. 1998).  Individual sites maintain local climate data in local information systems while a centralized site continually harvests, updates, and provides access to all sites' data through a common database.  Common distribution report formats and graphical displays have been established to meet specific needs of climate data users.

 

Funding from the U. S. Forest Service has allowed the climate data prototype (ClimDB) to be improved and expanded to include hydrologic variables (HydroDB).  Mechanisms for capturing appropriate metadata essential for discovery and interpretation of the hydroclimatological records are also developed.  Report formats and graphical displays have been updated for the hydrological data.  Enhancements to the existing harvester allow the prototype module to truly function as a production module.  Most recent enhancements have combined the two modules and have made capturing and accessing the data seamless.

 

1.1.      ClimDB Overview

 

Long-Term Ecological Research (LTER) sites have generally followed established LTER Climate Committee guidelines (Greenland 1986) for collecting baseline meteorological data.  Standardized measurements provide a basis for coordinating meteorological measurements at two or more sites and enable intersite comparisons.  However, access to comparable datasets from multiple sites is problematic.  While most sites make climate data accessible via the World-Wide Web (WWW), the data are displayed in a variety of formats, are aggregated using different methods, and are often not easily located.  

 

A project to conduct climatic analyses of the LTER sites (CLIMDES) gathered individual site temperature and precipitation data (1960-1990) and created on-line monthly summaries for each site (Greenland et al. 1997).  While the CLIMDES project satisfied an immediate need for access to monthly site climate data, no mechanisms were established for updating these summaries.  With synthesis groups needing ready access to current climatic summaries, a system to provide climatic summaries dynamically over the WWW is needed.  ClimDB was developed in response to this science-driven need.

 

1.2.      HydroDB Overview

 

Twenty-three Forest Service experimental sites with long-term hydrologic and associated meteorologic data have been funded to establish web access to existing long term data sets.  Access will facilitate use of these data to improve estimates of postfire flood risk and other scientific and practical uses.  Forest Health Monitoring is seeking to increase the accessiblity of long-term data on-line by funding linking long-term electronic data sets to a central “web harvester”.  This central portal can provide direct access to long-term data sets via the world wide web for a variety of uses including Fire Evaluation Monitoring.

 

Long-term data sets of interest include streamflow (l/sec) for gaged watersheds with corresponding  precipitation (mm) and ambient air temperature (oC) data that were collected simultaneous with hydrologic data and represent conditions in the watershed.  Data collected at daily or more frequent intervals with a data record longer than ten years are preferred.  Shorter data records will be considered if they are part of a current program designed to collect data for longer than ten years.  Metadata describing site conditions and methods of data collection and processing will also be required and must conform to specific content and format standards that are under development.

 

1.3.      USGS Data Overview

ClimDB/HydroDB how has the ability to harvest streamflow data from any real-time USGS gauging station and processing it for submission on a weekly basis.  For more information, visit http://gce-lter.marsci.uga.edu/lter/research/tools/usgs_harvester.htm.

 

1.4.      Project History

 

In the Fall 2003, it was decided to merge ClimDB and HydroDB.  The back-end database has always been seamless, but the front-end interfaces have been different.  Therefore, there is only one place to go to get data and another for participants to harvest data and update metadata.

 

1.5.      Required Steps for Site Participation

 

To participating the site will:

 

1)      Provide the names of research areas, meteorological stations, gauged watersheds, and gauging station names and code names to the ClimDB/HydroDB administrator.  These names must be in the central database before any test harvest can proceed.  Additionally, provide the names, addresses and email addresses for a data contact person as well as all interested principal investigators.

2)      Use the online metadata forms to provide metadata for overall research area, for every weather station and for every parameter measured at each station, watershed characteristics of gauged watersheds, and every gauging station.  (See section 4 for metadata categories and descriptors).

3)      Provide appropriate quality assurance parameters for every measured parameter as part of the metadata for central database validation checking (See Section 3.0).  Otherwise the global defaults are assumed (section 3.1).

4)      Restructure local site data into a standardized daily exchange format (See section 1.3).  This process can occur on a scheduled basis into static files, or can be created dynamically during the harvest process.

5)      Provide an Internet address (URL) to identify the location of the exchange format data file.  The address will link to a static file or a dynamic script.  This is entered using the online metadata forms under the research area category.

6)      Harvest data.  (Data is in the exchange format and located at or generated from one of the harvest URLs.)  A web page providing a mechanism for self-harvest is provided.  Please resolve any error or warning messages that are reported, and then re-harvest.  The ClimDB/HydroDB administrator can be contacted if there are unsolvable problems.

 

For sites wishing to add their USGS maintained stations, they need to provide:

1)      USGS station number and name (see attached file for a complete listing of USGS stations)

2)      Provide a station code (10 characters or less - you can use the USGS number or not)

3)      Provide a watershed name and watershed code (can be the same as station or not) for streamflow sites

4)      Provide a list of measured parameters at this USGS site (or we will screen for any valid HydroDB parameters, e.g., precipitation, stream temperature, etc.)

5)      Adjust the QC min-max ranges in the metadata web pages to prevent harvest failures due to excessive WARNING(101) warnings (section 7.2).

 

Note: General quality assurance criteria (min-max ranges) for all stations by variables can be entered in the metadata.  Before set up in the automated system, the station’s data are pre-screened with broader upper limits on gage height, discharge, precipitation, air temperature, etc. if provided.  This is a mechanism for eliminating bad values that might cause ClimDB/HydroDB harvest to fail.

 

Visit the USGS NWISWeb data page to see USGS maintained stations (http://waterdata.usgs.gov/nwis/rt) with Real-time data.

 

1.6.      Implementation

 

1.6.1.  Do-it-yourself harvest

 

ClimDB/HydroDB allows participating sites to trigger a harvest of their site’s data from the central site webpage.  The newest implementation is allowing sites to control their data harvest URL from the online metadata forms and two options per site are allowed.  Therefore, sites will need to specify which harvest URL they would like to use.  Additionally, the site will be able to wait and see any error or warning messages appear directly onto the screen.  The success or failure of the harvest will be known immediately, and data files can be harvested in an iterative process until all changes or corrections can be made.  The error log will be posted to the screen at the conclusion of the entire process, which might take several minutes.  Additionally, the error log file is automatically emailed to the site’s data set contact person and the ClimDB/HydroDB database administrator.

 

Another change in this implementation is the preservation of previously harvested data.  If data has been previously harvested, it does not have to be harvested again (although it is ok to do so).  However, if changes need to be made, simply re-harvest an edited exchange file containing corrected data or both new and corrected data. 

 

The harvester mechanics are divided into 3 phases: harvest, ingestion, and population.

  1. In the harvest portion, the central harvester checks to make sure the URL address is valid and active, and then captures the exchange file from this URL address.  An error message will be logged if the harvest fails.
  2. The ingestion phase does all of the data screening for errors and generates warning and error messages to the log file.  Header line and data set compatibility and consistency, as well as quality assurance checking are done here.  This process is generally fairly quick unless there are massive numbers (10,000’s) of data records.
  3. The population phase takes a few minutes.  Here, the data is placed into the relational database.  Even if you abandon the web page during this process, the process will complete itself, and the resultant error log file will be posted to the screen as well as emailed to the data set contact and the ClimDB/HydroDB database administrator.  The log file should be checked to make sure the data was successfully harvested.  The data will be instantly available on the download portion of the webpage for users to check.

 

1.6.2.  Do-it-yourself metadata

 

Site climatic and hydrologic metadata descriptors are entered using a password protected web entry form.  Metadata can be entered in piecemeal fashion and edited again at a later time.  Metadata is separated into various categories by their descriptors. (See Section 4.0. and associated webpages for more on metadata.) 

 

1.6.3.  Measurement Parameters

 

The valid implementation variables follow.  Please refer to section 3.1 for the valid variable names to be used in the exchange format.

  1. Air temperature; daily minimum, maximum, and mean in degrees Celsius (°C)
  2. Atmospheric pressure; daily mean in hectopascals (hPa)
  3. Dewpoint temperature; daily mean in degrees Celsius (°C)
  4. Global solar radiation; daily total in MegaJoules per square meter (MJm-2)
  5. Precipitation; daily total in millimeters (mm)
  6. Relative humidity; daily mean in percent (%)
  7. Snow depth (water equivalence); daily instantaneous observation in millimeters (mm of water).
  8. Soil Moisture; daily mean in megapascals (MPa)
  9. Soil temperature; daily mean in degrees Celsius (°C)
  10. Stream Discharge; daily mean in liters per second (l/sec)
  11. Vapor pressure; daily mean in hectopascals (hPa)
  12. Water Temperature; daily minimum, maximum, and mean in degrees Celsius (°C)
  13. Wind direction and resultant wind direction; daily mean in degrees azimuths (deg)
  14. Wind speed and resultant wind speed; daily mean in meters per second (m/sec)

 

1.7.      Disclaimer and Caveats

 

While every effort will be made to assure the integrity of the ClimDB/HydroDB central database, complete accuracy cannot be guaranteed.  Users of ClimDB/HydroDB will take responsibility for subsequent use of any data retrieved.  Data providers understand that ClimDB/HydroDB datasets are public.

 

 

2.0.                        Exchange Format

 

2.1.      Exchange Format Specification

 

The exchange file is fundamental to the operation of ClimDB/HydroDB.   The following are some basic guidelines for the exchange file:

  1. The exchange file is comma-delimited ASCII.
  2. The exchange file must be made internet-accessible at a specific publicly accessible URL at the local site.
  3. The exchange file can be static (created ahead of time) or dynamically created on the fly by a web script. 
  4. The exchange file contains a header line that describes the sequence of variables and their associated flag name contained in the data.  The "!" (bang or exclamation point) character must initiate the header line and is reserved for this use only. 
  5. The header line is followed by the comma-delimited data. 
  6. A data quality flag directly follows each variable. 

Note: Current valid variable names are listed in Section 3.1 along with their data limits.  The data quality flag uses the same variable name preceded by the word “Flag_”

 

Here is an example header line for air temperature and precipitation data.  Note, the header line could be one long continuous line, but this example uses continuation characters (further described in section 2.3.1):

!LTER_Site, Station, Date, Daily_AirTemp_Mean_C, Flag_Daily_AirTemp_Mean_C, \

#Daily_AirTemp_AbsMax_C, Flag_Daily_AirTemp_AbsMax_C, Daily_AirTemp_AbsMin_C, \

#Flag_Daily_AirTemp_AbsMin_C, Daily_Precip_Total_mm, Flag_Daily_Precip_Total_mm

 

Examples of variable names are defined as follows:

 

LTER_Site

A three-letter LTER/Research Area site code assigned by ClimDB/HydroDB database administrator

Station

Local site name for the weather station or gauging station (10 character max)

Date

An 8 character field, yyyymmdd

Daily_AirTemp_Mean_C

Mean daily air temperature

Flag_Daily_AirTemp_Mean_C

Data quality flag for mean daily air temperature.

Daily_AirTemp_AbsMax_C

Daily absolute maximum air temperature.

Flag_Daily_AirTemp_AbsMax_C

Data quality flag for daily absolute maximum air temperature

Daily_AirTemp_AbsMin_C

Daily absolute minimum air temperature.

Flag_Daily_AirTemp_AbsMin_C

Data quality flag for daily absolute minimum air temperature

Daily_Precip_Total_mm

Daily total precipitation

Flag_Daily_Precip_Total_mm

Data quality flag for daily total precipitation

Daily_Discharge_Mean_Lps

Mean daily discharge

Flag_Daily_Discharge_Mean_Lps

Data quality flag for mean daily discharge

 

2.2.      Data Quality Flags

 

Here is the list of valid codes for data quality flags:

 

G or blank

Value is a good value (blank is preferred)

E

Value is estimated

Q

Value is questionable

M

Value is missing (in this case, it is preferred to leave value field null or blank with the data quality flag = “M”.  It will be allowed to assign the value of “9999” to the data field with the data quality flag = “M”, but not preferred.)

T

Trace value (For precipitation only.  Values must be assigned to the data field (e.g., assign a zero or 0.1).  DO NOT leave the data field null or blank.

 

 

2.3.      Detailed Notes and Examples of the Exchange Format

 

Here is a precise example of the daily exchange format including the header line from the Andrews Forest (AND) site’s Primary Meteorological Station (PRIMET).  Note: (1) The data has been aligned for readability, but fill spaces are not necessary; (2) the header line could be one long continuous line, but this example uses continuation characters (described below in section 2.3.1).

    
!LTER_Site, Station, Date, Daily_AirTemp_Mean_C, Flag_Daily_AirTemp_Mean_C, \

#Daily_AirTemp_AbsMax_C, Flag_Daily_AirTemp_AbsMax_C, \

#Daily_AirTemp_AbsMin_C,Flag_Daily_AirTemp_AbsMin_C,  \

#Daily_Precip_Total_mm, Flag_Daily_Precip_Total_mm

AND,PRIMET,19960101,6.8, ,10.8,Q,4.5, , 0.0,T

AND,PRIMET,19960102,5.3, ,10.6,Q,0.8, , 4.3,

AND,PRIMET,19960103,7.7, , 9.7, ,4.1, ,20.6,

AND,PRIMET,19960104,4.2, , 6.7, ,2.4, ,11.4, AND,PRIMET,19960105,4.8,E, 7.4,E,2.7,E,    ,M AND,PRIMET,19960106,5.7,E, 9.7,E,1.3,E,    ,M

 

One comma-delimited header line is followed by an indefinite number of comma-delimited data records (lines).  ClimDB/HydroDB is coded so that a data record (line) value is based on the immediately preceding header line.  Here is a more generic example of the exchange format file. 

 

!Lter_site, station, date, field1, flag_field1, field2, flag_field2,\ #field3, flag_field3, field4, flag_field4

ABC,MY_STATION,19970228,111.1,,222.22,E,333.3,,444.4,

ABC,MY_STATION,19970304,,,,,,,34,Q

 (Note: the next line will cause an ERROR(101) to be logged and this one data record will be ignored because the header and data do not match.)

ABC,MY_STATION,19970305,27,E  

 

In this example, field names are fictional to demonstrate the generality of formats.  In practice the field names would be known names such as daily_airtemp_mean_c in place of field 1.  See section 3.1 for the current valid variable names.

 

Also in this example, the value 222.22 corresponds to the variable name “field2” for 19970228 due to the number of commas that precede 222.22 on that line.  According to the format, there must be 5 commas before field2.

 

2.3.1.  Exchange Format Header Line

 

The generic example above has a format header line, denoted by the reserved character "!" (bang or exclamation point) followed by data lines.  This header specifies 11 comma-separated fields.

 

Note: you can continue lines (any lines, header and/or data) if you end the previous line with a ‘\’ and then begin the next line with ‘#’.

 

Multiple header lines can appear within an exchange file.  That is, if the data variables in the data set change (e.g., different variables included, a change in the order of variables, variables added or removed, station changes, etc.), a new header line can be inserted followed by the corresponding data set.  If only the station name changes with the variable list remaining the same, a new header line is not necessary.  This will produce a  WARNING(107) and the data will be successfully harvested.   However for better interpretation of the log file, multiple headers should be included with a new header line for each station.

 

Note: no other data delimiter may be used other than a comma. All variable names that appear in a format header are pre-assigned names.  It is assumed that no variable names shall ever be devised that are not restricted to A-Z, 0 - 9, and underscore.  (No non-standard characters such as %, /, etc. will be accepted.)  However for convenience, case sensitivity, underscores, and spaces will be ignored in evaluation of the variable names.  (Thus daily_airtemp_mean_c could be represented as DailyAirTempMeanC if desired and still be recognized). 

 

2.3.2.  Missing Data

 

It is recommended that small gaps in the record be filled in with records and ‘M’issing flags as appropriate.  However, it is not necessary to pad the fields and flags with M where data is missing.  If all the data fields specified in the format are missing for a date, the record does not have to appear at all, as in the gap between Feb 28 and March 4 in the example.  Large gaps should be noted in the metadata comment field.  If only some data fields are missing, the specified data fields must appear with the appropriate number of preceding commas but missing values can be blank or null.  In line 2 of the example, the program can tell that field1, field2, and field3 are missing, but field4 is present with a value of 34 and a flag of Q. 

 

Note that 9999 may be supplied as a placeholder for a missing value, even if the variable is not numeric.  (We may have non-numeric data variables in the future.)