Last
Updated:
LTER and
(To download a Word version of this document,
right click, then choose "save
target/link as" and specify a
location: Guide)
i. Table of
Contents
1.5. Required Steps for Site Participation
1.6.2. Do-it-yourself
metadata
2.1. Exchange Format Specification
2.3. Detailed Notes and Examples of the Exchange Format
2.3.1. Exchange Format
Header Line
2.3.3. Exchange Data
Format Rules, Errors and Warnings (Also
see Appendix A.)
2.5. Guidelines for Units of Measurement and Precision for
Each Variable
3.0. Quality Assurance and Control
3.1. Guidelines for General Network QA
3.2. Guidelines for Parameter-Specific Range Checking
3.3. Parameter-Specific Default QC Threshold Values
3.4. Changing Threshold Values for QC Checks
3.5. Implementation of QA/QC Guidelines
3.5.2. Parameter-Specific
Guidelines
3.5.3. QA Warnings Using
Data Quality Flags
5.0. Variable Naming Conventions
7.0. Appendix: Errors, Warnings, and Fatal
Errors
7.1. Fatal Error Messages - program
halts, data is not accepted.
7.2. Error Messages - Program
continues, data point or record is not harvested.
7.3. Warning Messages - Program continues, data points and records may be
accepted or ignored.
The
National Science Foundation's Long-Term Ecological Research (LTER) program and many
U. S. Forest Service Experimental Research Stations collect and maintain
extensive, long-term ecological databases including streamflow and
meteorological measurements. These
databases have been widely used in intersite comparisons, modeling studies, and
land management-related studies. To
facilitate intersite research among the network of LTER sites, information
managers have developed a prototype to provide climatic summaries dynamically
over the Internet (http://www.fsl.orst.edu/climhy/), and serves as one model
for improving access to data across sites (Baker et al. 2000, Henshaw et al.
1998). Individual sites maintain local
climate data in local information systems while a centralized site continually
harvests, updates, and provides access to all sites' data through a common
database. Common distribution report
formats and graphical displays have been established to meet specific needs of
climate data users.
Funding
from the U. S. Forest Service has allowed the climate data prototype (ClimDB) to
be improved and expanded to include hydrologic variables (HydroDB). Mechanisms for capturing appropriate metadata
essential for discovery and interpretation of the hydroclimatological records
are also developed. Report formats and
graphical displays have been updated for the hydrological data. Enhancements to the existing harvester allow
the prototype module to truly function as a production module. Most recent enhancements have combined the
two modules and have made capturing and accessing the data seamless.
Long-Term
Ecological Research (LTER) sites have generally followed established LTER
Climate Committee guidelines (
A
project to conduct climatic analyses of the LTER sites (CLIMDES) gathered
individual site temperature and precipitation data (1960-1990) and created
on-line monthly summaries for each site (Greenland et al. 1997). While the CLIMDES project satisfied an
immediate need for access to monthly site climate data, no mechanisms were
established for updating these summaries.
With synthesis groups needing ready access to current climatic
summaries, a system to provide climatic summaries dynamically over the WWW is
needed. ClimDB was developed in response
to this science-driven need.
Twenty-three
Forest Service experimental sites with long-term hydrologic and associated
meteorologic data have been funded to establish web access to existing long
term data sets. Access will facilitate
use of these data to improve estimates of postfire flood risk and other
scientific and practical uses. Forest
Health Monitoring is seeking to increase the accessiblity of long-term data
on-line by funding linking long-term electronic data sets to a central “web
harvester”. This central portal can
provide direct access to long-term data sets via the world
wide web for a variety of uses including Fire Evaluation Monitoring.
Long-term
data sets of interest include streamflow (l/sec) for gaged watersheds with corresponding precipitation
(mm) and ambient air temperature (oC) data that were collected simultaneous
with hydrologic data and represent conditions in the watershed. Data collected at daily or more frequent
intervals with a data record longer than ten years are preferred. Shorter data records will be considered if
they are part of a current program designed to collect data for longer than ten
years. Metadata describing site
conditions and methods of data collection and processing will also be required
and must conform to specific content and format standards that are under
development.
ClimDB/HydroDB how has the
ability to harvest streamflow data from any real-time USGS gauging station and
processing it for submission on a weekly basis.
For more information, visit http://gce-lter.marsci.uga.edu/lter/research/tools/usgs_harvester.htm.
In the Fall
2003, it was decided to merge ClimDB and HydroDB. The back-end database has always been
seamless, but the front-end interfaces have been different. Therefore, there is only one place to go to
get data and another for participants to harvest data and update metadata.
To participating the site will:
1) Provide the names of
research areas, meteorological stations, gauged watersheds, and gauging station
names and code names to the ClimDB/HydroDB administrator. These names must be in the central database
before any test harvest can proceed.
Additionally, provide the names, addresses and email addresses for a
data contact person as well as all interested principal investigators.
2) Use the online metadata
forms to provide metadata for overall research area, for every weather station
and for every parameter measured at each station, watershed characteristics of
gauged watersheds, and every gauging station.
(See section 4 for metadata categories and descriptors).
3) Provide appropriate quality
assurance parameters for every measured parameter as part of the metadata for
central database validation checking (See Section 3.0). Otherwise the global defaults are assumed
(section 3.1).
4) Restructure local site data
into a standardized daily exchange format (See section 1.3). This process can occur on a scheduled basis
into static files, or can be created dynamically during the harvest process.
5) Provide an Internet address
(URL) to identify the location of the exchange format data file. The address will link to a static file or a
dynamic script. This is entered using
the online metadata forms under the research area category.
6) Harvest data. (Data is in the exchange format and located
at or generated from one of the harvest URLs.)
A web page providing a mechanism for self-harvest is provided. Please resolve any error or warning messages
that are reported, and then re-harvest.
The ClimDB/HydroDB administrator can be contacted if there are
unsolvable problems.
For
sites wishing to add their USGS maintained stations, they need to provide:
1) USGS station number and name
(see attached file for a complete listing of USGS stations)
2) Provide a station code (10
characters or less - you can use the USGS number or not)
3) Provide a watershed name and
watershed code (can be the same as station or not) for streamflow sites
4) Provide a list of measured
parameters at this USGS site (or we will screen for any valid HydroDB
parameters, e.g., precipitation, stream temperature, etc.)
5) Adjust the QC min-max ranges
in the metadata web pages to prevent harvest failures due to excessive WARNING(101) warnings (section 7.2).
Note: General quality assurance criteria (min-max
ranges) for all stations by variables can be entered in the metadata. Before set up in the automated system, the
station’s data are pre-screened with broader upper limits on gage height,
discharge, precipitation, air temperature, etc. if provided. This is a mechanism for eliminating bad
values that might cause ClimDB/HydroDB harvest to fail.
Visit the USGS NWISWeb
data page to see USGS maintained stations (http://waterdata.usgs.gov/nwis/rt)
with Real-time data.
ClimDB/HydroDB allows participating sites to trigger
a harvest of their site’s data from the central site webpage. The newest implementation is allowing sites
to control their data harvest URL from the online metadata forms and two
options per site are allowed. Therefore,
sites will need to specify which harvest URL they would like to use. Additionally, the site will be able to wait
and see any error or warning messages appear directly onto the screen. The success or failure of the harvest will be
known immediately, and data files can be harvested in an iterative process
until all changes or corrections can be made.
The error log will be posted to the screen at the conclusion of the
entire process, which might take several minutes. Additionally, the error log file is
automatically emailed to the site’s data set contact person and the
ClimDB/HydroDB database administrator.
Another change in this implementation is the
preservation of previously harvested data.
If data has been previously harvested, it does not have to be harvested
again (although it is ok to do so).
However, if changes need to be made, simply re-harvest an edited
exchange file containing corrected data or both new and corrected data.
The
harvester mechanics are divided into 3 phases: harvest, ingestion, and
population.
Site climatic and hydrologic metadata descriptors
are entered using a password protected web entry form. Metadata can be entered in piecemeal fashion
and edited again at a later time.
Metadata is separated into various categories by their descriptors. (See
Section 4.0. and associated webpages for more on metadata.)
The
valid implementation variables follow.
Please refer to section 3.1 for the valid variable names to be used in
the exchange format.
While every effort will be made to assure the
integrity of the ClimDB/HydroDB central database, complete accuracy cannot be
guaranteed. Users of ClimDB/HydroDB will
take responsibility for subsequent use of any data retrieved. Data providers understand that ClimDB/HydroDB
datasets are public.
The
exchange file is fundamental to the operation of ClimDB/HydroDB. The following are some basic guidelines for
the exchange file:
Note: Current
valid variable names are listed in Section 3.1 along with their data limits. The data quality flag uses the same variable
name preceded by the word “Flag_”
Here is an example header line for air temperature
and precipitation data. Note, the header
line could be one long continuous line, but this example uses continuation
characters (further described in section 2.3.1):
!LTER_Site, Station, Date, Daily_AirTemp_Mean_C, Flag_Daily_AirTemp_Mean_C,
\
#Daily_AirTemp_AbsMax_C, Flag_Daily_AirTemp_AbsMax_C, Daily_AirTemp_AbsMin_C,
\
#Flag_Daily_AirTemp_AbsMin_C,
Daily_Precip_Total_mm, Flag_Daily_Precip_Total_mm
Examples of variable names are defined as follows:
|
LTER_Site |
A three-letter LTER/Research Area site code
assigned by ClimDB/HydroDB database administrator |
|
Station |
Local site name for the weather station or gauging
station (10 character max) |
|
Date |
An 8 character field, yyyymmdd |
|
Daily_AirTemp_Mean_C |
Mean daily air temperature |
|
Flag_Daily_AirTemp_Mean_C |
Data quality flag for mean daily air temperature. |
|
Daily_AirTemp_AbsMax_C |
Daily absolute maximum air temperature. |
|
Flag_Daily_AirTemp_AbsMax_C |
Data quality flag for daily absolute maximum air
temperature |
|
Daily_AirTemp_AbsMin_C |
Daily absolute minimum air temperature. |
|
Flag_Daily_AirTemp_AbsMin_C |
Data quality flag for daily absolute minimum air
temperature |
|
Daily_Precip_Total_mm |
Daily total precipitation |
|
Flag_Daily_Precip_Total_mm |
Data quality flag for daily total precipitation |
|
Daily_Discharge_Mean_Lps |
Mean daily discharge |
|
Flag_Daily_Discharge_Mean_Lps |
Data quality flag for mean daily discharge |
Here
is the list of valid codes for data quality flags:
|
G
or blank |
Value
is a good value (blank is preferred) |
|
E |
Value
is estimated |
|
Q |
Value
is questionable |
|
M |
Value
is missing (in this case, it is preferred to leave value field null or blank
with the data quality flag = “M”. It
will be allowed to assign the value of “9999” to the data field with the data
quality flag = “M”, but not preferred.) |
|
T |
Trace
value (For precipitation only. Values
must be assigned to the data field (e.g., assign a zero or 0.1). DO NOT leave the data field null or blank. |
Here is a precise example of the daily exchange format including
the header line from the
!LTER_Site,
Station, Date, Daily_AirTemp_Mean_C, Flag_Daily_AirTemp_Mean_C, \
#Daily_AirTemp_AbsMax_C, Flag_Daily_AirTemp_AbsMax_C, \
#Daily_AirTemp_AbsMin_C,Flag_Daily_AirTemp_AbsMin_C, \
#Daily_Precip_Total_mm, Flag_Daily_Precip_Total_mm
AND,PRIMET,19960101,6.8, ,10.8,Q,4.5, , 0.0,T
AND,PRIMET,19960102,5.3,
,10.6,Q,0.8, , 4.3,
AND,PRIMET,19960103,7.7, , 9.7,
,4.1, ,20.6,
AND,PRIMET,19960104,4.2, , 6.7, ,2.4, ,11.4,
AND,PRIMET,19960105,4.8,E, 7.4,E,2.7,E,
,M AND,PRIMET,19960106,5.7,E, 9.7,E,1.3,E, ,M
One comma-delimited header line is followed by an
indefinite number of comma-delimited data records (lines). ClimDB/HydroDB is coded so that a data record
(line) value is based on the immediately preceding header line. Here is a more generic example of the
exchange format file.
!Lter_site, station, date, field1,
flag_field1, field2, flag_field2,\ #field3, flag_field3, field4, flag_field4
ABC,MY_STATION,19970228,111.1,,222.22,E,333.3,,444.4,
ABC,MY_STATION,19970304,,,,,,,34,Q
(Note: the next line will cause an ERROR(101) to be logged and this one data record will be
ignored because the header and data do not match.)
ABC,MY_STATION,19970305,27,E
In this example, field names are fictional to
demonstrate the generality of formats.
In practice the field names would be known names such as
daily_airtemp_mean_c in place of field 1.
See section 3.1 for the current valid variable names.
Also in this example, the value 222.22 corresponds
to the variable name “field2” for 19970228 due to the number of commas that
precede 222.22 on that line. According
to the format, there must be 5 commas before field2.
The generic example above has a format header line,
denoted by the reserved character "!" (bang
or exclamation point) followed by data lines.
This header specifies 11 comma-separated fields.
Note: you can
continue lines (any lines, header and/or data) if you end the previous line
with a ‘\’ and then begin the next line with ‘#’.
Multiple header lines can appear within an exchange
file. That is, if the data variables in
the data set change (e.g., different variables included, a change in the order
of variables, variables added or removed, station changes, etc.), a new header
line can be inserted followed by the corresponding data set. If only the station name changes with the
variable list remaining the same, a new header line is not necessary. This will produce a WARNING(107) and the data will be
successfully harvested. However for
better interpretation of the log file, multiple headers should be included with
a new header line for each station.
Note: no other
data delimiter may be used other than a comma. All variable names that appear in a format header are pre-assigned
names. It is assumed that no
variable names shall ever be devised that are not restricted to A-Z, 0 - 9, and
underscore. (No non-standard characters such
as %, /, etc. will be accepted.) However
for convenience, case sensitivity, underscores, and spaces will be ignored in
evaluation of the variable names. (Thus daily_airtemp_mean_c could be represented as DailyAirTempMeanC if desired and still be recognized).
It is recommended that small gaps in the record be
filled in with records and ‘M’issing flags as
appropriate. However, it is not
necessary to pad the fields and flags with M where data is missing. If all the data fields specified in the
format are missing for a date, the record does not have to appear at all, as in
the gap between Feb 28 and March 4 in the example. Large gaps should be noted in the metadata
comment field. If only some data fields
are missing, the specified data fields must appear with the appropriate number
of preceding commas but missing values can be blank or null. In line 2 of the example, the program can
tell that field1, field2, and field3 are missing, but field4 is present with a
value of 34 and a flag of Q.
Note that 9999
may be supplied as a placeholder for a missing value, even if the variable is
not numeric. (We may have non-numeric
data variables in the future.)