9. DATA COLLECTION AND MANAGEMENT
Accomplishment of the GCIP major science objectives involves the development of a comprehensive and accessible database for the Mississippi River basin. Volume I of the GCIP Implementation Plan (IGPO 1993) contains information that (1) identifies the sources of observations from existing and planned networks; (2) further enhances those networks where necessary; and (3) assists in developing data sets accumulated from existing observational systems and derived from operational model outputs, such as the NOAA/NCEP Eta regional mesoscale model. The strategic portion of the data management planning (IGPO 1994b) establishes the implementation strategies needed to achieve the data collection and management objective:
* Provide access to comprehensive in-situ, remote sensing and model output data sets for use in GCIP research and as a benchmark for future studies.
The GCIP Data Management and Service System (DMSS) is shown in Figure 9-1 as a user service configuration based on accessing the GCIP Home Page on the World Wide Web through the URL address:
Figure 9-1 GCIP DMSS user services configuration.
The GCIP Data Management and Service System (DMSS)
implementation strategy makes maximum use of existing data
centers which are made an integral part of the GCIP-DMSS through
four data source modules that specialize by data types (i.e., in
situ, model output, satellite remote sensing, and GCIP special
data). These four data source modules are connected to a GCIP
central information source that provides "single-point access" to
the GCIP-DMSS. The primary responsibilities for the data source
modules along with their major functions and activities were
described in Volume III of the GCIP Implementation Plan
(IGPO 1994b).
9.1 Overall Objectives
The goal of the DMSS is to make GCIP data available to GCIP
investigators and to the international scientific community
interested in GCIP. The data services are provided through a
system which will have multiyear data set information that will
be of continuing research use after GCIP is completed. These two
items led to the following overall objectives for the DMSS:
(1) During the course of GCIP, the GCIP data management
system will compile information on the data that are
collected in the data centers to produce special data
sets for GCIP users and to provide a single-point
access to service user requests for GCIP data.
(2) At the completion of GCIP, the GCIP data management
system will turn over the composite data set
documentation (metadata) to a permanent archiving
agency for continuing use in climate-related studies.
The topic of GCIP data management is divided into strategic
and tactical planning efforts. The strategic portion of the GCIP
data management plan is covered in Volume III of the GCIP
Implementation Plan (IGPO 1994b).
A tactical data management plan
is prepared for each definable data set produced by the DMSS.
9.2 Data Availability and Costs
The GCIP Science Plan (WMO 1992)
recognized that the success
of the Project depends on scientists and agency participants
sharing their data with each other. The timely archival of data
collected or processed by GCIP researchers, along with mechanisms
to ensure open and minimal-cost distribution to all researchers,
requires a clearly stated and implementable data policy. Such a
GCIP data policy concerning access to GCIP data was given in the
GCIP Science Plan (WMO 1992).
Data management will incur costs primarily for the
collection of information on the data and the reproduction costs
to compile data sets. The costs incurred for the initial
compilation of information on the data will be borne by the
Project. Costs for data sets that are compiled for general use
by researchers involved in the Project will also be borne by the
Project. Costs for data sets to individual specifications will,
in general, be borne by the user making the request for the data.
This topic is described further in Section 10 and was also
described in Section 3 of Volume III of the GCIP Implementation
Plan (IGPO 1994b).
9.3 System and Services Approach
To the extent possible GCIP relies upon existing or planned
operational, or, at least, systematic observing programs
operating over the Mississippi River basin, including space-based
observations. The essential task is to assemble information
about relevant data sets and implement a data management system
to support the scientific program. The DMSS takes advantage of
the ongoing data management activities of related projects and
programs such as Atmospheric Radiation Measurement (ARM), Earth
Observing System Data and Information System (EOSDIS), U.S.
Weather Research Program (USWRP), and others. Data sets and data
management infrastructure under development for these programs
are being used by the DMSS to the fullest extent possible. Each
of these programs has, or is developing, data management systems
with GCIP-relevant data to access through the GCIP-DMSS.
9.4 DMSS Overall Design
The data management strategy of GCIP relies fundamentally on
working with and through existing data centers. A variety of
organizations, including the National Climatic Data Center (NCDC)
of the National Oceanic and Atmospheric Administration (NOAA) and
the National Water Information System (NWIS), of the USGS already
have extensive capabilities for processing, validating, storing,
cataloging, retrieving, and disseminating environmental data.
The DMSS in use during the first two to three years of the
EOP is labeled the Prototype system and will not contain all the
features that are technically feasible. The DMSS will
incorporate improvements and new developments as these become
operational at the existing centers to evolve to an Advanced
system. It is envisioned that once the system is more fully
operational, users will be able to sign onto a central computer
and examine the GCIP master catalog to determine the data set(s)
that best meet their requirements. If they desire additional
information on a selected data set, the access software will
route them to the data source module for the particular data type
for more specific information. They will then be able to examine
detailed data guides or discuss their data needs with someone
knowledgeable about the GCIP data sets who can assist them in
searching and ordering the data from the correct existing data
center. The users can, if desired, go directly from the master
catalog to the existing data center to place an order for data.
To develop the distributed data management system envisioned
for GCIP in the most cost effective manner the DMSS Data Source
Modules will strive to make the best use of current and planned
capabilities of each pertinent data center. The DACOM recognized
that the specific data service policies and procedures can vary
among the existing data centers and the Project will need to
adapt its "GCIP specific" portion of the DMSS, shown in
Figure 9-1, to these variations.
The principal GCIP data centers form the backbone of the
data management system. A principal data center is responsible
for a significant volume of data pertinent to GCIP and has the
capability to provide on-line access to data catalogs,
inventories, and ordering systems. The center's on-line access
system will be connected to and accessible through an electronic
link to the DMSS. Since a center's designation as a principal
data center is dependent upon its technical capabilities, under
GCIP some supplementary centers will be changed to principal
centers as GCIP evolves during the EOP.
9.5 Near-Term Improvements
The flexibility of the DMSS configuration shown in Figure 9-1
makes it possible for each of the modules to evolve at
different rates which can be closely related to the specific data
centers connected to the module. A summary of the projected
improvements by each of the modules is given in the following
paragraphs:
GCIP Central Information Source
Responsible Agency: GCIP Project Office hosted by NOAA Office of
Global Programs Silver Spring, MD
Contact: Adrienne Calhoun
The GCIP Central Information Source (GCIS) is responsible
for a variety of major functions as listed in Section 5, Volume
III of the GCIP Implementation Plan. The DACOM will be asked to
review these functions and make recommendations on how they can
best be implemented in light of the experience gained from using
the World Wide Web as a communications media for information
about GCIP data.
The World Wide Web enables the GCIS to make use of this
medium for providing information about all the significant items
in GCIP in addition to providing the central contact for
information about the DMSS. The GCIP Project Office is compiling
information about GCIP to provide through the GCIP Home Page.
The GCIS will provide a mechanism for feedback from the
users and incorporate these suggestions in its attempts to make
this new medium a useful tool for the GCIP users.
In Situ Data Source Module
Responsible Agency: Joint Office for Science Support (JOSS) UCAR Boulder, CO
Contact: S. Williams
The In-situ Module is responsible for providing data
management and information resources for surface, upper air,
radar, and land surface characteristics data of interest to GCIP.
The Module uses the UCAR/JOSS Data Management System (CODIAC)
which has been the GCIP DMSS "on-line" demonstration" system. A
number of activities are planned for the DMSS In-Situ Module
during the next two years:
1) Continue in-situ data collection for the 5-year GCIP
Enhanced Observing Period (EOP), scheduled which began
in October 1995. Also select and publish appropriate
subsets of EOP data using CD-ROM media.
2) Complete the in-situ data collection process for the
1997 Enhanced Seasonal Observing Period (ESOP-97),
October 1996 through May 1997 in the Arkansas-Red River
Basin. Also select and publish appropriate subsets of
ESOP data using CD-ROM media.
3) Continue to provide and add preliminary GCIP "Quick
Response" data sets (i.e. 2 month lag) to the GCIP
Scientific Community via CODIAC. These data sets would
be available for both the EOP as well as the Enhanced
Annual and Seasonal Observing Periods.
4) Continue to provide GCIP Initial Data Sets (GIDS) to
the GCIP Scientific Community via on-line access and CD-ROM media.
5) Continue development of World Wide Web (WWW)
enhancements to the Module and data access links to
CODIAC as well as coordination of such development with the other Modules.
6) Continue establishment of on-line data links to other
in-situ GCIP primary data centers as well as improved
links to other NCDC data sets (i.e. WSR-88D Level II radar data).
7) Set up and execute the in-situ data collection process
for the ESOP-98, October 1997 through May 1998 in the
LSA-NC. Also select and publish appropriate subsets of ESOP data using CD-ROM media.
8) Set out and execute the in-situ data collection for the
Enhanced Annual Observing Period (EAOP-98),
October 1997 through September 1998 in the LSA-E/ Also
select and publish appropriate subsets of EAOP data using CD-ROM media.
Model Output Data Source Module
Scientific Data Services; NCAR; Boulder, CO
Contact: R. Jenne
The Model Output Data Source Module is responsible for
providing data management and information resources for GCIP-relevant
model output data and products. The Module uses the
NCAR Scientific Data Services as the infrastructure and expertise for GCIP support.
During the next three years this Module will concentrate on
establishing a data archive for the output from three different
regional models:
The data management plans for this large volume of model
output are evolving as an ongoing effort to balance the
investigator needs with the resources available as described in
Section 2.4 and Appendix B.
Satellite Remote Sensing
Responsible Agency: Global Hydrology and Climate Center (DAAC); NASA/MSFC Huntsville, AL
Contact: A. Ritchie
The GCIP Satellite Remote Sensing Data Source Module is
responsible for providing data management and information
resources for GCIP-relevant satellite data and products. The
satellite module participates in several coordinating functions
within the GCIP project primarily through DACOM.
The WWW is the implementation choice of the DMSS and allows
the satellite module to provide information and easily link to
other existing information at the various data centers. The
satellite module continues to compile information about the GCIP
data requirements to coordinate readily available data sets as
specified by the Principal Research Areas, the DACOM, and other
GCIP-related inputs.
The evolution of the satellite home page begins with the
initial prototype configuration. The prototype provided an
overview, high-level data access to existing archives, CD-ROM
information, and links with the other active modules. The
prototype home page provides a mechanism to solicit inputs from
the entire GCIP science community.
Eta Model output from NOAA/NMC
RFE (now GEM) Model output from AES/CMC
MAPS Model output from NOAA/FSL