9. DATA COLLECTION AND MANAGEMENT

Accomplishment of the GCIP major science objectives involves the development of a comprehensive and accessible database for the Mississippi River basin. Volume I of the GCIP Implementation Plan (IGPO 1993) contains information that (1) identifies the sources of observations from existing and planned networks; (2) further enhances those networks where necessary; and (3) assists in developing data sets accumulated from existing observational systems and derived from operational model outputs, such as the NOAA/NCEP Eta regional mesoscale model. The strategic portion of the data management planning (IGPO 1994b) establishes the implementation strategies needed to achieve the data collection and management objective:

* Provide access to comprehensive in-situ, remote sensing and model output data sets for use in GCIP research and as a benchmark for future studies.

The GCIP Data Management and Service System (DMSS) is shown in Figure 9-1 as a user service configuration based on accessing the GCIP Home Page on the World Wide Web through the URL address:

http://www.ogp.noaa.gov/gcip/


[user_services]

Figure 9-1 GCIP DMSS user services configuration.


The GCIP Data Management and Service System (DMSS) implementation strategy makes maximum use of existing data centers which are made an integral part of the GCIP-DMSS through four data source modules that specialize by data types (i.e., in situ, model output, satellite remote sensing, and GCIP special data). These four data source modules are connected to a GCIP central information source that provides "single-point access" to the GCIP-DMSS. The primary responsibilities for the data source modules along with their major functions and activities were described in Volume III of the GCIP Implementation Plan (IGPO 1994b).

9.1 Overall Objectives

The goal of the DMSS is to make GCIP data available to GCIP investigators and to the international scientific community interested in GCIP. The data services are provided through a system which will have multiyear data set information that will be of continuing research use after GCIP is completed. These two items led to the following overall objectives for the DMSS:

The topic of GCIP data management is divided into strategic and tactical planning efforts. The strategic portion of the GCIP data management plan is covered in Volume III of the GCIP Implementation Plan (IGPO 1994b). A tactical data management plan is prepared for each definable data set produced by the DMSS.

9.2 Data Availability and Costs

The GCIP Science Plan (WMO 1992) recognized that the success of the Project depends on scientists and agency participants sharing their data with each other. The timely archival of data collected or processed by GCIP researchers, along with mechanisms to ensure open and minimal-cost distribution to all researchers, requires a clearly stated and implementable data policy. Such a GCIP data policy concerning access to GCIP data was given in the GCIP Science Plan (WMO 1992).

Data management will incur costs primarily for the collection of information on the data and the reproduction costs to compile data sets. The costs incurred for the initial compilation of information on the data will be borne by the Project. Costs for data sets that are compiled for general use by researchers involved in the Project will also be borne by the Project. Costs for data sets to individual specifications will, in general, be borne by the user making the request for the data. This topic is described further in Section 10 and was also described in Section 3 of Volume III of the GCIP Implementation Plan (IGPO 1994b).

9.3 System and Services Approach

To the extent possible GCIP relies upon existing or planned operational, or, at least, systematic observing programs operating over the Mississippi River basin, including space-based observations. The essential task is to assemble information about relevant data sets and implement a data management system to support the scientific program. The DMSS takes advantage of the ongoing data management activities of related projects and programs such as Atmospheric Radiation Measurement (ARM), Earth Observing System Data and Information System (EOSDIS), U.S. Weather Research Program (USWRP), and others. Data sets and data management infrastructure under development for these programs are being used by the DMSS to the fullest extent possible. Each of these programs has, or is developing, data management systems with GCIP-relevant data to access through the GCIP-DMSS.

9.4 DMSS Overall Design

The data management strategy of GCIP relies fundamentally on working with and through existing data centers. A variety of organizations, including the National Climatic Data Center (NCDC) of the National Oceanic and Atmospheric Administration (NOAA) and the National Water Information System (NWIS), of the USGS already have extensive capabilities for processing, validating, storing, cataloging, retrieving, and disseminating environmental data.

The DMSS in use during the first two to three years of the EOP is labeled the Prototype system and will not contain all the features that are technically feasible. The DMSS will incorporate improvements and new developments as these become operational at the existing centers to evolve to an Advanced system. It is envisioned that once the system is more fully operational, users will be able to sign onto a central computer and examine the GCIP master catalog to determine the data set(s) that best meet their requirements. If they desire additional information on a selected data set, the access software will route them to the data source module for the particular data type for more specific information. They will then be able to examine detailed data guides or discuss their data needs with someone knowledgeable about the GCIP data sets who can assist them in searching and ordering the data from the correct existing data center. The users can, if desired, go directly from the master catalog to the existing data center to place an order for data.

To develop the distributed data management system envisioned for GCIP in the most cost effective manner the DMSS Data Source Modules will strive to make the best use of current and planned capabilities of each pertinent data center. The DACOM recognized that the specific data service policies and procedures can vary among the existing data centers and the Project will need to adapt its "GCIP specific" portion of the DMSS, shown in Figure 9-1, to these variations.

The principal GCIP data centers form the backbone of the data management system. A principal data center is responsible for a significant volume of data pertinent to GCIP and has the capability to provide on-line access to data catalogs, inventories, and ordering systems. The center's on-line access system will be connected to and accessible through an electronic link to the DMSS. Since a center's designation as a principal data center is dependent upon its technical capabilities, under GCIP some supplementary centers will be changed to principal centers as GCIP evolves during the EOP.

9.5 Near-Term Improvements

The flexibility of the DMSS configuration shown in Figure 9-1 makes it possible for each of the modules to evolve at different rates which can be closely related to the specific data centers connected to the module. A summary of the projected improvements by each of the modules is given in the following paragraphs:

GCIP Central Information Source

Responsible Agency: GCIP Project Office hosted by NOAA Office of Global Programs Silver Spring, MD

Contact: Adrienne Calhoun

The GCIP Central Information Source (GCIS) is responsible for a variety of major functions as listed in Section 5, Volume III of the GCIP Implementation Plan. The DACOM will be asked to review these functions and make recommendations on how they can best be implemented in light of the experience gained from using the World Wide Web as a communications media for information about GCIP data.

The World Wide Web enables the GCIS to make use of this medium for providing information about all the significant items in GCIP in addition to providing the central contact for information about the DMSS. The GCIP Project Office is compiling information about GCIP to provide through the GCIP Home Page.

The GCIS will provide a mechanism for feedback from the users and incorporate these suggestions in its attempts to make this new medium a useful tool for the GCIP users.

In Situ Data Source Module

Responsible Agency: Joint Office for Science Support (JOSS) UCAR Boulder, CO

Contact: S. Williams

The In-situ Module is responsible for providing data management and information resources for surface, upper air, radar, and land surface characteristics data of interest to GCIP. The Module uses the UCAR/JOSS Data Management System (CODIAC) which has been the GCIP DMSS "on-line" demonstration" system. A number of activities are planned for the DMSS In-Situ Module during the next two years:

Model Output Data Source Module

Scientific Data Services; NCAR; Boulder, CO

Contact: R. Jenne

The Model Output Data Source Module is responsible for providing data management and information resources for GCIP-relevant model output data and products. The Module uses the NCAR Scientific Data Services as the infrastructure and expertise for GCIP support.

During the next three years this Module will concentrate on establishing a data archive for the output from three different regional models:

The data management plans for this large volume of model output are evolving as an ongoing effort to balance the investigator needs with the resources available as described in Section 2.4 and Appendix B.

Satellite Remote Sensing

Responsible Agency: Global Hydrology and Climate Center (DAAC); NASA/MSFC Huntsville, AL

Contact: A. Ritchie

The GCIP Satellite Remote Sensing Data Source Module is responsible for providing data management and information resources for GCIP-relevant satellite data and products. The satellite module participates in several coordinating functions within the GCIP project primarily through DACOM.

The WWW is the implementation choice of the DMSS and allows the satellite module to provide information and easily link to other existing information at the various data centers. The satellite module continues to compile information about the GCIP data requirements to coordinate readily available data sets as specified by the Principal Research Areas, the DACOM, and other GCIP-related inputs.

The evolution of the satellite home page begins with the initial prototype configuration. The prototype provided an overview, high-level data access to existing archives, CD-ROM information, and links with the other active modules. The prototype home page provides a mechanism to solicit inputs from the entire GCIP science community.