The general approach to data management support for
INDOEX is summarized in a data flow diagram (see Fig.
1). It is important that the INDOEX data management strategy be responsive
to the needs of the investigators, assuring that data are accurate and
disseminated in a timely fashion. It is also important that the investigators
know what is expected of them in this process. A time line of critical
dates in the sequence of INDOEX data management tasks are included in Fig.
2. After a description of the Data Archive and Analysis Centers (Section
3.1), each step in the INDOEX data management process is discussed in more
detail.
The U.S. Data Archive Center will be located at UCAR/JOSS in Boulder, CO, USA. All U.S. data sets collected for INDOEX will be available through the existing JOSS Data Management System (CODIAC). CODIAC offers scientists access to research and operational data. It provides the means to identify data sets of interest, facilities to view data and associated metadata, and the ability to automatically obtain data via internet file transfer or magnetic media. The user may browse data to preview selected data sets prior to retrieval. Data displays include time series plots for surface parameters, skew-T/log-P diagrams for upper air soundings, and GIF images for model analysis and satellite imagery. CODIAC users can directly retrieve data. They can download data via Internet directly to their workstation or personal computer or request delivery of data on magnetic media. Data may be selected by time or location and can be converted to one of several formats before delivery. CODIAC automatically includes associated documentation concerning the data itself, processing steps, and quality control procedures.
Only data for the INDOEX IFP will be available on CODIAC. Ship cruise and supporting data from the Pre-INDOEX data collection periods are available from the SIO/C4 Data Analysis Center. (See Section 3.1.4).
Contact Information:The India Data Archive Center will be located at NPL in New Delhi, India. This center will be responsible for the archival and dissemination of all operational and research data sets collected by the Indian INDOEX Program. Both data from the IFP and previous INDOEX Indian ship cruises will be available through this Center. NPL may also function as a proposed Data Analysis Center in coordination with SIO/C4.
The India Data Archive Center is in the process of setting up its computer structure and links to the WWW. It is envisioned that INDOEX data sets will be cataloged and selected through a WWW page. Limited browse products will be available. Links to data sets not physically in the Data Archive Center will be provided. The Data Archive Center can be contacted by:
Contact Information:The European Data Archive Center will be located at the LMD in Paris, France. This Center will be responsible for the archival and dissemination of research data sets collected by the European participants of the INDOEX Program. LMD has also been designated as the final archive location for the METEOSAT-5 satellite data for INDOEX PIs.
The European Data Archive Center is in the process of setting up its computer structure and links to the WWW. It is envisioned that INDOEX data sets will be cataloged and selected through a WWW page. Limited browse products will be available. Links to data sets not physically in the Center will be provided. The Data Archive Center can be contacted by:
Contact Information:The SIO/C4 Data Analysis Center is located at La Jolla, California, USA, who developed and maintains the C4 Integrated Data System (CIDS) as a data integration and analysis tool. CIDS was developed to promote multi-disciplinary research by providing a common interface to complex and heterogeneous data sets. One of the greatest strengths of CIDS is its utility as a research and analysis tool to collocate, integrate, and overlay various types of data sets. CIDS was designed to facilitate research by disseminating data and derived products among PIs and other INDOEX Analysis Centers in a single common data format, NetCDF ( http://www.unidata.ucar.edu/packages/netcdf/ ).
Contact Information:The Airborne Platform for Earth (APE) Third European Stratospheric Experiment on Ozone (THESEO) Project will conduct aircraft operations from the Seychelle Islands (located in the western Indian Ocean) that will overlap for some period of INDOEX flight operations. Some flight coordination between the projects may take place. The APE-THESEO Project has its own data protocol. This policy will strive for the same openness and free access to data as described for INDOEX. Data format will typically follow NASA guidelines. Data from this field campaign will be archived with other THESEO Program information located at the Norwegian Institute for Air Research (NILU) in Norway.
Contact Information:The first step in organizing the data management support is to understand what data are anticipated from the various components of the program. JOSS has developed and distributed an initial questionnaire (November 1997) to survey the INDOEX participants and is in the process of documenting this information from the individual PIs. The questionnaire (and future requested sample data sets) will be used to obtain detailed information regarding the various data sets (e.g. data format, data set size, data frequency and resolution, real-time operational requirements, etc.). This will assist the INDOEX Data Archive Centers in handling and processing the data as well as developing any format converters necessary. Appendix A provides results from the questionnaire for the research data sets and Appendix B documents similar information for the operational data to be obtained for INDOEX. The INDOEX International Steering Committee has agreed that tasks associated with INDOEX data acquisition (e.g. in-field record keeping, backing up field data, data documentation [for catalog purposes], provision of data to data processing locations, and processing of raw data into geophysical parameters) is the responsibility of and will be performed by the participating PIs (see section 2.2). The PIs will be requested to adequately document data sets in accordance with the following documentation guidelines so that the data (and associated metadata) can be included in the INDOEX On-line Field Catalog (section 3.3.1) and in the Data Archive and Analysis Centers.
The initial (or "raw") field data sets produced by the PI's instrumentation will be recorded in a variety of formats (WMO level I and IIA data). It is important that processed and calibrated data end up in a converted easily accessible format (using engineering units) to be disseminated to the INDOEX scientific community and eventually the larger scientific community. JOSS will work with the PIs to establish format standards for data submitted first to the On-line Field Catalog and then to the respective Data Archive Centers. It is important to set the format convention prior to data collection so that the Centers can plan on data conversion software and storage requirements. However, there may be certain situations where conversion to a final format must occur after the data are received at the Center(s) and prior to dissemination.
The INDOEX International Steering Committee and members of the INDOEX operations and data management team met in Ultrecht, Netherlands on 20-23 June 1998 to discuss aspects of the upcoming field season (IFP). Out of these discussions came a list of requirements for all INDOEX PIs to follow when collecting and providing data for the INDOEX Data Archive. In order to develop the archive plans (and any associated software), the PIs will be requested to send sample data sets to the respective Archive Data Centers. This allows for clarification of the formats to be received during the program. Abiding by these requirements will also ensure consistency, flexibility, and utility of the data for operations planning and review, as well as inter-comparisons and preliminary analysis activities. The following are specific requirements for the PIs to follow when submitting their data:
HEADER SPECIFICATIONSA sample data file is listed below, delineated by rows of dashes. The
following data set is an example of an ideal data set in which the header
precedes the reported data, and the data is organized in columns separated
by spaces. Each column is identified by parameter and each parameter's
units of measure are listed in the respective column. Also each row has
a date/time of observation reported in Universal Time Coordinated (UTC).
This data set organization is ideal for plotting and inter-comparison of
data in the field (On-line Field Catalog). This data set format should
be used whenever possible and could be easily produced automatically from
a spread sheet computer program.
------------------------------------------------------------------------------------------------------------
DID = 100
FILE START TIME (UTC) = 0821133500
FILE STOP TIME (UTC) = 0821135500
PI NAME = Doe, John (NCAR)
PLATFORM = C130
INSTRUMENT = C-130 External Sampler
Data
LOCATION = mobile
DATA VERSION = 1.0
REMARKS = National Center for Atmospheric
Research, INDOEX
REMARKS = ppm values are mole fraction
REMARKS= nM/m3 at 25c and 101.3
kPa; DMS and NH4 in Parts per million (PPM)
REMARKS = Missing data = 99.9; Bad
data = 88.8
REMARKS = Data point Date/Time provided
in UTC
##END OF HEADER##
DATE/TIME
SAMPLE NO2 CO
DMS NH4
UTC
NUMBER nM/m3 PPM
PPM PPM
0821133500.0 E1.160.1
1000.65 200.67 345.98 2342.980
0821133510.0 E1.160.2
1003.45 200.60 349.76 2353.345
etc.
------------------------------------------------------------------------------------------------------------
**NOTE** This type of header information cannot be contained within
GIF and Postscript files. They will need to be submitted with attached
files or separate documentation containing this information.
DATA SPECIFICATIONS
There several periods that include the data collection tasks for INDOEX. Operational data sets will be collected during an Enhanced Observation Phase (EOP) for 1 to 3 years depending on the platform beginning in 1997. During this period, data will be collected to support a series of Pre-INDOEX cruises on the Indian R/V Sagar Kanya. C4/SIO and NPL will have primary responsibility for coordinating receipt of research-quality operational data sets during this entire period. The three month IFP data collection will begin in the INDOEX region on 1 January 1999 and conclude approximately 31 March 1999. During this period, a 6-8 week period of research aircraft flights and ship cruises will be conducted. This period will also include data collected during ferry flights. JOSS, NPL, and LMD will have primary responsibility for coordinating receipt and archival of research-quality operational data sets during this entire period. Other data and metadata for complete documentation of field season activities (i.e. status reports and mission summaries) will be collected from the PIs by JOSS as operations dictate. All this information along with selected research and operational data sets will be entered into the On-line Field Catalog on a near real-time basis.
The JOSS will develop and maintain an On-line Field Catalog that will be functional during the INDOEX IFP. The catalog will be implemented using a WWW browser interface and will be operational at the Male Operations Center [http://nimbus.indoex.edu.mv/catalog ] with a "mirror" site in Boulder, CO [http://catalog.eol.ucar.edu/indoex ]. Data collection information about both operational and research data sets (including metadata and overview documentation) will be entered into the system in near real-time beginning 1 January 1999. The catalog will permit data entry (data collection details, field summary notes, certain operational data etc.), data browsing (listings, plots) and limited catalog information distribution. Daily summaries (see section 4.4) will be prepared and contain information regarding operations (aircraft flight times, major instrument systems sampling times, POES overpasses, etc.). These summaries will be entered into the On-line Field Catalog as they become available. It is important and desirable for the PIs to contribute graphics (e.g. plots in GIF or Postscript format) and/or data for retention on the catalog whenever possible. Updates of the status of instrumentation and data collection (on a daily basis or more often depending on the platforms) will be available. Input requirements for the On-line Field Catalog used during the IFP for status updating are discussed below (Section 3.3.1.1). The full On-line Field Catalog will ONLY be accessible from the Local Area Network at the INDOEX Field Operations Center (Male) for data security reasons. Scientific community access to status information, mission summaries, and selected data sets (primarily operational data sets) will be available on the fully open mirrored On-line Field Catalog system in Boulder. Following the IFP, the entire catalog will be available through the JOSS address (with appropriate password protected data sets when applicable). NOTE - If any specialized plots of data are required in the field, such plots must be pre-defined and communicated to JOSS. It is highly desirable to have this input prior to 15 November 1998.
Data files and products can be transferred within the INDOEX network to the On-line Field Catalog Server nimbus.indoex.edu.mv (IP address: 202.1.195.195)]. Data files and products generated outside the INDOEX network at the Male Operations Center should be sent to the On-line Field Catalog server in Boulder, CO [ catalog.eol.ucar.edu (IP address: 128.117.150.208)]. Users should FTP with user name anonymous, password (your e-mail address), then cd (change directory) to /pub/incoming/indoex. Place the file in this directory using the FTP put command (or mput for multiple files). Users may not be able to list files in this subdirectory for computer security reasons. File names MUST meet the following format criteria:
General file structure is: product_source.YYMMDDHHmm.product_name.extSubmissions via e-mail are possible. Status reports, html and text data products to be put on the catalog may be e-mailed to user catalog@nimbus.indoex.edu.mv. The catalog@nimbus.indoex.edu.mv address is valid only for the IFP field phase and following the closure of the Operations Center, all data will be available through the JOSS catalog in Boulder (http://catalog.eol.ucar.edu/indoex ). It will be necessary to use the proper SUBJECT line in the e-mail to direct the message to the proper location in the catalog. The following subject lines should be used (case does not matter):
To: catalog@nimbus.indoex.edu.mv
Subject: report YYMMDD.project-type
-- where project-type = an appropriate name of your choosing; the
name will
be automatically added after submission);
Or for product submissions:
To: catalog@nimbus.indoex.edu.mv
Subject: product YYMMDDHHMM.product-name
-- where product_name is the name of the product such as
surface-forecast, radar_image, wind_plot, etc.
(no spaces in the name please).
C. Floppy disk
Products or information provided on floppy disks should be named for recognition by the catalog. The user/provider is responsible for naming the file to meet the criteria specified in section B above. The floppy should be given to a INDOEX Project Office staff member to load into the catalog.
D. Hard copy via scannerMaps, charts and other paper products can be scanned in using the scanner system in the Operations Center. A member of the INDOEX Project Office staff can assist you initially with this if needed.
E. Form submission for entering Status reportsWWW/html "fill-in" forms are also available for use to enter reports by following the "OTHER Information" link on the main catalog page. There are two general types. The first one is used to prepare a daily summary of INDOEX activities. There is a second instrumentation status form for each of the major measuring systems. Each status and daily summary form will have a password required in order to submit information. Passwords will be provided to individuals as required.
It is important that all INDOEX PIs concentrate on post field season data processing activities to assure timely availability of data sets to the INDOEX Data Archive and Analysis Centers. The PIs will have complete responsibility for the processing and delivery of their data to the respective Data Archive and Analysis Centers. Following the wishes of the INDOEX International Steering Committee, the PIs will release for public access their field related databases as soon as possible and within 12 months of the end of the INDOEX IFP, or approximately by 1 April 2000. Any departures from this schedule must be strongly justified by the PI, and will be considered on a case-by-case basis by the INDOEX Project Office or the INDOEX International Steering Committee. As data sets are received by the respective Data Archive Centers, they will be promptly made available to the scientific community. All operational data (see Appendix B) will be staged and freely accessible by the entire scientific community as soon as possible after the field season (6 months or earlier following the IFP).
The impact of timely receipt of the data on further steps in the data processing scheme is summarized with the time line in Fig. 2. The "preliminary" data will be in "native" resolution and format, that is, in the format and resolution the PI produces in their initial data processing. It is hoped that most preliminary research and certainly all operational data sets will become available within 6 months of the end of the IFP. If the PI requests, "preliminary" data sets submitted to the Data Archive Centers could be password protected prior to the 12-month release date. Between the end of the IFP and the time the PI submits data to the respective INDOEX Data Archive Centers, each PI will be individually responsible for the distribution and support of their data sets.
The INDOEX investigators have expressed a clear need for an integrated analysis of a subset of the data that combines measurements from different instrumentation or plaforms into a single consistent data base. The primary tool to perform data integration will be accomplished at the Data Analysis Centers, using the CIDS (see Section 3.1.4). C4 will evaluate candidate data sets for potential use in such an integrated database, based on consistent spatial and temporal measurements. However, the PIs must provide to C4 (and the Data Archive Centers) all final spatial and temporal information in the data sets as part of the post field data processing (see Section 3.2.2). Then CIDS could, for example, merge location (e.g. aircraft position) with other data as appropriate (e.g. satellite pixel radiance). It is clear that some data lend themselves to this type of integration while others do not. C4 will be consulting with the PIs during the field and post-processing phases of the experiment to determine which data sets should be considered for use in a potential integrated data set.
The INDOEX data sets will be archived and distributed through the various INDOEX Data Archive Centers. These Data Archive Centers will be linked and contain all "shared" data sets, that is, all research and operational data that will eventually be accessible by the general scientific community. JOSS has the responsibility for getting all U.S. data sets, NPL for the Indian data sets, and LMD for the European data sets into a long term archive. As directed by the INDOEX International Steering Committee, research data sets will be available, on a restricted basis, as PIs provide processed data to the respective Data Archive and Analysis Centers (Section 3.1). As shown in Fig. 2, data will accessible to all INDOEX PIs within 6 to 12 months of the completion of the IFP. Then, following the schedule described in Chapter 2.0, the data sets will be freely available to the general scientific community no later than 1 April 2000.