3. INDOEX Data Management Functional Description and Strategy

The general approach to data management support for INDOEX is summarized in a data flow diagram (see Fig. 1). It is important that the INDOEX data management strategy be responsive to the needs of the investigators, assuring that data are accurate and disseminated in a timely fashion. It is also important that the investigators know what is expected of them in this process. A time line of critical dates in the sequence of INDOEX data management tasks are included in Fig. 2. After a description of the Data Archive and Analysis Centers (Section 3.1), each step in the INDOEX data management process is discussed in more detail.







3.1 Data Archive and Analysis Centers

3.1.1 U.S. Data Archive Center

The U.S. Data Archive Center will be located at UCAR/JOSS in Boulder, CO, USA. All U.S. data sets collected for INDOEX will be available through the existing JOSS Data Management System (CODIAC). CODIAC offers scientists access to research and operational data. It provides the means to identify data sets of interest, facilities to view data and associated metadata, and the ability to automatically obtain data via internet file transfer or magnetic media. The user may browse data to preview selected data sets prior to retrieval. Data displays include time series plots for surface parameters, skew-T/log-P diagrams for upper air soundings, and GIF images for model analysis and satellite imagery. CODIAC users can directly retrieve data. They can download data via Internet directly to their workstation or personal computer or request delivery of data on magnetic media. Data may be selected by time or location and can be converted to one of several formats before delivery. CODIAC automatically includes associated documentation concerning the data itself, processing steps, and quality control procedures.

Only data for the INDOEX IFP will be available on CODIAC. Ship cruise and supporting data from the Pre-INDOEX data collection periods are available from the SIO/C4 Data Analysis Center. (See Section 3.1.4).

Contact Information:
Contact:                     (webmaster@eol.ucar.edu).
Mailing Address:         P.O. Box 3000, Boulder, CO, USA, 80307
Shipping Address:       3300 Mitchell Lane (Suite 175), Boulder, CO, USA, 80307
Telephone:                  (303)497-8987 [FAX (303)497-8158]
Internet Access:           http://data.eol.ucar.edu/codiac/

3.1.2 India Data Archive Center

The India Data Archive Center will be located at NPL in New Delhi, India. This center will be responsible for the archival and dissemination of all operational and research data sets collected by the Indian INDOEX Program. Both data from the IFP and previous INDOEX Indian ship cruises will be available through this Center. NPL may also function as a proposed Data Analysis Center in coordination with SIO/C4.

      The India Data Archive Center is in the process of setting up its computer structure and links to the WWW. It is envisioned that INDOEX data sets will be cataloged and selected through a WWW page. Limited browse products will be available. Links to data sets not physically in the Data Archive Center will be provided. The Data Archive Center can be contacted by:

Contact Information:
Contact:                     Ms. Sumana Bhattacharya [sumana@csnpl.ren.nic.in]
Mailing Address:        NPL, Dr. K.S. Krishnan Road, New Delhi - 110012, INDIA
Telephone:                 091-11-5745298 [FAX: 091-11-4690108]
Internet Access:         http://npl-cgc.ernet.in/

3.1.3 European Data Archive Center

The European Data Archive Center will be located at the LMD in Paris, France. This Center will be responsible for the archival and dissemination of research data sets collected by the European participants of the INDOEX Program. LMD has also been designated as the final archive location for the METEOSAT-5 satellite data for INDOEX PIs.

The European Data Archive Center is in the process of setting up its computer structure and links to the WWW. It is envisioned that INDOEX data sets will be cataloged and selected through a WWW page. Limited browse products will be available. Links to data sets not physically in the Center will be provided. The Data Archive Center can be contacted by:

Contact Information:
Contact:                      Dr. Michel DesBois [desbois@lmd.polytechnique.fr]
Mailing Address:         LMD/CNRS, Ecole Polytechnique, 91128 Palaiseau Cedex, Paris, FRANCE
Telephone:                  33 (1) 69 33 45 53 [FAX: 33 (1) 69 33 30 05]
Internet Access:          To be determined

3.1.4 SIO/C4 Data Analysis Center

The SIO/C4 Data Analysis Center is located at La Jolla, California, USA, who developed and maintains the C4 Integrated Data System (CIDS) as a data integration and analysis tool. CIDS was developed to promote multi-disciplinary research by providing a common interface to complex and heterogeneous data sets. One of the greatest strengths of CIDS is its utility as a research and analysis tool to collocate, integrate, and overlay various types of data sets. CIDS was designed to facilitate research by disseminating data and derived products among PIs and other INDOEX Analysis Centers in a single common data format, NetCDF ( http://www.unidata.ucar.edu/packages/netcdf/ ).

Contact Information:
Contact:                       CIDS (cids@fiji.ucsd.edu)
Mailing Address:          Center for Clouds, Chemistry, and Climate
                                    Scripps Institution of Oceanography, University of California, San Diego
                                    9500 Gilman Drive #0221,
                                    La Jolla, CA 92093-0221, USA.
Shipping Address:         8605 La Jolla Shores Drive, Nierenberg Hall, Room 219C
                                    La Jolla, CA 92037-0239, USA
Telephone:                    (619)534-7513 [FAX: (619)534-7452]
Internet Access:            http://www-c4.ucsd.edu/~cids/

3.1.5 APE-THESEO Data Archive and Analysis Center

The Airborne Platform for Earth (APE) Third European Stratospheric Experiment on Ozone (THESEO) Project will conduct aircraft operations from the Seychelle Islands (located in the western Indian Ocean) that will overlap for some period of INDOEX flight operations. Some flight coordination between the projects may take place. The APE-THESEO Project has its own data protocol. This policy will strive for the same openness and free access to data as described for INDOEX. Data format will typically follow NASA guidelines. Data from this field campaign will be archived with other THESEO Program information located at the Norwegian Institute for Air Research (NILU) in Norway.

Contact Information:
Contact:                       Dr. Geir Braathen (geir@nilu.no)
Mailing Address:          NILU
                                    P.O. Box 100, N-2007 Kjeller, NORWAY
Telephone:                    (47) 63 89 8180 [FAX: (47) 63 89 8650]
Internet Access:            http://www.nilu.no/first-e.html/

3.2 Investigator Requirements

The first step in organizing the data management support is to understand what data are anticipated from the various components of the program. JOSS has developed and distributed an initial questionnaire (November 1997) to survey the INDOEX participants and is in the process of documenting this information from the individual PIs. The questionnaire (and future requested sample data sets) will be used to obtain detailed information regarding the various data sets (e.g. data format, data set size, data frequency and resolution, real-time operational requirements, etc.). This will assist the INDOEX Data Archive Centers in handling and processing the data as well as developing any format converters necessary. Appendix A provides results from the questionnaire for the research data sets and Appendix B documents similar information for the operational data to be obtained for INDOEX. The INDOEX International Steering Committee has agreed that tasks associated with INDOEX data acquisition (e.g. in-field record keeping, backing up field data, data documentation [for catalog purposes], provision of data to data processing locations, and processing of raw data into geophysical parameters) is the responsibility of and will be performed by the participating PIs (see section 2.2). The PIs will be requested to adequately document data sets in accordance with the following documentation guidelines so that the data (and associated metadata) can be included in the INDOEX On-line Field Catalog (section 3.3.1) and in the Data Archive and Analysis Centers.

3.2.1 Data Format Conventions

The initial (or "raw") field data sets produced by the PI's instrumentation will be recorded in a variety of formats (WMO level I and IIA data). It is important that processed and calibrated data end up in a converted easily accessible format (using engineering units) to be disseminated to the INDOEX scientific community and eventually the larger scientific community. JOSS will work with the PIs to establish format standards for data submitted first to the On-line Field Catalog and then to the respective Data Archive Centers. It is important to set the format convention prior to data collection so that the Centers can plan on data conversion software and storage requirements. However, there may be certain situations where conversion to a final format must occur after the data are received at the Center(s) and prior to dissemination.

3.2.2 Data Submission Requirements

The INDOEX International Steering Committee and members of the INDOEX operations and data management team met in Ultrecht, Netherlands on 20-23 June 1998 to discuss aspects of the upcoming field season (IFP). Out of these discussions came a list of requirements for all INDOEX PIs to follow when collecting and providing data for the INDOEX Data Archive. In order to develop the archive plans (and any associated software), the PIs will be requested to send sample data sets to the respective Archive Data Centers. This allows for clarification of the formats to be received during the program. Abiding by these requirements will also ensure consistency, flexibility, and utility of the data for operations planning and review, as well as inter-comparisons and preliminary analysis activities. The following are specific requirements for the PIs to follow when submitting their data:

HEADER SPECIFICATIONS

Standard header records MUST be attached to the file, either as a label on a diskette or as text within the file itself (preferred). JOSS will have a utility script available in Male to automate the addition of these header records to the ASCII files. The header records must contain the following information:

FILE START TIME (UTC) =     YYYYMMDDHHMMSS.S (YYYY is numeric year, MM is numeric
                            month 1-12, DD is calendar date following UTC clock,
                            HHMMSS.S is time in UTC)
                            (e.g. year= 1999, month=01, day=15, hour=13,
                            minute=55,seconds=30.5)
FILE STOP TIME (UTC) =      YYYYMMDDHHMMSS.S (See above)
PI NAME =                   Text (PI name and affiliation)
PLATFORM =                  Text (e.g. C130, CITATION, BROWN, SK, KCO, DG, etc.)
INSTRUMENT =                Text (instrument name)
LOCATION =                  Text (Fixed site coordinates or "mobile")
DATA VERSION =              Text or integer (unique ID for latest version) JOSS will
                            add date/time stamp upon receipt
REMARKS =                   Text (PI remarks that aid in understanding data file
                            structure and contents. Items such as file type, how
                            missing and/or bad data are denoted)
##END OF HEADER##           Required last line of the header

NOTE - Other information that should be contained in the header:

Missing Value indicator =                     Text or integer (value used for data for missing information
                                                            (e.g. -99 or 999.99, etc)
Below Measurement Threshold =         Text or Integer [Value used to signify reading below
                                                            instrument detection threshold] (e.g. <0.00005)
Above Measurement Threshold =         Text or Integer (Value used to signify reading above
                                                            instrument saturation)
 

A sample data file is listed below, delineated by rows of dashes. The following data set is an example of an ideal data set in which the header precedes the reported data, and the data is organized in columns separated by spaces. Each column is identified by parameter and each parameter's units of measure are listed in the respective column. Also each row has a date/time of observation reported in Universal Time Coordinated (UTC). This data set organization is ideal for plotting and inter-comparison of data in the field (On-line Field Catalog). This data set format should be used whenever possible and could be easily produced automatically from a spread sheet computer program.

------------------------------------------------------------------------------------------------------------
DID = 100
FILE START TIME (UTC) = 0821133500
FILE STOP TIME (UTC) = 0821135500
PI NAME = Doe, John (NCAR)
PLATFORM = C130
INSTRUMENT = C-130 External Sampler Data
LOCATION = mobile
DATA VERSION = 1.0
REMARKS = National Center for Atmospheric Research, INDOEX
REMARKS = ppm values are mole fraction
REMARKS= nM/m3 at 25c and 101.3 kPa; DMS and NH4 in Parts per million (PPM)
REMARKS = Missing data = 99.9; Bad data = 88.8
REMARKS = Data point Date/Time provided in UTC
##END OF HEADER##
DATE/TIME     SAMPLE     NO2      CO       DMS     NH4
   UTC        NUMBER    nM/m3     PPM      PPM     PPM

0821133500.0  E1.160.1  1000.65  200.67  345.98  2342.980
0821133510.0  E1.160.2  1003.45  200.60  349.76  2353.345
etc.
------------------------------------------------------------------------------------------------------------
 

**NOTE** This type of header information cannot be contained within GIF and Postscript files. They will need to be submitted with attached files or separate documentation containing this information.

DATA SPECIFICATIONS

  1. For every mobile platform data set, the PIs are responsible for providing location information (i.e. latitude, longitude, altitude) and time of collection for each data point. This may be done by: (a) providing latitude, longitude, time, and altitude (for aircraft data) with each point in the submitted data file; or (b) providing time of collection of each data point in the submitted file, with an associated file containing time and location either from the platform navigation database or GPS file.
  2. If the data in the file are comma delimited, decimal places must be periods, not commas.
  3. Files should be ASCII, NetCDF, GIF, or Postscript whenever possible. Preferred format for ASCII data files is tab, space, or comma delimited columns, with a UTC date/time stamp at the beginning of each line.
  4. All data files must contain variable names and units of measurements as column headings (if applicable).
  5. If, for some reason, the PI cannot provide the date/time in the format shown above, it is important that the time be given in UTC. If local time is also supplied, a conversion to UTC must be provided. In addition to UTC and/or local time, time may also be given in seconds since the start of instrument collection (shown in the file header)
  6. The internal format structure of the file should remain constant after the first submission of data to ensure continuity and permit plotting and graphing.
  7. Only COMPLETE replacement files will be accepted.

3.3 Data Collection Schedule

There several periods that include the data collection tasks for INDOEX. Operational data sets will be collected during an Enhanced Observation Phase (EOP) for 1 to 3 years depending on the platform beginning in 1997. During this period, data will be collected to support a series of Pre-INDOEX cruises on the Indian R/V Sagar Kanya. C4/SIO and NPL will have primary responsibility for coordinating receipt of research-quality operational data sets during this entire period. The three month IFP data collection will begin in the INDOEX region on 1 January 1999 and conclude approximately 31 March 1999. During this period, a 6-8 week period of research aircraft flights and ship cruises will be conducted. This period will also include data collected during ferry flights. JOSS, NPL, and LMD will have primary responsibility for coordinating receipt and archival of research-quality operational data sets during this entire period. Other data and metadata for complete documentation of field season activities (i.e. status reports and mission summaries) will be collected from the PIs by JOSS as operations dictate. All this information along with selected research and operational data sets will be entered into the On-line Field Catalog on a near real-time basis.

3.3.1 On-line Field Catalog

The JOSS will develop and maintain an On-line Field Catalog that will be functional during the INDOEX IFP. The catalog will be implemented using a WWW browser interface and will be operational at the Male Operations Center [http://nimbus.indoex.edu.mv/catalog ] with a "mirror" site in Boulder, CO [http://catalog.eol.ucar.edu/indoex ]. Data collection information about both operational and research data sets (including metadata and overview documentation) will be entered into the system in near real-time beginning 1 January 1999. The catalog will permit data entry (data collection details, field summary notes, certain operational data etc.), data browsing (listings, plots) and limited catalog information distribution. Daily summaries (see section 4.4) will be prepared and contain information regarding operations (aircraft flight times, major instrument systems sampling times, POES overpasses, etc.). These summaries will be entered into the On-line Field Catalog as they become available. It is important and desirable for the PIs to contribute graphics (e.g. plots in GIF or Postscript format) and/or data for retention on the catalog whenever possible. Updates of the status of instrumentation and data collection (on a daily basis or more often depending on the platforms) will be available. Input requirements for the On-line Field Catalog used during the IFP for status updating are discussed below (Section 3.3.1.1). The full On-line Field Catalog will ONLY be accessible from the Local Area Network at the INDOEX Field Operations Center (Male) for data security reasons. Scientific community access to status information, mission summaries, and selected data sets (primarily operational data sets) will be available on the fully open mirrored On-line Field Catalog system in Boulder. Following the IFP, the entire catalog will be available through the JOSS address (with appropriate password protected data sets when applicable). NOTE - If any specialized plots of data are required in the field, such plots must be pre-defined and communicated to JOSS. It is highly desirable to have this input prior to 15 November 1998.

3.3.1.1. Submitting Data/Products to the On-line Field Catalog

A. FTP

Data files and products can be transferred within the INDOEX network to the On-line Field Catalog Server nimbus.indoex.edu.mv (IP address: 202.1.195.195)]. Data files and products generated outside the INDOEX network at the Male Operations Center should be sent to the On-line Field Catalog server in Boulder, CO [ catalog.eol.ucar.edu (IP address: 128.117.150.208)]. Users should FTP with user name anonymous, password (your e-mail address), then cd (change directory) to /pub/incoming/indoex. Place the file in this directory using the FTP put command (or mput for multiple files). Users may not be able to list files in this subdirectory for computer security reasons. File names MUST meet the following format criteria:

General file structure is: product_source.YYMMDDHHmm.product_name.ext

Where:

Product_source = Name of source or platform (e.g. C130, FSU, KCO, SK, BROWN, etc.)

YYMMDDHHmm = 2 digit Year, 2 digit month, 2 digit date, and 4 digit Time (hour and minute) in UTC (e.g. 9901301200 ).

Product_name = The name of the instrument or data/product (e.g. satellite_composite, KCO_radiation, SK_GLASS, etc.)

ext = file extension, (i.e. .gif = GIF image, .jpg and .jpeg = JPEG image, .ps = Postscript format (image files only, no text or word processed documents), .txt = ASCII text, .html = html formatted text)

(NOTE - other image format i.e. .tiff, .miff, .bmp, .pict, .pcx, .xpm, .xbm will also be accepted and converted to .gif by the process-files script).

B. E-mail

Submissions via e-mail are possible. Status reports, html and text data products to be put on the catalog may be e-mailed to user catalog@nimbus.indoex.edu.mv. The catalog@nimbus.indoex.edu.mv address is valid only for the IFP field phase and following the closure of the Operations Center, all data will be available through the JOSS catalog in Boulder (http://catalog.eol.ucar.edu/indoex ). It will be necessary to use the proper SUBJECT line in the e-mail to direct the message to the proper location in the catalog. The following subject lines should be used (case does not matter):

      To: catalog@nimbus.indoex.edu.mv
                      Subject: report YYMMDD.project-type

        -- where project-type = an appropriate name of your choosing; the name will
           be automatically added after submission);

Or for product submissions:

                        To: catalog@nimbus.indoex.edu.mv
                        Subject: product YYMMDDHHMM.product-name

         -- where product_name is the name of the product such as
            surface-forecast, radar_image, wind_plot, etc.
            (no spaces in the name please).

C. Floppy disk

Products or information provided on floppy disks should be named for recognition by the catalog. The user/provider is responsible for naming the file to meet the criteria specified in section B above. The floppy should be given to a INDOEX Project Office staff member to load into the catalog.

D. Hard copy via scanner

Maps, charts and other paper products can be scanned in using the scanner system in the Operations Center. A member of the INDOEX Project Office staff can assist you initially with this if needed.

E. Form submission for entering Status reports

WWW/html "fill-in" forms are also available for use to enter reports by following the "OTHER Information" link on the main catalog page. There are two general types. The first one is used to prepare a daily summary of INDOEX activities. There is a second instrumentation status form for each of the major measuring systems. Each status and daily summary form will have a password required in order to submit information. Passwords will be provided to individuals as required.

3.4 Data Processing following the IFP

It is important that all INDOEX PIs concentrate on post field season data processing activities to assure timely availability of data sets to the INDOEX Data Archive and Analysis Centers. The PIs will have complete responsibility for the processing and delivery of their data to the respective Data Archive and Analysis Centers. Following the wishes of the INDOEX International Steering Committee, the PIs will release for public access their field related databases as soon as possible and within 12 months of the end of the INDOEX IFP, or approximately by 1 April 2000. Any departures from this schedule must be strongly justified by the PI, and will be considered on a case-by-case basis by the INDOEX Project Office or the INDOEX International Steering Committee. As data sets are received by the respective Data Archive Centers, they will be promptly made available to the scientific community. All operational data (see Appendix B) will be staged and freely accessible by the entire scientific community as soon as possible after the field season (6 months or earlier following the IFP).

The impact of timely receipt of the data on further steps in the data processing scheme is summarized with the time line in Fig. 2. The "preliminary" data will be in "native" resolution and format, that is, in the format and resolution the PI produces in their initial data processing. It is hoped that most preliminary research and certainly all operational data sets will become available within 6 months of the end of the IFP. If the PI requests, "preliminary" data sets submitted to the Data Archive Centers could be password protected prior to the 12-month release date. Between the end of the IFP and the time the PI submits data to the respective INDOEX Data Archive Centers, each PI will be individually responsible for the distribution and support of their data sets.

3.5 Data Integration

The INDOEX investigators have expressed a clear need for an integrated analysis of a subset of the data that combines measurements from different instrumentation or plaforms into a single consistent data base. The primary tool to perform data integration will be accomplished at the Data Analysis Centers, using the CIDS (see Section 3.1.4). C4 will evaluate candidate data sets for potential use in such an integrated database, based on consistent spatial and temporal measurements. However, the PIs must provide to C4 (and the Data Archive Centers) all final spatial and temporal information in the data sets as part of the post field data processing (see Section 3.2.2). Then CIDS could, for example, merge location (e.g. aircraft position) with other data as appropriate (e.g. satellite pixel radiance). It is clear that some data lend themselves to this type of integration while others do not. C4 will be consulting with the PIs during the field and post-processing phases of the experiment to determine which data sets should be considered for use in a potential integrated data set.

3.6 Data Archival and Long-term Access

The INDOEX data sets will be archived and distributed through the various INDOEX Data Archive Centers. These Data Archive Centers will be linked and contain all "shared" data sets, that is, all research and operational data that will eventually be accessible by the general scientific community. JOSS has the responsibility for getting all U.S. data sets, NPL for the Indian data sets, and LMD for the European data sets into a long term archive. As directed by the INDOEX International Steering Committee, research data sets will be available, on a restricted basis, as PIs provide processed data to the respective Data Archive and Analysis Centers (Section 3.1). As shown in Fig. 2, data will accessible to all INDOEX PIs within 6 to 12 months of the completion of the IFP. Then, following the schedule described in Chapter 2.0, the data sets will be freely available to the general scientific community no later than 1 April 2000.