README for Accessing 3B42RT from the Experimental TRMM Real-Time Multi-Satellite Precipitation Analysis (MPA-RT) Data Set George J. Huffman 20 June 2003 Introduction The system to produce the "TRMM and Other Data" estimates in real time was developed to apply new concepts in merging quasi-global precipitation estimates and to take advantage of the increasing availability of input data sets in near real time. The product is produced quasi-operationally on a best-effort basis at TSDIS, with on-going scientific development by a research team in the GSFC Laboratory for Atmospheres. As such, users are encouraged to report their experiences with the data, and they should expect episodic upgrades or outages as the system develops. Product Definition 3B42RT (Merger of HQ and VAR) A merger of 3B40RT (HQ) and 3B41RT (VAR). The current scheme is simple replacement - for each gridbox the HQ value is used if available, and otherwise the VAR value is used. File Contents and Format Table 1 File layout for 3B40RT, 3B41RT, 3B42RT. ------------------------- | 3B42RT Block |Byte Count| Field ------+----------+------- 1 | 2880 | header 2 | 1382400& | precip 3 | 1382400& | error 4 | 691200@ | source ------------------------- & INTEGER*2, 60 deg N-S @ INTEGER*1, 60 deg N-S Header Each file starts with a header that is one 2-byte-integer row in length, or 2880 bytes. The header is ASCII in a "PARAMETER=VALUE" format that makes the file self-documenting (e.g., "algorithm_id=3B40RT"). As such, the header can be read with standard text editors, dumped as text with simple application programs, or parsed for input into applications. Successive "PARAMETER=VALUE" sets are separated by spaces, and no spaces or "=" are permitted in PARAMETER or VALUE. The current PARAMETER entries and definitions are: PARAMETER Definition algorithm_ID TRMM algorithm identifier (e.g., "3B40RT") algorithm_version Version of the science algorithm granule_ID TSDIS granule identifier (e.g., "3B40RT.2001121809.bin") header_byte_length Number of bytes in the header file_byte_length Number of bytes in the file, expressed as a formula describing the file structure nominal_YYYYMMDD Nominal UTC year, month, and day of the month nominal_HHMMSS Nominal UTC hour, minute, and second begin_YYYYMMDD Start UTC year, month, and day of the month begin_HHMMSS Start UTC hour, minute, and second end_YYYYMMDD End UTC year, month, and day of the month end_HHMMSS End UTC hour, minute, and second creation_YYYYMMDD Date the file was created as year, month, and day of the month west_boundary Longitude of the western edge of the data domain east_boundary Longitude of the eastern edge of the data domain north_boundary Latitude of the northern edge of the data domain south_boundary Latitude of the southern edge of the data domain origin Geographical direction of the first grid box from the grid center number_of_latitude_bins Number of grid boxes in the meridional direction number_of_longitude_bins Number of grid boxes in the zonal direction grid Size of one grid box first_box_center Geolocation of the first grid box center second_box_center Geolocation of the second grid box center last_box_center Geolocation of the last grid box center number_of_variables Number of data fields variable_name List of the data field names, separated by commas variable_units List of data field units, separated by commas, in the same order as the variable_name list variable_scale List of data field scaling factors, separated by commas, in the same order as the variable_name list variable_type List of data field word types, separated by commas, in the same order as the variable_name list byte_order Order of bytes in a data word ("big_endian" or "little_endian") flag_value List of special values, separated by commas flag_name List of special value names, separated by commas, in the same order as the flag_value list contact_name Name of the person to contact with questions contact_address Postal address of the contact_name contact_telephone Telephone number of the contact_name contact_facsimile Facsimile number of the contact_name contact_email Email address of the contact_name Thereafter the data fields follow. All the fields are on a 0.25-deg lat./long. grid that increments most rapidly to the east (from the Prime Meridian) and then to the south (from the northern edge). Grid box edges are on multiples of 0.25 deg. The data fields are written as binary data in big-endian byte order. 3B42RT Following the header, 3 data fields appear: precipitation (2-byte integer) precipitation_error (2-byte integer) source (1-byte integer; -1, 0, 100 stand for none, HQ, VAR) All fields are 1440x480 gridboxes (0-360 deg. E, 60 deg. N-S). The first grid box center is at (0.125 deg. E,59.875 deg. N). Files are produced every 3 hours on synoptic observation hours (00 UTC, 03 UTC, ..., 21 UTC) using that hour's 3B40RT and 3B41RT data sets. Valid estimates are only provided in the band 50 deg. N-S. Note that we use the term "gridbox" to denote the values on Level 3 data (i.e., gridded data), while we use the term "pixel" to denote individual values of Level 2 data (i.e., instrument footprints). Thus, there can be many pixels contributing to a gridbox. Both precipitation and random error are scaled by 100 before conversion to 2-byte integer. Thus, units are 0.01 mm/h. To recover the original floating-point values in mm/h, divide by 100. Missings are given the 2-byte-integer missing value, -31999. The remaining fields are in numbers of pixels, except the source variable, which is dimensionless. Currently the random error fields are all set to the 2-byte-integer missing value, -31999. This placeholder will be replaced with actual estimates as development proceeds. The variable ambiguous_pixels is the count of pixels for which the algorithm cannot determine whether the scene has valid or invalid data. It is a subset of the total_pixel and many, but not all, are included in raining_pixels. In general, a "high" fraction of ambiguous_pixels indicates that the grid box value is invalid. The originating machine on which the data files are written is a Silicon Graphics, Inc. Unix workstation, which uses the "big-endian" IEEE 754-1985 representation of 4-byte floating-point unformatted binary numbers. Some CPUs, including PCs and DEC machines, might require a change of representation (i.e., byte swapping) before using the data. In some cases, the gunzip routine, used to uncompress the data, will change representations automatically. Special Values All of the scaled 2-byte-integer precipitation and random error fields have one value with special meaning. Any grid box with insufficient valid data to make an estimate is assigned the 2-byte-integer value -31999. As well, the scaled 2-byte-integer precipitation and random error fields are clipped to [-31998,31998] to prevent duplication of the missing value (at the negative end) or overflows (at both the positive and negative ends). Note that any examples of clipping should be immediately reported to the dataset developers. The 3B42RT precipitation values outside the 50 deg. N-S latitude band are considered experimental and are encoded as (-p - 0.01), where "P" is the original precipitation value, before conversion to scaled 2-byte-integer. Thus, users can recover the estimated value of such gridboxes if desired, but the usual scheme of requiring precipitation to be non-negative will filter out these suspect values. The 3B42RT "source of estimate" field only has three discrete values, -1, 0, 100, which correspond to "no estimate", "HQ", and "VAR". Note that any negative values in the various "number of" fields is a processing error that should be immediately reported to the dataset developers. Dataset Validation These datasets represent a new initiative and should be considered experimental. Formal validation studies are planned, but are not yet available. The infrared results are designed to emulate the microwave results as closely as possible, so known deficiencies in the microwave will likely be reflected in the infrared as well. In addition, it is well-known that infrared algorithms of the kind used here have large random errors at the fine time and space scales provided. However, we expect the infrared estimates to match the histogram of microwave estimates, so that user-specified averaging should yield approximately unbiased results. We encourage early users to report successes and problems in applying these datasets to their particular applications. Dataset Status Beta testing began in early December 2001. An official (experimental) version was instituted in late January 2002. Processing changes occurred on 6 February and 12 March 2002. The ambiguous screening was upgraded for the HQ (3B40RT) as of 09Z 28 February 2003 and for the VAR (3B41RT) as of 00Z 2 March 2003. Users should anticipate a series of versions as the algorithm is developed further. We definitely plan to transition to the new TRMM versions of input (which governs calibration of the SSM/I and IR) when they become available in late 2003. As well, an improvement of the real-time system will be instituted in the official Version 6 TRMM operational product 3B42. Currently, access is by anonymous ftp to aeolus.nascom.nasa.gov. Under subdirectory pub/merged there are three directories: CombinedMicro (3B40RT) CalibratedIR (3B41RT) MergeIRMicro (3B42RT) Example Programs The data fields are all written with C-language code as blocks of bytes, so there are no extraneous bytes in the files. Because the first two fields are 2-byte integers and the rest are 1-byte integers in each file (to save space), users must exercise care in using FORTRAN direct access to read the data. Both example programs read all fields in FORTRAN with a single OPEN. Alternatively, the files can be opened with different logical record sizes depending on whether one is reading 2-byte-integer or 1-byte-integer fields. Note well that the units of the logical record size is not part of the FORTRAN 77 standard. On SGI machines it is in 4-byte words, but some other systems expect it in bytes. Also, to repeat an earlier comment, the originating machine on which the data files are written is a Silicon Graphics, Inc. Unix workstation. It uses the "big-endian" IEEE 754-1985 representation of 4-byte floating-point unformatted binary numbers, and some CPUs, such as PCs and DEC machines, might require a change of representation (i.e., byte swapping) before using the data. The FTP site ftp://agnes.gsfc.nasa.gov/pub/huffman/rt_examples/docs provides several example programs: read3B4XRT.c C example read_header.f FORTRAN header-read example read_rt_file.f FORTRAN single-read example read_rt_file.pro IDL example read_rt_lines.f FORTRAN line-by-line example Example Images and Movies Users may obtain example GIF images and QuickTime movies via FTP at ftp://agnes.gsfc.nasa.gov/pub/huffman/rt_examples/images_and_movies. See the index page (3B4XRT_index) for details. Additional Documentation Users should refer to the detailed documentation (3B4XRT_doc) and programming examples at the FTP site ftp://agnes.gsfc.nasa.gov/pub/huffman/rt_examples/docs for additional details.