This section gives some additional background, ideas, tools, and routine checks for verifying and monitoring the operation of the ISS workstation.
The following list mentions a few programs which are useful for diagnostic and maintenance work. Most of the programs belong to the Zebra system, so full descriptions may be found in the Zebra documentation, such as the Advanced User's Manual. Most of the commands will provide a detailed usage summary if invoked only with the -help option.
dsdump [platform...]
This will produce a summary of all data files in the data store, for either the platforms named on the command line or for all platforms. The summary includes details on the data times and number of observations the file contains.
dsdelete {platform} {time}
Use this command to delete data from the data store prior to a given time. This command actually removes data files from the disk, so obviously it should be used carefully and only if the data to be deleted have been archived elsewhere. Run dsdelete -h, or just dsdelete with no options, to see the full summary of options.
dsrescan platform...
,dsrescan -all
This command tells the data store daemon to rescan data directories and update its file database. With the -all option, the daemon rescans directories for all of the platforms. Without -all, the daemon rescans each of the platforms named as parameters on the command line. Use dsrescan to notify the data store of additions or deletions of files made directly to a data store directory, as opposed to through the Zebra commands.
mstatus
This command returns a listing of all processes connected to the Zebra message system. This is a handy way to check whether all of the correct programs are running. The listing also contains some useful information, including the process identifiers for each program and the name by which the program is known to the message system. The name is needed to query the program with the zqeury command.
ncdump {file}
The ncdump command can be used to examine the contents of datastore files stored in the netCDF format. Files in netCDF usually have a filename extension of .cdf or .nc.
zquery {process}
Some of the programs connected to the Zebra message system can be queried for their status. The zquery performs this query and writes the response to the terminal window. Specify the name of the process to query in the process parameter. The Data Checks section gives an example of using zquery to check on the datastore daemon.
The usual first step to verifying proper ISS operation is to examine the graphical data displays to see that data products are being ingested, stored, and displayed properly. If data measurements are making it to the data plots, then it is a good bet they are being ingested and stored as well. To check all of the data platforms and their plots, cycle through each of the graphical display entries in the DISPLAY menu on the Iconbar. Look carefully for data gaps, noisy values, or values that exceed reasonable ranges. Trust your intuition on which values look reasonable.
Next, check the Data Available display. Open this window using the DATA menu on the icon bar. For each platform, check that the end time of the data period includes the most recent data which should have been recorded. Click on a platform to display detailed information on the observations stored in that platform. In particular, check to see that data files exist at the expected frequencies and times.
Sometimes it can be reassuring to query the datastore daemon directly and ask it for its status. Among other things, a response assures that the daemon is actually running and able to respond to other Zebra programs. Use the zquery command to send a query to the datastore daemon, as shown below. In the Zebra message system, the daemon goes by the name of DS_Daemon.
Use the UNIX df command to display a disk usage table. The table shows the disk space used and the disk space remaining for each partition in the system. The absolute space remaining in a partition is more important than the percentage of space remaining. The df command is almost always invoked with the -k option so that it reports disk usage in kilobytes.
If the Jaz disk is currently mounted, it will also appear in the disk usage table.
When the ISS transfers data back to ATD, it also sends along a summary of the system's status from the init.iss status script. The script runs several commands consecutively to generate a single status report. Some of the commands have already been mentioned, and some of the information is not very interesting for casual operations. However, if you ever encounter an unexplainable problem and seek assistance from someone else, especially over the phone, you may be asked to run this script and help analyze the output.
The command line is simply the name of the script and the method to run. The output, however, is large and not so simple, so either pipe the output into more, as shown below, or be prepared to use the terminal window scroll bars to scroll up to the beginning of the output:
105 iss2:/iss/home/iss> init.iss status | more |
The output begins by identifying the host producing the output and the time of the output. Then the first section gives the output of several system status commands.
For example, the very first part of this section is the output from the disk space command df -k, which is mentioned above. In the output below, there is an entry for the Jaz drive because at the time the device /dev/jaz was mounted to the directory /jaz. Since the Jaz drive is mounted automatically by the system when the system needs to access the /jaz directory, then unmounted after a minute without access, the /dev/jaz mount entry will not always appear in df -k output.
The last part of this section is the output from the ifconfig command, which lists important parameters for each network interface in the system. The interfaces have names like lo0, le0, and ipdptp0. See the chapter Networking for more information on networking.
In the particular example below, you can tell that the PPP connection has been established over the ipdptp3 interface because the 1.1.1.n dummy addresses have been replaced with real 128.117.80.n internet address.
System status for iss1 on Thu Apr 27 04:05:10 GMT 2000 ----- Filesystem kbytes used avail capacity Mounted on /dev/dsk/c0t3d0s0 357638 260151 61724 81% / /proc 0 0 0 0% /proc fd 0 0 0 0% /dev/fd /dev/dsk/c0t1d0s0 388998 288727 61372 83% /u2 swap 13352 212 13140 2% /tmp /dev/jaz 1894319 679525 1157965 37% /jaz ----- 10:05pm up 118 day(s), 4:58, 1 user, load average: 0.35, 0.10, 0.05 ----- Users: iss ----- procs memory page disk faults cpu r b w swap free re mf pi po fr de sr f0 s1 s3 s4 in sy cs us sy id 0 0 0 516 1400 0 8 0 0 0 0 0 0 0 0 0 19 55 34 1 2 98 ----- tty fd0 sd1 sd3 sd4 cpu tin tout kps tps serv kps tps serv kps tps serv kps tps serv us sy wt id 2 6 0 0 0 2 0 46 2 0 133 0 0 718 1 2 0 97 ----- lo0: flags=849<UP,LOOPBACK,RUNNING,MULTICAST> mtu 8232 inet 127.0.0.1 netmask ff000000 le0: flags=863<UP,BROADCAST,NOTRAILERS,RUNNING,MULTICAST> mtu 1500 inet 128.117.80.84 netmask ffffff00 broadcast 128.117.80.255 ipdptp0: flags=8d1<UP,POINTOPOINT,RUNNING,NOARP,MULTICAST> mtu 8232 inet 1.1.1.1 --> 1.1.1.2 netmask ffffff00 ipdptp1: flags=8d1<UP,POINTOPOINT,RUNNING,NOARP,MULTICAST> mtu 8232 inet 1.1.1.3 --> 1.1.1.4 netmask ffffff00 ipdptp2: flags=8d1<UP,POINTOPOINT,RUNNING,NOARP,MULTICAST> mtu 8232 inet 1.1.1.5 --> 1.1.1.6 netmask ffffff00 ipdptp3: flags=28d1<UP,POINTOPOINT,RUNNING,NOARP,MULTICAST,UNNUMBERED> mtu 1500 inet 128.117.80.84 --> 128.117.80.29 netmask ffffff00 |
Next, there is a short section listing the current crontab entries for user iss. This section will usually look something like the example below, containing only an entry to run the data transfers periodically. If data transfers have been disabled, then the crontab output may only contain the single line at the top.
----- Crontab: # ISS boot crontab generated Thu Sep 2 22:38:53 GMT 1999 # datasend: # datasend: periodic data transfer installed Wed Apr 26 20:40:17 GMT 2000 # datasend: 5 10,22 * * * /iss/etc/init.d/datasend run |
While the first few sections provide general system status, the next section finally provides some feedback about the ISS software itself, particularly all of the Zebra processes.
Finally, the init.iss status script appends the last several lines of the system log file, /var/adm/messages. Most of the messages in this file should be informational only, such as notices of successful adjustments of the system clock by ntpdate.
----- System log: Apr 23 10:05:11 iss1 ntpdate[15518]: step time server 192.52.106.6 offset 1.253221 sec Apr 23 22:05:10 iss1 ntpdate[21699]: step time server 192.52.106.6 offset 1.386205 sec Apr 24 06:00:24 iss1 named[25886]: starting. named 4.9.4-P1 Apr 24 06:00:24 iss1 named[25887]: Ready to answer queries. Apr 24 10:05:10 iss1 ntpdate[28000]: step time server 192.52.106.6 offset 1.308586 sec Apr 24 22:05:11 iss1 ntpdate[4179]: step time server 192.52.106.6 offset 1.446363 sec Apr 25 06:00:24 iss1 named[8363]: starting. named 4.9.4-P1 Apr 25 06:00:24 iss1 named[8364]: Ready to answer queries. Apr 25 10:05:10 iss1 ntpdate[10475]: step time server 192.52.106.6 offset 1.116340 sec Apr 25 22:05:10 iss1 ntpdate[16627]: step time server 192.52.106.6 offset 1.448122 sec Apr 26 06:00:24 iss1 named[20840]: starting. named 4.9.4-P1 Apr 26 06:00:25 iss1 named[20841]: Ready to answer queries. Apr 26 06:01:03 iss1 unix: ip_rput: DL_ERROR_ACK for 7, errno 3, unix 0 Apr 26 06:01:38 iss1 last message repeated 3 times Apr 26 06:02:18 iss1 unix: ip_rput: DL_ERROR_ACK for 7, errno 3, unix 0 Apr 26 06:08:41 iss1 last message repeated 24 times Apr 26 06:08:48 iss1 unix: ip_rput: DL_ERROR_ACK for 7, errno 3, unix 0 Apr 26 06:10:03 iss1 last message repeated 8 times Apr 26 10:05:15 iss1 ntpdate[22960]: step time server 192.52.106.6 offset 1.354962 sec Apr 26 22:05:10 iss1 ntpdate[29109]: step time server 192.52.106.6 offset 1.224128 sec |