TREX: Logbook Entries

TREX: Site base Messages, 20 Entries..

Entry	Date	Title	Site	Author
246	Wed 26-Apr-2006	eantE DLR lidar Etherant returned	base	poulos
245	Wed 26-Apr-2006	Aster outage when docking station hardware suddenly and permanently failed	base	poulos
230	Sun 23-Apr-2006	changed network route to west	base	maclean
173	Tue 11-Apr-2006	Removed network equipment from school	base	maclean
138	Sun 02-Apr-2006	how to restart covars	base	horst
133	Sat 01-Apr-2006	restared wireless network and dsm server	base	horst
108	Sun 26-Mar-2006	dsm server down	base	horst
107	Sun 26-Mar-2006	PAM mast broke last night	base	horst
93	Wed 22-Mar-2006	Network disconnected	base	semmer
92	Wed 22-Mar-2006	CRASH!	base	semmer
84	Mon 20-Mar-2006	How to restart the dsm_server	base	semmer
75	Sun 19-Mar-2006	HotFilm status	base	semmer
70	Sat 18-Mar-2006	hot film run in base	base	semmer
35	Fri 10-Mar-2006	dsm crash	base	oncley
32	Thu 09-Mar-2006	another TPOP crash	base	oncley
31	Thu 09-Mar-2006	TPOP crash	base	oncley
29	Mon 06-Mar-2006	school connection weak	base	oncley
26	Sun 05-Mar-2006	strange stuff	base	oncley
10	Mon 27-Feb-2006	monitoring tools	base	oncley
3	Mon 27-Feb-2006	school internet contacts	base	oncley

246: wireless-network, Site base, Wed 26-Apr-2006 12:56:35 PDT, eantE DLR lidar Etherant returned
Previous - Next - Index

The DLR lidar tower and etherant (eantE or 14.37) has been returned and is placed under the Base trailer. The antenna will be detached and placed in the back of the base.

245: base, Site base, Wed 26-Apr-2006 11:40:53 PDT, Aster outage when docking station hardware suddenly and permanently failed
Previous - Next - Index

At 615am local time it was observed that Aster at the Base trailer was off. The author had never seen Aster shut down the entire experiment and had not shut it down when departing the trailer at approximately 2000 25 Apr.Approximately 2 minutes later the sounding team, Bob Street, Stanford and Serena Chew, DRI, showed up for the morning sounding during the current IOP of T-REX. They noted that lightning had occured overnight, near MAPR, after Midnight local.

An unsuccessful attempt was made to turn on the system at ~0620 after checking power cords and connectors (noting that the power was on to all drives, the monitor, the modems, the Terabeam, the lights, the Leeds sounding computer and various surge protectors. Thus, it appeared that either Aster or the computer port had experienced a catastrophic failure of some kind. Further investigation revealed that
1) the Aster machine would not turn on with battery power (it had drained after power to it had failed),
2) the computer Docking Station would not turn on with when powered from any plug, including known functional power
3) that there was no access to a fuse on the Docking Station through any panels etc.
4) the original 90W power cable for the Aster (Dell Latitude D800) was not in the trailer (two Poulos searches, one Chew search),
5) the power cord for Poulos' Dell Latitude D410 would apply power to Aster, but was of 65W and therefore subject to lesser performance
6) internet access was down in the trailer,
7) Agee reported from online in Bishop that the data plots had been down since before midnight (~2300 local), which when combined with the fact that no other electronic equipment was down, eliminated to a significant degree the 'lightning' theory,
7) #5 allowed some power to the battery and Aster could be functional - leading to the conclusion that indeed the Docking Station had a catastrophic failure.

After lengthy trouble shooting and Semmer looking up the viability of running Aster on 65W power, we decided to bring up Aster on 65W power. The only peripherals plugged in were the 12.0 and 14.0 connections to limit power usage.

Fortunately, this worked after a 2nd hardboot. The GPS time unit was subsequently plugged in as a crucial element of timekeeping after consultation with Gordon. This created no power or performance issues on sub-optimally-powered-Aster.

While no data had been written to Aster over the wireless, pings, file listings and data_stats of the 3 towers indicated that indeed the local data storage was complete overnight during the Aster outage. No data had been lost and soon thereafter files were filling on Aster with on-line data plots updating.

Subsequently, it was decided after discussions with Gordon/Steve S. to one-by-one attach various peripherals to Aster in its weak power state in hopes of retaining near normal functionality. An attempt was first made to enable the use of the 19" computer screen rather than the Aster laptop screen - this was unsuccessful - with electronic gibberish being the onscreen result. We thus are using Aster without the big screen until Golubieski arrives with the 90W power supply from Boulder (none were available at Schat.net in Bishop when Agee arrived first thing this morning and it would be 4-5 days to ship).

Agee also used a multi-meter to assess the circuits on the Docking station power supply. He found 'beeps' (circuit completion) when the meter was attached to the cord itself plugged in, but not when attached to the male pins at the back of the Docking Station. Our preliminary conclusion is that the internal power supply has failed or a fuse has blown within the Docking Station. Once that occured, apparently randomly, battery power allowed Aster to function temporarily while the battery ran down to zilch. Thus the state at 0615 26 Apr.

Gordon believes that since exthd7 and 9 are powered separately that their draw will be limited from the USB when plugged. Steve S. has suggested doing this in half-hour stages to ensure Aster functionality at each step.

We also agreed that since the data acquisition is quite solid onsite, an IOP is underway and the Central tower wireless still potentially shaky after yesterday's tests, that off-loading local storage is not advisable at this time to fill in the online data plot gaps.

230: wireless-network, Site base, Sun 23-Apr-2006 13:16:29 PDT, changed network route to west
Previous - Next - Index

Written from Boulder...

central and west systems have not been reporting on the network for
the last few hours, but south is fine.  Looks like TPOP, which serves
192.168.12.0 is sick again.

west (192.168.13.71) is on the AP24 ethernet interface. The base has an 
etherant on both the .12 and .14 networks, so there are two ways to
route traffic to west, either through 192.168.12.41 (AP24 internal
antenna), or 192.168.14.1 (AP24 external antenna).

Up to this point we were routing though 192.168.12.41, which requires
the TPOP:

$ route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
192.168.14.0    *               255.255.255.0   U     0      0        0 eth1
192.168.13.0    ap24int         255.255.255.0   UG    0      0        0 eth0
192.168.12.0    *               255.255.255.0   U     0      0        0 eth0

I think it would improve things to route through the .14 network and
avoid the TPOP.

Changed the route using these commands:

route delete -net 192.168.13.0/24 gw 192.168.12.41
route add -net 192.168.13.0/24 gw 192.168.14.1

Also changed /etc/sysconfig/static-routes so that it takes effect on
the next reboot:

any net 192.168.13.0/24 gw 192.168.14.1


route command now shows:

$ route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
192.168.14.0    *               255.255.255.0   U     0      0        0 eth1
192.168.13.0    ap24ext         255.255.255.0   UG    0      0        0 eth1
192.168.12.0    *               255.255.255.0   U     0      0        0 eth0

west is now responding.

On west, also changed the address for server in /etc/hosts
from 192.168.12.1 to 192.168.14.21.

In the /usr/local/isff/projects/TREX/ISFF/ops/ops*/ads3.xml, 
changed the destination address to 14.21:


     


On west, did /etc/init.d/nids restart and it is now sending data.

173: wireless-network, Site base, Tue 11-Apr-2006 11:31:58 PDT, Removed network equipment from school
Previous - Next - Index

John and I removed all the network equipment from the Independence
school yesterday.  That link was not being used and it gives us
a spare etherant WIFI antenna.

138: base, Site base, Sun 02-Apr-2006 11:08:32 PDT, how to restart covars
Previous - Next - Index

To restart covar/netcdf process:

[aster@aster ~]$ cd ~aster
[aster@aster ~]$ ./runstats_sock
18612: old priority 0, new priority 19
[aster@aster ~]$ ps -ef |grep stats
aster    18615     1  3 10:49 ?        00:00:00 statsproc sock:localhost:30000

To catch up covar/netcdf in ~aster edit runstats.sh for days to rerun, e.g.

statsproc isff_20060402_*.dat

Then:

[aster@aster ~]$ batch
at> ./runstats.sh
at> D
job 18 at 2006-04-02 10:52
[aster@aster ~]$ at -l
18      2006-04-02 10:52 = aster
[aster@aster ~]$ ps -ef | grep stats
aster    18615     1  2 10:49 ?        00:00:07 statsproc sock:localhost:30000
aster    18642 18641  0 10:52 ?        00:00:00 /bin/sh ./runstats.sh
aster    18644 18642 93 10:52 ?        00:02:23 statsproc isff_20060401_000000.dat isff_20060401_040000.dat isff_20060401_080000.dat isff_20060401_120000.dat isff_20060401_160000.dat isff_20060401_221636.dat isff_20060401_224518.dat

133: wireless-network, Site base, Sat 01-Apr-2006 14:45:57 PST, restared wireless network and dsm server
Previous - Next - Index

Spent the morning trying to restart communications with central with no luck.
Then also lost communications with west and south and the dsm server stopped
(do not know in what order this occurred).

Could not restart dsm server until communications were restored.  Steve
recycled power on tpop at base, restored communications to all three sites,
and then the dsm server was able to be restarted.

Wireless link to sodar is intermittent.

108: data-system, Site base, Sun 26-Mar-2006 09:47:08 PST, dsm server down
Previous - Next - Index

dsm server stopped yesterday at 0225 GMT, March 25; restarted this morning.

107: wireless-network, Site base, Sun 26-Mar-2006 09:33:24 PST, PAM mast broke last night
Previous - Next - Index

The PAM mast broke in an apparent high wind event last night; guy wire 
turnbuckles were not tied-off with cable ties!

Lost communication with central.

Repaired mast this morning at ~0930 PST.  Steve restarted etherant communicationby recycling power at central tower.

93: base, Site base, Wed 22-Mar-2006 20:36:58 PST, Network disconnected
Previous - Next - Index

Charlie called Cebridge and found out that they had disabled our modem,
After a few minutes they activated it again. It may get disabled again
so we will leave it on the school link until tomorrow.

92: base, Site base, Wed 22-Mar-2006 19:35:16 PST, CRASH!
Previous - Next - Index

Around 7:00pm we had a major crash. The first thing to go down was the
network interface. Charlie believes it was a Cebridge problem. I then
notice that the aster system was not working correctly. AA hard reboot of
aster had to be done. It refused to allow su login. It would not come
up properly, hung at swap disk space.Recycled power again and it came up ok.
Charlie is still working on the network link.

84: LOG, Site base, Mon 20-Mar-2006 16:51:26 PST, How to restart the dsm_server
Previous - Next - Index

If you get a response of connection refused to data_stats sock:localhost,
check ps -ef |grep dsm_server. If not running, log in as root and run
/etc/init.d/nids restart. This should get the dsm_server running again.

NOTE: if the wireless link is down, dsm_server will not get going.

75: HotFilm, Site base, Sun 19-Mar-2006 08:44:25 PST, HotFilm status
Previous - Next - Index

The hotfilm stayed up last night. It was configured in the base with ripple
tied directly to the prometheus via a swapper ethernet cable. The plan is
to move it out to the site later today with ripple housed inside the
Prometheus box. The external hard drive on ripple will not be used since
the local USB drive files look good. We may try to get ripple hooked up to the
wireless link so we can monitor things back at the base.

70: HotFilm, Site base, Sat 18-Mar-2006 08:00:32 PST, hot film run in base
Previous - Next - Index

The hot film ran for about 4 hours in the base then the A/D died. I will restart daisy for another run.

35: base, Site base, Fri 10-Mar-2006 08:35:26 PST, dsm crash
Previous - Next - Index

The dsm process crashed again last night at 1720 local (actually while we were 
here, but we didn't catch it :( ).  This probably was associated with a 
TPOP crash, of which we had lots yesterday.  All stations stored data locally.

32: wireless-network, Site base, Thu 09-Mar-2006 10:23:26 PST, another TPOP crash
Previous - Next - Index

Another TPOP crash, requiring a manual reboot, just occurred.  This time 
everybody came back up (though west took a while).

31: wireless-network, Site base, Thu 09-Mar-2006 08:49:43 PST, TPOP crash
Previous - Next - Index

The TPOP died at 05:15 UTC (9:15PM last night), so there are no local data from
any stations until I rebooted the TPOP just now.

central is down, though the etherant is okay now.  Will have to visit it.

Both south and west recorded data locally through this outage.

P.S. central was working except for the network.  "reboot" brought everything
(including the network) back up.  All data were recorded on the usb, so none
were lost during this network outage. (Also, power stayed up overnight.)

29: wireless-network, Site base, Mon 06-Mar-2006 12:20:31 PST, school connection weak
Previous - Next - Index

The school connection has been very weak.  I spent an hour on their roof this
morning (in part to help DRI with their 915 Freewave problems).  The best S/N
I could get was 4-6db -- pretty lousy.  Sometimes I can now ping, sometimes not.
Average is ~30% packets get through.

I'm convinced that I need to reconfigure the network to put the TPOP at west.
Guess we'll work on that tomorrow (when Brad is back).

Prior to this, I implemented Gordon's suggestion to lower the RTS/CTS packet 
size from 500 to 32 bytes, but of course this didn't fix school's problem.

26: wireless-network, Site base, Sun 05-Mar-2006 16:29:40 PST, strange stuff
Previous - Next - Index

lots of network outages now/dropped data.  have changed TPOP to chan 2 and 
rebooted several times. ?????

10: base, Site base, Mon 27-Feb-2006 21:34:47 PST, monitoring tools
Previous - Next - Index

data_stats sock:localhost [gives all nids channels]

Alico AP24 winbox (on Toughbook) [gives West's status]

TeraBeam navigator/configure/click on 192...address/"configure remote"/
"AP Associated ..." (on Toughbook) [gives Etherant/AP24 internal status]

More Etherant status on their own WWW sites.  (see /etc/hosts for addresses)
[set up as pull-down bookmarks from mozilla on aster base]
passwords should be saved by mozilla.

3: wireless-network, Site base, Mon 27-Feb-2006 11:03:25 PST, school internet contacts
Previous - Next - Index

Independence: Joel Hampton superintendent 760-878-2405
Bishop: Joe Griego IT director 760-920-1800 (c) 760-872-7027 (office)
        Justin Norcross Network manager 760-920-8959 (c)