[wxqc] cwop.aprs.net

Ted Lum gladstonefamily.net at tedworld.com
Mon Jun 2 11:39:42 CDT 2008


Linux/Unix will send back an ICMP "connection refused" in response to an 
attempt to connect to a port where _nothing is listening_. This differs 
slightly from Windows that just won't send ICMP so you end up getting a 
socket timeout. Connection refused on a box is characteristic of a dead 
process. Routers on the other hand return ICMP "destination unreachable" 
because they can't route a packet, usually either due to route table 
errors or a route that has gone down. Servers do not send ICMP 
destination unreachable, only routers.

Regardless, WeatherLink was architected poorly and would not try any 
other server no matter what comes back, and in some cases the 
application hangs and you have to restart it. This is the response I got 
back from Davis on this topic:

"I forwarded your email to our WeatherLink server person. He said you are
entirely right.  CWOP lists several servers and Weatherlink.com does only try
to upload to the first one.  He said he thought it would be a good idea to
try the others but haven't coded it yet.  He is aware of the problem and it
has been added to the our bug list. Thank you for the input."

Davis are the only ones that can fix WeatherLink. There is no simple 
work around for this. The best that could be hopped for would be a 
solution that dynamically updates DNS when a server goes down. But due 
the how DNS works and how the internet is put together with DNS caching 
all over the place, and not necessarily playing by all the same rules, 
it probably would still not be 100% effective. The server likely would 
be back up before all the stale records got aged out.

Basically it goes like this. If the internet circuit goes down the 
gateway router will probably return "destination unreachable". There are 
a whole lot of reasons you might get "destination unreachable" but with 
the internet circuit usually being the most brittle that's the most 
likely cause. If the circuit is up and the path is good through the last 
router, down to the subnet, but the server operating system or network 
stack is down (or your talking to a Windows box or there is a firewall 
in the way) you'll get a local socket timeout because nothing at all 
came back and your client got tired of waiting. If the O/S and network 
are up but the process that services requests on the port is either not 
started or dead, you'll get "connection refused" (unless its Windows or 
there is a firewall in the way) - "connection refused" = "no one is 
listening". These messages are implemented deep in the internet protocol 
(ICMP and IP), they are not arbitrary messages that applications create 
or send. Its the responsibility of the client application to understand 
how the internet protocols work and do the right thing.

-Ted-

Merton Campbell Crockett wrote:
> The primary problem with the CWOP-2 failure yesterday is the response  
> that was returned on a connection request.  Something was listening on  
> the port and returning a connection refused.  Had a port unreachable  
> been returned, the user's application software would have tried one of  
> the other addresses.
>
> Merton Campbell Crockett
>
>
> On 02 Jun 2008, at 06:10:45, Gerry Creager wrote:
>
>   
>> We're working to get better monitoring going, but realize that all  
>> three
>> dedicated servers already run in professional environments.
>>
>> As for last evening, CWOP-2 suffered a power supply failure.   
>> Although a
>> redundant power supply system, it was deemed necessary to replace the
>> failed unit.  The box was down for ~1 minute according to the system
>> administrator, although I recorded 22 minutes of downtime on my  
>> network
>> monitoring system.
>>
>> What we appear to have is a set of ongoing problems that we've  
>> discussed
>> here before:
>> 1.  OS caching DNS lookups?  Don't know, always a possibility.
>> 2.  Local routers caching DNS data and not refreshing as needed.
>> 3.  ISPs caching DNS and not obeying TTL setting in Bind.
>> 4.  Client software not responding to a failure appropriately and
>> re-requesting DNS lookup.
>>
>> The connection request should time out after 30 sec and another server
>> should be queried at that point.  That'd be problemmatical if we had
>> consistent 30 sec latencies but we shouldn't have that.
>>
>> I've started investigating a proxy method in place of round-robin DNS
>> lookups to direct all clients to a live server on a load-leveling  
>> basis.
>>  I've not found one I like yet, as most such systems expect the  
>> servers
>> to be co-located instead of geographically diverse, as we are.  As  
>> more
>> develops in this area I'll report.
>>
>> Gerry
>>
>> Merton Campbell Crockett wrote:
>>     
>>> You might want to check the status of the system again.  I show it
>>> dropping connections starting at 2121z and continuing until 0001z.
>>>
>>> I defined APRS.NET as a forward zone so its a little hard to tell  
>>> if the
>>> problem is continuing as its address tends to be at the end of the  
>>> list.
>>>
>>> Merton Campbell Crockett
>>>
>>>
>>>
>>> On 01 Jun 2008, at 17:04:18, Ted Lum wrote:
>>>
>>>       
>>>> CWOP-2 is back @ Sun Jun 1 16:00:57 UTC 2008
>>>>
>>>> Ted Lum wrote:
>>>>         
>>>>> Yes, CWOP-2 @ Sun Jun 1 13:37:10 UTC 2008. Think its just the  
>>>>> service on
>>>>> the box.
>>>>>
>>>>> tim.mcmanus at mac.com wrote:
>>>>>
>>>>>           
>>>>>> I think one of these servers went down again.  Every time a server
>>>>>> drops in that rotation, WeatherLink locks up sending data.  It  
>>>>>> locked
>>>>>> up for 45 minutes this time.
>>>>>>
>>>>>> Is there a better way to do this?  I think this is the fifth or  
>>>>>> sixth
>>>>>> time a dead server in the rotation locked up WeatherLink since we
>>>>>> changed the URL to cwop.aprs.net.
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Tim McManus
>>>>>> tim.mcmanus at mac.com
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> wxqc mailing list
>>>>>> Post messages to wxqc at lists.gladstonefamily.net
>>>>>> To unsubcribe or change delivery options, please go to:
>>>>>> http://server.gladstonefamily.net/mailman/listinfo/wxqc
>>>>>> To search the archives: http://www.google.com/coop/cse?cx=008314629403309390388%3Aknlfnptih9u
>>>>>>
>>>>>> The contents of this message are the responsibility of the author.
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>>           
>>>> -- 
>>>> This message has been scanned for viruses and
>>>> dangerous content by *MailScanner* <http://www.mailscanner.info/>,  
>>>> and is
>>>> believed to be clean.  
>>>> _______________________________________________
>>>> wxqc mailing list
>>>> Post messages to wxqc at lists.gladstonefamily.net
>>>> <mailto:wxqc at lists.gladstonefamily.net>
>>>> To unsubcribe or change delivery options, please go to:
>>>> http://server.gladstonefamily.net/mailman/listinfo/wxqc
>>>> To search the archives:
>>>> http://www.google.com/coop/cse? 
>>>> cx=008314629403309390388%3Aknlfnptih9u
>>>>
>>>> The contents of this message are the responsibility of the author.
>>>>         
>>> Merton Campbell Crockett
>>> m.c.crockett at roadrunner.com <mailto:m.c.crockett at roadrunner.com>
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> wxqc mailing list
>>> Post messages to wxqc at lists.gladstonefamily.net
>>> To unsubcribe or change delivery options, please go to:
>>> http://server.gladstonefamily.net/mailman/listinfo/wxqc
>>> To search the archives: http://www.google.com/coop/cse?cx=008314629403309390388%3Aknlfnptih9u
>>>
>>> The contents of this message are the responsibility of the author.
>>>       
>> -- 
>> Gerry Creager -- gerry.creager at tamu.edu
>> Texas Mesonet -- AATLT, Texas A&M University	
>> Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
>> Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843
>> _______________________________________________
>> wxqc mailing list
>> Post messages to wxqc at lists.gladstonefamily.net
>> To unsubcribe or change delivery options, please go to:
>> http://server.gladstonefamily.net/mailman/listinfo/wxqc
>> To search the archives: http://www.google.com/coop/cse?cx=008314629403309390388%3Aknlfnptih9u
>>
>> The contents of this message are the responsibility of the author.
>>     
>
> Merton Campbell Crockett
> m.c.crockett at roadrunner.com
>
>
>
> _______________________________________________
> wxqc mailing list
> Post messages to wxqc at lists.gladstonefamily.net
> To unsubcribe or change delivery options, please go to:
> http://server.gladstonefamily.net/mailman/listinfo/wxqc
> To search the archives: http://www.google.com/coop/cse?cx=008314629403309390388%3Aknlfnptih9u
>
> The contents of this message are the responsibility of the author.
>
>   


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://server.gladstonefamily.net/pipermail/wxqc/attachments/20080602/0d5fdd37/attachment.html 


More information about the wxqc mailing list