[nas] NAS or Solaris deadlock...

Jon Trulson jon at radscan.com
Wed Feb 13 20:52:22 MST 2002


On Thu, 14 Feb 2002, Erik Inge Bolsø wrote:

> Date: Thu, 14 Feb 2002 01:13:05 +0100 (CET)
> From: Erik Inge Bolsø <knan at mo.himolde.no>
> To: nas at radscan.com
> Cc: Ziying Sherwin <sherwin at nlm.nih.gov>,
>      "R. P. Channing [\"Rick\"] Rodg, ers" <rodgers at nlm.nih.gov>
> Subject: [nas] NAS or Solaris deadlock...
> 
> Greetings Jon and all.
> 
> After some rather extensive debugging, it seems that we've hit either a
> NAS solaris server bug, or a solaris OS bug.
> 
> Means of triggering: run nasd on a solaris box, then run mpg123 by way of
> libaudiooss. This triggers a very long loop of open/close of connections
> to the NAS server, and halfway through, nasd and client mpg123/libaudiooss
> hangs.
> 
> Have not managed to reproduce it against nasd running on linux.
> 
> NASD version: 1.4.2, 1.5
> 
> The last part of our debugging session attached below. Including backtrace
> of mpg123/libaudiooss and the hung nasd process.
> 
> Suggestions?
> 

	Strange... The stacktrace of mpg123 just shows that it's
wating for a response from the server - normal if the server is
otherwise wedged.

	The second trace just appears to be libaudiooss waiting on
nasd again, also normal if nasd is hung.

	The first trace (nas) seems to be the issue - it's waiting in
open(), which it should never be doing for any significant amount of
time.  This - at first glance anyway - looks like a kernel problem.
If the open() never returns, what can nasd do?  Out of curiousity, how
many opens via libaudio are occurring?

	When this happens, can you kill nasd and restart it and then
have everything working again?  What does a 'truss' show on the nasd
process?

	Does it only happen with 1.4.2d and 1.5?  Did earlier versions
show this behavior?

	Since mpg123 has native NAS support - does that work?  Would
be interesting since it would be using the same libaudio library as
libaudiooss is using... Just throwing things out here based on what
I've seen so far.  


> From: Ziying Sherwin <sherwin at nlm.nih.gov>
> To: Erik Inge Bolsø <knan at mo.himolde.no>
> Cc: "R. P. Channing [\"Rick\"] Rodgers" <rodgers at nlm.nih.gov>
> Subject: Re: audiooss?
> 

> Erik,
> 
> Congratulation for finding the problem. I actually had several questions sent
> to Jon about the new release of nas (version 1.5). Hopefully, we can get
> libaudiooss working on next nas release.

	I believe I did respond to those, were there any other issues
besides the STARTSERVER problem?

	Also, did libaudiooss on solaris stop working with 1.5?? I
wasn't made aware of that...  That would be bad, and if true, the
first thing I'd start looking at is the mutex code, as that was the
major change from 1.4.2 and 1.5.

[...]

-- 
Jon Trulson    mailto:jon at radscan.com
ID: 1A9A2B09, FP: C23F328A721264E7 B6188192EC733962
PGP keys at http://radscan.com/~jon/PGPKeys.txt
#include <std/disclaimer.h>
Bad Color Temperature, Too much Peach.




More information about the Nas mailing list