Re: [nottingham] Panics, kernel 2.4 (fwd)

From: Matthew Sackman (matthew@sackman.co.uk)
Date: Fri 04 Jan 2002 - 12:47:49 GMT


Well, if you want to get the swap onto another machine then you could try
using enbd: Extended Network Block Device. That way you access the swap
from your application machine, but the swap is actually a collection of
blocks on another hard drive in another machine. The only problem is that
as the exported block device is a simple file, there's a 2GB limit on it,
though by using RAID stripes you can easily crank this up. (Or use a patch
to get past the 2GB maximum file limit on linux - not sure of the state
of this with 2.4).

You'd need to compile nbd into your kernel as a module, then delete the
module, compile and install enbd, the principle site is:
http://www.it.uc3m.es/~ptb/nbd/

On the other hand, if you're really needing that much swap then you might
find that your network gets saturated if you try doing this and that you
need gigabit ethernet if you want any performance at all - I don't know,
I've not played with enbd as swap though I know it can be done.

The other thing you might want to try is to use klogd and have it pipe
output through nc to another machine. That way you'd have complete log of
the crash on the other machine. On your application machine, stop klogd
and then do:
klogd -d -f - | nc stable_machine 7777
and on the stable_machine, run (before the above):
nc -l -p 7777 > log.txt

like all good programs, klogd understands an output file of - to be STDOUT

If you have the time to set it up, it might be interesting to use enbd:
to see which machine falls over first, OTOH, the restriction of bandwidth
might be successful in slowing down the hammering of VM to a point where
it's more stable.

Just my 2p.

Matthew

-- 

Matthew Sackman Nottingham England

BOFH Excuse Board: CPU-angle has to be adjusted because of vibrations coming from the nearby road

On Thu, Jan 03, 2002 at 01:02:44AM +0000, Jon Masters wrote: > On Wed, 2 Jan 2002, Robert Davies wrote: > > > serial console would help > > <quote type="enemyOfTheState">It's already done</quote> > > > especially if you can log the panic's into a > > logfile on another machine linked by serial cable. > > That's generally the idea. What's more important though is that these > things should happen at night when they won't affect normal activities. > I'm in the office now as it happens, running a few tests, etc. > > > I've seen similar sounding problems, when a developer played with modperl on > > a production web server (I was away on holiday) > > This is also running apache with a few modules like modperl however the > point is, it's a vm bug isn't it. I'm (excuse language) fucking pissed off > that they haven't sorted this out by now, sure I'm hardly one to talk > because you don't see me doing it but someone should have done it :-) > > > basically the Linux VM folds under really heavy abuse > > This I knew. I've turned off swap for tonight's tests to see if that > helps. Problem is getting more than 1.5GB of RAM into a machine which > can't take any more than its already got :) > > > which is why they consider a VM similar to FreeBSD's in 2.5/2.6. > > Or just like 2.2. I can run said Java program and let it allocate numerous > GB of memory (said box has 1.5GB RAM and 14.5GB swap - yes I know that's > a little excessive) on its new 2.2.20 based production server just fine. > It doesn't need to run fast as it's only handling dishing out processing > blocks and piecing them back together but when we do a run of the Notting > Hill area then it's nice to actually processes all the map data :) > > > On those 3Com cards, I've seen problems in past > > As have I, discounted as you say, because it was still responsive > initially and also said card was not likely to be under too much strain. > > > where an interface drops out, and needs an ifconfig down & up > > ...like my old firewall in Hall last year, though that was eventually > partially attributed to Nottingam having given me a fucked port... > > > But it doesn't sound like that one, from your ssh response. > > It's not, IMO. > > > Perhaps you'ld consider using a remote logging server > > That's "in progress". > > > and have syslog over net, to get access to more info? > > Yes however this often happens very quickly so I seriously doubt many > useful syslogs from it :( > > > You could perhaps put in some cron jobs, doing things like ps auxwww to > > a log file every minute, and or use logger(1) in your places scripts to > > record what's going on when the problems are triggered. > > I had process accounting running and a few others, and there is a watchdog > too but these all don't really solve the problem :) > > > Have fun, but you know what you say to your luser's when they report > > problems, without copy/pasting error messages or showing you log files :) > > Yeah yeah. No useful logs and as yet no sucessfully captured panic()s. > > --jcm > > -------------------------------------------------------------------- > http://www.lug.org.uk http://www.linuxportal.co.uk > http://www.linuxjob.co.uk http://www.linuxshop.co.uk > -------------------------------------------------------------------- > -------------------------------------------------------------------- http://www.lug.org.uk http://www.linuxportal.co.uk http://www.linuxjob.co.uk http://www.linuxshop.co.uk --------------------------------------------------------------------



This archive was generated by hypermail 2.1.3 : Fri 04 Jan 2002 - 18:56:43 GMT