Re: [nottingham] Clusters...

From: Ben Blackburne (pcxbpb@nottingham.ac.uk)
Date: Mon 08 Jul 2002 - 14:33:04 BST


On Mon, 2002-07-08 at 14:26, Tom Allender wrote:
> On Mon, 2002-07-08 at 14:15, Ben Blackburne wrote:
> > I'm at the Computational Chemistry unit at the University of Nottingham.
> > We have just installed a 48-node dual Athlon cluster. Its running Redhat
> > with the SCore software (http://pccluster.org/) and Sun Grid Engine. 16
> > nodes have myrinet, the rest have only ethernet.
>
> Very cool. I nearly took Computational Chem. at Nottingham...
>

:-)

> > It is however, at the moment, remarkably unstable :-(. The main node
> > crashes every so often with no messages in the log. The suspects include
> > dodgy hardware and not enough cooling. The hardware of the main node has
> > just been replaced so we should know soon if that was the problem.
>
> Naive shot in the dark. Is it Red Hat 7.3 with 2.4.18-3? There was an
> errata kernel released as 2.4.18-3 had a repeatable panic with ext3 on
> SMP systems.
>

Its 7.2 but with the SCore people's Kernel:

2.4.18-2SCORE_athlon

I wonder if this is derived from Redhat's 7.3 kernel. We have been using
ext3 but only after we enabled it to speed up reboots after the crashes
we were already getting!

Might be worth checking out anyway though - cheers.

Ben

--------------------------------------------------------------------
http://www.lug.org.uk http://www.linuxportal.co.uk
http://www.linuxjob.co.uk http://www.linuxshop.co.uk
--------------------------------------------------------------------



This archive was generated by hypermail 2.1.3 : Mon 08 Jul 2002 - 14:33:20 BST