[mdlug] Server stability *UPDATE*

Michael ORourke mrorourke at earthlink.net
Wed Jul 8 18:48:40 EDT 2009


Okay... here's what we did.

Rebuilt the server using a 64 bit version of CentOS, but still had problems.
We also double checked the temp inside the server cabinet and it was fine.
Then we replaced all the RAM in the box and the problems went away.
We now have a stable server!
As always... thanks to all for your suggestions.

-Mike

----- Original Message ----- 
From: "Robert Adkins" <radkins at impelind.com>
To: "'MDLUG's Main discussion list'" <mdlug at mdlug.org>
Sent: Wednesday, June 10, 2009 2:53 PM
Subject: Re: [mdlug] Server stability


> Mike,
>
> Rebooting under load can mean a whole slew of different things.
>
> It's possible that the Northbridge chip on the mainboard is overheating 
> and
> causing a reboot. It's possible that the CPU is overheating and causing a
> reboot. It's possible that the RAM is overheating...
>
> There could be some software conflicts relating to an incompatability
> between the software, the kernel and the CPU. (Recompiling the kernel
> specific for the hardware in your server could be a fix for this.)
>
> Do you have a way of monitoring the temperatures of the CPU and other
> elments of the PC when a heavy load is being applied to the server?
>
> -Rob
>
>> -----Original Message-----
>> From: mdlug-bounces at mdlug.org
>> [mailto:mdlug-bounces at mdlug.org] On Behalf Of Michael ORourke
>> Sent: Wednesday, June 10, 2009 2:41 PM
>> To: MDLUG's main mailing list
>> Subject: [mdlug] Server stability
>>
>> Lugnuts,
>>
>> I've got a server that is experiencing some stability issues.
>> The server was built with CentOS 5.2 i386 and has been
>> patched recently.  It's running Oracle XE, tomcat, httpd, and
>> some java.  But when I put any load on the system, it just
>> spontaneously reboots.  Any thoughts and/or suggestions?
>> Would we be better off reloading this box with an x86_64 bit
>> version of CentOS?
>> Here is some info on the server...
>>
>> --kernel
>> [root at patchserv proc]# uname -a
>> Linux patchserv.xxxxxxxxxxxxxxxxxx.xxx 2.6.18-128.1.6.el5PAE
>> #1 SMP Wed Apr 1 10:02:22 EDT 2009 i686 i686 i386 GNU/Linux
>>
>> --memory info (8GB RAM)
>> [root at patchserv log]# cat /proc/meminfo
>> MemTotal:      8178228 kB
>> MemFree:        300404 kB
>> Buffers:         43884 kB
>> Cached:        6252688 kB
>> SwapCached:      44140 kB
>> Active:        3276088 kB
>> Inactive:      4392996 kB
>> HighTotal:     7470840 kB
>> HighFree:        17988 kB
>> LowTotal:       707388 kB
>> LowFree:        282416 kB
>> SwapTotal:     2031608 kB
>> SwapFree:      1987468 kB
>> Dirty:             136 kB
>> Writeback:           0 kB
>> AnonPages:     1267384 kB
>> Mapped:         684764 kB
>> Slab:           144172 kB
>> PageTables:      50672 kB
>> NFS_Unstable:        0 kB
>> Bounce:              0 kB
>> CommitLimit:   6120720 kB
>> Committed_AS:  3973640 kB
>> VmallocTotal:   116728 kB
>> VmallocUsed:      3900 kB
>> VmallocChunk:   112712 kB
>> HugePages_Total:     0
>> HugePages_Free:      0
>> HugePages_Rsvd:      0
>> Hugepagesize:     2048 kB
>>
>> --processor info (8 CPUs)
>> processor       : 7
>> vendor_id       : GenuineIntel
>> cpu family      : 15
>> model           : 4
>> model name      : Intel(R) Xeon(TM) CPU 2.80GHz
>> stepping        : 8
>> cpu MHz         : 2793.483
>> cache size      : 2048 KB
>> physical id     : 1
>> siblings        : 4
>> core id         : 1
>> cpu cores       : 2
>> apicid          : 7
>> fdiv_bug        : no
>> hlt_bug         : no
>> f00f_bug        : no
>> coma_bug        : no
>> fpu             : yes
>> fpu_exception   : yes
>> cpuid level     : 5
>> wp              : yes
>> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep
>> mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse
>> sse2 ss ht tm pbe nx lm constant_tsc pni monitor ds_cpl est
>> cid cx16 xtpr lahf_lm
>> bogomips        : 5586.43
>>
>> --recent reboots (not user initiated)
>> [root at patchserv log]# last | grep reboot
>> reboot   system boot  2.6.18-128.1.6.e Wed Jun 10 07:42
>>    (06:50)
>> reboot   system boot  2.6.18-128.1.6.e Wed Jun 10 07:33
>>    (00:04)
>> reboot   system boot  2.6.18-128.1.6.e Tue Jun  9 01:12
>>   (1+06:24)
>> reboot   system boot  2.6.18-128.1.6.e Tue Jun  9 01:03
>>    (00:04)
>> reboot   system boot  2.6.18-128.1.6.e Tue Jun  9 00:26
>>    (00:41)
>> reboot   system boot  2.6.18-128.1.6.e Tue Jun  9 00:18
>>    (00:03)
>>
>> --/var/log/messages around the time of the last reboot...
>> Jun 10 07:33:26 patchserv kernel: ADDRCONF(NETDEV_CHANGE):
>> eth0: link becomes ready Jun 10 07:33:26 patchserv kernel:
>> ADDRCONF(NETDEV_UP): eth1: link is not ready Jun 10 07:37:30
>> patchserv kdump: saved a vmcore to
>> /var/crash/2009-06-10-07:33 Jun 10 07:37:31 patchserv
>> shutdown[3048]: shutting down for system reboot Jun 10
>> 07:37:31 patchserv init: Switching to runlevel: 6 Jun 10
>> 07:37:32 patchserv rpc.statd[2983]: Caught signal 15,
>> un-registering and exiting.
>> Jun 10 07:37:32 patchserv portmap[3087]: connect from
>> 127.0.0.1 to unset(status): request from unprivileged port
>> Jun 10 07:37:33 patchserv auditd[2888]: The audit daemon is exiting.
>> Jun 10 07:37:33 patchserv kernel: audit(1244633853.205:5):
>> audit_pid=0 old=2888 by auid=4294967295 Jun 10 07:37:33
>> patchserv kernel: Kernel logging (proc) stopped.
>> Jun 10 07:37:33 patchserv kernel: Kernel log daemon terminating.
>> Jun 10 07:37:34 patchserv exiting on signal 15 Jun 10
>> 07:42:29 patchserv syslogd 1.4.1: restart.
>> Jun 10 07:42:29 patchserv kernel: klogd 1.4.1, log source =
>> /proc/kmsg started.
>> Jun 10 07:42:29 patchserv kernel: Linux version
>> 2.6.18-128.1.6.el5PAE (mockbuild at builder10.centos.org) (gcc
>> version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Wed Apr 1
>> 10:02:22 EDT 2009 Jun 10 07:42:29 patchserv kernel:
>> BIOS-provided physical RAM map:
>> Jun 10 07:42:29 patchserv kernel:  BIOS-e820:
>> 0000000000000000 - 00000000000a0000 (usable) Jun 10 07:42:29
>> patchserv kernel:  BIOS-e820: 0000000000100000 -
>> 00000000bffc0000 (usable) Jun 10 07:42:29 patchserv kernel:
>> BIOS-e820: 00000000bffc0000 - 00000000bffcfc00 (ACPI data)
>> Jun 10 07:42:29 patchserv kernel:  BIOS-e820:
>> 00000000bffcfc00 - 00000000bffff000 (reserved) Jun 10
>> 07:42:29 patchserv kernel:  BIOS-e820: 00000000e0000000 -
>> 00000000fec90000 (reserved) Jun 10 07:42:29 patchserv kernel:
>>  BIOS-e820: 00000000fed00000 - 00000000fed00400 (reserved)
>> Jun 10 07:42:29 patchserv kernel:  BIOS-e820:
>> 00000000fee00000 - 00000000fee10000 (reserved) Jun 10
>> 07:42:29 patchserv kernel:  BIOS-e820: 00000000ffb00000 -
>> 0000000100000000 (reserved) Jun 10 07:42:29 patchserv kernel:
>>  BIOS-e820: 0000000100000000 - 00000001ffffe000 (usable) Jun
>> 10 07:42:29 patchserv kernel:  BIOS-e820: 00000001ffffe000 -
>> 0000000200000000 (reserved) Jun 10 07:42:29 patchserv kernel:
>>  BIOS-e820: 0000000200000000 - 0000000240000000 (usable) Jun
>> 10 07:42:29 patchserv kernel: 8320MB HIGHMEM available.
>> Jun 10 07:42:29 patchserv kernel: 896MB LOWMEM available.
>> Jun 10 07:42:29 patchserv kernel: found SMP MP-table at
>> 000fe710 Jun 10 07:42:29 patchserv kernel: NX (Execute
>> Disable) protection: active Jun 10 07:42:29 patchserv kernel:
>> DMI 2.3 present.
>> Jun 10 07:42:29 patchserv kernel: Using APIC driver default
>> Jun 10 07:42:29 patchserv kernel: ACPI: PM-Timer IO Port:
>> 0x808 Jun 10 07:42:29 patchserv kernel: ACPI: LAPIC
>> (acpi_id[0x01] lapic_id[0x00] enabled) Jun 10 07:42:29
>> patchserv kernel: Processor #0 15:4 APIC version 20 Jun 10
>> 07:42:29 patchserv kernel: ACPI: LAPIC (acpi_id[0x02]
>> lapic_id[0x06] enabled) Jun 10 07:42:29 patchserv kernel:
>> Processor #6 15:4 APIC version 20 Jun 10 07:42:29 patchserv
>> kernel: ACPI: LAPIC (acpi_id[0x03] lapic_id[0x02] enabled)
>> Jun 10 07:42:29 patchserv kernel: Processor #2 15:4 APIC
>> version 20 Jun 10 07:42:29 patchserv kernel: ACPI: LAPIC
>> (acpi_id[0x04] lapic_id[0x04] enabled) Jun 10 07:42:29
>> patchserv kernel: Processor #4 15:4 APIC version 20 Jun 10
>> 07:42:29 patchserv kernel: ACPI: LAPIC (acpi_id[0x05]
>> lapic_id[0x01] enabled) Jun 10 07:42:29 patchserv kernel:
>> Processor #1 15:4 APIC version 20 Jun 10 07:42:29 patchserv
>> kernel: ACPI: LAPIC (acpi_id[0x06] lapic_id[0x07] enabled)
>> Jun 10 07:42:29 patchserv kernel: Processor #7 15:4 APIC
>> version 20 Jun 10 07:42:29 patchserv kernel: ACPI: LAPIC
>> (acpi_id[0x07] lapic_id[0x03] enabled) Jun 10 07:42:29
>> patchserv kernel: Processor #3 15:4 APIC version 20 Jun 10
>> 07:42:29 patchserv kernel: ACPI: LAPIC (acpi_id[0x08]
>> lapic_id[0x05] enabled) Jun 10 07:42:29 patchserv kernel:
>> Processor #5 15:4 APIC version 20
>>
>> Thanks,
>> Mike
>> _______________________________________________
>> mdlug mailing list
>> mdlug at mdlug.org
>> http://mdlug.org/mailman/listinfo/mdlug
>>
>
> _______________________________________________
> mdlug mailing list
> mdlug at mdlug.org
> http://mdlug.org/mailman/listinfo/mdlug


--------------------------------------------------------------------------------



No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.339 / Virus Database: 270.12.61/2167 - Release Date: 06/10/09 
05:52:00




More information about the mdlug mailing list