[mdlug] Server stability

Michael ORourke mrorourke at earthlink.net
Thu Jun 11 06:54:56 EDT 2009


Rob,

Yeah... the heat issue had crossed my mind.  However, the server is located 
in a datacenter in another state, and I'm not sure what (if any) temperature 
monitoring utilities/resources are available.  Might have to dig into this a 
bit more.  Thanks for the suggestion.

-Mike

----- Original Message ----- 
From: "Robert Adkins" <radkins at impelind.com>
To: "'MDLUG's Main discussion list'" <mdlug at mdlug.org>
Sent: Wednesday, June 10, 2009 2:53 PM
Subject: Re: [mdlug] Server stability


> Mike,
>
> Rebooting under load can mean a whole slew of different things.
>
> It's possible that the Northbridge chip on the mainboard is overheating 
> and
> causing a reboot. It's possible that the CPU is overheating and causing a
> reboot. It's possible that the RAM is overheating...
>
> There could be some software conflicts relating to an incompatability
> between the software, the kernel and the CPU. (Recompiling the kernel
> specific for the hardware in your server could be a fix for this.)
>
> Do you have a way of monitoring the temperatures of the CPU and other
> elments of the PC when a heavy load is being applied to the server?
>
> -Rob
>
>> -----Original Message-----
>> From: mdlug-bounces at mdlug.org
>> [mailto:mdlug-bounces at mdlug.org] On Behalf Of Michael ORourke
>> Sent: Wednesday, June 10, 2009 2:41 PM
>> To: MDLUG's main mailing list
>> Subject: [mdlug] Server stability
>>
>> Lugnuts,
>>
>> I've got a server that is experiencing some stability issues.
>> The server was built with CentOS 5.2 i386 and has been
>> patched recently.  It's running Oracle XE, tomcat, httpd, and
>> some java.  But when I put any load on the system, it just
>> spontaneously reboots.  Any thoughts and/or suggestions?
>> Would we be better off reloading this box with an x86_64 bit
>> version of CentOS?
>> Here is some info on the server...
>>
>> --kernel
>> [root at patchserv proc]# uname -a
>> Linux patchserv.xxxxxxxxxxxxxxxxxx.xxx 2.6.18-128.1.6.el5PAE
>> #1 SMP Wed Apr 1 10:02:22 EDT 2009 i686 i686 i386 GNU/Linux
>>
>> --memory info (8GB RAM)
>> [root at patchserv log]# cat /proc/meminfo
>> MemTotal:      8178228 kB
>> MemFree:        300404 kB
>> Buffers:         43884 kB
>> Cached:        6252688 kB
>> SwapCached:      44140 kB
>> Active:        3276088 kB
>> Inactive:      4392996 kB
>> HighTotal:     7470840 kB
>> HighFree:        17988 kB
>> LowTotal:       707388 kB
>> LowFree:        282416 kB
>> SwapTotal:     2031608 kB
>> SwapFree:      1987468 kB
>> Dirty:             136 kB
>> Writeback:           0 kB
>> AnonPages:     1267384 kB
>> Mapped:         684764 kB
>> Slab:           144172 kB
>> PageTables:      50672 kB
>> NFS_Unstable:        0 kB
>> Bounce:              0 kB
>> CommitLimit:   6120720 kB
>> Committed_AS:  3973640 kB
>> VmallocTotal:   116728 kB
>> VmallocUsed:      3900 kB
>> VmallocChunk:   112712 kB
>> HugePages_Total:     0
>> HugePages_Free:      0
>> HugePages_Rsvd:      0
>> Hugepagesize:     2048 kB
>>
>> --processor info (8 CPUs)
>> processor       : 7
>> vendor_id       : GenuineIntel
>> cpu family      : 15
>> model           : 4
>> model name      : Intel(R) Xeon(TM) CPU 2.80GHz
>> stepping        : 8
>> cpu MHz         : 2793.483
>> cache size      : 2048 KB
>> physical id     : 1
>> siblings        : 4
>> core id         : 1
>> cpu cores       : 2
>> apicid          : 7
>> fdiv_bug        : no
>> hlt_bug         : no
>> f00f_bug        : no
>> coma_bug        : no
>> fpu             : yes
>> fpu_exception   : yes
>> cpuid level     : 5
>> wp              : yes
>> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep
>> mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse
>> sse2 ss ht tm pbe nx lm constant_tsc pni monitor ds_cpl est
>> cid cx16 xtpr lahf_lm
>> bogomips        : 5586.43
>>
>> --recent reboots (not user initiated)
>> [root at patchserv log]# last | grep reboot
>> reboot   system boot  2.6.18-128.1.6.e Wed Jun 10 07:42
>>    (06:50)
>> reboot   system boot  2.6.18-128.1.6.e Wed Jun 10 07:33
>>    (00:04)
>> reboot   system boot  2.6.18-128.1.6.e Tue Jun  9 01:12
>>   (1+06:24)
>> reboot   system boot  2.6.18-128.1.6.e Tue Jun  9 01:03
>>    (00:04)
>> reboot   system boot  2.6.18-128.1.6.e Tue Jun  9 00:26
>>    (00:41)
>> reboot   system boot  2.6.18-128.1.6.e Tue Jun  9 00:18
>>    (00:03)
>>
>> --/var/log/messages around the time of the last reboot...
>> Jun 10 07:33:26 patchserv kernel: ADDRCONF(NETDEV_CHANGE):
>> eth0: link becomes ready Jun 10 07:33:26 patchserv kernel:
>> ADDRCONF(NETDEV_UP): eth1: link is not ready Jun 10 07:37:30
>> patchserv kdump: saved a vmcore to
>> /var/crash/2009-06-10-07:33 Jun 10 07:37:31 patchserv
>> shutdown[3048]: shutting down for system reboot Jun 10
>> 07:37:31 patchserv init: Switching to runlevel: 6 Jun 10
>> 07:37:32 patchserv rpc.statd[2983]: Caught signal 15,
>> un-registering and exiting.
>> Jun 10 07:37:32 patchserv portmap[3087]: connect from
>> 127.0.0.1 to unset(status): request from unprivileged port
>> Jun 10 07:37:33 patchserv auditd[2888]: The audit daemon is exiting.
>> Jun 10 07:37:33 patchserv kernel: audit(1244633853.205:5):
>> audit_pid=0 old=2888 by auid=4294967295 Jun 10 07:37:33
>> patchserv kernel: Kernel logging (proc) stopped.
>> Jun 10 07:37:33 patchserv kernel: Kernel log daemon terminating.
>> Jun 10 07:37:34 patchserv exiting on signal 15 Jun 10
>> 07:42:29 patchserv syslogd 1.4.1: restart.
>> Jun 10 07:42:29 patchserv kernel: klogd 1.4.1, log source =
>> /proc/kmsg started.
>> Jun 10 07:42:29 patchserv kernel: Linux version
>> 2.6.18-128.1.6.el5PAE (mockbuild at builder10.centos.org) (gcc
>> version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Wed Apr 1
>> 10:02:22 EDT 2009 Jun 10 07:42:29 patchserv kernel:
>> BIOS-provided physical RAM map:
>> Jun 10 07:42:29 patchserv kernel:  BIOS-e820:
>> 0000000000000000 - 00000000000a0000 (usable) Jun 10 07:42:29
>> patchserv kernel:  BIOS-e820: 0000000000100000 -
>> 00000000bffc0000 (usable) Jun 10 07:42:29 patchserv kernel:
>> BIOS-e820: 00000000bffc0000 - 00000000bffcfc00 (ACPI data)
>> Jun 10 07:42:29 patchserv kernel:  BIOS-e820:
>> 00000000bffcfc00 - 00000000bffff000 (reserved) Jun 10
>> 07:42:29 patchserv kernel:  BIOS-e820: 00000000e0000000 -
>> 00000000fec90000 (reserved) Jun 10 07:42:29 patchserv kernel:
>>  BIOS-e820: 00000000fed00000 - 00000000fed00400 (reserved)
>> Jun 10 07:42:29 patchserv kernel:  BIOS-e820:
>> 00000000fee00000 - 00000000fee10000 (reserved) Jun 10
>> 07:42:29 patchserv kernel:  BIOS-e820: 00000000ffb00000 -
>> 0000000100000000 (reserved) Jun 10 07:42:29 patchserv kernel:
>>  BIOS-e820: 0000000100000000 - 00000001ffffe000 (usable) Jun
>> 10 07:42:29 patchserv kernel:  BIOS-e820: 00000001ffffe000 -
>> 0000000200000000 (reserved) Jun 10 07:42:29 patchserv kernel:
>>  BIOS-e820: 0000000200000000 - 0000000240000000 (usable) Jun
>> 10 07:42:29 patchserv kernel: 8320MB HIGHMEM available.
>> Jun 10 07:42:29 patchserv kernel: 896MB LOWMEM available.
>> Jun 10 07:42:29 patchserv kernel: found SMP MP-table at
>> 000fe710 Jun 10 07:42:29 patchserv kernel: NX (Execute
>> Disable) protection: active Jun 10 07:42:29 patchserv kernel:
>> DMI 2.3 present.
>> Jun 10 07:42:29 patchserv kernel: Using APIC driver default
>> Jun 10 07:42:29 patchserv kernel: ACPI: PM-Timer IO Port:
>> 0x808 Jun 10 07:42:29 patchserv kernel: ACPI: LAPIC
>> (acpi_id[0x01] lapic_id[0x00] enabled) Jun 10 07:42:29
>> patchserv kernel: Processor #0 15:4 APIC version 20 Jun 10
>> 07:42:29 patchserv kernel: ACPI: LAPIC (acpi_id[0x02]
>> lapic_id[0x06] enabled) Jun 10 07:42:29 patchserv kernel:
>> Processor #6 15:4 APIC version 20 Jun 10 07:42:29 patchserv
>> kernel: ACPI: LAPIC (acpi_id[0x03] lapic_id[0x02] enabled)
>> Jun 10 07:42:29 patchserv kernel: Processor #2 15:4 APIC
>> version 20 Jun 10 07:42:29 patchserv kernel: ACPI: LAPIC
>> (acpi_id[0x04] lapic_id[0x04] enabled) Jun 10 07:42:29
>> patchserv kernel: Processor #4 15:4 APIC version 20 Jun 10
>> 07:42:29 patchserv kernel: ACPI: LAPIC (acpi_id[0x05]
>> lapic_id[0x01] enabled) Jun 10 07:42:29 patchserv kernel:
>> Processor #1 15:4 APIC version 20 Jun 10 07:42:29 patchserv
>> kernel: ACPI: LAPIC (acpi_id[0x06] lapic_id[0x07] enabled)
>> Jun 10 07:42:29 patchserv kernel: Processor #7 15:4 APIC
>> version 20 Jun 10 07:42:29 patchserv kernel: ACPI: LAPIC
>> (acpi_id[0x07] lapic_id[0x03] enabled) Jun 10 07:42:29
>> patchserv kernel: Processor #3 15:4 APIC version 20 Jun 10
>> 07:42:29 patchserv kernel: ACPI: LAPIC (acpi_id[0x08]
>> lapic_id[0x05] enabled) Jun 10 07:42:29 patchserv kernel:
>> Processor #5 15:4 APIC version 20
>>
>> Thanks,
>> Mike
>> _______________________________________________
>> mdlug mailing list
>> mdlug at mdlug.org
>> http://mdlug.org/mailman/listinfo/mdlug
>>
>
> _______________________________________________
> mdlug mailing list
> mdlug at mdlug.org
> http://mdlug.org/mailman/listinfo/mdlug


--------------------------------------------------------------------------------



No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.339 / Virus Database: 270.12.61/2167 - Release Date: 06/10/09 
05:52:00




More information about the mdlug mailing list