[mdlug] Server stability

Brian brian at dangerbacon.com
Wed Jun 10 15:08:11 EDT 2009


Is it possible that the machine is overheating?

Also looks like it saved a core file.  LKCD produces a human readable
analysis.* file that might give you some clues.

On 6/10/09, Michael ORourke <mrorourke at earthlink.net> wrote:
> Lugnuts,
>
> I've got a server that is experiencing some stability issues.
> The server was built with CentOS 5.2 i386 and has been patched recently.
> It's running Oracle XE, tomcat, httpd, and some java.  But when I put any
> load on the system, it just spontaneously reboots.  Any thoughts and/or
> suggestions?
> Would we be better off reloading this box with an x86_64 bit version of
> CentOS?
> Here is some info on the server...
>
> --kernel
> [root at patchserv proc]# uname -a
> Linux patchserv.xxxxxxxxxxxxxxxxxx.xxx 2.6.18-128.1.6.el5PAE #1 SMP Wed Apr
> 1 10:02:22 EDT 2009 i686 i686 i386 GNU/Linux
>
> --memory info (8GB RAM)
> [root at patchserv log]# cat /proc/meminfo
> MemTotal:      8178228 kB
> MemFree:        300404 kB
> Buffers:         43884 kB
> Cached:        6252688 kB
> SwapCached:      44140 kB
> Active:        3276088 kB
> Inactive:      4392996 kB
> HighTotal:     7470840 kB
> HighFree:        17988 kB
> LowTotal:       707388 kB
> LowFree:        282416 kB
> SwapTotal:     2031608 kB
> SwapFree:      1987468 kB
> Dirty:             136 kB
> Writeback:           0 kB
> AnonPages:     1267384 kB
> Mapped:         684764 kB
> Slab:           144172 kB
> PageTables:      50672 kB
> NFS_Unstable:        0 kB
> Bounce:              0 kB
> CommitLimit:   6120720 kB
> Committed_AS:  3973640 kB
> VmallocTotal:   116728 kB
> VmallocUsed:      3900 kB
> VmallocChunk:   112712 kB
> HugePages_Total:     0
> HugePages_Free:      0
> HugePages_Rsvd:      0
> Hugepagesize:     2048 kB
>
> --processor info (8 CPUs)
> processor       : 7
> vendor_id       : GenuineIntel
> cpu family      : 15
> model           : 4
> model name      : Intel(R) Xeon(TM) CPU 2.80GHz
> stepping        : 8
> cpu MHz         : 2793.483
> cache size      : 2048 KB
> physical id     : 1
> siblings        : 4
> core id         : 1
> cpu cores       : 2
> apicid          : 7
> fdiv_bug        : no
> hlt_bug         : no
> f00f_bug        : no
> coma_bug        : no
> fpu             : yes
> fpu_exception   : yes
> cpuid level     : 5
> wp              : yes
> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm
> constant_tsc pni monitor ds_cpl est cid cx16 xtpr lahf_lm
> bogomips        : 5586.43
>
> --recent reboots (not user initiated)
> [root at patchserv log]# last | grep reboot
> reboot   system boot  2.6.18-128.1.6.e Wed Jun 10 07:42          (06:50)
> reboot   system boot  2.6.18-128.1.6.e Wed Jun 10 07:33          (00:04)
> reboot   system boot  2.6.18-128.1.6.e Tue Jun  9 01:12         (1+06:24)
> reboot   system boot  2.6.18-128.1.6.e Tue Jun  9 01:03          (00:04)
> reboot   system boot  2.6.18-128.1.6.e Tue Jun  9 00:26          (00:41)
> reboot   system boot  2.6.18-128.1.6.e Tue Jun  9 00:18          (00:03)
>
> --/var/log/messages around the time of the last reboot...
> Jun 10 07:33:26 patchserv kernel: ADDRCONF(NETDEV_CHANGE): eth0: link
> becomes ready
> Jun 10 07:33:26 patchserv kernel: ADDRCONF(NETDEV_UP): eth1: link is not
> ready
> Jun 10 07:37:30 patchserv kdump: saved a vmcore to
> /var/crash/2009-06-10-07:33
> Jun 10 07:37:31 patchserv shutdown[3048]: shutting down for system reboot
> Jun 10 07:37:31 patchserv init: Switching to runlevel: 6
> Jun 10 07:37:32 patchserv rpc.statd[2983]: Caught signal 15, un-registering
> and exiting.
> Jun 10 07:37:32 patchserv portmap[3087]: connect from 127.0.0.1 to
> unset(status): request from unprivileged port
> Jun 10 07:37:33 patchserv auditd[2888]: The audit daemon is exiting.
> Jun 10 07:37:33 patchserv kernel: audit(1244633853.205:5): audit_pid=0
> old=2888 by auid=4294967295
> Jun 10 07:37:33 patchserv kernel: Kernel logging (proc) stopped.
> Jun 10 07:37:33 patchserv kernel: Kernel log daemon terminating.
> Jun 10 07:37:34 patchserv exiting on signal 15
> Jun 10 07:42:29 patchserv syslogd 1.4.1: restart.
> Jun 10 07:42:29 patchserv kernel: klogd 1.4.1, log source = /proc/kmsg
> started.
> Jun 10 07:42:29 patchserv kernel: Linux version 2.6.18-128.1.6.el5PAE
> (mockbuild at builder10.centos.org) (gcc version 4.1.2 20080704 (Red Hat
> 4.1.2-44)) #1 SMP Wed Apr 1 10:02:22 EDT 2009
> Jun 10 07:42:29 patchserv kernel: BIOS-provided physical RAM map:
> Jun 10 07:42:29 patchserv kernel:  BIOS-e820: 0000000000000000 -
> 00000000000a0000 (usable)
> Jun 10 07:42:29 patchserv kernel:  BIOS-e820: 0000000000100000 -
> 00000000bffc0000 (usable)
> Jun 10 07:42:29 patchserv kernel:  BIOS-e820: 00000000bffc0000 -
> 00000000bffcfc00 (ACPI data)
> Jun 10 07:42:29 patchserv kernel:  BIOS-e820: 00000000bffcfc00 -
> 00000000bffff000 (reserved)
> Jun 10 07:42:29 patchserv kernel:  BIOS-e820: 00000000e0000000 -
> 00000000fec90000 (reserved)
> Jun 10 07:42:29 patchserv kernel:  BIOS-e820: 00000000fed00000 -
> 00000000fed00400 (reserved)
> Jun 10 07:42:29 patchserv kernel:  BIOS-e820: 00000000fee00000 -
> 00000000fee10000 (reserved)
> Jun 10 07:42:29 patchserv kernel:  BIOS-e820: 00000000ffb00000 -
> 0000000100000000 (reserved)
> Jun 10 07:42:29 patchserv kernel:  BIOS-e820: 0000000100000000 -
> 00000001ffffe000 (usable)
> Jun 10 07:42:29 patchserv kernel:  BIOS-e820: 00000001ffffe000 -
> 0000000200000000 (reserved)
> Jun 10 07:42:29 patchserv kernel:  BIOS-e820: 0000000200000000 -
> 0000000240000000 (usable)
> Jun 10 07:42:29 patchserv kernel: 8320MB HIGHMEM available.
> Jun 10 07:42:29 patchserv kernel: 896MB LOWMEM available.
> Jun 10 07:42:29 patchserv kernel: found SMP MP-table at 000fe710
> Jun 10 07:42:29 patchserv kernel: NX (Execute Disable) protection: active
> Jun 10 07:42:29 patchserv kernel: DMI 2.3 present.
> Jun 10 07:42:29 patchserv kernel: Using APIC driver default
> Jun 10 07:42:29 patchserv kernel: ACPI: PM-Timer IO Port: 0x808
> Jun 10 07:42:29 patchserv kernel: ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00]
> enabled)
> Jun 10 07:42:29 patchserv kernel: Processor #0 15:4 APIC version 20
> Jun 10 07:42:29 patchserv kernel: ACPI: LAPIC (acpi_id[0x02] lapic_id[0x06]
> enabled)
> Jun 10 07:42:29 patchserv kernel: Processor #6 15:4 APIC version 20
> Jun 10 07:42:29 patchserv kernel: ACPI: LAPIC (acpi_id[0x03] lapic_id[0x02]
> enabled)
> Jun 10 07:42:29 patchserv kernel: Processor #2 15:4 APIC version 20
> Jun 10 07:42:29 patchserv kernel: ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04]
> enabled)
> Jun 10 07:42:29 patchserv kernel: Processor #4 15:4 APIC version 20
> Jun 10 07:42:29 patchserv kernel: ACPI: LAPIC (acpi_id[0x05] lapic_id[0x01]
> enabled)
> Jun 10 07:42:29 patchserv kernel: Processor #1 15:4 APIC version 20
> Jun 10 07:42:29 patchserv kernel: ACPI: LAPIC (acpi_id[0x06] lapic_id[0x07]
> enabled)
> Jun 10 07:42:29 patchserv kernel: Processor #7 15:4 APIC version 20
> Jun 10 07:42:29 patchserv kernel: ACPI: LAPIC (acpi_id[0x07] lapic_id[0x03]
> enabled)
> Jun 10 07:42:29 patchserv kernel: Processor #3 15:4 APIC version 20
> Jun 10 07:42:29 patchserv kernel: ACPI: LAPIC (acpi_id[0x08] lapic_id[0x05]
> enabled)
> Jun 10 07:42:29 patchserv kernel: Processor #5 15:4 APIC version 20
>
> Thanks,
> Mike
> _______________________________________________
> mdlug mailing list
> mdlug at mdlug.org
> http://mdlug.org/mailman/listinfo/mdlug
>

-- 
Sent from my mobile device

Brian
"It's not stupid, it's advaaanced!" - Tallest Purple



More information about the mdlug mailing list