Troubleshooting kernel freeze (virtual memory problem) on Pentium 4 computers
(Last updated May 9, 2002)
|
Linux kernels running on Pentium 4 computers freeze under the following conditions:
- If a 'mount /dev/cdrom' command is issued without a CD in the
drive, or if an audio CD is present
- At various random times when Netscape or other memory-intensive
program is running (see below).
At unpredictable times the entire system freezes, typically starting with
an HTTP request timing out in Netscape or Galeon. Next, characters in xterms
and telnet sessions will suddenly stop. The rest of the system seems fine
for a few seconds; then the keyboard freezes. The mouse continues to
move for a few seconds, but clicking on windows has no effect. At this
point the system is frozen and does not respond to ping or telnet from
other computers. The problem occurs with a variety of 2.4 and 2.5 kernels
and occurred on two separate Pentium 4 computers.
On one occasion, the problem was associated with swapping.
During this time, the system load was rapidly and continuously increasing,
and the disk light was on continuously, as Linux went into page swap
death:
PID USER PRI NI SIZE RSS SHARE STAT CPU MEM TIME COMMAND
3669 tjnelson 25 0 0 0 0 RW 71.2 0.0 5:01 bash
4 root 23 0 0 0 0 RW 20.6 0.0 2:18 kswapd
493 root 15 0 656 600 560 S 6.3 0.1 0:00 cron |
This situation recovers after about 5 minutes if you exit X and logout.
Other times, the freeze was not accompanied by swapping,
and it was not possible to exit X.
Symptoms
- During the page swap crisis,
/proc/meminfo shows that the problem was related to low memory:
NORMAL DURING FREEZE
------------------------- -------------------------
MemTotal: 516280 kB MemTotal: 516280 kB
MemFree: 439636 kB MemFree: 2996 kB
MemShared: 0 kB MemShared: 0 kB
Buffers: 11684 kB Buffers: 7372 kB
Cached: 30536 kB Cached: 17536 kB
SwapCached: 1688 kB SwapCached: 56700 kB
Active: 26908 kB Active: 34716 kB
Inactive: 32992 kB Inactive: 461008 kB
HighTotal: 0 kB HighTotal: 0 kB
HighFree: 0 kB HighFree: 0 kB
LowTotal: 516280 kB LowTotal: 516280 kB
LowFree: 439636 kB LowFree: 2996 kB
SwapTotal: 2106776 kB SwapTotal: 3723248 kB
SwapFree: 2097136 kB SwapFree: 2097136 kB |
About 2GB of disk space is available to the swap partition.
Note also that the kernel gets confused about the total swap space.
Vmstat 1 normally gives:
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 0 0 0 414192 19024 41508 0 0 0 0 101 306 1 0 99 |
During a hang, vmstat 1 gives:
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
2 4 3 15020 4716 378332 40712 0 0 0 0 101 326 0 0 100 |
This shows that nothing unusual, such as high system load, is occurring.
The only thing unusual is that idle memory is almost depleted, which would
signify that virtual memory is about to be used.
The following all had no effect:
- Changing IRQs of the sound card and serial port
- Disabling USB
- Swapping out the keyboard
- Disabling APIC and APM by adding
append = "disableapic apm=off " |
to /etc/lilo.conf, rerunning /sbin/lilo, and rebooting,
- Setting hard limits for all user processes by modifying
/etc/security/limits.conf, in case the problem was due to resource
depletion. (Note that only the 2.5 kernels appear to respect the
limits.conf file; in earlier versions, the resource limits must be
set by the shell in /etc/profile. This means users can change their
own limits. This is a good reason to upgrade to 2.5, since it means
anyone with an account can crash the system.)
- Switching to other browsers instead of Netscape
- Running fdisk to lower the size of the swap partition from 2GB to 1GB.
This stopped the inconsistencies in SwapTotal reported by the kernel.
It appears that, contrary to the documentation, Linux has a maximum of
1GB swap partition size on x86 Pentium 4 machines.
- Installing more recent kernels. A 2.5.7 and 2.5.10 kernel both gave
the error message "null pointer dereference" at boot-up followed by
a kernel panic. A 2.5.12 kernel not only failed to boot, but obliterated
the root partition (Reiserfs), forcing a reinstallation. Apparently, all
kernels in the neighborhood of 2.5.12 and 2.5.13 have this problem and
should be avoided like the plague.
Of the 2.5 kernels, 2.5.8 seems to be the most stable. Although
this kernel required editing of main.c before it would compile, the
new kernel seems to have much better memory management than previous
versions, but it still had the same problem of hanging at random times
on Pentium 4 computers.
To compile 2.5.8:
- in init/main.c, line 263 add:
static inline void setup_per_cpu_areas(void)
{
} |
(Note: This function should be empty unless you are using SMP).
- In drivers/ide/Config.in, line 52: Put double quotes (") around
$CONFIG_BLK_DEV_IDE_TLQ_DEFAULT.
Unfortunately, this new kernel couldn't control the Creative Labs SB
Live Ensoniq EMU10000 sound card.
Solution
Changed the following in the kernel configuration.
- Changed processor type from Pentium 4 to MK6. It was a bit odd that
this helped, considering the prominent "Intel Inside" stickers on the
outside of the computer and the fact that /proc/cpuinfo correctly
identifies the CPU as a "Intel(R) Pentium(R) 4 CPU 1.80GHz". The L1
cache shift setting was automatically changed from 7 to 5.
- Activated multi-mode disk access
- Deactivated loopback device (/dev/loop)
- Deactivated PPP compression
Here are the relevant lines from /usr/src/linux/.config:
# CONFIG_MPENTIUM4 is not set
CONFIG_MK6=y
CONFIG_X86_L1_CACHE_SHIFT=5
CONFIG_X86_ALIGNMENT_16=y
# CONFIG_BLK_DEV_LOOP is not set
CONFIG_IDEDISK_MULTI_MODE=y
CONFIG_IDEDISK_STROKE=y
# CONFIG_PPP_DEFLATE is not set
# CONFIG_PPP_BSDCOMP is not set
# CONFIG_ZLIB_INFLATE is not set
# CONFIG_ZLIB_DEFLATE is not set |
With these changes, the kernel has gone for about 10 months so far without freezing.
Working 2.5.5 kernel configuration for a Sony VAIO
Working 2.5.8 kernel configuration for a Dell Precision 330
Back