Troubleshooting kernel freeze (virtual memory problem) on Pentium 4 computers

(Last updated May 9, 2002)


Linux kernels running on Pentium 4 computers freeze under the following conditions: At unpredictable times the entire system freezes, typically starting with an HTTP request timing out in Netscape or Galeon. Next, characters in xterms and telnet sessions will suddenly stop. The rest of the system seems fine for a few seconds; then the keyboard freezes. The mouse continues to move for a few seconds, but clicking on windows has no effect. At this point the system is frozen and does not respond to ping or telnet from other computers. The problem occurs with a variety of 2.4 and 2.5 kernels and occurred on two separate Pentium 4 computers.

On one occasion, the problem was associated with swapping. During this time, the system load was rapidly and continuously increasing, and the disk light was on continuously, as Linux went into page swap death:
  PID USER     PRI  NI  SIZE  RSS SHARE STAT  CPU   MEM   TIME COMMAND
  3669 tjnelson  25   0     0    0     0 RW   71.2  0.0   5:01 bash
     4 root      23   0     0    0     0 RW   20.6  0.0   2:18 kswapd
   493 root      15   0   656  600   560 S     6.3  0.1   0:00 cron 
This situation recovers after about 5 minutes if you exit X and logout. Other times, the freeze was not accompanied by swapping, and it was not possible to exit X.

Symptoms

  1. During the page swap crisis, /proc/meminfo shows that the problem was related to low memory:
    NORMAL                      DURING FREEZE
    -------------------------   -------------------------
    MemTotal:       516280 kB   MemTotal:       516280 kB
    MemFree:        439636 kB   MemFree:          2996 kB
    MemShared:           0 kB   MemShared:           0 kB
    Buffers:         11684 kB   Buffers:          7372 kB
    Cached:          30536 kB   Cached:          17536 kB
    SwapCached:       1688 kB   SwapCached:      56700 kB
    Active:          26908 kB   Active:          34716 kB
    Inactive:        32992 kB   Inactive:       461008 kB
    HighTotal:           0 kB   HighTotal:           0 kB
    HighFree:            0 kB   HighFree:            0 kB
    LowTotal:       516280 kB   LowTotal:       516280 kB
    LowFree:        439636 kB   LowFree:          2996 kB
    SwapTotal:     2106776 kB   SwapTotal:     3723248 kB
    SwapFree:      2097136 kB   SwapFree:      2097136 kB 
    About 2GB of disk space is available to the swap partition. Note also that the kernel gets confused about the total swap space.

    Vmstat 1 normally gives:
       procs                           memory    swap          io     system         cpu
     r  b  w      swpd    free    buff  cache  si  so    bi    bo   in    cs  us  sy  id
     0  0  0         0  414192   19024  41508   0   0     0     0  101   306   1   0  99 
    During a hang, vmstat 1 gives:
       procs                           memory    swap          io     system         cpu
     r  b  w      swpd    free    buff  cache  si  so    bi    bo   in    cs  us  sy  id
     2  4  3     15020    4716  378332  40712   0   0     0     0  101   326   0   0  100 
    This shows that nothing unusual, such as high system load, is occurring. The only thing unusual is that idle memory is almost depleted, which would signify that virtual memory is about to be used.

The following all had no effect:

  1. Changing IRQs of the sound card and serial port
  2. Disabling USB
  3. Swapping out the keyboard
  4. Disabling APIC and APM by adding
          append = "disableapic apm=off "     
    to /etc/lilo.conf, rerunning /sbin/lilo, and rebooting,
  5. Setting hard limits for all user processes by modifying /etc/security/limits.conf, in case the problem was due to resource depletion. (Note that only the 2.5 kernels appear to respect the limits.conf file; in earlier versions, the resource limits must be set by the shell in /etc/profile. This means users can change their own limits. This is a good reason to upgrade to 2.5, since it means anyone with an account can crash the system.)
  6. Switching to other browsers instead of Netscape
  7. Running fdisk to lower the size of the swap partition from 2GB to 1GB. This stopped the inconsistencies in SwapTotal reported by the kernel. It appears that, contrary to the documentation, Linux has a maximum of 1GB swap partition size on x86 Pentium 4 machines.
  8. Installing more recent kernels. A 2.5.7 and 2.5.10 kernel both gave the error message "null pointer dereference" at boot-up followed by a kernel panic. A 2.5.12 kernel not only failed to boot, but obliterated the root partition (Reiserfs), forcing a reinstallation. Apparently, all kernels in the neighborhood of 2.5.12 and 2.5.13 have this problem and should be avoided like the plague.

    Of the 2.5 kernels, 2.5.8 seems to be the most stable. Although this kernel required editing of main.c before it would compile, the new kernel seems to have much better memory management than previous versions, but it still had the same problem of hanging at random times on Pentium 4 computers.

    To compile 2.5.8:

    1. in init/main.c, line 263 add:
           static inline void setup_per_cpu_areas(void)
           {
           }     
      (Note: This function should be empty unless you are using SMP).
    2. In drivers/ide/Config.in, line 52: Put double quotes (") around $CONFIG_BLK_DEV_IDE_TLQ_DEFAULT.
    Unfortunately, this new kernel couldn't control the Creative Labs SB Live Ensoniq EMU10000 sound card.

Solution

Changed the following in the kernel configuration.
  1. Changed processor type from Pentium 4 to MK6. It was a bit odd that this helped, considering the prominent "Intel Inside" stickers on the outside of the computer and the fact that /proc/cpuinfo correctly identifies the CPU as a "Intel(R) Pentium(R) 4 CPU 1.80GHz". The L1 cache shift setting was automatically changed from 7 to 5.
  2. Activated multi-mode disk access
  3. Deactivated loopback device (/dev/loop)
  4. Deactivated PPP compression
Here are the relevant lines from /usr/src/linux/.config:
     # CONFIG_MPENTIUM4 is not set
     CONFIG_MK6=y
     CONFIG_X86_L1_CACHE_SHIFT=5
     CONFIG_X86_ALIGNMENT_16=y
     # CONFIG_BLK_DEV_LOOP is not set
     CONFIG_IDEDISK_MULTI_MODE=y
     CONFIG_IDEDISK_STROKE=y
     # CONFIG_PPP_DEFLATE is not set
     # CONFIG_PPP_BSDCOMP is not set
     # CONFIG_ZLIB_INFLATE is not set
     # CONFIG_ZLIB_DEFLATE is not set    
With these changes, the kernel has gone for about 10 months so far without freezing.

Working 2.5.5 kernel configuration for a Sony VAIO
Working 2.5.8 kernel configuration for a Dell Precision 330


name

Back