Announcement

Collapse
No announcement yet.

New system locking up frequently - Maddening!

Collapse
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    New system locking up frequently - Maddening!

    I just put together a new system to run Kubuntu Karmic, but I'm getting frequent lockups (kernel panics, with Caps Lock & Scroll Lock LEDs flashing, and sometimes just freezing without the LEDs flashing) and I cannot figure out what's causing it. Right now the computer is unusable because of this.
    (some diagnostics below)

    HARDWARE:

    Asus P5KPL-AM SE Motherboard
    Pentium Dual Core E5300 CPU 2.6GHz
    2GB Kingston DDR2 RAM (one stick)
    HP PS/2 Keyboard
    HP PS/2 Optical Mouse
    Belkin F5D7000 v5 PCI wireless network card

    TESTING:
    The RAM has passed several runs using different memory testers, so I think the RAM is OK.

    The same thing happens whether the PCI network card is present or not, so I think that's eliminated, too.

    I also don't think it's related to any software, since I've had almost the same results with OpenSUSE 11.1, Kubuntu Jaunty & Win2k.

    The last time I was able to get any data around the time of lockup (running Kubuntu Karmic), this is what there was on the monitor (if anyone can see something here that will provide a clue as to what the problem is, I would really appreciate hearing it):
    __________________________________________________ ___

    [ 707.861049] [<c013c2d4>] try_to_wake_up+0x84/0x350
    [ 707.861049] [<c013c5ab>] default_wake_up_function+0xb/0x10
    [ 707.861049] [<c015c29b>] autoremove_wake_function+0x1b/0x40
    [ 707.861049] [<c012e890>] __wake_up_common+0x40/0x70
    [ 707.861049] [<c0133077>] __wake_up+0x37/0x50
    [ 707.861049] [<c0157fdb>] insert_work+0x5b/0xa0
    [ 707.861049] [<c0127c38>] ? default_spin_lock_flags+0x8/0x10
    [ 707.861049] [<c0570e1a>] ? _spin_lock_irqsave+0x2a/0x40
    [ 707.861049] [<c0158368>] __queue_work+0x28/0x40
    [ 707.861049] [<c01583fb>] queue_work_on+0x3b/0x60
    [ 707.861049] [<c0158455>] queue_work+0x15/0x20
    [ 707.861049] [<f8c306a2>] ieee80211_sta_rx_mgmt+0xa2/0xd0 [mac80211]
    [ 707.861049] [<f8c39c19>] ieee80211_invoke_rx_handlers+0x359/0x380 [mac80211]
    [ 707.861049] [<c04bb7b6>] ? nf_hook_slow+0x96/0xd0
    [ 707.861049] [<f8c39331>] __ieee80211_rx_handle_packet+0x161/0x2c0 [mac80211]
    [ 707.861049] [<f8c39cd6>] __ieee80211_rx+0x96/0x120 [mac80211]
    [ 707.861049] [<f8c8e7a5>] ath5k_tasklet_rx+0xd5/0x580 [ath5k]
    [ 707.861049] [<c014a217>] tasklet_action+0xa7/0xc0
    [ 707.861049] [<c014b3b0>] __do_soft_irq+0x90/0x1a0
    [ 707.861049] [<c018f8fc>] ? handle_IRQ_event+0x4c/0x140
    [ 707.861049] [<c0121270>] ? ack_apic_level+0x60/0x230
    [ 707.861049] [<c014b4fd>] do_softirq+0x3d/0x40
    [ 707.861049] [<c014b63d>] irq_exit+0x5d/0x70
    [ 707.861049] [<c0104f10>] do_IRQ+0x50/0xc0
    [ 707.861049] [<c01039b0>] common_interrupt+0x30/0x40
    [ 707.861049] [<c010a993>] ? mwait_idle+0x63/0xf0
    [ 707.861049] [<c010202c>] cpu_idle+0xd0
    [ 707.861049] [<c056c6d6>] start_secondary+0xc6/0xc8
    [ 707.861049] [drm:intelfb_panic] *ERROR* panic occurred, switching back to text console
    [ 707.861049] BUG: scheduling while atomic: swapper/0/0x10000100
    [ 707.861049] Modules linked in: ppdev snd_hda_codec_realtek arc4 ecb snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mix
    er_oss ath5k mac80211 led_class ath snd_pcm iptable_filter snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event
    lp parport snd_seq ip_tables x_tables snd_timer snd_seq_device cfg80211 psmouse serio_raw asus_atk0110 snd soundcore snd_page_a
    lloc fbcom tileblit font bitblit softcursor i915 drm i2c_algo_bit r8169 mli video output intel_agp agpgart
    [ 707.861049]
    [ 707.861049] Pid: 0, comm: swapper Tainted: G D (2.6.31-16-generic #52-Ubuntu) System Product Name
    [ 707.861049] EIP: 0060:[<c010a993>] EFLAGS: 00000246 CPU: 1
    [ 707.861049] EIP is at mwait_idle+0x63/0xf0
    [ 707.861049] EAX: 00000000 EBX: 00000001 ECX: 00000000 EDX: 00000000
    [ 707.861049] ESI: c078ac1c EDI: 31777537 EBP: f7081f84 ESP: f7081f60
    [ 707.861049] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
    [ 707.861049] CR0: 8005003b CR2: b1e60010 CR3: 31d14000 CR4: 000406d0
    [ 707.861049] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
    [ 707.861049] DR6: ffff0ff0 DR7: 00000400
    [ 707.861049] Call Trace:
    [ 707.861049] [<c010202c>] cpu_idle+0x8c/0xd0
    [ 707.861049] [<c056c6d6>] start_secondary+0xc6/0xc8
    __________________________________________________ ___

    I think there must be a problem with the motherboard or the processor. I've successfully built other systems similar to this one and I've never seen this happen before. I don't have a clue what is shutting it down all the time, and I'd really appreciate any insight.

    Thanks very much!

    #2
    Re: New system locking up frequently - Maddening!

    Kernel panics suggest that it's not peripherals, it's more likely the CPU or motherboard or memory.

    One stick of RAM? Normally DDR2 needs 2 sticks, no?

    Comment


      #3
      Re: New system locking up frequently - Maddening!

      Thanks for the suggestion. From what I've been able to find, DDR needs 2 sticks for optimum performance, but if you only have one, the mobo will just run in single-channel mode. I haven't found anything to indicate that running only 1 stick of DDR memory will cause catastrophic failure. BTW, this was a barebone kit, and it only came with one module.
      I tend to agree that it's the processor or the motherboard, but how to tell . . . ? I'll have to keep digging.

      Comment


        #4
        Re: New system locking up frequently - Maddening!

        Believe it or not, it is possible that Asus has a troubleshooting guide that, although somewhat general, may indicate pointers to mobo or CPU or memory or what. Intel has such guides. Try the Asus site.
        An intellectual says a simple thing in a hard way. An artist says a hard thing in a simple way. Charles Bukowski

        Comment


          #5
          Re: New system locking up frequently - Maddening!

          http://support.asus.com/troubleshoot...Language=en-us

          Haven't studied it, may be too general, but worth a look-see.
          An intellectual says a simple thing in a hard way. An artist says a hard thing in a simple way. Charles Bukowski

          Comment


            #6
            Re: New system locking up frequently - Maddening!

            If in a Konsole you will enter
            Code:
            dmesg | more
            it will give you the kernel messages about your hardware, one screen at a time. Use the spacebar or Enter to page through them. There might be some clue there.

            Comment


              #7
              Re: New system locking up frequently - Maddening!

              I did look at the Asus site, but I've already done the troubleshooting steps listed there, thanks.

              If I could get the system to stay running for more than 5 minutes I would try "dmesg | more" but that doesn't seem to be an option at this point.

              Below is some more information from this morning; this time I was trying to load the Kubuntu live CD and the PC locked up after less than 2 minutes, with the following on the screen:

              [ 123.533795] [<c01a1536>] ? do_wp_page+0x206/0x650
              [ 123.533797] [<c0502cf8>] ? _spin_lock+0x8/0x10
              [ 123.533801] [<c01a2000>] ? handle_mm_fault+0x300/0x380
              [ 123.533803] [<c050521d>] ? do_page_fault+0x1fd/0x790
              [ 123.533805] [<c02873f5>] ? security_path_permission+0x25/0x30
              [ 123.533808] [<c01b4e7b>] ? shmem_get_acl+0x3b/0x80
              [ 123.533811] [<c0501a4b>] ? mutex_lock+0xb/0x20
              [ 123.533813] [<c01e8414>] ? inotify_inode_is_dead+0x74/0x80
              [ 123.533816] [<c01b1b08>] ? shmem_destroy_inode+0x18/0x20
              [ 123.533817] [<c01b1b08>] ? shmem_destroy_inode+0x18/0x20
              [ 123.533819] [<c01b3aa0>] ? shmem_delete_inode+0x0/0x100
              [ 123.533821] [<c01c5d65>] ? putname+0x25/0x40
              [ 123.533824] [<c01c5d65>] ? putname+0x25/0x40
              [ 123.533825] [<c01c8390>] ? do_unlinkat+0x40/0x150
              [ 123.533827] [<c01d387d>] ? mntput_no_expire+0x1d/0x120
              [ 123.533830] [<c0505020>] ? do_page_fault+0x0/0x790
              [ 123.533832] [<c0503082>] ? error_code+0x72/0x80
              [ 123.533833] [<c0500000>] ? relay_hotcpu_callback+0x6d/0xbd
              [ 123.533837] Code: eb f1 8d b6 00 00 00 00 f6 03 04 90 8d 74 26 00 0f 85 0f 01
              00 00 89 73 04 8b 55 08 8b 04 95 50 4e 51 c0 89 03 8b 43 08 8b 76 30 <8b> 38 c1
              ef 1e 89 f8 c1 e0 06 8d 04 06 89 45 e4 e8 10 81 34 00
              [ 123.533849] EIP: [<c01baabb>] mem_cgroup_charge_common+0xcb/0x220 SS:ESP 0068
              :f6389d90
              [ 123.533853] ---[ end trace 0f224103dfcf8c1b ]---
              _

              The "relay_hotcpu_callback" may be a clue, but I got into the BIOS and have been watching the hardware monitor. The CPU is staying right around 30-31°C, which is not hot. Just to make sure the sensor isn't dead I unplugged the CPU fan briefly, long enough to see that it will register a temperature change, and it does.

              I keep suspecting the CPU itself, but that may be because the last few pages of the processor installation guide are printed upside down, and that makes me think "counterfeit."

              Thanks again.

              Comment


                #8
                Re: New system locking up frequently - Maddening!

                It says in the spec that motherboard supports 667 or 800 MHz memory. What is the model of your Kingston memory module?

                Comment


                  #9
                  Re: New system locking up frequently - Maddening!

                  Among the error messages I saw

                  0e1a>] ? _spin_lock_irqsave+0x2a/0x40
                  [ 707.861049] [<c0158368>] __queue_work+0x28/0x40
                  ........
                  [ 707.861049] [<c056c6d6>] start_secondary+0xc6/0xc8
                  [ 707.861049] [drm:intelfb_panic] *ERROR* panic occurred, switching back to text console
                  [ 707.861049] BUG: scheduling while atomic: swapper/0/0x10000100
                  I believe you have run into a bios dependent kernel bug: http://linux.derkeiler.com/Mailing-L.../msg14258.html

                  The following bug entry is on the current list of known regressions
                  from 2.6.30. Please verify if it still should be listed and let me know
                  (either way).
                  Alan Cox was at a loss
                  as to whether this is a network bug, a scheduler bug or something else.
                  Kernel developer Sergey Senozhatsky replied that


                  Hello Alan, Rafael,

                  BUG is still here...

                  kernel: [ 7331.719518] BUG: scheduling while atomic: pptpgw/4161/0x10000500
                  kernel: [ 7331.719526] Modules linked in: ppp_deflate zlib_deflate ppp_async crc_ccitt ppp_generic slhc ipv6 fuse loop snd_hda_codec_si3054 snd_hda_codec_realtek snd_hda_intel
                  snd_hda_codec snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device snd pcspkr soundcore psmouse serio_raw snd_page_alloc i2c_i801 rng_core evdev
                  asus_laptop usbhid hid sg sd_mod sr_mod cdrom ata_generic pata_acpi uhci_hcd ata_piix sdhci_pci sdhci ricoh_mmc mmc_core led_class ide_pci_generic ehci_hcd r8169 mii usbcore
                  kernel: [ 7331.719627]
                  kernel: [ 7331.719636] Pid: 4161, comm: pptpgw Not tainted (2.6.31-rc1-dbgnv-git4 #32) F3JC
                  kernel: [ 7331.719644] EIP: 0060:[<c13ffcd6>] EFLAGS: 00200246 CPU: 0
                  kernel: [ 7331.719657] EIP is at _spin_unlock_irqrestore+0x16/0x30

                  Those remarks were posted 6 months ago while 30 was being developed. It was marked closed five months ago but, apparently, either later kernel releases regressed, or your bios is an unusual case. : The bug was assigned to Ingo Molnar, who is the maintainer for and wrote the kernel scheduler, IIRC.


                  Perhaps a solution for your Asus is to downgrade to a kernel prior to 30.
                  "A nation that is afraid to let its people judge the truth and falsehood in an open market is a nation that is afraid of its people.”
                  – John F. Kennedy, February 26, 1962.

                  Comment


                    #10
                    Re: New system locking up frequently - Maddening!

                    Originally posted by dibl
                    It says in the spec that motherboard supports 667 or 800 MHz memory. What is the model of your Kingston memory module?
                    KVR667D2-2GR (it's discontinued) - hasn't failed any memory tests that I've run on it

                    Originally posted by GreyGeek
                    I believe you have run into a bios dependent kernel bug:

                    . . . apparently, either later kernel releases regressed, or your bios is an unusual case.

                    . . . Perhaps a solution for your Asus is to downgrade to a kernel prior to 30.
                    This issue has proven to be OS and software independent. But that raises a question: maybe the BIOS is corrupt?

                    Getting back to the data I posted this morning:

                    At http://docs.huihoo.com/linux/kernel/...api/re407.html I found:

                    "relay_hotcpu_callback — CPU hotplug callback . . . Description
                    Returns the success/failure of the operation. (NOTIFY_OK, NOTIFY_BAD)"

                    So, in the most recent case (loading Kubuntu live CD), was it trying to return the status of "CPU hotplug callback" when an error occurred? Then followed "mem_cgroup_charge_common" but I haven't found anything about that that I can understand. Is the the processor failing to process instructions, or is the memory failing somehow?
                    It doesn't matter what OS is installed on it; the computer won't run for more than 10 minutes (more like 2 to 5) without locking up or sponateously rebooting, with or without keyboard/mouse input. It's been that way since I unboxed it and assembled everything on 12/7. However, I did let it stay in BIOS setup for about an hour while I watched the CPU temp, and it kept running then with no problem.

                    Comment


                      #11
                      Re: New system locking up frequently - Maddening!

                      Update:
                      I got into the BIOS and forced the RAM clock to 667 MHz. That helped a lot, and the system stayed running longer, but it still had kernel panics and locked up.
                      After some more research, I downloaded the latest BIOS version (making sure it was for the P5KPL-AM SE) from the Asus website. I flashed the BIOS, and as I did so, the system told me it was reading the existing BIOS; then it was flashing the BIOS; then verifying the BIOS. Finally it said flashing was successful and it would reboot. It rebooted, and now the system is nothing more than a paperweight, waiting to have a new, different, motherboard installed.

                      Thanks, everyone, for the suggestions. I really appreciate your time, experience and willingness to help.

                      Comment

                      Working...
                      X