Announcement

Collapse
No announcement yet.

Kubuntu Locking Up Randomly

Collapse
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Kubuntu Locking Up Randomly

    Okay, I've got a rather weird issue and I'm hoping all of you can help me with this. For about the past 2 months I've had a rather odd problem on my machine that I can't seem to diagnose and fix. Namely, the machine is locking up at random. And I mean hard locking. And there doesn't seem to be any rhyme or reason to it. Sometimes I can go days without issues, and then I'll get several locks in a row. But the machine is never doing the same thing twice when it locks up. Sometimes its active, sometimes its idle, sometimes it's just going along like nothing's wrong and then boom, everything freezes with no warning whatsoever. Now here's a few things I've noticed when it locks up, plus some troubleshooting steps I've already taken.

    1. My desktop icons get reset. Every icon I've repositioned on the desktop, whose position normally gets saved and restored at bootup, gets reset to what looks like a default alphabetical icon sort order.
    2. Sometimes when I shut down it goes down fast and shuts off quickly like it should and other times it takes forever. That may be unrelated, but I thought I'd mention it anyways.
    3. I've recently FSCK'ed the main disk and I don't know if it reported any errors that it may have found, but the system did run a bit better afterwards, even though it didn't stop the freezing problem.
    4. There are no blown caps on my motherboard or video card. Both are pristine still.
    5. The machine has been cleaned recently so it's not a heat or dust issue that I can tell.
    6. Moving aside the .kde folder doesn't fix anything.

    It was suggested by someone as a possible fix, but it didn't change anything.
    7. Ark crashes when creating zip files. The archive is successfully created and is a valid archive with no issues, but none the less it crashes whenever zips are made, but not any other format.
    8. Yes, all of the software is updated to the latest. I did that just to be sure it wasn't a software issue.

    So at this point I'm stumped as to what it might be. Any ideas? I mean, it's not like it locks up every 5 minutes. I might go several days without a lockup, and then get 2-3 in a row, and it really doesn't matter what I'm doing at the time, or how long the machine has been up. So if you guys could throw some ideas at me as to what might be causing this issue, I'm all ears.

    #2
    My general rule-of-thumb about lock-ups: If it's software it's in a log somewhere, if it's not, it's hardware.

    Start with noting the exact time it locks. Then go through every log file and take note of anything occurring at or just prior to the lock up. If you find nothing, start looking at hardware.

    I had similar behavior last year related to the video driver - no rhyme or reason to the lock-ups but every time it happened, there was an entry in Xorg.0.log.

    Hardware is much harder to diagnose. Could be RAM, could be video card, etc. Almost no way to tell unless you can pull parts and swap them as a test. If you have two or more RAM sticks, try running with just one. I've also had a case in the past where one of the four RAM slots went bad (old mobo) and I solved it by moving the RAM to a different slot. Sometimes, just pulling a re-seating hardware will solve it. Micro-corrosion and the expansion and contraction of heating and cooling can cause momentary connection loss.

    Please Read Me

    Comment


      #3
      The first possible culprit I'd look at is RAM. On the grub menu there is Memtest86+. Let it run over night. That should be enough time for 6 or more complete passes. If it finds bad ram it will tell you which bank of RAM is bad.

      If the RAM is bad a possible easy fix is to power down the machine, put on a wrist strap to ground yourself, open up the machine and pull out the RAM chip/card. Take a clean eraser (pencil) and polish the leads on them. For RAM cards take a fingernail Emory board and polish the prongs in the card slot as well. Then put the RAM chip/cards back in, power up and retry the Memtest86+. If you still get bad RAM it's time to replace them AS A SET. Don't mix and match RAM of different sizes, makes or models.

      Another possible culprit that exhibits those symptoms is the video chip/card. If it is a card try the polishing leads trick. If your machine offers two GPUs then switch to the other one and see if your machine still crashes. If you don't have a 2nd GPU to switch to then, like oshunluver says, take note of the time your machine hangs and then check the ksystemlog to see if there are any video events at or just proceeding that time. If your problem is a faulty video chip/card then check the modules (sudo lsmod) to see what your video driver is. Then use "modinfo yourvideodriver" to see if there are any "parm"s that can be adjusted.

      I have an Intel 3000HD graphic chip as my primary, and an Nvidia GT 650M as my secondary GPU, which cannot be made the primary in the BIOS. But, I installed nvidia-378 and it made my nvidia chip behave like the primary GPU. When I check the modules I notice:
      Code:
      [B]~$ lsmod | grep nvidia[/B]
      nvidia_uvm            634880  0
      nvidia_drm             49152  2
      nvidia_modeset        806912  7 nvidia_drm
      nvidia              12271616  134 nvidia_modeset,nvidia_uvm
      drm_kms_helper        167936  2 [B]i915[/B],nvidia_drm
      drm                   368640  6 i915,nvidia_drm,[B]drm_kms_helper[/B]
      The only two which supply modinfo information are highlighted in black.
      Code:
      [B]~$ modinfo nvidia_drm[/B]
      modinfo: ERROR: Module nvidia_drm not found.
      [B]~$ modinfo nvidia_modeset[/B]
      modinfo: ERROR: Module nvidia_modeset not found.
      [B]~$ modinfo drm_kms_helper[/B]
      filename:       /lib/modules/4.8.0-42-generic/kernel/drivers/gpu/drm/drm_kms_helper.ko
      license:        GPL and additional rights
      description:    DRM KMS helper
      author:         David Airlie, Jesse Barnes
      license:        GPL
      srcversion:     35131DBB17F8279DDE3C0EA
      depends:        drm,fb_sys_fops,syscopyarea,sysfillrect,sysimgblt
      intree:         Y
      vermagic:       4.8.0-42-generic SMP mod_unload modversions 
      parm:           fbdev_emulation:Enable legacy fbdev emulation [default=true] (bool)
      parm:           edid_firmware:Do not probe monitor, use specified EDID blob from built-in data or /lib/firmware instead.  (string)
      parm:           poll:bool
      parm:           dp_aux_i2c_speed_khz:Assumed speed of the i2c bus in kHz, (1-400, default 10) (int)
      parm:           dp_aux_i2c_transfer_size:Number of bytes to transfer in a single I2C over DP AUX CH message, (1-16, default 16) (int)
      [B]~$ modinfo i915[/B]
      filename:       /lib/modules/4.8.0-42-generic/kernel/drivers/gpu/drm/i915/i915.ko
      license:        GPL and additional rights
      description:    Intel Graphics
      author:         Intel Corporation
      author:         Tungsten Graphics, Inc.
      firmware:       i915/bxt_dmc_ver1_07.bin
      firmware:       i915/skl_dmc_ver1_26.bin
      firmware:       i915/kbl_dmc_ver1_01.bin
      firmware:       i915/skl_guc_ver6_1.bin
      srcversion:     0919D751C244BAC45254B51
      alias:          pci:v00008086d0000593Bsv*sd*bc03sc*i*
      alias:          pci:v00008086d00005927sv*sd*bc03sc*i*
      ***  lots of alias's deleted.
      alias:          pci:v00008086d00002562sv*sd*bc03sc*i*
      alias:          pci:v00008086d00003577sv*sd*bc03sc*i*
      depends:        drm_kms_helper,drm,video,i2c-algo-bit
      intree:         Y
      vermagic:       4.8.0-42-generic SMP mod_unload modversions 
      parm:           modeset:Use kernel modesetting [KMS] (0=disable, 1=on, -1=force vga console preference [default]) (int)
      parm:           panel_ignore_lid:Override lid status (0=autodetect, 1=autodetect disabled [default], -1=force lid closed, -2=force lid open) (int)
      parm:           semaphores:Use semaphores for inter-ring sync (default: -1 (use per-chip defaults)) (int)
      parm:           enable_rc6:Enable power-saving render C-state 6. Different stages can be selected via bitmask values (0 = disable; 1 = enable rc6; 2 = enable deep rc6; 4 = enable deepest rc6). For example, 3 would enable rc6 and deep rc6, and 7 would enable everything. default: -1 (use per-chip default) (int)
      parm:           enable_dc:Enable power-saving display C-states. (-1=auto [default]; 0=disable; 1=up to DC5; 2=up to DC6) (int)
      parm:           enable_fbc:Enable frame buffer compression for power savings (default: -1 (use per-chip default)) (int)
      parm:           lvds_channel_mode:Specify LVDS channel mode (0=probe BIOS [default], 1=single-channel, 2=dual-channel) (int)
      parm:           lvds_use_ssc:Use Spread Spectrum Clock with panels [LVDS/eDP] (default: auto from VBT) (int)
      parm:           vbt_sdvo_panel_type:Override/Ignore selection of SDVO panel mode in the VBT (-2=ignore, -1=auto [default], index in VBT BIOS table) (int)
      parm:           reset:Attempt GPU resets (default: true) (bool)
      parm:           enable_hangcheck:Periodically check GPU activity for detecting hangs. WARNING: Disabling this can cause system wide hangs. (default: true) (bool)
      parm:           enable_ppgtt:Override PPGTT usage. (-1=auto [default], 0=disabled, 1=aliasing, 2=full, 3=full with extended address space) (int)
      parm:           enable_execlists:Override execlists usage. (-1=auto [default], 0=disabled, 1=enabled) (int)
      parm:           enable_psr:Enable PSR (0=disabled, 1=enabled - link mode chosen per-platform, 2=force link-standby mode, 3=force link-off mode) Default: -1 (use per-chip default) (int)
      parm:           preliminary_hw_support:Enable preliminary hardware support. (int)
      parm:           disable_power_well:Disable display power wells when possible (-1=auto [default], 0=power wells always on, 1=power wells disabled when possible) (int)
      parm:           enable_ips:Enable IPS (default: true) (int)
      parm:           fastboot:Try to skip unnecessary mode sets at boot time (default: false) (bool)
      parm:           prefault_disable:Disable page prefaulting for pread/pwrite/reloc (default:false). For developers only. (bool)
      parm:           load_detect_test:Force-enable the VGA load detect code for testing (default:false). For developers only. (bool)
      parm:           invert_brightness:Invert backlight brightness (-1 force normal, 0 machine defaults, 1 force inversion), please report PCI device ID, subsystem vendor and subsystem device ID to dri-devel@lists.freedesktop.org, if your machine needs it. It will then be included in an upcoming module version. (int)
      parm:           disable_display:Disable display (default: false) (bool)
      parm:           enable_cmd_parser:Enable command parsing (1=enabled [default], 0=disabled) (int)
      parm:           use_mmio_flip:use MMIO flips (-1=never, 0=driver discretion [default], 1=always) (int)
      parm:           mmio_debug:Enable the MMIO debug code for the first N failures (default: off). This may negatively affect performance. (int)
      parm:           verbose_state_checks:Enable verbose logs (ie. WARN_ON()) in case of unexpected hw state conditions. (bool)
      parm:           nuclear_pageflip:Force atomic modeset functionality; asynchronous mode is not yet supported. (default: false). (bool)
      parm:           edp_vswing:Ignore/Override vswing pre-emph table selection from VBT (0=use value from vbt [default], 1=low power swing(200mV),2=default swing(400mV)) (int)
      parm:           enable_guc_loading:Enable GuC firmware loading (-1=auto, 0=never [default], 1=if available, 2=required) (int)
      parm:           enable_guc_submission:Enable GuC submission (-1=auto, 0=never [default], 1=if available, 2=required) (int)
      parm:           guc_log_level:GuC firmware logging level (-1:disabled (default), 0-3:enabled) (int)
      parm:           enable_dp_mst:Enable multi-stream transport (MST) for new DisplayPort sinks. (default: true) (bool)
      parm:           inject_load_failure:Force an error after a number of failure check points (0:disabled (default), N:force failure at the Nth failure check point) (uint)
      parm:           enable_dpcd_backlight:Enable support for DPCD backlight control (default:false) (bool)
      parm:           enable_gvt:Enable support for Intel GVT-g graphics virtualization host support(default:false) (bool)
      Checking for "video" in the lsmod listing:
      Code:
      [B]~$ lsmod | grep video[/B]
      uvcvideo               90112  0
      videobuf2_vmalloc      16384  1 uvcvideo
      videobuf2_memops       16384  1 videobuf2_vmalloc
      videobuf2_v4l2         24576  1 uvcvideo
      videobuf2_core         40960  2 uvcvideo,videobuf2_v4l2
      videodev              180224  3 uvcvideo,videobuf2_core,videobuf2_v4l2
      media                  40960  2 uvcvideo,videodev
      video                  40960  2 acer_wmi,i915
      And showing "video", but not the others.
      Code:
      [B]~$ modinfo video[/B]
      filename:       /lib/modules/4.8.0-42-generic/kernel/drivers/acpi/video.ko
      license:        GPL
      description:    ACPI Video Driver
      author:         Bruno Ducrot
      srcversion:     5305985BDE36EBD0913C8F9
      alias:          acpi*:LNXVIDEO:*
      depends:        
      intree:         Y
      vermagic:       4.8.0-42-generic SMP mod_unload modversions 
      parm:           brightness_switch_enabled:bool
      parm:           allow_duplicates:bool
      parm:           disable_backlight_sysfs_if:int
      parm:           report_key_events:0: none, 1: output changes, 2: brightness changes, 3: all (int)
      parm:           device_id_scheme:bool
      parm:           only_lcd:bool
      So, I can create a somevideo.conf file in /etc/modprobe.d/ , as root, and put option settings in it, for example:
      "options nvidia enable_hangcheck=1"
      etc...
      Last edited by GreyGeek; Nov 08, 2017, 11:33 PM.
      "A nation that is afraid to let its people judge the truth and falsehood in an open market is a nation that is afraid of its people.”
      – John F. Kennedy, February 26, 1962.

      Comment


        #4
        Thanks guys. I'll keep you guys informed on what I find. The resetting icons thing happened again this morning too, but I can't find an answer for that yet as it too is random, but I'll keep digging.

        Comment


          #5
          And, speak of the devil it just happened. 9:57am to be exact. Checked xorg.0.log as suggested by oshunluvr and this is the only thing at the end of that log, and the .old log as well:

          [ 46.185] (II) RADEON(0): EDID vendor "HWP", prod id 10300
          [ 46.185] (II) RADEON(0): Using EDID range info for horizontal sync
          [ 46.185] (II) RADEON(0): Using EDID range info for vertical refresh
          [ 46.185] (II) RADEON(0): Printing DDC gathered Modelines:
          [ 46.185] (II) RADEON(0): Modeline "1920x1080"x0.0 148.50 1920 2008 2052 2200 1080 1084 1089 1125 +hsync +vsync (67.5 kHz eP)
          [ 46.185] (II) RADEON(0): Modeline "800x600"x0.0 40.00 800 840 968 1056 600 601 605 628 +hsync +vsync (37.9 kHz e)
          [ 46.185] (II) RADEON(0): Modeline "640x480"x0.0 25.18 640 656 752 800 480 490 492 525 -hsync -vsync (31.5 kHz e)
          [ 46.185] (II) RADEON(0): Modeline "720x400"x0.0 28.32 720 738 846 900 400 412 414 449 -hsync +vsync (31.5 kHz e)
          [ 46.185] (II) RADEON(0): Modeline "1024x768"x0.0 65.00 1024 1048 1184 1344 768 771 777 806 -hsync -vsync (48.4 kHz e)
          [ 46.185] (II) RADEON(0): Modeline "1280x720"x60.0 74.48 1280 1336 1472 1664 720 721 724 746 -hsync +vsync (44.8 kHz e)
          [ 46.185] (II) RADEON(0): Modeline "1280x960"x0.0 108.00 1280 1376 1488 1800 960 961 964 1000 +hsync +vsync (60.0 kHz e)
          [ 46.185] (II) RADEON(0): Modeline "1280x1024"x0.0 108.00 1280 1328 1440 1688 1024 1025 1028 1066 +hsync +vsync (64.0 kHz e)
          [ 46.185] (II) RADEON(0): Modeline "1440x900"x0.0 106.50 1440 1520 1672 1904 900 903 909 934 -hsync +vsync (55.9 kHz e)
          [ 46.185] (II) RADEON(0): Modeline "1600x1200"x0.0 162.00 1600 1664 1856 2160 1200 1201 1204 1250 +hsync +vsync (75.0 kHz e)
          [ 46.185] (II) RADEON(0): Modeline "1680x1050"x0.0 146.25 1680 1784 1960 2240 1050 1053 1059 1089 -hsync +vsync (65.3 kHz e)
          [ 46.185] (II) RADEON(0): Modeline "1920x1080"x60.0 172.80 1920 2040 2248 2576 1080 1081 1084 1118 -hsync +vsync (67.1 kHz e)
          [ 63.018] (II) RADEON(0): EDID vendor "HWP", prod id 10300
          [ 63.018] (II) RADEON(0): Using hsync ranges from config file
          [ 63.018] (II) RADEON(0): Using vrefresh ranges from config file
          [ 63.018] (II) RADEON(0): Printing DDC gathered Modelines:
          [ 63.018] (II) RADEON(0): Modeline "1920x1080"x0.0 148.50 1920 2008 2052 2200 1080 1084 1089 1125 +hsync +vsync (67.5 kHz eP)
          [ 63.018] (II) RADEON(0): Modeline "800x600"x0.0 40.00 800 840 968 1056 600 601 605 628 +hsync +vsync (37.9 kHz e)
          [ 63.018] (II) RADEON(0): Modeline "640x480"x0.0 25.18 640 656 752 800 480 490 492 525 -hsync -vsync (31.5 kHz e)
          [ 63.018] (II) RADEON(0): Modeline "720x400"x0.0 28.32 720 738 846 900 400 412 414 449 -hsync +vsync (31.5 kHz e)
          [ 63.018] (II) RADEON(0): Modeline "1024x768"x0.0 65.00 1024 1048 1184 1344 768 771 777 806 -hsync -vsync (48.4 kHz e)
          [ 63.018] (II) RADEON(0): Modeline "1280x720"x60.0 74.48 1280 1336 1472 1664 720 721 724 746 -hsync +vsync (44.8 kHz e)
          [ 63.018] (II) RADEON(0): Modeline "1280x960"x0.0 108.00 1280 1376 1488 1800 960 961 964 1000 +hsync +vsync (60.0 kHz e)
          [ 63.018] (II) RADEON(0): Modeline "1280x1024"x0.0 108.00 1280 1328 1440 1688 1024 1025 1028 1066 +hsync +vsync (64.0 kHz e)
          [ 63.018] (II) RADEON(0): Modeline "1440x900"x0.0 106.50 1440 1520 1672 1904 900 903 909 934 -hsync +vsync (55.9 kHz e)
          [ 63.018] (II) RADEON(0): Modeline "1600x1200"x0.0 162.00 1600 1664 1856 2160 1200 1201 1204 1250 +hsync +vsync (75.0 kHz e)
          [ 63.018] (II) RADEON(0): Modeline "1680x1050"x0.0 146.25 1680 1784 1960 2240 1050 1053 1059 1089 -hsync +vsync (65.3 kHz e)
          [ 63.018] (II) RADEON(0): Modeline "1920x1080"x60.0 172.80 1920 2040 2248 2576 1080 1081 1084 1118 -hsync +vsync (67.1 kHz e)

          Oddly, there's nothing in dmesg, and there's literally no messages log (why am I missing that!?). Syslog has nothing. It goes from 40 minutes or so before the lockup straight over to the bootup sequence. I guess since the logs are coming up with nothing, it's time to dive in and see if ram is an issue. Oddly, I did a single pass not long back and had no issues. But a full scan several times over may not hurt. I'll see what that gives me.

          Comment


            #6
            Look at the previous version: syslog.1. etc. The reboot might have cycled logs.

            I would start by simply re-seating all the removable hardware - including cables.

            Another possibility is your power supply is under-powered, weak with age, or failing.

            Please Read Me

            Comment


              #7
              Well, I went in and dusted out the machine, removed the ram and video card, dusted them off really well (there was some black mold looking micro dust on the cards from the CPU) and got all the junk off them, then blew the slots out REALLY good, then I used the eraser trick that GrayGeek mentioned and finally put everything back. However, I took the ram stick closest to the CPU that looked like a mold monster barfed all over it (it doesn't now, but it did then) and put it in the 3rd slot and moved the #3 stick up to the #1 slot to see if that helps. Oddly enough, it hasn't locked up since (which doesn't say much as sometimes it can take a day or two to happen, after which there's several in a row, even in the same day) and the odd stuttering problem I was noticing in the machine went away too. Also, my old Microsoft mouse bought the farm today, so now I need to find a good replacement. I've got an old laptop mouse doing the job for now, but that one is too small and I need to find a bigger one that works better. But so far, so good.

              Comment


                #8
                A thorough cleaning can have significant consequences to system performance, noticeably, lower heat, and heat as we know, is a killer to electronics.

                Swapping out the #1 RAM chip for any of the others 'may' be significant, IF the problem was associated with RAM. A bad (or going bad) #1 RAM chip can cause all sorts of issues. You'll have to monitor the systems behavior for a while to determine if what you've done solved your problem. Of course, you won't really know which of the actions you took was the solution, but....
                Windows no longer obstructs my view.
                Using Kubuntu Linux since March 23, 2007.
                "It is a capital mistake to theorize before one has data." - Sherlock Holmes

                Comment


                  #9
                  Sometimes a good ole' fashioned house-cleanin' is all that's needed

                  Please Read Me

                  Comment


                    #10
                    Re. the mouse: If your tastes aren't exotic in mouses, I can recommend Technet models like this one. We have three. They're easy on batteries and cheap.

                    Please Read Me

                    Comment


                      #11
                      Originally posted by oshunluvr View Post
                      Sometimes a good ole' fashioned house-cleanin' is all that's needed
                      Back in the day, when most components were setting in sockets and on boards which plugged into slots the first step in any repair was to clean and reseat all components and cards. I fixed more PCs with a pencil eraser than I can remember.

                      Now, with laptops and soldered in components, the only things that need cleaning are cooling fans & radiators, memory sticks and mouse feet. And, the occasional silver paste to restore heat conductivity between the CPU and the thermal radiator.
                      "A nation that is afraid to let its people judge the truth and falsehood in an open market is a nation that is afraid of its people.”
                      – John F. Kennedy, February 26, 1962.

                      Comment


                        #12
                        Originally posted by oshunluvr View Post
                        Re. the mouse: If your tastes aren't exotic in mouses, I can recommend Technet models like this one. We have three. They're easy on batteries and cheap.
                        Interesting. How does that compare size wise with this one: https://www.walmart.com/ip/Logitech-...Mouse/16207314

                        I purchased two of these, one for my laptop and one as a sort of improvised remote control for my desktop that doubles as my television (yes, my monitor is THAT big) that I sit down and watch from my easy chair. Right now it's filling the roll of emergency replacement for my other mouse. But if the one you suggested is bigger, I may just go with that one instead and put this one in the supply bin for later, possibly emergency use.

                        Comment


                          #13
                          Logitech M185 2.36 x 3.89 x 1.54 inches @ 2.65 ounces
                          Logitech M705 2.6 x 4.3 x 1.6 inches @ 4.8 ounces
                          Tecknet "Classic" 2.61 x 4.23 x 1.63 @ 2.65 ounces

                          As you can see, the Tecknet is larger than the M185 you linked to and virtually identical in size (and design) to the Logitech M705 at less than half the price. The battery life is very long because the mouse goes to sleep after a short period of inactivity. You wake it up with a click. I am now in the habit of just clicking once when I return to my desk, so I don't really even think about it being asleep. I bought my first Tecknet after my last Logitech MX died (that I had paid $50 for) after less than a year of use. For $50, you could buy 5 Tecknet mice and have enough change for Taco Bell.

                          Please Read Me

                          Comment


                            #14
                            Hmm, interesting. I probably wouldn't want the sleep feature like that on all the time, but that's just me. As for the system itself, we're now 8 days in from the cleaning and everything is nice and stable, and the system is having no issues, save for one. The stupid desktop icons keep resetting every time I reboot. So I can organize them all day long, and when I start up again, they're back to a single disorganized row of icons no matter what I do. I've had this happen in the past where KDE keeps forgetting where the icons are positioned and I can't remember what the fix was for it anymore. Usually I just dealt with it until my next reinstall where the issue was magically fixed. Usually. Sometimes it took a few versions of KDE before everything was resolved. So I'm kinda scratching my head on why that's happening.

                            Comment


                              #15
                              Well, I finally found my answer in regards to the icon positioning problems. It's a known regression in KDE5.x desktop. However, the only answer the developers will give in regards to this is, "F-you, deal with it. We're not fixing it." So I guess I'm stuck with this issue regardless of what I, or thousands of other KDE users would like to see. Sheesh, with Firefox now and countless other app developers all going the same route, I'm starting to wonder what the future holds for Linux. I mean, if they keep this up everyone will leave them, and then what will they do? It's honestly sad.

                              Comment

                              Working...
                              X