Announcement

Collapse
No announcement yet.

Faulty RAM and Intel NUC boot issues

Collapse
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Faulty RAM and Intel NUC boot issues

    I have two Intel NUCs that I use as servers. One of them (thankfully a backup server, not really mission critical) was working fine one day and refused to boot the next. When I say "refused to boot" I mean it wouldn't even boot into the EFI, no output at all via HDMI.

    I tried various things including upgrading the firmware using the recovery update function, but nothing seemed to make a difference.

    I'm not sure exactly what the problem was, but I now think it may have be faulty RAM. The NUC has a single 8BG Corsair unit inside. If I removed it completely, the NUC would blink the power light on boot (apparently a sign of no RAM or incompatible RAM). No flashing light with the stick installed though.

    Since the original NUC, Intel have sent me two replacement NUCs. The first had the same problem as the original, but the second (which arrived today) will boot into the EFI (woop!). However, if I try to load a live USB, it stops at the "loading ramdisk" or "loading initrd" stage. This is what makes me think it's a RAM problem (I've checked other live USB sticks I know work, and they have the same problem, so it's not that).

    So, a plausible theory is that the RAM died one day, and is only partially working now. Has anyone seen something like this before? Any ideas what I could try?

    I don't want to order new RAM and find it wasn't a RAM problem after all. I have some RAM from old laptops, but I'm not sure it's compatible. I know the original Corsair unit is compatible, because it worked for months before I had these problems, and I bought it specifically for the NUC. I'll try the other RAM in the meantime, but I thought I'd throw this out there first because it's mildly interesting, and some of you guys are bound to have had RAM problems before!
    samhobbs.co.uk

    #2
    you say you have 2 of these NUC things ,,,,,,,,,swap the RAM and see what happens .

    or you can not take the other NUC down ?

    VINNY
    i7 4core HT 8MB L3 2.9GHz
    16GB RAM
    Nvidia GTX 860M 4GB RAM 1152 cuda cores

    Comment


      #3
      Thanks Vinny,

      The other NUC is running my website/email server/XMPP server etc. so I'm reluctant to take it down. They're also different models (the backup has an older Celeron processor whereas the main server has an i5 haswell. Not sure if the RAM is necessarily compatible (probably is, but still...).

      It might come to that though, the RAM cost me £60 at the time (right at the start of 2015) so I'm definitely going to ask for my money back if it turns out to be the culprit. I'll need to know for sure before I ask for a refund though!
      samhobbs.co.uk

      Comment


        #4
        Seems odd to me that old (not brand new) RAM just died. IME, I've only ever seen RAM die in a day or two or never - except for a physical cause (water, over-heating, over voltage). Most of the RAM from my old machines was still good the day I dumped the box into the waste bin. I usually ended up donating it to some hobbyist who wanted to upgrade an older machine. I have had RAM the stopped working, but re-seating it solved it. Once I thought I had RAM going bad but it turned out to be the mobo. Tough to trouble-shoot with a single stick though.

        Corsair has good customer service. You might want to call them and ask for a swap. Since you now have three NUC's rejecting the RAM it's plausible that it failed.

        Please Read Me

        Comment


          #5
          Originally posted by oshunluvr View Post
          Seems odd to me that old (not brand new) RAM just died. IME, I've only ever seen RAM die in a day or two or never - except for a physical cause (water, over-heating, over voltage).
          I thought it was strange too, I've never even heard of someone with faulty RAM before but it must happen to someone!

          Corsair has good customer service. You might want to call them and ask for a swap. Since you now have three NUC's rejecting the RAM it's plausible that it failed.
          Thing is, would you expect the NUC to boot at all with faulty RAM? Is there such a thing as partly-faulty RAM, e.g. enough of it works to boot the EFI but when it comes to unpacking the initrd it fails? The EFI won't load at all with no RAM.

          Thanks for your help, you were actually one of the people I was thinking may have seen something like this before.
          samhobbs.co.uk

          Comment


            #6
            Thing is, would you expect the NUC to boot at all with faulty RAM? Is there such a thing as partly-faulty RAM, e.g. enough of it works to boot the EFI but when it comes to unpacking the initrd it fails? The EFI won't load at all with no RAM.
            With faulty RAM, some things may work, some may not -- and things may work until they don't. Meaning that after awhile--maybe heat build-up to cause a crack in the circuit board to open wider?--your system may go from OK to crazy. I had it happen only once, on a PC, XP, running a simulation program. I'd get to work OK for about 10 minutes, then WHAM!, I got screen tearing, then freezing. I had a PC-build guy I know diagnose it, took him hours, and then he thought to inspect (with magnifier) the RAM cards to see the very, very small cracked sub-chip area on one of the cards.
            An intellectual says a simple thing in a hard way. An artist says a hard thing in a simple way. Charles Bukowski

            Comment


              #7
              It is possible that a portion of the RAM is bad - especially with such a large stick - and you're not always addressing the bad portion. Also, "failure" may not mean total failure or complete failure. It's theoretically possible that it's a fleeting problem with the on-board controller rather than the storage chips themselves.

              If it were me: I'd first take down the other, working NUC and swap the RAM over (assuming of course the RAM is compatible with both units given their differences). This should give a solid result one way or the other: If the working NUC works with the suspect RAM, then it's likely not the RAM. If the suspect NUC won't boot with the working RAM, then it's the NUC.

              I'd research Intel's RAM compatibility list and verify the suspect RAM is on the list. I'd also clear the BIOS settings on the suspect NUC and try and boot again - with as few peripherals attached as possible. If possible, boot to memtest and let it run.

              I don't know enough about the construction of those boxes and EFI, but it seems likely that NO RAM causes the boot process to halt, regardless of other conditions, but that doesn't necessarily mean EFI is using the system RAM for itself, rather it has enough room in it's own flash RAM onboard. I could be totally wrong about that though.

              I could be as simple as a timing issue with that particular RAM and the Intel CPU/mobo. If your BIOS allows it, you could try changing the timings (I'd try slower first) to see if it stabilizes. If the RAM isn't getting enough voltage, it may not be able to run at full speed.

              If you post the exact models of the RAM and the NUC I might be able to offer more specifics...

              Please Read Me

              Comment


                #8
                I have also assumed you've done due diligence with regards to physical inspection, blowing out of dust, etc...

                Please Read Me

                Comment


                  #9
                  Originally posted by Qqmike View Post
                  With faulty RAM, some things may work, some may not -- and things may work until they don't. Meaning that after awhile--maybe heat build-up to cause a crack in the circuit board to open wider?--your system may go from OK to crazy. I had it happen only once, on a PC, XP, running a simulation program. I'd get to work OK for about 10 minutes, then WHAM!, I got screen tearing, then freezing. I had a PC-build guy I know diagnose it, took him hours, and then he thought to inspect (with magnifier) the RAM cards to see the very, very small cracked sub-chip area on one of the cards.
                  I had a similar problem once I detected by pressing down lightly on various parts of the mobo. Ordered a replacement the next day.

                  The other thing I've seen that causes hair pulling and tears is an over-worked (under powered) power supply. If the room warms up, the PSU gets less efficient and things start to go haywire as the voltages drop. Then a shutdown an the associated cooling cause it to boot up normally again. Can drive you to madness to figure out...

                  Please Read Me

                  Comment


                    #10
                    OK so I took the plunge and swapped the RAM from the main NUC. That definitely seems to have been the problem, the Ubuntu server ISO loaded fine (and memtest was also fine with the RAM from the working NUC, as you'd expect).

                    I'm now in the process of creating an openSUSE USB stick, I think running an openSUSE server as my backup will be a good way to get my head around some of the packaging tools since they're also used on Sailfish. The real test will be if I can install without issues.

                    Hold my ankles, I'm going in!

                    Thanks for the good advice both of you

                    If the openSUSE installation is successful, I may order some new RAM of the same type that's in my main NUC so I don't have to wait for the warranty replacement. When the replacement arrives I'll take the new stuff out of the backup NUC and put it in the second slot of the main NUC. Buying the same again will ensure it's compatible.
                    samhobbs.co.uk

                    Comment


                      #11
                      Slightly OT, but one of the downsides of running your own DNS server is that if you take the machine offline you have to remember to specify alternative DNS manually in plasma-nm. Maybe I should hand the IPs of opendns servers out as well as my LAN box, but I only want them to be used if my box is unavailable.
                      samhobbs.co.uk

                      Comment


                        #12
                        Confirmed: everything works as expected with the other RAM, installation was successful.

                        Unfortunately, the faulty RAM doesn't have a sticker on it. I can't remember if it came with one, but my working RAM has a "warranty void if removed" sticker on it, so fingers crossed I can get a replacement.
                        samhobbs.co.uk

                        Comment


                          #13
                          I've had no problems with Corsair. If you can come up with proof of purchase, I doubt they'll hesitate to replace it.

                          Please Read Me

                          Comment


                            #14
                            I've had RAM sticks fail. Two I can remember were the same, came in the same box, failed a few years apart, the first a few years after use every day. Two computers refused to start with a failed stick in any slot.
                            Originally posted by oshunluvr View Post
                            The other thing I've seen that causes hair pulling and tears is an over-worked (under powered) power supply. If the room warms up, the PSU gets less efficient and things start to go haywire as the voltages drop. Then a shutdown an the associated cooling cause it to boot up normally again.
                            I've had two failing power supplies do the opposite; ok while they're kept warm, but leave them off for a few days and getting them to start and stay up again was a mission. Eventually, they'd warm up enough to keep going. The second time I got this behaviour, of course, I didn't muck around.
                            Regards, John Little

                            Comment


                              #15
                              I have an 8 GB Toshiba stick on which I had put Kubuntu 9.04 in January of that year. A couple days ago I needed to create a LiveUSB of Kubuntu 14.02.2. Expecting to see it occupied I was surprised to see it empty. It held 14.04.2 Ok, but I don't know for how long.
                              "A nation that is afraid to let its people judge the truth and falsehood in an open market is a nation that is afraid of its people.”
                              – John F. Kennedy, February 26, 1962.

                              Comment

                              Working...
                              X