Announcement

Collapse
No announcement yet.

"A reboot would cost us MILLIONS for each instance!"

Collapse
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    "A reboot would cost us MILLIONS for each instance!"

    Why Google uses "Goobuntu" (Ubuntu's 12.04.1 release).

    http://www.zdnet.com/the-truth-about-goobuntu-googles-in-house-desktop-ubuntu-linux-7000003462/


    Googlers must ask to use Windows because “Windows is harder because it has 'special' security problems so it requires high-level permission before someone can use it.” In addition, “Windows tools tend to be heavy and inflexible.”


    That said, Bushnell was asked why Ubuntu instead of say Fedora or openSUSE? He replied, “We chose Debian because packages and apt [Debian's basic software package programs] are light-years ahead of RPM ....” And, why Ubuntu over the other Debian-based Linux distributions? “Because it's release cadence is awesome and Canonical [Ubuntu's parent company] offers good support.”
    ...
    To manage all these Goobuntu desktops, Google uses apt and Puppet desktop administration tools. This gives the Google desktop management team the power to quickly control and manage their PCs. That's important because, “A single reboot can cost us a million dollars per instance.”

    That said, desktop problems , even on Linux, will happen. As Bushnell said “Hope is not a strategy. Most people hope that things won't fail. Hoping computers won't fail is bad You will die someday. Your PC will crash someday. You have to design for failure.”


    This is where Goobuntu's 'special sauce' appears. On Google's desktops, “Active monitoring is absolutely critical. At Google we have challenging demands, we're always pushing workstations to their limits, and we work with rapidly moving development cycles.”


    On top of this, Google has very strict security requirements. As Bushnell observes, “Google is a target Everyone wants to hack us.” So some programs that are part of the Ubuntu distribution are banned as potential security risks. These include any program “that calls home” to an outside server. On top of that Google uses its own proprietary in-house user PC network authentication that Bushnell says is “pushing the state of the art in network authentication, because we're such a high profile security target.”


    Put it all together: the need for top-of-the-line security, high-end PC performance, and the flexibility to meet the desktop needs of both genius developers and newly-hired sales representatives, and it's no wonder that Google uses Ubuntu for its desktop operating system of choice. To quote, Bushnell, “You'd be a fool to use anything but Linux.
    "A nation that is afraid to let its people judge the truth and falsehood in an open market is a nation that is afraid of its people.”
    – John F. Kennedy, February 26, 1962.

    #2
    Interesting story. But without some data to support the million-dollar reboot, I'm inclined to view that as pure hyperbole. My laptop reboots in 15 seconds. If a 15 second delay truly causes a million-dollar loss, that implies each powered-on hour generates 240 million dollars of revenue. Sorry, I call BS.

    Comment


      #3
      IIRC, I think he explains the reboot cost in this video and puts it at some number of minutes times X thousand employees: http://www.youtube.com/watch?v=Ym5UeQHxPiU

      Comment


        #4
        That article reads as if it was dashed off pretty quickly, without any review from its subject. It may not factually inaccurate but if it was on Wikipedia it would get a lot of edits or questions.[citation required]

        Steven J Vaughan-Nichols is capable of better-researched and more carefully written articles, but I suppose he's got to make a living.
        I'd rather be locked out than locked in.

        Comment


          #5
          Of course, since employees don't work 24/7, even at Google, it is possible to schedule the reboot in off hours so that when they return to work, regardless of their shift, they have been upgraded to the latest version. That would be far less than $1M I would imagine. That said, I really don't have a problem with a business choosing to use the LTS and only upgrading every so many years.

          Comment


            #6
            Originally posted by ronw View Post
            IIRC, I think he explains the reboot cost in this video and puts it at some number of minutes times X thousand employees: http://www.youtube.com/watch?v=Ym5UeQHxPiU
            DYRC when in the 56m58s video he says this?
            I'd rather be locked out than locked in.

            Comment


              #7
              No. It's been a few weeks since I watched it.

              Comment


                #8
                What matters is the number of servers and the ratio of income per server. In 2010 Google said they had about 680,000 servers and one source using satellite photos estimates that as of January of 2012 they had 1.8 Million and by early 2013 they'll have 2.3 M servers.

                Each of the data centers in those photos have their own backup/alternate power supplies/generation and probably could operate independently of the public power grid indefinitely. So, baring some unforeseen disaster that could take out an entire server farm at one of those locations (airplane crash, terrorist attack, meteor strike?), the only thing going down would be a particular blade with mechanical failure or a server on a blade on which the OS has malfunctioned, become infected, or has slowed down or stopped under load.

                The last thing Google would want to have to do is to reboot one or more stack of blades (1,600 servers) every 24 hours, or even once a week, simply to restore its capacity to function properly, or to remove infections. But, if Google had to reboot 1% of its server farm every night it has lost 1% of its revenue generating opportunities.

                When one has that many servers to run the next thing to consider is the cost of licensing the OS for those servers, AND for the workstations the employees run. Paying a license fee on that many servers, even with a "volume" contract, would run into the millions per year, not including AV and support costs.

                From my POV, the biggest cost on such massive farms isn't mechanical or licensing fees, it is the costs involved in maintaining security against malware attacks, and limiting the damage from successful attacks, and recovering ASAP from such attacks. Where I retired from we had 6 MSCEs and 50 Netware servers and 500 WinXX workstations (except for a few Linux servers and the Linux workstations I and several other developers were using). Malware became such a problem that they switched their Internet portal from a Windows box to a Linux box on which they ran $28K of AV software. Every byte from the Internet went through that box. After it was installed our Internet speed (bandwidth) actually increased, and the number of infections from the Internet dropped to zero. Then, when some infections re-appeared, they found out that employees were bringing infected floppies, cdroms and USB sticks to work to bring their favorite music and games! When they disabled the floppies and cdroms on workstations for all but the developers and IT staff the malware infections ceased. In my last year of work there was only one infection, IIRC.

                I remember about a decade ago when Microsoft was secretly using about 15,000 Linux DNS servers via Akamai's to help deploy apps and updates ("Akamai" means smart, clever, wise), and they were discovered doing so as early as 2001. I did that Netcraft search myself and repeated it frequently over the next few years. Sometime around 2004, IIRC, Microsoft quit using Linux.

                The generally accepted reasons were:
                1) Linux can carry a greater load than WinXX,
                2) it is less susceptible to malware, and
                3) it wasn't as prone to crashing or slowing down as WinXX was.

                I heard that at work, since they replaced the XP and Win2008 servers with Win7 that they now only reboot the servers on the weekends.
                "A nation that is afraid to let its people judge the truth and falsehood in an open market is a nation that is afraid of its people.”
                – John F. Kennedy, February 26, 1962.

                Comment


                  #9
                  Thanks GG - interesting analysis
                  I'd rather be locked out than locked in.

                  Comment


                    #10
                    Originally posted by GreyGeek View Post
                    What matters is the number of servers and the ratio of income per server.
                    The article was about Google's desktop of choice for employees. But as far as servers go, yes, rebooting those can indeed be costly.

                    Originally posted by GreyGeek View Post
                    infected floppies, cdroms and USB sticks
                    The one time in my life when I had a computer infected (unintentionally, that is) was when I loaned a laptop to someone. Bad Steve. Bad, bad Steve.

                    Originally posted by GreyGeek View Post
                    When they disabled the floppies and cdroms on workstations for all but the developers and IT staff
                    LOL. IT staff always gets to create exceptions for itself. I remember when I was doing firewall admin stuff once. Rule #1: if user=me, allow all

                    Originally posted by GreyGeek View Post
                    I remember about a decade ago when Microsoft was secretly using about 15,000 Linux DNS servers via Akamai to help deploy apps and updates...and they were discovered doing so as early as 2001... Sometime around 2004, IIRC, Microsoft quit using Linux.
                    There are many legitmate reasons to use a content distribution network; hiding typically isn't (but it does make for salacious reportage). CDNs allow software and multimedia distributors to push content out closer to users, which relieves pressure on Internet backhaul and backbone links and thus improves performance for everyone. Of the four or so decent CDNs on the planet, Akamai is far and away the largest and most distributed, with almost 2000 points of presence around the world. Microsoft uses Akamai for Windows Update, which is a completely reasonable business decision. The fact that most of Akamai's servers run Linux is, IMHO, orthogonal to the whole Windows vs. Linux debate.

                    Comment


                      #11
                      Originally posted by SteveRiley View Post
                      ....
                      The one time in my life when I had a computer infected (unintentionally, that is) was when I loaned a laptop to someone. Bad Steve. Bad, bad Steve.
                      Bad boy! Bad boy! (swatting with newspaper!). Go to your room without supper!

                      LOL. IT staff always gets to create exceptions for itself. I remember when I was doing firewall admin stuff once. Rule #1: if user=me, allow all
                      Ah, a perk of the trade! Let the bosses see if any development work gets done if "we" don't have access to CDROMs, USB sticks and the Internet. :cool:

                      ....The fact that most of Akamai's servers run Linux is, IMHO, orthogonal to the whole Windows vs. Linux debate.
                      That could be, but it could also be that using Linux servers was economically sound for both of them, especially when they don't have to be as concerned for malware or overloads or failures.

                      At the time there was an uptimes war raging. Windows fans were claiming that Win95 uptimes were just as long as Linux uptimes ..... until Microsoft announced the clock bug, which automatically rebooted any Windows box that had been up for 49.7 days. (At the time I had a SuSE server in my office which had been up for 400+ days.) 400+ days are good for servers which must serve 24/7/365 but not so important for workstations and such which are probably turned off every night anyway. So, perhaps, the reason why Microsoft was using Linux at the time was because it didn't automatically reboot itself every 50 days or at other inconvenient times.
                      "A nation that is afraid to let its people judge the truth and falsehood in an open market is a nation that is afraid of its people.”
                      – John F. Kennedy, February 26, 1962.

                      Comment


                        #12
                        Originally posted by ronw View Post
                        IIRC, I think he explains the reboot cost in this video and puts it at some number of minutes times X thousand employees: http://www.youtube.com/watch?v=Ym5UeQHxPiU
                        At 3m18s:
                        A reboot costs a million dollars. Because that's tens of thousands of workstations which pull an engineer off for 15 minutes. So that's a million dollars.

                        Logging out is only half that.[laugh]
                        Sounds like they've implemented Goobuntu as a mainframe ...? Why 15 minutes lost productivity for a reboot and why are >10,000 users affected by one reboot? He's trying to emphasise that their considerations are very different from the consumer desktop, but I don't think the figures he's actually stated hang together.
                        I'd rather be locked out than locked in.

                        Comment


                          #13
                          Originally posted by GreyGeek View Post
                          So, perhaps, the reason why Microsoft was using Linux at the time was because it didn't automatically reboot itself every 50 days or at other inconvenient times.
                          Microsoft wasn't "using Linux." It's more accurate to say Microsoft was (and still is) relying a third party to reduce the costs associated with distributing large amounts of content.

                          I've known instances of Windows servers running for reliably for months, even years, without reboots. Not all patches are required. What prevents Windows from gaining lots of share in massive networks like Akamai is the sheer cost. Akamai has something like 95,000 servers. Can you imagine how much it would cost to buy Windows licenses for all those? It's insane!

                          Originally posted by SecretCode View Post
                          Sounds like they've implemented Goobuntu as a mainframe ...? Why 15 minutes lost productivity for a reboot and why are >10,000 users affected by one reboot? He's trying to emphasise that their considerations are very different from the consumer desktop, but I don't think the figures he's actually stated hang together.
                          And therefore has damaged the credibility for anything else he might say.

                          Comment

                          Working...
                          X