Announcement

Collapse
No announcement yet.

NIS HELP Please....

Collapse
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    #16
    Re: NIS HELP Please....

    ah...
    so it seems your network, your nis and your nfs are working after all...

    alright.
    a couple of things:

    a) kde4 is not for production.
    you should not use kde4 in production.
    or you can't use it and hope you won't have troubles.
    you will.
    i would downgrade all workstations to kde3.5.9.
    or i would at least downgrade one for testing (see how it behaves when all the others freeze).
    and i personally would remove kde all together from the server (but that's me).

    b) network-manager and knetwtorkmanager are useless in your environment.
    especially on your server.
    i would uninstal it and configure the nic in /etc/network/interfaces.
    i'd do this on the server for sure.
    but then i'd do it on the clients too, 'cause you have a static wired net.
    so there's no point in using the net manager.
    plus...
    on my test 8.04 system, the network manager causes the system load to skyrocket
    (apparently randomly) and renders my desktop environment barely useable.

    guess you have some work to do...

    cheers

    ps:
    on a side note, i see your client runs an apache web server on its own.
    i don't know if this is wanted.
    gnu/linux is not windoze

    Comment


      #17
      Re: NIS HELP Please....

      Thanks again...

      a) I only use kdm-kde4 at the moment. I installed kde4 to test it out. The default is to load kde3.5.9 as the desktop env.

      b) I do use DHCP in my classroom but I guess I can configure the interfaces in /etc... However, it the interface gets wonky and loses its IP, it is simple to use knm to repair it. I have never experienced problems with knm before ( of course i realize that means nothing though ). IF it was knm, then wouldn't the workstation not be able to surf? I was able to via CLI even though the GUI was dead?

      And, yes I do need apache running on the clients. I do not allow the students to have "public"-html directories. The server won't hand out anything not in /var/www-ssl so to teach the students html, php,css I need them to be able to have apache running. SO, I configured the local install of apache to use userdir.conf/load and since their home dirs are mounted, all is good. This way they get to learn without me having to monitor what they put out there (CGI proxies etc...) It will only be accessible from within the classroom.

      Still, I see the loads on the server were .99, .76, .36 average. Aren't these high?

      I will remove KNM from one client and see if that changes anything for the next time this happens. Too bad it is not reproducible or consistent, it takes too long for the "next" time.

      Thanks again.

      Comment


        #18
        Re: NIS HELP Please....

        Originally posted by knichel
        I only use kdm-kde4 at the moment. I installed kde4 to test it out. The default is to load kde3.5.9 as the desktop env.
        i see.
        though it doesn't make much sense to mix a bit of kde4 with kde3.
        especially if that bit is from a beta version...

        Originally posted by knichel
        However, it the interface gets wonky and loses its IP...

        no, no, no...
        an interface can't simply lose its ip...
        an interface must not lose its ip.
        and, i assure you, an interface doesn't lose it's ip.
        if there's anything that can cause this behaviour, i fear it's exactly knm.

        Originally posted by knichel
        IF it was knm, then wouldn't the workstation not be able to surf?
        I was able to via CLI even though the GUI was dead?
        knm has nothing to do with networking itself.
        nm and knm are pieces of software that simply manage configurations...

        Originally posted by knichel
        Still, I see the loads on the server were .99, .76, .36 average. Aren't these high?
        yes.
        .36 is the 15min avg load.
        .76 is the 5min avg load.
        .99 is the 1min avg load.
        this seems to say that the load of your server has been going up for a good while
        before you see the actual problem.
        obviously something funny happens.
        however, we still don't know if what you see on the server is the cause or a consequence.

        questions (we should have checked this a long time ago, but...):
        a) how much physical memory have you got on the server?
        b) could you please post the result of:
        Code:
        swapon -s
        on the server (it's not important when you run this command).
        c) how much memory has one of the clients that you've seen "freezing"?
        d) could you post the output of the swapon command for that client, too, please?

        cheers
        gnu/linux is not windoze

        Comment


          #19
          Re: NIS HELP Please....

          i assure you, an interface doesn't lose it's ip.
          if there's anything that can cause this behavious, i fear it's exactly knm.
          The only time the workstation has lost it's IP is when the dhcp server is down for whatever reason. Sorry to suggest it has been losing it. Was just a what if...

          Code:
          swapon -s
          SERVER:
          Filename Type Size Used Priority
          /dev/sda3 partition 3943948 0 -1
          CLIENT:
          Filename Type Size Used Priority
          /dev/sda2 partition 996020 0 -1

          how much physical memory have you got on the server?
          SERVER = 2GB
          CLIENT = 1GB

          Thanks again...

          Comment


            #20
            Re: NIS HELP Please....

            paging seems to be set up alright.
            just wanted to make sure.

            Originally posted by knichel
            The only time the workstation has lost it's IP is when the dhcp server is down for whatever reason...
            still a funny thing to be happening...

            next step i would:
            a) remove nm and knm from the server
            b) replace kdm-kde4 on one of the clients with kdm-kde3
            c) remove nm and knm from another client

            if you don't want to uninstall nm and knm,
            at least make absolutely sure they don't run.

            cheers

            EDIT:
            oh...
            and when the problem shows up the next time,
            recheck server and clients with uptime, ctrl-alt-fx, etc...
            to (try to) make sure it's always the same thing we're seeing
            gnu/linux is not windoze

            Comment


              #21
              Re: NIS HELP Please....

              OK, I changed 2 computers from using DHCP to STATIC and restarted networking, but did not remove nm,knm as of yet (during class time). These 2 computers experienced a problem (seemed to freeze) around 10:50. SO, I logged into them via remote ssh and uptime revealed nothing. They restarted X and could not login. So... I tried to test nis "ypcat passwd" and got errors. I restarted nis and all was fine again.

              I inspected syslog and found this...
              Code:
              Sep 25 10:49:20 ws-12 kernel: [10252.102871] javaldx[7723]: segfault at 00000004 eip b7ddf237 esp bf80435c error 4
              Sep 25 10:49:20 ws-12 kernel: [10252.227780] oosplash.bin[7729]: segfault at 00000004 eip b7e01237 esp bfc2268c error 4
              Sep 25 10:49:51 ws-12 kernel: [10282.828679] javaldx[7740]: segfault at 00000004 eip b7dc3237 esp bfd3b08c error 4
              Sep 25 10:49:51 ws-12 kernel: [10282.929903] oosplash.bin[7746]: segfault at 00000004 eip b7daa237 esp bfeb091c error 4
              Sep 25 10:50:06 ws-12 kernel: [10298.350053] javaldx[7757]: segfault at 00000004 eip b7dfe237 esp bfa905dc error 4
              Sep 25 10:50:06 ws-12 kernel: [10298.450561] oosplash.bin[7763]: segfault at 00000004 eip b7d98237 esp bf9c442c error 4
              Sep 25 10:50:32 ws-12 kernel: [10324.295852] javaldx[7828]: segfault at 00000004 eip b7ded237 esp bfbb370c error 4
              Sep 25 10:50:32 ws-12 kernel: [10324.403314] oosplash.bin[7834]: segfault at 00000004 eip b7e43237 esp bf8f335c error 4
              Sep 25 10:51:46 ws-12 kernel: [10398.006313] javaldx[7884]: segfault at 00000004 eip b7e08237 esp bf8a13ec error 4
              Sep 25 10:51:46 ws-12 kernel: [10398.120946] oosplash.bin[7890]: segfault at 00000004 eip b7d59237 esp bfc98f0c error 4
              Sep 25 10:53:22 ws-12 console-kit-daemon[5412]: WARNING: Unable to activate console: No such device or address
              Sep 25 10:53:22 ws-12 console-kit-daemon[5412]: WARNING: Could not map unix user 1120 to user name
              Sep 25 10:57:14 ws-12 init: tty2 main process ended, respawning
              Sep 25 10:57:49 ws-12 kdm: :0[7939]: Abnormal termination of greeter for display :0, code 1, signal 0
              Sep 25 10:57:49 ws-12 kdm: :0[7939]: Fatal X server IO error: Broken pipe
              Sep 25 11:01:24 ws-12 kernel: [10975.318871] console-kit-dae[5412]: segfault at 00000000 eip b7de1677 esp bfe890f4 error 4
              Did not see same issues on other workstation. Might be coincidence...

              I will look for ypbind errors next time this happens, but I do not think this is what was going on. I am removing nm, knm on 2 machines and on 1 of them I am removing kdm-kde4. Am set to static IP on both.

              Maybe this will reveal something more helpful. Cross your fingers...

              Thanks.

              Comment


                #22
                Re: NIS HELP Please....

                Originally posted by knichel
                OK, I changed 2 computers from using DHCP to STATIC and restarted networking, but did not remove nm,knm
                this is not what i meant.
                sorry...my fault.
                should have stated this more clearly.
                my suggestion was simply to remove (or stop running/using) nm and knm,
                not to change from dhcp to static ip setup.
                you can leave the dhcp config.
                just put it directly in the /etc/network/interfaces file, instead of using nm and knm.
                like so:
                Code:
                auto lo
                iface lo inet loopback
                address 127.0.0.1
                netmask 255.0.0.0
                
                auto eth0
                iface eth0 inet dhcp
                of course, testing static ip config, instead of dhcp, could be yet another test.
                like so:
                Code:
                auto lo
                iface lo inet loopback
                address 127.0.0.1
                netmask 255.0.0.0
                
                auto eth0
                iface eth0 inet static
                address nnn.nnn.nnn.nnn
                netmask nnn.nnn.nnn.nnn

                Originally posted by knichel
                I inspected syslog and found this...

                mate, you have a lot of stuff (too much, if you ask me) seg faulting.
                that's not normal.
                i see open office splash seg faulting.
                also kdm.
                that's not good either.
                and (in)famous console-kit-daemon (see here).
                you have some big issue there.
                possibly issues...

                btw,
                what is your hw?
                and what version(s) of the os did you install?

                gnu/linux is not windoze

                Comment


                  #23
                  Re: NIS HELP Please....

                  Originally posted by knichel
                  I only use kdm-kde4 at the moment.
                  I installed kde4 to test it out.
                  The default is to load kde3.5.9 as the desktop env.
                  does this mean you have kde4 installed alongside kde3?

                  'cause if this is the case, that's another possible source of troubles.
                  even if you run kde3 and kde4 exclusively.
                  the simple fact of having the 2 installed alongside each other could result in total mess.
                  i expect some kde4 files may easily overwrite older kde3 files.

                  let me know.
                  gnu/linux is not windoze

                  Comment


                    #24
                    Re: NIS HELP Please....

                    Switching from DHCP to STATIC was just a timing thing... I was planning it anyway.

                    The syslog info I pasted was from a single machine and I expect was an isolated incident. I did not observer that on any other computer. I read that bug report (or at least tried to follow the dribble...) it seems that there are conflicting views on the spawning of 60+ console-kit processes and whether of not it is a bug. Far beit from me to comment on such things beyond my skill.

                    So, around 11:50 the computers hung again. All efforts to correct yielded the same results as in the past. The only solution was to restart the server. The load on the server reached 1.1 1.0 .86 when I restarted.

                    Is there a way to figure out what/when is causing the load to skyrocket? I assume that that script you suggested earlier will take care of the when, but the what concerns me.

                    Originally I suspected NIS or NFS to be the cause. Since (when the workstation appears to freeze) users can ctrl+alt+F1 and log in and view / edit files in their mounted home dirs, neither of them can be problematic. That also seems to suggest to me that kde might be having issues. However, I have one computer that I did not upgrade to 8.041 nor kde4. Running 7.10 and kde 3.5.? Same issues with this computer, so it would seem to rule out kde/kdm issues. Which (as u originally suggested points to the server) supports the server being the problem.

                    Now, with all that said, while I was writing this, it happened again. Loads hit the ceiling and GUI on workstations was dead.

                    Server syslog showed this at the time of death...
                    Code:
                    Sep 25 14:21:15 ws-11 kernel: [22945.753185] lockd: server 192.168.6.200 not responding, still trying
                    After restart the workstations resumed business as usual.

                    I Really need to find a way to determine *what* is causing the load on the server to go out of control.

                    Comment


                      #25
                      Re: NIS HELP Please....

                      Originally posted by knichel
                      Is there a way to figure out what/when is causing the load to skyrocket? I assume that that script you suggested earlier will take care of the when, but the what concerns me.
                      the only thing you can do is to log the avg loads in a file in /var/log every 5 secs or so.
                      then cross check your loads log with the other logs and see if you find anything.

                      Originally posted by knichel
                      Server syslog showed this at the time of death...
                      Code:
                      Sep 25 14:21:15 ws-11 kernel: [22945.753185] lockd: server 192.168.6.200 not responding, still trying
                      ttbomk, lockd is nfs.
                      but this seems to suggest ws-11 can't reach 192.168.6.200 via the net.

                      and i also see you use NAT, right?
                      i didn't know this.
                      is your NAT config alright?
                      is your net config really alright?
                      are the routing tables ok?
                      is anyone stealing your ips?

                      I Really need to find a way to determine *what* is causing the load on the server to go out of control.
                      start removing all the icing in your setup.
                      remove all the pieces of software that aren't strictly necessary.

                      it's a server issue you have.
                      and it's (some sort of) a network issue.
                      the workstations "freeze" because the home dir is nfs mounted and .kde is in the user's home dir.
                      this is now pretty obvious to me.

                      your problem is on the server.
                      in the net config of your server most probably.
                      make absolutely sure that the server is configured properly.
                      don't take things for granted.
                      don't assume things work.
                      check that they be working.
                      don't rule out the obvious.

                      use a monitoring utility (sar, maybe?) to log everything that goes on on your server
                      if you still can't find anything.
                      gnu/linux is not windoze

                      Comment


                        #26
                        Re: NIS HELP Please....

                        the only thing you can do is to log the avg loads in a file in /var/log every 5 secs or so.
                        then cross check your loads log with the other logs and see if you find anything.
                        I have done this and there is nothing in syslog that looks out of place. I have also been told that load of 1-2 are not unusual and are not high loads. The .99 does not mean 99% CPU utilization.

                        and i also see you use NAT, right?
                        i didn't know this.
                        is your NAT config alright?
                        Yes, I am using NAT and ttbomk it is OK. I don't know how to check / test it, but everything is working as expected in regards to networking.

                        is your net config really alright?
                        I believe so. Remember, when the GUI freezes, I can still log into tty2-6 and do everything from CLI like nano a document or links google.com etc... So NIS and NFS must be operational, Right?

                        are the routing tables ok?
                        Yes, the routing tables are fine. When the workstations freeze, my laptop (on same network, but no NIS or NFS in play) is unaffected.

                        is anyone stealing your ips?
                        No, I am only using DHCP on my inside interface (my LAN) Besides, via tty2, the user can surf the web.

                        the workstations "freeze" because the home dir is nfs mounted and .kde is in the user's home dir.
                        Not true, when the workstations freeze, I log in remotely and check the /home (nfs mounted) and everything is still there. I tested by creating a folder and a document in that folder and checked on the server and both were there. So, nfs appears to be working.

                        On all of the workstations, when they freeze, I can restart X and login using the local accounts created at install. However, I realize that this doesn't test everything. I will (on Monday) copy one of my students' home dirs from server to a workstation and remove it (rename it or something) from the server. The user will still log in via NIS, but nfs will not come into play I will disable it in fstab. This should tell me if the problem is nfs or not.

                        I am using nfs-kernel-server (in case it matters).

                        Thanks for your support and patience. Today was not good (2 restarts of the server required).

                        Comment


                          #27
                          Re: NIS HELP Please....

                          Originally posted by knichel
                          I have also been told that load of 1-2 are not unusual and are not high loads.
                          The .99 does not mean 99% CPU utilization.
                          no, indeed it doesn't.
                          to learn what the avg load means, it is sufficient to read the man pages...
                          ...
                          System load averages is the average number of processes that are either in a runnable
                          or uninterruptable state. A process in a runnable state is either using the CPU or waiting
                          to use the CPU.

                          A process in uninterruptable state is waiting for some I/O access, eg waiting for disk.
                          The averages are taken over the three time intervals.

                          Load averages are not normalized for the number of CPUs in a system, so...
                          a load average of 1 means a single CPU system is loaded all the time
                          while on a 4 CPU system it means it was idle 75% of the time.
                          so, whether .99 is high or not depends on how many cpus you have.
                          and on how much work your server is carrying out.
                          you have a 1 cpu system.
                          based on the avg loads you posted your 15 min avg was 0.3 something.
                          your 1 min avg load was 0.9 something.
                          it therefore seems that your normal load was around 0.3.
                          which then went up to 0.9 for a little while.
                          it seems reasonable to conclude that your avg load of 0.9 is...unusual.
                          of course this is based on the info i have.

                          Originally posted by knichel
                          Not true, when the workstations freeze, I log in remotely and check the /home (nfs mounted) and everything is still there. I tested by creating a folder and a document in that folder and checked on the server and both were there. So, nfs appears to be working.
                          yes.
                          i only said you have "some sort of" network issue on the server.
                          the same issue shows up on all the clients every time at the same time.
                          it must be something server side.
                          lockd complaining your server does not respond can only mean 2 things:
                          a) nfs gone bust server side or flaky
                          b) network not working or flaky

                          it could be that a long enough connection interruption make kde freeze.
                          by logging in from console, kde (and its .kde directory) is excluded.

                          try configure one client with gnome.
                          see how it behaves...

                          Originally posted by knichel
                          I am using nfs-kernel-server (in case it matters).
                          well, yeah, it does matter, actually.
                          you don't notice anything using much cpu, but you avg load suddenly goes up.
                          may well mean something's stuck in the kernel.
                          i'd rush to test traditional user space nfs.

                          Originally posted by knichel
                          Thanks for your support and patience.
                          Today was not good (2 restarts of the server required).
                          i want to know what the problem is...

                          take care
                          gnu/linux is not windoze

                          Comment


                            #28
                            Re: NIS HELP Please....

                            Thanks.  I will try this.

                            1)  Should I mv a users dir to the local HD for one or two accounts and NOT mount /home on that workstation at all? or just mv the users dir and still mount /home?

                            I am guessing that in order to rule out nfs or blame it, I should not mount /home anymore on these test workstations.

                            I am thinking that at the end of class,  the user can rsync their local home docs with the server so they can access from home if need be.

                            2)  How do I switch from nfs-kernel-server to traditional?  What is it called?

                            Comment


                              #29
                              Re: NIS HELP Please....

                              remember: 1 change at a time.

                              Originally posted by knichel
                              Should I mv a users dir to the local HD for one or two accounts and NOT mount /home on that workstation at all?
                              or just mv the users dir and still mount /home?
                              mv a user's home dir to a workstation's local hd and do not nfs mount /home.
                              this obviously means that that user will have to work at that given workstation.

                              Originally posted by knichel
                              I am guessing that in order to rule out nfs or blame it,
                              I should not mount /home anymore on these test workstations.
                              correct.
                              1 test workstation should be enough.

                              you could test this config first.
                              see what happens.
                              if you still have troubles, then it is reasonable to rule out nfs.
                              and skip the second test.
                              if this workstation is stable, then go and test user space nfs.

                              Originally posted by knichel
                              I am thinking that at the end of class, the user can rsync their local home docs
                              with the server so they can access from home if need be.
                              or you can do it for her/him
                              but yeah, it's a good idea.

                              Originally posted by knichel
                              How do I switch from nfs-kernel-server to traditional? What is it called?
                              you have 2 options:
                              a) nfs-user-server
                              b) unfs3
                              i'd use unfs3

                              to carry out the test it should be sufficient to:
                              a) stop clients
                              b) backup server configuration files (nfs exports and stuff)
                              c) uninstall nfs-kernel-server from server
                              d) install unfs3 on server
                              e) double check server configuration files (in case of troubles, restore from backup)
                              f) reboot server and start clients

                              kernel space servers are supposedly faster than user space servers.
                              but nfs is so slow anyway, that i doubt your clients will see any difference.

                              btw, what do your users do on your lab's workstations?

                              cheers
                              gnu/linux is not windoze

                              Comment


                                #30
                                Re: NIS HELP Please....

                                Well, haven't written in a while as everything was fine last week. All I have done was to stop nfs on 2 computers and copy user home dirs to those machines and teach students to use rsync to copy files back to server. On one computer I added gnome and have students that use that computer use gnome vs. kde. Figured this could test both nfs and kde possibilities. Since nothing has happened (yea, I know, don't count my chickens...) in over a week, I am inclined to thing it was an nfs problem. Of course, it suggests that the problem was one of the 2 computers rather than the server.

                                Just didn't want you to think I was ignoring you. I appreciate your help.

                                Comment

                                Working...
                                X