Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-23980

Xvfb doesn't remove /tmp/.X-* locks after a build has finished

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Minor
    • Resolution: Not A Defect
    • Component/s: xvfb-plugin
    • Labels:
    • Environment:
      Jenkins 1.561, Xvfb plugin 1.0.10
    • Similar Issues:

      Description

      Once in a while it happens that a Jenkins build fails because the Xvfb it wants to create already exists:

      Xvfb starting$ Xvfb :1 -screen 0 1024x768x8 -fbdir /u10/app/gcadmin/jenkins/stable/jenkins_data/2014-07-24_11-54-567539530762307179860xvfb
      _XSERVTransSocketUNIXCreateListener: ...SocketCreateListener() failed
      _XSERVTransMakeAllCOTSServerListeners: server already running

      Fatal server error:
      Cannot establish any listening sockets - Make sure an X server isn't already running

      Settings are pretty default: I use the number of the executor with offset 1 (so it is not possible for another server running with this number). The workaaround at the moment is manual deletion from time to time.

        Attachments

          Activity

          Hide
          zregvart zregvart added a comment -

          Hi,
          no, actually the error you're getting:

          _XSERVTransSocketUNIXCreateListener: ...SocketCreateListener() failed
          _XSERVTransMakeAllCOTSServerListeners: server already running

          is due to another X server already using the same display number, as per source and FAQ mentioned in comment-206473.

          An existing lock file is a pretty good signal that another X server is using the same display number, the only way I can see a lock file remaining if the X server process crashed or was killed with SIGKILL. And as can be seen in the X server source code in that case X server tries really hard to remove the stale lock file and to launch at the specified display number.

          I think that the handling of X server lock file is best left to the X server. I could add the functionality to the Xvfb plugin to kill already running X server detected by the lock file. But as you can imagine this would result in erratic build behavior as concurrent builds would terminate each other, or user's X sessions would unexpectedly terminate.

          The zombie process handling is automatic termination of Xvfb processes left by slave disconnect or Jenkins master crash. So if your slave disconnects, for example due to a network issues, the build fails and no Xvfb process termination could be performed as the slave cannot be contacted to execute the process termination. For these cases Xvfb plugin keeps a list of Xvfb processes that it started, and on slave reconnect it goes trough the list and terminates the leftover Xvfb processes and removes the temporary frame buffer directory used by the Xvfb. This is done automatically on Jenkins master startup, and on slave reconnect.

          I think that in your case you need to make sure that there is no overlap between display numbers, which could be caused by using the same offset for more than one job running on the same slave, using multiple slaves per physical machine or running X servers on the display numbers used by Jenkins. The easiest way to have non overlapping display numbers is to use the 'Let Xvfb choose display name' option and have that done automatically by the X server.

          Show
          zregvart zregvart added a comment - Hi, no, actually the error you're getting: _XSERVTransSocketUNIXCreateListener: ...SocketCreateListener() failed _XSERVTransMakeAllCOTSServerListeners: server already running is due to another X server already using the same display number, as per source and FAQ mentioned in comment-206473 . An existing lock file is a pretty good signal that another X server is using the same display number, the only way I can see a lock file remaining if the X server process crashed or was killed with SIGKILL. And as can be seen in the X server source code in that case X server tries really hard to remove the stale lock file and to launch at the specified display number. I think that the handling of X server lock file is best left to the X server. I could add the functionality to the Xvfb plugin to kill already running X server detected by the lock file. But as you can imagine this would result in erratic build behavior as concurrent builds would terminate each other, or user's X sessions would unexpectedly terminate. The zombie process handling is automatic termination of Xvfb processes left by slave disconnect or Jenkins master crash. So if your slave disconnects, for example due to a network issues, the build fails and no Xvfb process termination could be performed as the slave cannot be contacted to execute the process termination. For these cases Xvfb plugin keeps a list of Xvfb processes that it started, and on slave reconnect it goes trough the list and terminates the leftover Xvfb processes and removes the temporary frame buffer directory used by the Xvfb. This is done automatically on Jenkins master startup, and on slave reconnect. I think that in your case you need to make sure that there is no overlap between display numbers, which could be caused by using the same offset for more than one job running on the same slave, using multiple slaves per physical machine or running X servers on the display numbers used by Jenkins. The easiest way to have non overlapping display numbers is to use the 'Let Xvfb choose display name' option and have that done automatically by the X server.
          Hide
          pedro_cucaracha Stefan Schultz added a comment -

          Hi,

          My server admin told me, there was indeed a xvfb process blocking port 6003. After killing it there are no failing builds yet. So, sorry for the confusion...

          Now I have another problem:

          I'm calling a gradle build (the xvfb is started successfully) and the gradle build calls three executables one after another which need a Display to connect to. The first two calls are successful the third fails with:

          java.lang.InternalError: Can't connect to X11 window server using ':103' as the value of the DISPLAY variable.

          I checked the processes and there is no longer an xvfb instance running for this display. (I use netstat -tupln | grep tcp to check for a process listening to port 6103). It looks like the instance is shut down after the second connection. Do you happen to know why xvfb is doing this and is there an option for the plugin to keep xvfb alive no matter how many processes connect?

          Thanks,
          Stefan

          Show
          pedro_cucaracha Stefan Schultz added a comment - Hi, My server admin told me, there was indeed a xvfb process blocking port 6003. After killing it there are no failing builds yet. So, sorry for the confusion... Now I have another problem: I'm calling a gradle build (the xvfb is started successfully) and the gradle build calls three executables one after another which need a Display to connect to. The first two calls are successful the third fails with: java.lang.InternalError: Can't connect to X11 window server using ':103' as the value of the DISPLAY variable. I checked the processes and there is no longer an xvfb instance running for this display. (I use netstat -tupln | grep tcp to check for a process listening to port 6103). It looks like the instance is shut down after the second connection. Do you happen to know why xvfb is doing this and is there an option for the plugin to keep xvfb alive no matter how many processes connect? Thanks, Stefan
          Hide
          zregvart zregvart added a comment -

          Hi Stefan,
          the Xvfb plugin keeps the Xvfb running for the duration of the build steps, there is an option to keep it running for the post build actions as well (Shoutdown Xvfb with whole job, not just with the main build action). The started Xvfb process makes the display available until the termination, it should not matter how many processes connect to it.

          Not sure what your build is doing, but if you can reproduce this i suggest you open another issue and attach the job output console with the Log Xvfb option turned on. A job configuration, or detailed steps to reproduce this would be very beneficial.

          Show
          zregvart zregvart added a comment - Hi Stefan, the Xvfb plugin keeps the Xvfb running for the duration of the build steps, there is an option to keep it running for the post build actions as well (Shoutdown Xvfb with whole job, not just with the main build action). The started Xvfb process makes the display available until the termination, it should not matter how many processes connect to it. Not sure what your build is doing, but if you can reproduce this i suggest you open another issue and attach the job output console with the Log Xvfb option turned on. A job configuration, or detailed steps to reproduce this would be very beneficial.
          Hide
          zregvart zregvart added a comment -

          Treating this as not a defect

          Show
          zregvart zregvart added a comment - Treating this as not a defect
          Hide
          zregvart zregvart added a comment -

          Closing

          Show
          zregvart zregvart added a comment - Closing

            People

            Assignee:
            zregvart zregvart
            Reporter:
            pedro_cucaracha Stefan Schultz
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: