Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-11962

Symlinking lastSuccessful build shouldn't fail with concurrent jobs

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Major Major
    • core
    • Jenkins 1.438
      Concurrent job
      3 builds running

      I had three builds running at the same time, and two of them finished during the same second. This lead to the following message on one of them:

      ln -s builds/2011-12-01_21-35-22 /var/lib/jenkins/jobs/my_job/builds/../lastSuccessful failed: 17 File exists

      The job still succeeded, so it's not a big deal.

      This seems like a race condition between rm-ing the old symlink and creating the new one. Maybe ln -sf would work better? I assume it does its operations atomically.

        1. bugchugger_cleanup_1098516.txt
          0.6 kB
        2. bugchugger_cleanup_1098517.txt
          0.5 kB
        3. bugchugger_cleanup_1098518.txt
          0.5 kB
        4. error
          15 kB
        5. ScreenShot.png
          ScreenShot.png
          213 kB

          [JENKINS-11962] Symlinking lastSuccessful build shouldn't fail with concurrent jobs

          Radek Chromy added a comment -

          Found the same problem on Jenkins version: 1.496
          4 Jobs running in parallel, the first one has finished successfully (with console "ln -s ... failed: 17 File exists"), but the others stuck forever.

          Radek Chromy added a comment - Found the same problem on Jenkins version: 1.496 4 Jobs running in parallel, the first one has finished successfully (with console "ln -s ... failed: 17 File exists"), but the others stuck forever.

          I am also seeing this issue (version 1.451). I have up to 7 jobs running concurrently and I'm getting the "ln -s ... failed: 17 File exists" error message in about 5% of the jobs. That means that there is almost always a group of jobs stuck because of a job that is unable to successfully create the symlink. The stuck jobs eventually finish, sometimes in a few seconds, sometimes up to 20 minutes later.

          Rebecca Drabenstott added a comment - I am also seeing this issue (version 1.451). I have up to 7 jobs running concurrently and I'm getting the "ln -s ... failed: 17 File exists" error message in about 5% of the jobs. That means that there is almost always a group of jobs stuck because of a job that is unable to successfully create the symlink. The stuck jobs eventually finish, sometimes in a few seconds, sometimes up to 20 minutes later.

          Daniel Beck added a comment -

          Can this be reproduced in more recent versions of Jenkins (no older than 8-10 weeks or so)? If so, what OS and what version and vendor of Java are you using? Please include log excerpts, content of the /systemInfo URL in your comment. Also relevant:
          https://wiki.jenkins-ci.org/display/JENKINS/How+to+report+an+issue

          Daniel Beck added a comment - Can this be reproduced in more recent versions of Jenkins (no older than 8-10 weeks or so)? If so, what OS and what version and vendor of Java are you using? Please include log excerpts, content of the /systemInfo URL in your comment. Also relevant: https://wiki.jenkins-ci.org/display/JENKINS/How+to+report+an+issue

          I have reproduced this issue in Jenkins 1.588 running on Red Hat Enterprise Linux Server release 5.5 (Tikanga). It is using Java 1.6.0_21-b06 from Sun (Oracle). I’m not sure my company would be happy if I publicly posted the entire contents of /systemInfo, but if there are specific sections of interest, I could possibly remove the sensitive data and post them.

          It seems that one job gets stuck trying (and ultimately failing) to create a symlink. While it is stuck, other jobs get stuck too. The instant the first job finishes, the other jobs finish too. The job in this example runs very quickly and also gets stuck for a fairly short period of time, but we have seen the same behavior in other jobs that normally run on the order of many seconds to many minutes and they can get stuck for up to 20 minutes.

          In the screen shot that I have attached, you can see that jobs 1098516, 1098517, 1098518 took longer than the others. Job 1098516 failed to create the symlink. I’m attaching the console output of the three jobs. I’m also attaching part of the Jenkins error log. There is an error in the error log, but it is difficult to tell if it is related or not. It is a frequent error and in other examples of this issue, the error does not always occur immediately before the stuck jobs finish.

          Rebecca Drabenstott added a comment - I have reproduced this issue in Jenkins 1.588 running on Red Hat Enterprise Linux Server release 5.5 (Tikanga). It is using Java 1.6.0_21-b06 from Sun (Oracle). I’m not sure my company would be happy if I publicly posted the entire contents of /systemInfo, but if there are specific sections of interest, I could possibly remove the sensitive data and post them. It seems that one job gets stuck trying (and ultimately failing) to create a symlink. While it is stuck, other jobs get stuck too. The instant the first job finishes, the other jobs finish too. The job in this example runs very quickly and also gets stuck for a fairly short period of time, but we have seen the same behavior in other jobs that normally run on the order of many seconds to many minutes and they can get stuck for up to 20 minutes. In the screen shot that I have attached, you can see that jobs 1098516, 1098517, 1098518 took longer than the others. Job 1098516 failed to create the symlink. I’m attaching the console output of the three jobs. I’m also attaching part of the Jenkins error log. There is an error in the error log, but it is difficult to tell if it is related or not. It is a frequent error and in other examples of this issue, the error does not always occur immediately before the stuck jobs finish.

          Daniel Beck added a comment -

          I think the Java 1.6 implementation needs JNA to work or something similar. Maybe try running Jenkins on Java 7.

          Also, it doesn't look like you need the builds to run in parallel, as they are all done within milliseconds. Disabling that would likely prevent issues like that.

          Daniel Beck added a comment - I think the Java 1.6 implementation needs JNA to work or something similar. Maybe try running Jenkins on Java 7. Also, it doesn't look like you need the builds to run in parallel, as they are all done within milliseconds. Disabling that would likely prevent issues like that.

          Thanks for the suggestions Daniel. We plan to move to Java 7 in the near future. Possibly that will fix the issue. Good point about the parallel runs being unnecessary. However, the other job we have that runs for much longer (and stalls for much longer) does require parallel runs.

          Rebecca Drabenstott added a comment - Thanks for the suggestions Daniel. We plan to move to Java 7 in the near future. Possibly that will fix the issue. Good point about the parallel runs being unnecessary. However, the other job we have that runs for much longer (and stalls for much longer) does require parallel runs.

          Ken Poole added a comment -

          This just happened to us on jenkins 1.644 running from the "official" docker image.

          Ken Poole added a comment - This just happened to us on jenkins 1.644 running from the "official" docker image.

          Mark Waite added a comment -

          Jenkins no longer creates symbolic links by default. Closing

          Mark Waite added a comment - Jenkins no longer creates symbolic links by default. Closing

            Unassigned Unassigned
            jorgenpt Jørgen Tjernø
            Votes:
            4 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: