Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-43038

Intermittent error "Cannot contact node123: java.lang.InterruptedException " in jenkins

    XMLWordPrintable

Details

    Description

      We face below connection errors intermittently while running jobs on node123.

      Error which we see in build log is : Cannot contact node123: java.lang.InterruptedException

      I dont see any error in thread dump or any other logs related to this node.

      Also i see there was not connection drop between Master and node.

      Slave is see is running since more than 24 hrs now.

       

       

      Attachments

        Issue Links

          Activity

            oleg_nenashev Oleg Nenashev added a comment -

            Unfortunately I have no capacity to work on Remoting in medium term, so I will unassign it and let others to take it. If somebody is interested to submit a pull request, I will be happy to help to get it reviewed and released.

            oleg_nenashev Oleg Nenashev added a comment - Unfortunately I have no capacity to work on Remoting in medium term, so I will unassign it and let others to take it. If somebody is interested to submit a pull request, I will be happy to help to get it reviewed and released.
            svanoort Sam Van Oort added a comment - - edited

            msavlani1 shahmishal tsvi If you update to the latest Pipeline plugins and especially support-core plugin and use the suggested GC settings (https://jenkins.io/blog/2016/11/21/gc-tuning/) you should find that the InterruptedExceptions are pretty much gone – they are the result of timeouts in remoting-related operations generally. The only cases they should happen now I believe are actual hardware/system/network issues.

            In the last quarter of 2017 we did a big change to the way Pipeline's durable tasks interact with remoting that should avoid many of these issues.

            Edit: There was an additional issue fixed around support-core that caused problems and was recently fixed. Specifically, support-core plugin in version 2.42 added heap histogram analysis for diagnostics but this had the unexpected side effect of introducing periodic catastrophically long GC pauses that made the Jenkins master unresponsive for long periods and triggered timeouts (and thus the InterruptedException here when Timeouts kick in).

            Please see https://issues.jenkins-ci.org/browse/JENKINS-49931 for more details of that.

            For now I'm going to transition this to "closed" because when working with several users showing this among other symptoms, the suggestions above successfully resolved the issues – but I'm happy to re-open this if you all still experience problems after applying the above (please reply to note the same).

            svanoort Sam Van Oort added a comment - - edited msavlani1 shahmishal tsvi If you update to the latest Pipeline plugins and especially support-core plugin and use the suggested GC settings ( https://jenkins.io/blog/2016/11/21/gc-tuning/ ) you should find that the InterruptedExceptions are pretty much gone – they are the result of timeouts in remoting-related operations generally. The only cases they should happen now I believe are actual hardware/system/network issues. In the last quarter of 2017 we did a big change to the way Pipeline's durable tasks interact with remoting that should avoid many of these issues. Edit: There was an additional issue fixed around support-core that caused problems and was recently fixed. Specifically, support-core plugin in version 2.42 added heap histogram analysis for diagnostics but this had the unexpected side effect of introducing periodic catastrophically long GC pauses that made the Jenkins master unresponsive for long periods and triggered timeouts (and thus the InterruptedException here when Timeouts kick in). Please see https://issues.jenkins-ci.org/browse/JENKINS-49931 for more details of that. For now I'm going to transition this to "closed" because when working with several users showing this among other symptoms, the suggestions above successfully resolved the issues – but I'm happy to re-open this if you all still experience problems after applying the above (please reply to note the same).
            joebarber Joe Barber added a comment -

            Hi I am recently seeing the same "Cannot contact node123: java.lang.InterruptedException" error but only during parallel stages in a pipeline job.

            I have created a brand new Jenkins environment (Jenkins version 2.121.1) with all updated plugins and have the GC settings according to the gc-tuning page from the above comment.
            This issue is intermittent (about 1 every 8 builds or so).

            Support-Core version 2.48
            Pipeline version 2.5

            Any other advice?

             

            Thanks,

             

            joebarber Joe Barber added a comment - Hi I am recently seeing the same "Cannot contact node123: java.lang.InterruptedException" error but only during parallel stages in a pipeline job. I have created a brand new Jenkins environment (Jenkins version 2.121.1) with all updated plugins and have the GC settings according to the gc-tuning page from the above comment. This issue is intermittent (about 1 every 8 builds or so). Support-Core version 2.48 Pipeline version 2.5 Any other advice?   Thanks,  
            svanoort Sam Van Oort added a comment -

            joebarber What you describe sounds a lot like https://issues.jenkins-ci.org/browse/JENKINS-46507 but we have not had a consistent way to reproduce the issue, so it's very hard to debug. If you can provide a simple, self-contained sample Pipeline in the comments of that ticket that will reproduce the issue, that would be very helpful. Thanks!

            svanoort Sam Van Oort added a comment - joebarber What you describe sounds a lot like https://issues.jenkins-ci.org/browse/JENKINS-46507 but we have not had a consistent way to reproduce the issue, so it's very hard to debug. If you can provide a simple, self-contained sample Pipeline in the comments of that ticket that will reproduce the issue, that would be very helpful. Thanks!

            We're experiencing the same issue when our java agent get killed my OOM or machine on which agent is running is rebooted. Is there any way to reduce amount of time Jenkins will wait till the build will be mark as failed?

            oxygenxo Andrey Babushkin added a comment - We're experiencing the same issue when our java agent get killed my OOM or machine on which agent is running is rebooted. Is there any way to reduce amount of time Jenkins will wait till the build will be mark as failed?

            People

              svanoort Sam Van Oort
              msavlani1 Manish Sawlani
              Votes:
              25 Vote for this issue
              Watchers:
              44 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: