Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-26947

Maven job stuck when slave channel get disconnected

      I find a way to trigger a remoting problem using tcp fault injection with netem. I'm able to trigger this wait call at hudson.remoting.Request.call(Request.java:146):

      while(response==null && !channel.isInClosed())
        // I don't know exactly when this can happen, as pendingCalls are cleaned up by Channel,
        // but in production I've observed that in rare occasion it can block forever, even after a channel
        // is gone. So be defensive against that.
        wait(30*1000);
      

      When this wait is triggered, the running build is stuck and consumes a executor. It loops over and over on the wait.

      To reproduce, setup a SSH slave using the attached Dockerfile, and setup netem on the docker0 bridge like this:

      tc qdisc add dev docker0 root netem
      tc qdisc change dev docker0 root netem corrupt 1
      

      Testing requires to run the job one time before configuring netem, as netem settings are applied to all network streams, it could fail while downloading Maven dependencies. I just launched a Maven build of a example project to trigger the problem. It might be a Maven specific problem...

      To remove netem settings, just run tc qdisc del dev docker0 root.

      I've attached the Dockerfile, the command I used to launch it and a threaddump of a Jenkins stuck master.

        1. Dockerfile
          0.4 kB
        2. launch.sh
          0.0 kB
        3. stacktrace.txt
          44 kB

          [JENKINS-26947] Maven job stuck when slave channel get disconnected

          James Nord added a comment -

          FWIW the original report has nothing to do with packet corruption - just the channel dying.

          You can get the same results with a "kill -9" on the slave.

          James Nord added a comment - FWIW the original report has nothing to do with packet corruption - just the channel dying. You can get the same results with a "kill -9" on the slave.

          Yes that's right. I found the problem when playing with netem, hence the bug report.

          It's a bug in the Maven plugin. When upstream channel is closed, Maven channel stays around. Will post a PR shortly.

          Yoann Dubreuil added a comment - Yes that's right. I found the problem when playing with netem, hence the bug report. It's a bug in the Maven plugin. When upstream channel is closed, Maven channel stays around. Will post a PR shortly.

          Yoann Dubreuil added a comment - Just created a PR: https://github.com/jenkinsci/maven-plugin/pull/39

          Code changed in jenkins
          User: Yoann Dubreuil
          Path:
          src/main/java/hudson/maven/AbstractMavenProcessFactory.java
          http://jenkins-ci.org/commit/maven-plugin/47b28737803ef90bc7e6518e159df28471988bae
          Log:
          [FIX JENKINS-26947] forcibly terminate Maven remoting channel when upstream channel get closed

          Currently, when the main remoting channel is abruptly closed, Maven channel can be stuck for a while because it doesn't get notified of the disconnection

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Yoann Dubreuil Path: src/main/java/hudson/maven/AbstractMavenProcessFactory.java http://jenkins-ci.org/commit/maven-plugin/47b28737803ef90bc7e6518e159df28471988bae Log: [FIX JENKINS-26947] forcibly terminate Maven remoting channel when upstream channel get closed Currently, when the main remoting channel is abruptly closed, Maven channel can be stuck for a while because it doesn't get notified of the disconnection

          Code changed in jenkins
          User: Nicolas De loof
          Path:
          src/main/java/hudson/maven/AbstractMavenProcessFactory.java
          http://jenkins-ci.org/commit/maven-plugin/7171b20fd0d6b783c1f9e7b0ad09e031db47b4ac
          Log:
          Merge pull request #39 from ydubreuil/fix-JENKINS-26947

          JENKINS-26947 forcibly terminate Maven remoting channel when upstream channel get closed

          Compare: https://github.com/jenkinsci/maven-plugin/compare/ea1a77405c6e...7171b20fd0d6

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Nicolas De loof Path: src/main/java/hudson/maven/AbstractMavenProcessFactory.java http://jenkins-ci.org/commit/maven-plugin/7171b20fd0d6b783c1f9e7b0ad09e031db47b4ac Log: Merge pull request #39 from ydubreuil/fix- JENKINS-26947 JENKINS-26947 forcibly terminate Maven remoting channel when upstream channel get closed Compare: https://github.com/jenkinsci/maven-plugin/compare/ea1a77405c6e...7171b20fd0d6

          Code changed in jenkins
          User: Jesse Glick
          Path:
          src/main/java/hudson/maven/AbstractMavenProcessFactory.java
          http://jenkins-ci.org/commit/maven-plugin/51b204897c19c9be3b03a0a80482d096ae1db3e3
          Log:
          Revert "[FIX JENKINS-26947] forcibly terminate Maven remoting channel when upstream channel get closed"
          [FIXED JENKINS-22252] Caused a regression.
          This reverts commit 47b28737803ef90bc7e6518e159df28471988bae.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: src/main/java/hudson/maven/AbstractMavenProcessFactory.java http://jenkins-ci.org/commit/maven-plugin/51b204897c19c9be3b03a0a80482d096ae1db3e3 Log: Revert " [FIX JENKINS-26947] forcibly terminate Maven remoting channel when upstream channel get closed" [FIXED JENKINS-22252] Caused a regression. This reverts commit 47b28737803ef90bc7e6518e159df28471988bae.

          Code changed in jenkins
          User: Jesse Glick
          Path:
          src/main/java/hudson/maven/AbstractMavenProcessFactory.java
          http://jenkins-ci.org/commit/maven-plugin/e85502ac9b4e05f8ff8a476c795057a770cdc97f
          Log:
          Merge pull request #52 from jglick/IllegalAccessError-JENKINS-22252

          JENKINS-22252 Revert JENKINS-26947 fix

          Compare: https://github.com/jenkinsci/maven-plugin/compare/22dcb14846e2...e85502ac9b4e

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: src/main/java/hudson/maven/AbstractMavenProcessFactory.java http://jenkins-ci.org/commit/maven-plugin/e85502ac9b4e05f8ff8a476c795057a770cdc97f Log: Merge pull request #52 from jglick/IllegalAccessError- JENKINS-22252 JENKINS-22252 Revert JENKINS-26947 fix Compare: https://github.com/jenkinsci/maven-plugin/compare/22dcb14846e2...e85502ac9b4e

          Jesse Glick added a comment -

          Reverting fix.

          Jesse Glick added a comment - Reverting fix.

          Oleg Nenashev added a comment -

          It is still applicable AFAICT though some related fixes have been applied on the Remoting side

          Oleg Nenashev added a comment - It is still applicable AFAICT though some related fixes have been applied on the Remoting side

          Oleg Nenashev added a comment -

          No plan to work on it anytime soon

          Oleg Nenashev added a comment - No plan to work on it anytime soon

            Unassigned Unassigned
            ydubreuil Yoann Dubreuil
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: