Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-30655

Remoting blocks when the slave disconnects during copying files

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Blocker Blocker
    • core, remoting
    • Jenkins 1.599 and 1.609.3, Copy artifact 1.35

      We are copying a big file from another job to the test job. That takes 4~5 minutes. The problem is, if the slave disconnects during this 4~5 minutes, the copy artifact doesn't know and doesn't stop. As a result, this job will run forever. Cancelling the job doesn't cancel it at this moment. Even disconnecting the slave doesn't stop the job. The only way out of this is restart the master, and I mean reboot the master machine, because soft restarting the Jenkins process will also hang during the restart. This is really ugly when it happens, so the priority is blocker.

          [JENKINS-30655] Remoting blocks when the slave disconnects during copying files

          JY Hsu created issue -
          JY Hsu made changes -
          Assignee New: ikedam [ ikedam ]

          ikedam added a comment -

          It is rather considered an issue of Jenkins core as copyartifact depends on the function of Jenkins core for file copying from remotes.

          • Would you get thread dumps (of both the master and the slave) when the problem occurs?
          • Would you let me know more details about "soft restarting the Jenkins process will also hang during the restart"? What exactly happens?
          • I highly recommend you to use Jenkins LTS versions, as non-LTS versions are often unstable. Could you see whether the problem reproduces when you use the latest LTS version?: https://jenkins-ci.org/#stable

          ikedam added a comment - It is rather considered an issue of Jenkins core as copyartifact depends on the function of Jenkins core for file copying from remotes. Would you get thread dumps (of both the master and the slave) when the problem occurs? Please see following pages for instructions to get thread dumps: https://wiki.jenkins-ci.org/display/JENKINS/Obtaining+a+thread+dump Would you let me know more details about "soft restarting the Jenkins process will also hang during the restart"? What exactly happens? I highly recommend you to use Jenkins LTS versions, as non-LTS versions are often unstable. Could you see whether the problem reproduces when you use the latest LTS version?: https://jenkins-ci.org/#stable
          ikedam made changes -
          Component/s New: core [ 15593 ]

          JY Hsu added a comment -

          soft restarting means go https://jenkins-url/restart. This restart will never end when the problem happens. It appears that the whole Jenkins process is messed up when this happens. Only rebooting the host machine can resolve this. But this is a Jenkins server with several product team using it, so having to reboot every couple days is really a big problem.

          JY Hsu added a comment - soft restarting means go https://jenkins-url/restart . This restart will never end when the problem happens. It appears that the whole Jenkins process is messed up when this happens. Only rebooting the host machine can resolve this. But this is a Jenkins server with several product team using it, so having to reboot every couple days is really a big problem.

          JY Hsu added a comment -

          This is happening again even after I upgraded to LTS build 1.609.3. Attached is the thread dump. The slave that is having problem is IC_Mac_01. The job name is LCMI_UnitTest.

          JY Hsu added a comment - This is happening again even after I upgraded to LTS build 1.609.3. Attached is the thread dump. The slave that is having problem is IC_Mac_01. The job name is LCMI_UnitTest.
          JY Hsu made changes -
          Attachment New: Jenkins-ThreadDump.txt [ 30766 ]

          ikedam added a comment -

          Here looks the place where the block occurs:

          Executor #0 for IC_Mac_01 : executing LCMI_UnitTest #630
          "Executor #0 for IC_Mac_01 : executing LCMI_UnitTest #630" Id=14887 Group=main BLOCKED on hudson.remoting.Channel@2bdb2bdb owned by "Ping thread for channel hudson.remoting.Channel@2bdb2bdb:IC_Mac_01" Id=12907
          	at hudson.remoting.ProxyOutputStream.flush(ProxyOutputStream.java:153)
          	-  blocked on hudson.remoting.Channel@2bdb2bdb
          	-  locked hudson.remoting.ProxyOutputStream@2570257
          	at hudson.remoting.RemoteOutputStream.flush(RemoteOutputStream.java:114)
          	at java.io.FilterOutputStream.flush(FilterOutputStream.java:134)
          	at java.io.FilterOutputStream.close(FilterOutputStream.java:151)
          	at hudson.remoting.RemoteOutputStream.close(RemoteOutputStream.java:118)
          	at org.apache.commons.io.IOUtils.closeQuietly(IOUtils.java:303)
          	at org.apache.commons.io.IOUtils.closeQuietly(IOUtils.java:274)
          	at hudson.FilePath$41.invoke(FilePath.java:2020)
          	at hudson.FilePath$41.invoke(FilePath.java:2010)
          	at hudson.FilePath.act(FilePath.java:989)
          	at hudson.FilePath.act(FilePath.java:967)
          	at hudson.FilePath.copyTo(FilePath.java:2010)
          	at hudson.plugins.copyartifact.FingerprintingCopyMethod.copyOne(FingerprintingCopyMethod.java:80)
          	at hudson.plugins.copyartifact.FingerprintingCopyMethod.copyAll(FingerprintingCopyMethod.java:64)
          

          It looks caused by the remoting module in Jenkins core.
          I'll leave the assignee expecting a maintainer of Jenkins core will take over this issue.

          ikedam added a comment - Here looks the place where the block occurs: Executor #0 for IC_Mac_01 : executing LCMI_UnitTest #630 "Executor #0 for IC_Mac_01 : executing LCMI_UnitTest #630" Id=14887 Group=main BLOCKED on hudson.remoting.Channel@2bdb2bdb owned by "Ping thread for channel hudson.remoting.Channel@2bdb2bdb:IC_Mac_01" Id=12907 at hudson.remoting.ProxyOutputStream.flush(ProxyOutputStream.java:153) - blocked on hudson.remoting.Channel@2bdb2bdb - locked hudson.remoting.ProxyOutputStream@2570257 at hudson.remoting.RemoteOutputStream.flush(RemoteOutputStream.java:114) at java.io.FilterOutputStream.flush(FilterOutputStream.java:134) at java.io.FilterOutputStream.close(FilterOutputStream.java:151) at hudson.remoting.RemoteOutputStream.close(RemoteOutputStream.java:118) at org.apache.commons.io.IOUtils.closeQuietly(IOUtils.java:303) at org.apache.commons.io.IOUtils.closeQuietly(IOUtils.java:274) at hudson.FilePath$41.invoke(FilePath.java:2020) at hudson.FilePath$41.invoke(FilePath.java:2010) at hudson.FilePath.act(FilePath.java:989) at hudson.FilePath.act(FilePath.java:967) at hudson.FilePath.copyTo(FilePath.java:2010) at hudson.plugins.copyartifact.FingerprintingCopyMethod.copyOne(FingerprintingCopyMethod.java:80) at hudson.plugins.copyartifact.FingerprintingCopyMethod.copyAll(FingerprintingCopyMethod.java:64) It looks caused by the remoting module in Jenkins core. I'll leave the assignee expecting a maintainer of Jenkins core will take over this issue.
          ikedam made changes -
          Component/s Original: copyartifact-plugin [ 15692 ]
          Assignee Original: ikedam [ ikedam ]
          Environment Original: Jenkins 1.599, Copy artifact 1.35 New: Jenkins 1.599 and 1.609.3, Copy artifact 1.35
          Labels New: remoting
          Summary Original: Copy artifact hangs the job when the slave disconnects during copy New: Remoting blocks when the slave disconnects during copying files

          ikedam added a comment -

          k76154
          Would you attach following logs output when the slave was disconnected? That might help the investigation.

          • The console log of the build (LCMI_UnitTest #630)
          • Jenkins system logs, e.g. /var/log/jenkins/jenkins.log (it depends on how you launch Jenkins).

          ikedam added a comment - k76154 Would you attach following logs output when the slave was disconnected? That might help the investigation. The console log of the build (LCMI_UnitTest #630) Jenkins system logs, e.g. /var/log/jenkins/jenkins.log (it depends on how you launch Jenkins).

            Unassigned Unassigned
            k76154 JY Hsu
            Votes:
            4 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated: