• Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: Major Major
    • pipeline
    • None
    • jenkins master & slaves inside docker containers all on the same linux server
      Workflow 1.10.1
      Jenkins ver. 1.625.1

      sh steps that succeed are sometimes hanging

      on http://agility-test.local/job/team_romeo_dev/4/console I see

      ...
      Running: Allocate node : Body : End                                             
      Running: Allocate node : End                                                    
      Running: Allocate node : Start                                                  
      Running on qibuild-linux64 in /home/jenkins/jenkins/workspace/team_romeo_dev    
      Running: Allocate node : Body : Start                                           
      Running: Determine Current Directory                                            
      Running: Shell Script                                                           
      [team_romeo_dev] Running shell script                                           
      + qibuild add-config linux64_debug -t linux64                                   
      

      and it hangs there forever. Here is the corresponding part of the workflow

        ...
        node("qibuild-linux64") {
          configs = []
          if (params.run_linux64_debug) {
            configs << 'linux64_debug'
          }
          if (params.run_linux64_release) {
            configs << 'linux64_release'
          }
          workspace = pwd()
          for (int i = 0; i < configs.size(); ++i) {
            String config = configs[i]
            sh "qibuild add-config ${config} -t linux64"
            def p = projects.all()
            configure(c, config, p)
            make(c, config, p)
            String test_dir = "test-${config}"
            String test_dir_full = "${workspace}/${test_dir}"
            test(c, config, p, test_dir_full)
            step([$class: 'JUnitResultArchiver', testResults: "${test_dir}/*/test-results/*.xml"])
            String test_archive = "${test_dir}.tgz"
            sh "tar -czf ${test_archive} ${test_dir}"
            step([$class: 'ArtifactArchiver', artifacts: test_archive])
          }
          ...
      

      I does seem similar to https://issues.jenkins-ci.org/browse/JENKINS-28759

      I'm not sure how to help nailing this issue.

      I got thread dumps in case it helps:

      I then added a org.jenkinsci.plugins.workflow.steps.durable_task and will hopefully have more information next time it occurs.

          [JENKINS-31769] sh steps on slaves randomly hang when complete

          Sébastien Barthélémy created issue -

          Jesse Glick added a comment -

          Rather sounds like JENKINS-28821: perhaps your slave workspace is not writable, and the wrapper script launch is just failing. Look in your workspace for a directory like .abcdef123456 and see if it contains a PID file, a log file, and a wrapper script file. I suspect something is wrong with how your Docker containers are set up. The durable-task plugin needs to do a better job of detecting and diagnosing this kind of environmental problem.

          Jesse Glick added a comment - Rather sounds like JENKINS-28821 : perhaps your slave workspace is not writable, and the wrapper script launch is just failing. Look in your workspace for a directory like .abcdef123456 and see if it contains a PID file, a log file, and a wrapper script file. I suspect something is wrong with how your Docker containers are set up. The durable-task plugin needs to do a better job of detecting and diagnosing this kind of environmental problem.
          Jesse Glick made changes -
          Link New: This issue is related to JENKINS-28400 [ JENKINS-28400 ]

          Hi Jesse, thank you for answering.
          I'd be surprised by a docker problem : I guess it would not work at all in that case, no?

          Anyway, I got a new hang, on
          http://agility-test.local/job/team_romeo_dev/12/console
          I get

          Started by upstream project "nightly" build number 8
          originally caused by:
           Started by timer
          Running: Print Message
          runme
          Running: run_stage
          Entering stage run_stage
          Proceeding
          Running: Allocate node : Start
          Running on qisrc in /home/jenkins/jenkins/workspace/team_romeo_dev
          Running: Allocate node : Body : Start
          Running: Shell Script
          [team_romeo_dev] Running shell script
          + file /home/jenkins/w/team_romeo_dev/.qi
          /home/jenkins/w/team_romeo_dev/.qi: directory
          ...hanging forever...
          

          On the slave itself, there are the files you mention

          $ ssh jenkins@`sudo docker inspect --format '{{.NetworkSettings.IPAddress}}' qisrc`
          
          $ cd jenkins/workspace/team_romeo_dev/.d10a2fc7/
          
          $ ls
          jenkins-log.txt  jenkins-result.txt  pid  script.sh
          
          $ cat jenkins-log.txt 
          + file /home/jenkins/w/team_romeo_dev/.qi
          /home/jenkins/w/team_romeo_dev/.qi: directory 
          
          $ cat jenkins-result.txt
          0
          
          $ cat pid               
          31285
          
          $ cat script.sh 
          #!/bin/sh -xe
          file /home/jenkins/w/team_romeo_dev/.qijenkins@339a8fe7a101:~/jenkins/workspace/team_romeo_dev/.d10a2fc7$
          
          $ mount
          none on / type aufs (rw,relatime,si=1abb4f0b5dbc66b3,dio)
          proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
          tmpfs on /dev type tmpfs (rw,nosuid,mode=755)
          devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666)
          sysfs on /sys type sysfs (ro,nosuid,nodev,noexec,relatime)
          tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,relatime,mode=755)
          cgroup on /sys/fs/cgroup/cpuset type cgroup (ro,nosuid,nodev,noexec,relatime,cpuset)
          cgroup on /sys/fs/cgroup/cpu type cgroup (ro,nosuid,nodev,noexec,relatime,cpu)
          cgroup on /sys/fs/cgroup/cpuacct type cgroup (ro,nosuid,nodev,noexec,relatime,cpuacct)
          cgroup on /sys/fs/cgroup/memory type cgroup (ro,nosuid,nodev,noexec,relatime,memory)
          cgroup on /sys/fs/cgroup/devices type cgroup (ro,nosuid,nodev,noexec,relatime,devices)
          cgroup on /sys/fs/cgroup/freezer type cgroup (ro,nosuid,nodev,noexec,relatime,freezer)
          cgroup on /sys/fs/cgroup/blkio type cgroup (ro,nosuid,nodev,noexec,relatime,blkio)
          cgroup on /sys/fs/cgroup/perf_event type cgroup (ro,nosuid,nodev,noexec,relatime,perf_event)
          cgroup on /sys/fs/cgroup/hugetlb type cgroup (ro,nosuid,nodev,noexec,relatime,hugetlb)
          systemd on /sys/fs/cgroup/systemd type cgroup (ro,nosuid,nodev,noexec,relatime,name=systemd)
          /dev/mapper/agility--test--vg-root on /home/jenkins/w type ext4 (rw,relatime,errors=remount-ro,data=ordered)
          /dev/mapper/agility--test--vg-root on /etc/resolv.conf type ext4 (rw,relatime,errors=remount-ro,data=ordered)
          /dev/mapper/agility--test--vg-root on /etc/hostname type ext4 (rw,relatime,errors=remount-ro,data=ordered)
          /dev/mapper/agility--test--vg-root on /etc/hosts type ext4 (rw,relatime,errors=remount-ro,data=ordered)
          shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=65536k)
          mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)
          proc on /proc/asound type proc (ro,nosuid,nodev,noexec,relatime)
          proc on /proc/bus type proc (ro,nosuid,nodev,noexec,relatime)
          proc on /proc/fs type proc (ro,nosuid,nodev,noexec,relatime)
          proc on /proc/irq type proc (ro,nosuid,nodev,noexec,relatime)
          proc on /proc/sys type proc (ro,nosuid,nodev,noexec,relatime)
          proc on /proc/sysrq-trigger type proc (ro,nosuid,nodev,noexec,relatime)
          tmpfs on /proc/kcore type tmpfs (rw,nosuid,mode=755)
          tmpfs on /proc/latency_stats type tmpfs (rw,nosuid,mode=755)
          tmpfs on /proc/timer_stats type tmpfs (rw,nosuid,mode=755)
          

          I found nothing useful in http://agility-test.local/log/org.jenkinsci.plugins.workflow.steps.durable_task/, its ring buffer seems too small.

          Here is the jenkins thread dump
          http://pastebin.com/V5bJrJxn

          Sébastien Barthélémy added a comment - Hi Jesse, thank you for answering. I'd be surprised by a docker problem : I guess it would not work at all in that case, no? Anyway, I got a new hang, on http://agility-test.local/job/team_romeo_dev/12/console I get Started by upstream project "nightly" build number 8 originally caused by: Started by timer Running: Print Message runme Running: run_stage Entering stage run_stage Proceeding Running: Allocate node : Start Running on qisrc in /home/jenkins/jenkins/workspace/team_romeo_dev Running: Allocate node : Body : Start Running: Shell Script [team_romeo_dev] Running shell script + file /home/jenkins/w/team_romeo_dev/.qi /home/jenkins/w/team_romeo_dev/.qi: directory ...hanging forever... On the slave itself, there are the files you mention $ ssh jenkins@`sudo docker inspect --format '{{.NetworkSettings.IPAddress}}' qisrc` $ cd jenkins/workspace/team_romeo_dev/.d10a2fc7/ $ ls jenkins-log.txt jenkins-result.txt pid script.sh $ cat jenkins-log.txt + file /home/jenkins/w/team_romeo_dev/.qi /home/jenkins/w/team_romeo_dev/.qi: directory $ cat jenkins-result.txt 0 $ cat pid 31285 $ cat script.sh #!/bin/sh -xe file /home/jenkins/w/team_romeo_dev/.qijenkins@339a8fe7a101:~/jenkins/workspace/team_romeo_dev/.d10a2fc7$ $ mount none on / type aufs (rw,relatime,si=1abb4f0b5dbc66b3,dio) proc on /proc type proc (rw,nosuid,nodev,noexec,relatime) tmpfs on /dev type tmpfs (rw,nosuid,mode=755) devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666) sysfs on /sys type sysfs (ro,nosuid,nodev,noexec,relatime) tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,relatime,mode=755) cgroup on /sys/fs/cgroup/cpuset type cgroup (ro,nosuid,nodev,noexec,relatime,cpuset) cgroup on /sys/fs/cgroup/cpu type cgroup (ro,nosuid,nodev,noexec,relatime,cpu) cgroup on /sys/fs/cgroup/cpuacct type cgroup (ro,nosuid,nodev,noexec,relatime,cpuacct) cgroup on /sys/fs/cgroup/memory type cgroup (ro,nosuid,nodev,noexec,relatime,memory) cgroup on /sys/fs/cgroup/devices type cgroup (ro,nosuid,nodev,noexec,relatime,devices) cgroup on /sys/fs/cgroup/freezer type cgroup (ro,nosuid,nodev,noexec,relatime,freezer) cgroup on /sys/fs/cgroup/blkio type cgroup (ro,nosuid,nodev,noexec,relatime,blkio) cgroup on /sys/fs/cgroup/perf_event type cgroup (ro,nosuid,nodev,noexec,relatime,perf_event) cgroup on /sys/fs/cgroup/hugetlb type cgroup (ro,nosuid,nodev,noexec,relatime,hugetlb) systemd on /sys/fs/cgroup/systemd type cgroup (ro,nosuid,nodev,noexec,relatime,name=systemd) /dev/mapper/agility--test--vg-root on /home/jenkins/w type ext4 (rw,relatime,errors=remount-ro,data=ordered) /dev/mapper/agility--test--vg-root on /etc/resolv.conf type ext4 (rw,relatime,errors=remount-ro,data=ordered) /dev/mapper/agility--test--vg-root on /etc/hostname type ext4 (rw,relatime,errors=remount-ro,data=ordered) /dev/mapper/agility--test--vg-root on /etc/hosts type ext4 (rw,relatime,errors=remount-ro,data=ordered) shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=65536k) mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime) proc on /proc/asound type proc (ro,nosuid,nodev,noexec,relatime) proc on /proc/bus type proc (ro,nosuid,nodev,noexec,relatime) proc on /proc/fs type proc (ro,nosuid,nodev,noexec,relatime) proc on /proc/irq type proc (ro,nosuid,nodev,noexec,relatime) proc on /proc/sys type proc (ro,nosuid,nodev,noexec,relatime) proc on /proc/sysrq-trigger type proc (ro,nosuid,nodev,noexec,relatime) tmpfs on /proc/kcore type tmpfs (rw,nosuid,mode=755) tmpfs on /proc/latency_stats type tmpfs (rw,nosuid,mode=755) tmpfs on /proc/timer_stats type tmpfs (rw,nosuid,mode=755) I found nothing useful in http://agility-test.local/log/org.jenkinsci.plugins.workflow.steps.durable_task/ , its ring buffer seems too small. Here is the jenkins thread dump http://pastebin.com/V5bJrJxn

          Jesse Glick added a comment -

          Jenkins thread dumps are generally useless. 1.12-beta-2 adds a Workflow-specific virtual thread dump, though in this case it would not tell you anything you did not already know: that the build is inside a sh step.

          If jenkins-result.txt exists, and the step still does not exit, it usually means that the Jenkins slave agent is failing to read this file—check file permissions. This is why I suspect Docker: there might be a UID mapping issue.

          The ring buffer for log messages shown in the UI only holds 100 items IIRC. Install the Support Core plugin and your existing custom logs will be streamed to disk, rotated every 10000 items to a series of up to 10 backup files.

          Jesse Glick added a comment - Jenkins thread dumps are generally useless. 1.12-beta-2 adds a Workflow-specific virtual thread dump, though in this case it would not tell you anything you did not already know: that the build is inside a sh step. If jenkins-result.txt exists, and the step still does not exit, it usually means that the Jenkins slave agent is failing to read this file—check file permissions. This is why I suspect Docker: there might be a UID mapping issue. The ring buffer for log messages shown in the UI only holds 100 items IIRC. Install the Support Core plugin and your existing custom logs will be streamed to disk, rotated every 10000 items to a series of up to 10 backup files.

          Hi Jesse, thank you again for help and tips.

          > If jenkins-result.txt exists, and the step still does not exit,
          > it usually means that the Jenkins slave agent is failing to read this file—
          > check file permissions.

          Here they are:

          $ ls -l jenkins/workspace/team_romeo_dev/.d10a2fc7/
          total 16
          -rw-rw-r-- 1 jenkins jenkins 89 Nov 29 02:26 jenkins-log.txt
          -rw-rw-r-- 1 jenkins jenkins  2 Nov 29 02:26 jenkins-result.txt
          -rw-rw-r-- 1 jenkins jenkins  6 Nov 29 02:25 pid
          -rwxr-xr-x 1 jenkins jenkins 53 Nov 29 02:25 script.sh
          

          > This is why I suspect Docker: there might be a UID mapping issue.

          I doubt it, for several reasons:
          1/ my docker setup did not change and is quite straightforward I think
          2/ that would not explain why it sometimes hangs but usually works
          3/ in this case, you can see that the permissions are ok. In particular, the master did succeed to read the log file (since it displayed it in the web interface, see my previous comment), and the result file has the exact same permissions.

          Sébastien Barthélémy added a comment - Hi Jesse, thank you again for help and tips. > If jenkins-result.txt exists, and the step still does not exit, > it usually means that the Jenkins slave agent is failing to read this file— > check file permissions. Here they are: $ ls -l jenkins/workspace/team_romeo_dev/.d10a2fc7/ total 16 -rw-rw-r-- 1 jenkins jenkins 89 Nov 29 02:26 jenkins-log.txt -rw-rw-r-- 1 jenkins jenkins 2 Nov 29 02:26 jenkins-result.txt -rw-rw-r-- 1 jenkins jenkins 6 Nov 29 02:25 pid -rwxr-xr-x 1 jenkins jenkins 53 Nov 29 02:25 script.sh > This is why I suspect Docker: there might be a UID mapping issue. I doubt it, for several reasons: 1/ my docker setup did not change and is quite straightforward I think 2/ that would not explain why it sometimes hangs but usually works 3/ in this case, you can see that the permissions are ok. In particular, the master did succeed to read the log file (since it displayed it in the web interface, see my previous comment), and the result file has the exact same permissions.

          Jesse Glick added a comment -

          Well enable a fine logger. If you see

          still running in … on …
          

          you can use /script to check something like

          Jenkins.instance.getNode('slavename').createPath('/that/path/.abcd1234/jenkins-result.txt').exists()
          

          One odd thing I notice in your logs is the mismatch between /home/jenkins/w/team_romeo_dev and /home/jenkins/workspace/team_romeo_dev. Are you using symbolic links? Custom workspace locations?

          Jesse Glick added a comment - Well enable a fine logger. If you see still running in … on … you can use /script to check something like Jenkins.instance.getNode( 'slavename' ).createPath( '/that/path/.abcd1234/jenkins-result.txt' ).exists() One odd thing I notice in your logs is the mismatch between /home/jenkins/w/team_romeo_dev and /home/jenkins/workspace/team_romeo_dev . Are you using symbolic links? Custom workspace locations?

          Hello Jesse,

          Well enable a fine logger

          I enabled the fine logger and installed the support-core plugin (no hang since ) .
          I can see the logs on the master. There are 10 of them, each one holding 2 minutes of history, that will not be enough I fear.

          $ cd jenkins_home
          $ wc -l logs/custom/*
              4727 logs/custom/durable_task.log
             10027 logs/custom/durable_task.log.1
              9989 logs/custom/durable_task.log.2
             10044 logs/custom/durable_task.log.4
             10008 logs/custom/durable_task.log.5
              9993 logs/custom/durable_task.log.6
             10004 logs/custom/durable_task.log.7
             10025 logs/custom/durable_task.log.8
              9976 logs/custom/durable_task.log.9
             84793 total
          $ head -n 1 logs/custom/durable_task.log.1
          2015-12-03 09:38:55.137+0000 [id=95]       FINE    hudson.remoting.Channel$1#handle: Received hudson.remoting.UnexportCommand@571a0764
          $ tail -n 1 logs/custom/durable_task.log.1
          2015-12-03 09:40:03.616+0000 [id=95]    FINE    hudson.remoting.Channel$1#handle: Received 
          

          One odd thing I notice in your logs is the mismatch between
          home/jenkins/w/team_romeo_dev and
          /home/jenkins/workspace/team_romeo_dev.

          /home/jenkins/workspace is jenkins' workspace.

          team_romeo_dev is the name of the job, so /home/jenkins/workspace/team_romeo_dev is this job's workspace. I think there is nothing custom there.

          The purpose of the job is to build C++ software using qibuild.

          The /home/jenkins/w directory is where I store qibuild worktrees (checkouts of C++ sources) and build directories.
          This directory is a docker volume shared among slaves (which are docker containers, all running on the same host), so that all slaves have access to the sources to build.
          I have one worktree directory per job (here /home/jenkins/w/team_romeo_dev), named like the job, which is itself named like the git branch it follows.

          So, there is no symlink involved, and jenkins should not be aware of this /home/jenkins/w/team_romeo_dev directory (but the scripts it runs are).

          Are you using symbolic links?

          no

          Custom workspace locations?

          no

          Sébastien Barthélémy added a comment - Hello Jesse, Well enable a fine logger I enabled the fine logger and installed the support-core plugin (no hang since ) . I can see the logs on the master. There are 10 of them, each one holding 2 minutes of history, that will not be enough I fear. $ cd jenkins_home $ wc -l logs/custom/* 4727 logs/custom/durable_task.log 10027 logs/custom/durable_task.log.1 9989 logs/custom/durable_task.log.2 10044 logs/custom/durable_task.log.4 10008 logs/custom/durable_task.log.5 9993 logs/custom/durable_task.log.6 10004 logs/custom/durable_task.log.7 10025 logs/custom/durable_task.log.8 9976 logs/custom/durable_task.log.9 84793 total $ head -n 1 logs/custom/durable_task.log.1 2015-12-03 09:38:55.137+0000 [id=95] FINE hudson.remoting.Channel$1#handle: Received hudson.remoting.UnexportCommand@571a0764 $ tail -n 1 logs/custom/durable_task.log.1 2015-12-03 09:40:03.616+0000 [id=95] FINE hudson.remoting.Channel$1#handle: Received One odd thing I notice in your logs is the mismatch between home/jenkins/w/team_romeo_dev and /home/jenkins/workspace/team_romeo_dev. /home/jenkins/workspace is jenkins' workspace. team_romeo_dev is the name of the job, so /home/jenkins/workspace/team_romeo_dev is this job's workspace. I think there is nothing custom there. The purpose of the job is to build C++ software using qibuild . The /home/jenkins/w directory is where I store qibuild worktrees (checkouts of C++ sources) and build directories. This directory is a docker volume shared among slaves (which are docker containers, all running on the same host), so that all slaves have access to the sources to build. I have one worktree directory per job (here /home/jenkins/w/team_romeo_dev ), named like the job, which is itself named like the git branch it follows. So, there is no symlink involved, and jenkins should not be aware of this /home/jenkins/w/team_romeo_dev directory (but the scripts it runs are). Are you using symbolic links? no Custom workspace locations? no

          Jesse Glick added a comment - - edited

          each one holding 2 minutes of history

          The log is supposed to be rotated after it has accumulated 10k lines. But it looks like most of yours is junk. Perhaps I needed to clarify that the FINE logger should be logging only org.jenkinsci.plugins.workflow.steps.durable_task and org.jenkinsci.plugins.durabletask.

          Jesse Glick added a comment - - edited each one holding 2 minutes of history The log is supposed to be rotated after it has accumulated 10k lines. But it looks like most of yours is junk. Perhaps I needed to clarify that the FINE logger should be logging only org.jenkinsci.plugins.workflow.steps.durable_task and org.jenkinsci.plugins.durabletask .

          Sébastien Barthélémy added a comment - - edited

          This week-end I had another hang, on a toy workflow I created for the occasion.

          Here is code:

          class Config implements Serializable {
            String branch
            String source
            String build
          }
          
          def runme(Config c)
          {
            echo "runme"
            stage name: 'run_stage', concurrency: 1
            node("qisrc") {
              for (int i = 0; i < 1000; ++i) {
                echo("i: ${i}");
                try {
                  sh "file ${c.source}/.qi" // will throw if .qi is missing
                } catch (all) {
                  sh "ls ${c.source}"
                }
              }
            }
          }
          
          def c = new Config(
            branch: "team/romeo/dev",
            source: "/home/jenkins/w/team_romeo_dev",
            build: "/home/jenkins/w/team_romeo_dev-build")
          runme(c)
          class Config implements Serializable {
            String branch
            String source
            String build
          }
          
          def runme(Config c)
          {
            echo "runme"
            stage name: 'run_stage', concurrency: 1
            node("qisrc") {
              for (int i = 0; i < 1000; ++i) {
                echo("i: ${i}");
                try {
                  sh "file ${c.source}/.qi" // will throw if .qi is missing
                } catch (all) {
                  sh "ls ${c.source}"
                }
              }
            }
          }
          
          def c = new Config(
            branch: "team/romeo/dev",
            source: "/home/jenkins/w/team_romeo_dev",
            build: "/home/jenkins/w/team_romeo_dev-build")
          runme(c)
          

          I found it hung at iteration 530
          http://agility-test.local/job/test_workflow/19/console

          Started by upstream project "nightly" build number 6
          originally caused by:
           Started by timer
          Running: Print Message
          runme
          Running: run_stage
          Entering stage run_stage
          Proceeding
          Running: Allocate node : Start
          Running on qisrc in /home/jenkins/jenkins/workspace/test_workflow
          Running: Allocate node : Body : Start
          Running: Print Message
          i: 0
          Running: Shell Script
          [test_workflow] Running shell script
          + file /home/jenkins/w/team_romeo_dev/.qi
          /home/jenkins/w/team_romeo_dev/.qi: directory 
          Running: Print Message
          i: 1
          Running: Shell Script
          ...
          Running: Shell Script
          [test_workflow] Running shell script
          + file /home/jenkins/w/team_romeo_dev/.qi
          /home/jenkins/w/team_romeo_dev/.qi: directory 
          Running: Print Message
          i: 530
          Running: Shell Script
          [test_workflow] Running shell script
          + file /home/jenkins/w/team_romeo_dev/.qi
          /home/jenkins/w/team_romeo_dev/.qi: directory 
          

          Nothing unexpected in the control directory

          $ ssh jenkins@`sudo docker inspect --format '{{.NetworkSettings.IPAddress}}' qisrc`
          $ ls -la jenkins/workspace/test_workflow/
          total 24
          drwxrwxr-x 3 jenkins jenkins  4096 Dec  6 02:39 .
          drwxrwxr-x 4 jenkins jenkins  4096 Dec  2 13:20 ..
          drwxrwxr-x 2 jenkins jenkins  4096 Dec  6 02:39 .ac553e58
          -rw-rw-r-- 1 jenkins jenkins 10130 Dec  4 02:14 snapshot.json
          $ cat jenkins/workspace/test_workflow/.ac553e58/jenkins-log.txt 
          + file /home/jenkins/w/team_romeo_dev/.qi
          /home/jenkins/w/team_romeo_dev/.qi: directory 
          $ cat jenkins/workspace/test_workflow/.ac553e58/                
          jenkins-log.txt     jenkins-result.txt  pid                 script.sh           
          $ cat jenkins/workspace/test_workflow/.ac553e58/jenkins-result.txt 
          0
          $ cat jenkins/workspace/test_workflow/.ac553e58/                   
          jenkins-log.txt     jenkins-result.txt  pid                 script.sh           
          $ cat jenkins/workspace/test_workflow/.ac553e58/pid 
          17882
          $ cat jenkins/workspace/test_workflow/.ac553e58/script.sh 
          #!/bin/sh -xe
          file /home/jenkins/w/team_romeo_dev/.qi
          

          The jobs usually runs for 5 minutes and started Dec 6, 2015 2:35 AM.
          Sadly, my older log is 2015-12-06 21:31:54.875+0000.

          I tried this in http://agility-test.local/script

          Jenkins.instance.getNode('qisrc').createPath('/home/jenkins/jenkins/workspace/test_workflow/.ac553e58/jenkins-result.txt').exists()
          

          I got

          Result: true
          

          But nothing happened. The job is still hanged.

          Sébastien Barthélémy added a comment - - edited This week-end I had another hang, on a toy workflow I created for the occasion. Here is code: class Config implements Serializable { String branch String source String build } def runme(Config c) { echo "runme" stage name: 'run_stage' , concurrency: 1 node( "qisrc" ) { for ( int i = 0; i < 1000; ++i) { echo( "i: ${i}" ); try { sh "file ${c.source}/.qi" // will throw if .qi is missing } catch (all) { sh "ls ${c.source}" } } } } def c = new Config( branch: "team/romeo/dev" , source: "/home/jenkins/w/team_romeo_dev" , build: "/home/jenkins/w/team_romeo_dev-build" ) runme(c) class Config implements Serializable { String branch String source String build } def runme(Config c) { echo "runme" stage name: 'run_stage' , concurrency: 1 node( "qisrc" ) { for ( int i = 0; i < 1000; ++i) { echo( "i: ${i}" ); try { sh "file ${c.source}/.qi" // will throw if .qi is missing } catch (all) { sh "ls ${c.source}" } } } } def c = new Config( branch: "team/romeo/dev" , source: "/home/jenkins/w/team_romeo_dev" , build: "/home/jenkins/w/team_romeo_dev-build" ) runme(c) I found it hung at iteration 530 http://agility-test.local/job/test_workflow/19/console Started by upstream project "nightly" build number 6 originally caused by: Started by timer Running: Print Message runme Running: run_stage Entering stage run_stage Proceeding Running: Allocate node : Start Running on qisrc in /home/jenkins/jenkins/workspace/test_workflow Running: Allocate node : Body : Start Running: Print Message i: 0 Running: Shell Script [test_workflow] Running shell script + file /home/jenkins/w/team_romeo_dev/.qi /home/jenkins/w/team_romeo_dev/.qi: directory Running: Print Message i: 1 Running: Shell Script ... Running: Shell Script [test_workflow] Running shell script + file /home/jenkins/w/team_romeo_dev/.qi /home/jenkins/w/team_romeo_dev/.qi: directory Running: Print Message i: 530 Running: Shell Script [test_workflow] Running shell script + file /home/jenkins/w/team_romeo_dev/.qi /home/jenkins/w/team_romeo_dev/.qi: directory Nothing unexpected in the control directory $ ssh jenkins@`sudo docker inspect --format '{{.NetworkSettings.IPAddress}}' qisrc` $ ls -la jenkins/workspace/test_workflow/ total 24 drwxrwxr-x 3 jenkins jenkins 4096 Dec 6 02:39 . drwxrwxr-x 4 jenkins jenkins 4096 Dec 2 13:20 .. drwxrwxr-x 2 jenkins jenkins 4096 Dec 6 02:39 .ac553e58 -rw-rw-r-- 1 jenkins jenkins 10130 Dec 4 02:14 snapshot.json $ cat jenkins/workspace/test_workflow/.ac553e58/jenkins-log.txt + file /home/jenkins/w/team_romeo_dev/.qi /home/jenkins/w/team_romeo_dev/.qi: directory $ cat jenkins/workspace/test_workflow/.ac553e58/ jenkins-log.txt jenkins-result.txt pid script.sh $ cat jenkins/workspace/test_workflow/.ac553e58/jenkins-result.txt 0 $ cat jenkins/workspace/test_workflow/.ac553e58/ jenkins-log.txt jenkins-result.txt pid script.sh $ cat jenkins/workspace/test_workflow/.ac553e58/pid 17882 $ cat jenkins/workspace/test_workflow/.ac553e58/script.sh #!/bin/sh -xe file /home/jenkins/w/team_romeo_dev/.qi The jobs usually runs for 5 minutes and started Dec 6, 2015 2:35 AM. Sadly, my older log is 2015-12-06 21:31:54.875+0000. I tried this in http://agility-test.local/script Jenkins.instance.getNode( 'qisrc' ).createPath( '/home/jenkins/jenkins/workspace/test_workflow/.ac553e58/jenkins-result.txt' ).exists() I got Result: true But nothing happened. The job is still hanged.

            jglick Jesse Glick
            sbarthelemy Sébastien Barthélémy
            Votes:
            1 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: