Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-10944

MatrixProject can run on a heavyweight executor, leading to deadlocks

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • core
    • jenkins running on linux host
      builds tied to remote mac host
      have several matrix builds

      Sometimes jobs get blocked by each other or "deadlock". We must manually cancel and restart the builds.

      IRC Transcript
      --------------
      amrox: Hello. I had a brief exchange with @jenkins on twitter yesterday http://dl.dropbox.com/u/45634/Screen%20Shot%202011-09-08%20at%205.18.05%20PM.png
      amrox: my jenkins fell into a "deadlock" state again
      amrox: it's still in that state now. what information can I provide?
      rtyler: *nudges kohsuke *
      rtyler: amrox: I'm @jenkinsci FWIW
      amrox: rtyler: hi, and thanks for the responses yesterday
      amrox: heres a screen recording... best way I could think to show the issue http://dl.dropbox.com/u/45634/Screen%20Recording%204.mov
      farshidghods left the room (quit: Quit: Leaving.). (5:30:12 PM)
      mconigliaro left the room (quit: Quit: mconigliaro). (5:30:16 PM)
      amrox: I think it can be solved by just adding another executor
      amrox: but I'd like to avoid that, and it seems like a bug?
      kohsuke: amrox: we need thread dump. see https://wiki.jenkins-ci.org/display/JENKINS/Build+is+hanging
      amrox: jenkins master dump: http://dl.dropbox.com/u/45634/threaddump.html
      amrox: slave: http://dl.dropbox.com/u/45634/slave-threaddump.html
      amrox: is that format acceptable? helpful at all?
      amrox: should I just file a bug?
      kennethreitz kennethre@c-24-127-96-129.hsd1.va.comcast.net entered the room. (5:45:36 PM)
      kohsuke: amrox: yes, that'd be great
      kohsuke: jenkins-admin: create ant-plugin on github for kohsuke
      kohsuke: jenkins-admin: create javadoc-plugin on github for kohsuke
      amrox: kohsuke: will do thanks
      kohsuke: Is "content_viewer_ios_develop_build" the hanging job?
      kohsuke: OK, so the issue is that the matrix parents are blocking the execution of its child builds
      kohsuke: but the parent is also waiting for the completion of the child builds, hence the dead lock
      kohsuke: amrox: ^^ did I get that right?
      kohsuke: the question is why content_asset_verify_auto build is occupying an executor
      amrox: yes that seems accurate
      kohsuke: content_viewer_ios_develop_build is correctly using a temporary flyweight executor
      kohsuke: ... as seen by the lack of number in the executor table
      kohsuke: amrox: I assume all those builds are tried to remote-macslave-1
      amrox: kohsuke: yes
      kohsuke: OK. We'll capture this in the ticket you'll create
      kohsuke: Thanks for bringing this to our attention, and sorry for the bug
      amrox: thanks for building and maintaining jenkins 

        1. slave-threaddump.html
          21 kB
          Andy M
        2. threaddump.html
          42 kB
          Andy M

            jglick Jesse Glick
            amrox Andy M
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: