Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-76181

Buildnode communication lost; no retries?

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • core
    • None

      We periodically experience build failures due to what looks like build node comms issues. However, as far as we can tell the build nodes remain up and accessible.  We run a jenkins controller via docker and have a number of build nodes (mix of vms and metal) registered.  The agent runs as a systemd service. Given that a build can take upwards of 3h due to complexity and number of components being built, failures at the end (eg archiving artifacts) are inconvenient as there is no way to resume (that we know of) and the build process just needs to be restarted.

      Attached are two stack traces showing two different failures that we recently experienced.  At this point we are not sure what to look at next to try and track this down as it is intermittent.  Some times it will happen with frequency other times, it might be days before we see it again so we do not have a consistent trigger either.

      jenkins-stack1.txt – unclear what to look at next

      jenkins-stack2.txt + jenkins-build3.txt – looks like a timeout but unclear what would trigger this.  no firewalls should be in the path between these nodes.

        1. jenkins-build3.log
          2 kB
          Ben
        2. jenkins-stack1.txt
          10 kB
          Ben
        3. jenkins-stack2.txt
          8 kB
          Ben

            Unassigned Unassigned
            bmagistro Ben
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: