Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-68244

Possible bug or issue with compute engine

XMLWordPrintable

      Hello Team,

      I am using 2 Jenkins instances with google-compute-engine:4.2.0 on GCP. One of the instance is properly working and the other one is in progress. I have setup on both of them the test helpers/nodes for provisioning the nodes from System engine -> Google Compute section. The plugin is working and the google compute is properly provisioning nodes. Both Jenkins instances are with the same configs on purpose.

      Issue is that when I turn on the second instance and both jenkins instances are up and running, jobs on the first Jenkins (fully functioning) start pending with timeouts due to no connection to the nodes. (which seems to be privisioned properly but then somehow lost during the build)

      The issue is that these nodes are just stuck and while the jobs are being in timeout, nodes are still there and not being removed in the first instance.

      Example error from the build from 1st Jenkins: FATAL: command execution failed java.io.IOException: Backing channel 'test-helper-cbc13j' is disconnected.

      Below screenshot is from the Manage nodes section.

      Possible explanation to me could be that while the nodes are being provision in the first jenkins and being used in the builds; the second instance of jenkins is recognizing the nodes being up but not used in the second instance so the second instance of jenkins is shutting them down; -> which results in error and exception in the first jenkins and builds getting stuck along with the nodes:

      2 Jenkins up and running -> 1st jenkins provisioned the Nodes and working - > 2nd jenkins acknowledge the nodes but consider them as idle and shutting them down -> 1st jenkins builds using the nodes fail.

      Please note that this is a migration and I need both jenkins instances up and with the same nodes config as the builds are using it and the specific labels. Please advise if this is causing the exception below and how to tackle it. (If I use different label nodes, then my builds would not work as would require the specific labels (same as set on jenkins 1)

      Information for the (any) stuck node: and the localhost/computer/ url screenshot
      java.io.EOFException
      at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2680)
      at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3155)
      at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:861)
      at java.io.ObjectInputStream.<init>(ObjectInputStream.java:357)
      at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
      at hudson.remoting.Command.readFrom(Command.java:140)
      at hudson.remoting.Command.readFrom(Command.java:126)
      at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:36)
      at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
      Caused: java.io.IOException: Unexpected termination of the channel
      at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)

            evanbrown Evan Brown
            mlrd_ Milorad Radkov
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: