Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-71350

Provisioning an instance succeeds before SSH is really usable

      When an instance is provisioned (or started), awaitInstanceSshAvailable() succeeds before SSH is really allowing normal users to log in. Trying to log in via SSH will f.e. report "System is booting up. Unprivileged users are not permitted to log in yet. Please come back later. For technical details, see pam_nologin(8).".
      This will result in the subsequent agent launch to fail and the node will never be usable (unless the agent is later relaunched by manually clicking on it) and it will go on and try to provision the next node etc.

      I worked around the problem by patching awaitInstanceSshAvailable() to also log in as the user that the agent will be launched with, which solves the above problem (patch is attached).

          [JENKINS-71350] Provisioning an instance succeeds before SSH is really usable

          Cloud Configuration looks like this (values not listed are using their defaults):
           
          Name: OCI - PHX - GBUCDSINT
          Credentials: oci_api_key_gbucdsint
          Instance Cap: 3
           
          Instance Template:
          Description: CPM OL8 Buildnode
          Labels: ol8
          Compartment: CEGBU-Aconex
          Availability Domain: iRbP:PHX-AD-1
          Image Compartment: CEGBU-Aconex
          Image: jenkins-buildnode-cpm-ol8
          Shape: VM.Standard2.4
          Virtual Cloud Network Compartment: Networks
          Virtual Cloud Network: CorpDev1-phx.vcn
          Subnet Compartment: Networks
          Subnet: snPhxPrShared1
          SSH credentials: jenkins (SSH key for the jenkins user on the build nodes)
          Remote FS Root: /data/jenkins
          Stop on Idle Timeout: tried both on and off, doesn't really matter, result is the same whether it's freshly commissioned or just starting up an old instance
          Tags
            Namespace: gbuitops
            Key: InstanceContact
            Value: tobias.wildgruber@oracle.com
          Instance Name Prefix: cpm-ol8
           

          Tobias Wildgruber added a comment - Cloud Configuration looks like this (values not listed are using their defaults):   Name: OCI - PHX - GBUCDSINT Credentials: oci_api_key_gbucdsint Instance Cap: 3   Instance Template: Description: CPM OL8 Buildnode Labels: ol8 Compartment: CEGBU-Aconex Availability Domain: iRbP:PHX-AD-1 Image Compartment: CEGBU-Aconex Image: jenkins-buildnode-cpm-ol8 Shape: VM.Standard2.4 Virtual Cloud Network Compartment: Networks Virtual Cloud Network: CorpDev1-phx.vcn Subnet Compartment: Networks Subnet: snPhxPrShared1 SSH credentials: jenkins (SSH key for the jenkins user on the build nodes) Remote FS Root: /data/jenkins Stop on Idle Timeout: tried both on and off, doesn't really matter, result is the same whether it's freshly commissioned or just starting up an old instance Tags   Namespace: gbuitops   Key: InstanceContact   Value: tobias.wildgruber@oracle.com Instance Name Prefix: cpm-ol8  

            sindhusri16 Yarlagadda Sindhu Sri
            tow Tobias Wildgruber
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: