-
Bug
-
Resolution: Unresolved
-
Major
-
None
When an instance is provisioned (or started), awaitInstanceSshAvailable() succeeds before SSH is really allowing normal users to log in. Trying to log in via SSH will f.e. report "System is booting up. Unprivileged users are not permitted to log in yet. Please come back later. For technical details, see pam_nologin(8).".
This will result in the subsequent agent launch to fail and the node will never be usable (unless the agent is later relaunched by manually clicking on it) and it will go on and try to provision the next node etc.
I worked around the problem by patching awaitInstanceSshAvailable() to also log in as the user that the agent will be launched with, which solves the above problem (patch is attached).
Cloud Configuration looks like this (values not listed are using their defaults):
Name: OCI - PHX - GBUCDSINT
Credentials: oci_api_key_gbucdsint
Instance Cap: 3
Instance Template:
Description: CPM OL8 Buildnode
Labels: ol8
Compartment: CEGBU-Aconex
Availability Domain: iRbP:PHX-AD-1
Image Compartment: CEGBU-Aconex
Image: jenkins-buildnode-cpm-ol8
Shape: VM.Standard2.4
Virtual Cloud Network Compartment: Networks
Virtual Cloud Network: CorpDev1-phx.vcn
Subnet Compartment: Networks
Subnet: snPhxPrShared1
SSH credentials: jenkins (SSH key for the jenkins user on the build nodes)
Remote FS Root: /data/jenkins
Stop on Idle Timeout: tried both on and off, doesn't really matter, result is the same whether it's freshly commissioned or just starting up an old instance
Tags
Namespace: gbuitops
Key: InstanceContact
Value: tobias.wildgruber@oracle.com
Instance Name Prefix: cpm-ol8