-
Bug
-
Resolution: Fixed
-
Major
-
None
-
Powered by SuggestiMate
the ssh-agent configuration isn't applied when pipeline is run on a docker container using the kubernetes plugin
as an example, this pipeline works fine:
node { stage('Pre-Build') { sshagent (credentials: ['jenkins-master-ssh']) { sh 'ssh -vT -o "StrictHostKeyChecking=no" git@github.com' } } }
the job will fail, but the console output will clearly show the error from github
You've successfully authenticated, but GitHub does not provide shell access.
whereas
podTemplate(label: 'jenkpod', containers: [containerTemplate(name: 'golang', image: 'golang:1.8', ttyEnabled: true, command: 'cat')]) { node ('jenkpod') { container('golang') { stage('Pre-Build') { sshagent (credentials: ['jenkins-master-ssh']) { sh 'ssh -vT -o "StrictHostKeyChecking=no" git@github.com' } } } } }
fails with public key error:
debug1: Authentications that can continue: publickey
debug1: Next authentication method: publickey
debug1: Trying private key: /root/.ssh/id_rsa
debug1: Trying private key: /root/.ssh/id_dsa
debug1: Trying private key: /root/.ssh/id_ecdsa
debug1: Trying private key: /root/.ssh/id_ed25519
debug1: No more authentication methods to try.
Permission denied (publickey).
this seems closely related to -JENKINS-32624sshagent{} ignored when executed in docker.image().inside
- duplicates
-
JENKINS-49370 Environment variables cannot be set within the container step
-
- Resolved
-
- is blocked by
-
JENKINS-47225 Kubernetes plugin not using cmd proc variables
-
- Resolved
-
- is related to
-
JENKINS-32624 sshagent{} ignored when executed in docker.image().inside{...}
-
- Resolved
-
-
JENKINS-40647 With Env not working after .10 k8 plugin update
-
- Resolved
-
-
JENKINS-42851 secretVolume not created read only
-
- Resolved
-
-
JENKINS-50437 Environment variables using PATH+SOMETHING syntax clear the previous env var
-
- Resolved
-
- links to
[JENKINS-42582] ssh-agent not applied in kubernetes container
if env vars are not getting saved, it's possibly the same reason that the ssh-agent doesn't persist
csanchez, Any progress or info on this? Perhaps I can help with this one. git ssh also seems to fail the same way.
We hit this last week. When we do "sh 'sudo ps aux'" in the container, we notice that there is an ssh-agent process running. Our theory, which we have not verified, is that if we set the environment variable SSH_AUTH_SOCK to the agent's socket file, it will work.
If you will check available env variables with "printenv" - you will see that SSH_AUTH_SOCK variable is not set as well as other Jenkins env variables, and I believe this is the issue, but I don't know if this is a bug or feature of PodTemplate design.
Our theory, which we have not verified, is that if we set the environment variable SSH_AUTH_SOCK to the agent's socket file, it will work.
I can confirm it works.
I guess the container step needs to access the enclosing node environment and populate from there
some possibly related comments
https://github.com/jenkinsci/kubernetes-plugin/pull/204#issuecomment-324023861
>I guess the container step needs to access the enclosing node environment and populate from there
I could be demonstrating my ignorance, but this feels off. The ssh-agent is being launched from within the container scope itself. One would expect an command issued within the container block to affect only the container itself (not node scope), and arguably vice-versa (container env is isolated from the node - which itself is the jnlp container). Basically, a solution that involves the interacting with node scope env would violate the isolation we expect from containers.
I'd like to post a variation on this issue that we're hitting. Please shout if I should punt this to a new ticket.
Like others where, we are trying to get sshagent working within container, and like others, the SSH_AUTH_SOCK env var is not populated (ssh-agent is launched and running).
I was looking at the workaround mentioned by aroqand clabu609, namely scripting export of SSH_AUTH_SOCK. However, it appears that while the ssh-agent is running in the desired container, the key has not be successfully added to the agent ("ssh-add -l" returns no identities).
Looking more closely, I see that the expected key temp file exists in the filesystem of the container in the temp workspace, but the key file content has 0 bytes. This key is used successfully in other jobs (without use of container).
The job log file has the following signature:
[Pipeline] sshagent [ssh-agent] Using credentials my-github (key for the github account (my-github)) [ssh-agent] Looking for ssh-agent implementation... [ssh-agent] Exec ssh-agent (binary ssh-agent on a remote machine) Executing shell script inside container [my-container] of pod [my-pod] Executing command: "ssh-agent" printf "EXITCODE %3d" $?; exit
with no other clear indicator of failure. The one difference from the logs of other successful jobs being the line
printf "EXITCODE %3d" $?; exit
Thanks in advance for any guidance.
I ended up with adding
sh(script: "export SSH_AUTH_SOCK=${env.SSH_AUTH_SOCK}; ${shellCommand}")
for every sh command I execute (you can create a global library wrapper action for it).
You should do it inside "sshagent" step of course.
Hope it will help.
aroq thank you for clarification on the workaround. I confirm it sets the env var as desired.
I'm still hitting my problem with the key not being added - investigating and will report back.
Perhaps I spoke to soon -
sh(script: "export SSH_AUTH_SOCK=${env.SSH_AUTH_SOCK}; ${shellCommand}")
doesn't have quite the effect hoped.
When running
podTemplate(...)]) { node ('the-pod') { container('the-container') { stage('Pre-Build') { sshagent(...) { sh(script: "export SSH_AUTH_SOCK=${env.SSH_AUTH_SOCK}; ${shellCommand}") } } } } }
env.SSH_AUTH_SOCK resolves to the sock file from the jnlp container of the-pod. This of course by default is not available to container the-container, unless the sock is in a directory mounted within both containers (default /tmp). I suppose I could share /tmp between the containers - are you doing something like this aroq?
if you don't see file being created it's because they are created in the jnlp filesystem. You would have to either share that dir with all containers or force the file to be created in the workspace, because that's already shared
>if you don't see file being created it's because they are created in the jnlp filesystem.
If this is regarding the key file being empty on disk, I don't think this is the case. The key file is created in the workspace that is mounted by both jnlp container and non-jnlp container.
I took the approach of sharing "/tmp" between the jnlp container and the non-jnlp container. I defined "/tmp" as an emptyDir mount in the pod template.
Doing this, in combination with injecting "SSH_AUTH_SOCK=${env.SSH_AUTH_SOCK}" into the script creates what appears to be a working solution.
Please let me know if someone needs this approach more thoroughly documented.
This is related to JENKINS-47225 - the issue is that the plugin expects to see the response back from the executed commands to be able to load keys into the socket file that is created.
The pull request #236 that was merged in version 1.1.1 doesn't seem to fix this issue.
I'm using version 1.1.2 with a declarative pipeline:
pipeline { agent { kubernetes { cloud 'kubernetes' label 'mypod' containerTemplate { imagePullSecrets 'docker-secret' name 'dev' image 'myImage' ttyEnabled true } } } stages { stage("setup") { steps { container('dev') { sshagent (credentials: ['git-repos']) { sh "my_setup_script" } } } } ....
my_setup_script is a bash script that clones git repos.
The pipeline log is:
Cloning into 'my_repo'... OpenSSH_7.2p2 Ubuntu-4ubuntu2.2, OpenSSL 1.0.2g 1 Mar 2016 debug1: Reading configuration data /etc/ssh/ssh_config debug1: /etc/ssh/ssh_config line 19: Applying options for * debug1: Connecting to my_server [my_ip] port my_port. debug1: Connection established. debug1: permanently_set_uid: 0/0 debug1: SELinux support disabled debug1: key_load_public: No such file or directory debug1: identity file /root/.ssh/id_rsa type -1 debug1: key_load_public: No such file or directory debug1: identity file /root/.ssh/id_rsa-cert type -1 debug1: key_load_public: No such file or directory debug1: identity file /root/.ssh/id_dsa type -1 debug1: key_load_public: No such file or directory debug1: identity file /root/.ssh/id_dsa-cert type -1 debug1: key_load_public: No such file or directory debug1: identity file /root/.ssh/id_ecdsa type -1 debug1: key_load_public: No such file or directory debug1: identity file /root/.ssh/id_ecdsa-cert type -1 debug1: key_load_public: No such file or directory debug1: identity file /root/.ssh/id_ed25519 type -1 debug1: key_load_public: No such file or directory debug1: identity file /root/.ssh/id_ed25519-cert type -1 debug1: Enabling compatibility mode for protocol 2.0 debug1: Local version string SSH-2.0-OpenSSH_7.2p2 Ubuntu-4ubuntu2.2 debug1: Remote protocol version 2.0, remote software version SSHD-UNKNOWN debug1: no match: SSHD-UNKNOWN debug1: Authenticating to my_server:my_port as 'git' debug1: SSH2_MSG_KEXINIT sent debug1: SSH2_MSG_KEXINIT received debug1: kex: algorithm: ecdh-sha2-nistp256 debug1: kex: host key algorithm: ssh-rsa debug1: kex: server->client cipher: aes128-ctr MAC: hmac-sha2-256 compression: none debug1: kex: client->server cipher: aes128-ctr MAC: hmac-sha2-256 compression: none debug1: sending SSH2_MSG_KEX_ECDH_INIT debug1: expecting SSH2_MSG_KEX_ECDH_REPLY debug1: Server host key: ssh-rsa SHA256:blahblahbalh debug1: read_passphrase: can't open /dev/tty: No such device or address Host key verification failed. fatal: Could not read from remote repository. Please make sure you have the correct access rights and the repository exists.
The same step works ok when using a docker agent instead of using the kubernetes plugin.
Just want to confirm fgsalomon's observation, that this bug still persists. I tested with Kubernetes plugin 1.1.3, and observed similar behavior and logs.
However, aroq's workaround (i.e. manually setting SSH_AUTH_SOCK) works fine for me. Also, I didn't have to tinker with any mounts, as suggested in some of the other comment here.
aroq's workaround doesn't work for us. This stage returns the same output as my previous one:
... stage("Setup") { environment { GIT_SSH_COMMAND = "ssh -v" } steps { container('dev') { sshagent (credentials: ['git-repos']) { sh(script: "export SSH_AUTH_SOCK=${env.SSH_AUTH_SOCK}; bash my_setup_script") } } } } ...
Probably you need to check that SSH_AUTH_SOCK env var is really set in all steps (before bash script call, inside bash script, etc) to figure the issue out.
I can say that there was an issue with race conditions in the newer version of Kubernetes plugin > 1.1 (I believe you can find this issue by the links in the above comments), causing wrong behaviour of SSH agent. I had to downgrade to 1.1 version to keep it working, but I didn't try 1.1.3 version yet.
I'm sorry, aroq workaround did work. It turns out we were having key host verification issues while running the container on Kubernetes.
csanchez What's the status of this bug? It was supposedly fixed, now it's reopened, and I can reproduce the issue with version 1.2 of the plugin.
The workaround does work for me too, but it's not a proper solution.
I believe this problem is a consequence of JENKINS-49370 by the way.
seakip18 was working on it, not sure what the state is
The test here is passing, so at least a PR make that break would help
https://github.com/jenkinsci/kubernetes-plugin/blob/master/src/test/resources/org/csanchez/jenkins/plugins/kubernetes/pipeline/sshagent.groovy
csanchez I think it's only broken for the declarative pipeline syntax, that's why the test probably works.
berni_ no, I use the scripted pipeline syntax, and it's broken for me regardless.
I have the same issue.
kubernetes-plugin 1.2
jenkins latest
pipeline
podTemplate( label: 'myscript-builder', serviceAccount: 'jenkins', containers: [ containerTemplate(name: 'jnlp', image: 'jenkinsci/jnlp-slave'), containerTemplate(name: 'go', image: 'golang:1.9', command: 'cat', ttyEnabled: true), ]) { node('myscript-builder') { stage('Checkout the repository') { dir('src/myscript') { checkout scm } } withEnv(["GOPATH=${pwd()}", "CGO_ENABLED=0"]) { stage('Install dependencies') { sshagent(['deploy-key-one', 'deploy-key-two']) { container('go') { sh "ssh-add -l" sh "ssh -T git@github.com" } } } } } }
result
[Pipeline] stage [Pipeline] { (Install dependencies) [Pipeline] sshagent [ssh-agent] Using credentials git (deploy-key-one) [ssh-agent] Using credentials git (deploy-key-two) [ssh-agent] Looking for ssh-agent implementation... [ssh-agent] Exec ssh-agent (binary ssh-agent on a remote machine) $ ssh-agent SSH_AUTH_SOCK=/tmp/ssh-fpyTHMMWhEHH/agent.153 SSH_AGENT_PID=155 $ ssh-add /home/jenkins/workspace/myscript@tmp/private_key_7693712685218725708.key Identity added: /home/jenkins/workspace/myscript@tmp/private_key_7693712685218725708.key (/home/jenkins/workspace/myscript@tmp/private_key_7693712685218725708.key) $ ssh-add /home/jenkins/workspace/myscript@tmp/private_key_6381387346580660507.key Identity added: /home/jenkins/workspace/myscript@tmp/private_key_6381387346580660507.key (/home/jenkins/workspace/myscript@tmp/private_key_6381387346580660507.key) [ssh-agent] Started. [Pipeline] { [Pipeline] container [Pipeline] { [Pipeline] sh [myscript] Running shell script + ssh-add -l Error connecting to agent: No such file or directory [Pipeline] } [Pipeline] // container $ ssh-agent -k unset SSH_AUTH_SOCK; unset SSH_AGENT_PID; echo Agent pid 155 killed; [ssh-agent] Stopped. [Pipeline] } [Pipeline] // sshagent [Pipeline] } [Pipeline] // stage
As you can see, ssh-add -l doesn't seem to find the agent at all. This thus breaks anything that requires such thing.
Also, I'm not sure if it is related to JENKINS-49370, because as you can see, in my case, the `sshagent(){}` bit is outside the container. Mind you, if I do put it inside the container, it blows up further.
Finally, the workaround previously mentioned, doesn't work either:
[Pipeline] sh [myscript] Running shell script + export SSH_AUTH_SOCK=/tmp/ssh-IXNWI9bUU1ow/agent.147 + ssh-add -l Error connecting to agent: No such file or directory [Pipeline] }
OK, so to clarify, there are multiple possible problems with ssh-agent, and not all of them are caused by the plugin.
1. sshagent step outside the container step
Example:
sshagent(['credential-id']) { container('container') { sh 'ssh-add -l' } }
In this scenario the ssh agent is running in a different container than the ssh-add program. This setup will typically not work, and it's not the fault of the plugin. The containers are not expected to be able to communicate with each other.
When you do this, you get an error message like this:
+ ssh-add -l Error connecting to agent: No such file or directory
It means two things:
- The SSH_AUTH_SOCK environment variable is set, and therefore ssh-add tries to open the socket.
- The socket does not exist in the container, because it was created in a different container.
Note: In theory this solution may work if the /tmp volume (where the socket is placed) is shared between the jnlp container and the container where the ssh-add command is executed, but I haven't tested it.
2. sshagent step inside the container step
Example:
container('container') { sshagent(['credential-id']) { sh 'ssh-add -l' } }
In this scenario both the ssh agent and the ssh-add program are running in the same container. This should work, but it does not for me. There is even a test case for this scenario, which fails when I run it on my Jenkins.
The problem seems to be that the sshagent step defines a new environment variable, SSH_AUTH_SOCK, that the plugin doesn't make available for the enclosed sh command. This is the error message you get instead:
+ ssh-add -l Could not open a connection to your authentication agent.
There's a workaround for this problem: set the environment variable yourself! Like this:
container('container') { sshagent(['credential-id']) { sh "SSH_AUTH_SOCK=${env.SSH_AUTH_SOCK} ssh-add -l" } }
This is the problem the plugin shall fix.
3. you're using the wrong credentials
Let's mention this third possibility for errors, because it has happened before to someone in this thread. Even if you set up the ssh agent properly and the ssh program can talk to it via the socket, you may still be using the wrong credential and fail to log in.
Obviously, this is not the plugin's fault.
I am also experiencing this behavior. I have tried the different options but I am still not able to clone one of my repos. I was able to ssh into my git server by doing this
sh 'ssh -vT -o "StrictHostKeyChecking=no" git@github.com'
Of course it told me it couldn't go further because of tty. I have confirmed that the key fingerprint I get from ssh-add -l matches the key of the user I am using via the Github ui.
Right now I have /tmp mounted and shared via both containers and that is how I get the ssh command above to work. I also exported the SSH_AUTH_SOCK variable. After all of that I still can't clone my repo. I think I saw a mention about going back to version 1.1 of the plugin should I try that?
As a workaround you can use SSH_AUTH_SOCK=$(find /tmp/ssh-* -type s) ssh user@server
csanchez I went ahead and tried that but still no luck. When I do a git clone I get the message that says I might not have permission. Just to confirm I used that key to checkout the source locally on my box and it works just fine. So the user has permissions to clone the repo.
Alright I was able to get it working. The last piece I was fighting with was Host key verification.
I think I have found a solution for this - we can use override the computer getEnvironment to drop the computer level properties(JAVA_HOME, etc) and use the built in variables without issue.
This would avoid the need for workarounds.
Thank you seakip18, this is awesome! Everything is working fine now with version 1.3.2.
Using version 1.3.2 most of the times it seems to work but sometimes I get the error
ERROR: Failed to run ssh-add
(Almost) complete log:
First time build. Skipping changelog. [Pipeline] } [Pipeline] // stage [Pipeline] container [Pipeline] { [Pipeline] withCredentials [Pipeline] { [Pipeline] withEnv [Pipeline] { [Pipeline] timeout Timeout set to expire in 1 hr 0 min [Pipeline] { [Pipeline] stage [Pipeline] { (Test) [Pipeline] container [Pipeline] { [Pipeline] sshagent [ssh-agent] Using credentials git-repos (SSH credential for Git repos) [ssh-agent] Looking for ssh-agent implementation... [ssh-agent] Exec ssh-agent (binary ssh-agent on a remote machine) Executing shell script inside container [ubuntu] of pod [bowiepod-47nkj-nld78] Executing command: "ssh-agent" printf "EXITCODE %3d" $?; exit SSH_AUTH_SOCK=/tmp/ssh-oYnHo47QlMEc/agent.22; export SSH_AUTH_SOCK; SSH_AGENT_PID=23; export SSH_AGENT_PID; echo Agent pid 23; EXITCODE 0SSH_AUTH_SOCK=/tmp/ssh-oYnHo47QlMEc/agent.22 SSH_AGENT_PID=23 Executing shell script inside container [ubuntu] of pod [bowiepod-47nkj-nld78] Executing command: "ssh-add" "/home/jenkins/workspace/bowie_feature_doc_upload-2NDL5PPAHMMAWBMRX6CFCZI7VDOLJYNMA6HK2HG3NCDKNGOA6YVA@tmp/private_key_8883554255349387919.key" printf "EXITCODE %3d" $?; exit EXITCODE 0EXITCODE 0Identity added: /home/jenkins/workspace/bowie_feature_doc_upload-2NDL5PPAHMMAWBMRX6CFCZI7VDOLJYNMA6HK2HG3NCDKNGOA6YVA@tmp/private_key_8883554255349387919.key (/home/jenkins/workspace/bowie_feature_doc_upload-2NDL5PPAHMMAWBMRX6CFCZI7VDOLJYNMA6HK2HG3NCDKNGOA6YVA@tmp/private_key_8883554255349387919.key) Identity added: /home/jenkins/workspace/bowie_feature_doc_upload-2NDL5PPAHMMAWBMRX6CFCZI7VDOLJYNMA6HK2HG3NCDKNGOA6YVA@tmp/private_key_8883554255349387919.key (/home/jenkins/workspace/bowie_feature_doc_upload-2NDL5PPAHMMAWBMRX6CFCZI7VDOLJYNMA6HK2HG3NCDKNGOA6YVA@tmp/private_key_8883554255349387919.key) [Pipeline] // sshagent [Pipeline] } [Pipeline] // container Post stage [Pipeline] junit Recording test results No test report files were found. Configuration error? [Pipeline] } [Pipeline] // stage [Pipeline] stage [Pipeline] { (Doc) Stage 'Doc' skipped due to earlier failure(s) [Pipeline] } [Pipeline] // stage [Pipeline] stage [Pipeline] { (Lint) Stage 'Lint' skipped due to earlier failure(s) [Pipeline] } [Pipeline] // stage [Pipeline] stage [Pipeline] { (SonarQube) Stage 'SonarQube' skipped due to earlier failure(s) [Pipeline] } [Pipeline] // stage [Pipeline] stage [Pipeline] { (Deploy) Stage 'Deploy' skipped due to earlier failure(s) [Pipeline] } [Pipeline] // stage [Pipeline] stage [Pipeline] { (Declarative: Post Actions) [Pipeline] } [Pipeline] // stage [Pipeline] } [Pipeline] // timeout [Pipeline] } [Pipeline] // withEnv [Pipeline] } [Pipeline] // withCredentials [Pipeline] } [Pipeline] // container [Pipeline] } [Pipeline] // node [Pipeline] } [Pipeline] // podTemplate [Pipeline] End of Pipeline ERROR: Failed to run ssh-add Finished: FAILURE
Can it be a race condition?
We also sometimes get the exactly same error. It often works, but randomly throws this error.
And it's not clear why it fails then. There is no error message except that ssh-add failed to run.
With Kub ver 1.12.4 and kub-Credential 0.3.1, I am seeing this ssh-agent issue. Can you please check.
[ssh-agent] Looking for ssh-agent implementation...
Could not find ssh-agent: IOException: container [dind] does not exist in pod [jenkins-slave-lcmzb-0g359]
Check if ssh-agent is installed and in PATH
[ssh-agent] Java/JNR ssh-agent
And then finally, it shows this error -
[Pipeline] End of Pipeline
java.io.IOException: container [dind] does not exist in pod [jenkins-slave-lcmzb-0g359]
at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1.waitUntilContainerIsReady(ContainerExecDecorator.java:435)
akmjenkins whatever you are seeing, it sounds like an unrelated issue, and perhaps just a user error—incorrect pod definition.
ac77 asking for help through a closed Jenkins issue report is much less likely to receive help than asking for help through a message on the Jenkins discourse site, the Jenkins gitter chat channel, or the Jenkins user mailing list. There are 23 people watching this issue and likely 10x more than that reading those other lists. Please use the Jenkins discourse site, the Jenkins gitter chat channel, or the Jenkins user mailing list to ask for help.
Hi all !
I'm getting the error "Host key verification failed" when trying to authenticate with ssh for a git push operation from a worker container in Kubernetes pod.
I run podTemplate with a worker image based on alpine with git version 2.32.0
Jenkins version is 2.303.3
Kubernetes plugin version 1.30.7
The first operation of git clone succeeds but then when trying to push tags to a remote (gitlab) repository I get the error.
Attached is the Jenkins console output.
Am I missing something ?
Any help is appreciated.
Thanks,
AC
ac77 asking for help through a closed Jenkins issue report is much less likely to receive help than asking for help through a message on the Jenkins discourse site, the Jenkins gitter chat channel, or the Jenkins user mailing list. There are 23 people watching this issue and likely 10x more than that reading those other lists. Please use the Jenkins discourse site, the Jenkins gitter chat channel, or the Jenkins user mailing list to ask for help.
one thing I saw in the logs, is that if `sshagent () {}` is the first command in the container (like the example above), the workspace isn't created yet
fixing that , the ssh-agent plugin still doesn't provide the key as expected