-
Bug
-
Resolution: Fixed
-
Major
-
None
-
Jenkins 2.32.2
-
Powered by SuggestiMate
I tried to narrow this bug down, but there isn't much information. We just upgraded to all newest plugins, but unfortunately we upgraded a lot at once, so no idea which one.
This is spamming out logs every few seconds:
00:19:42.695 Cannot contact kubernetes-ef39fe82c8a541be84bd780e4d7c1ddb-ce4d47fc96bfc: java.io.IOException: corrupted content in /home/jenkins/workspace/Robusta_robusta_develop-6EPNQBJK5BYEXOJV6L45MMZZGUIP7WO4Y6EGRUYNFFMRC7B2GL3A@tmp/durable-96fa79b7/pid: java.lang.NumberFormatException: For input string: "" 00:19:57.758 Cannot contact kubernetes-ef39fe82c8a541be84bd780e4d7c1ddb-ce4d47fc96bfc: java.io.IOException: corrupted content in /home/jenkins/workspace/Robusta_robusta_develop-6EPNQBJK5BYEXOJV6L45MMZZGUIP7WO4Y6EGRUYNFFMRC7B2GL3A@tmp/durable-96fa79b7/pid: java.lang.NumberFormatException: For input string: "" 00:20:12.769 Cannot contact kubernetes-ef39fe82c8a541be84bd780e4d7c1ddb-ce4d47fc96bfc: java.io.IOException: corrupted content in /home/jenkins/workspace/Robusta_robusta_develop-6EPNQBJK5BYEXOJV6L45MMZZGUIP7WO4Y6EGRUYNFFMRC7B2GL3A@tmp/durable-96fa79b7/pid: java.lang.NumberFormatException: For input string: ""
Add information from comments:
The same kind of problem has occurred in my Jenkins running on GKE. This problem occurred by upgrading Pipeline Nodes and Processes Plugin from 2.8 to 2.9. I am confirming that this problem temporarily resolves by downgrading that plugin from 2.9 to 2.8.
BTW workflow-durable-task-step 2.9 does add this log message but it is just exposing a problem that was already there, and simply being suppressed unless you were running a sufficiently fine logger. The problem is that this code is seeing a file which is supposed to contain a number once created, whereas it is being created as empty for some reason.
Again the bug probably exists in all versions, it is only printed to the build log as of 2.9. You can add a FINE logger to org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep to verify.
- blocks
-
JENKINS-40825 "Pipe not connected" errors when running multiple builds simultaneously
-
- Resolved
-
-
JENKINS-39550 "Pipe not connected" with parallel steps
-
- Resolved
-
-
JENKINS-39664 Docker builds do not work with Kubernetes Pipeline plugin
-
- Closed
-
-
JENKINS-44150 Pipeline job not waiting for all triggered parallel jobs
-
- Closed
-
- is duplicated by
-
JENKINS-42316 Log report corrupted content in durable-*/pid
-
- Resolved
-
-
JENKINS-44152 kubernetes-plugin java.io.IOException
-
- Closed
-
- is related to
-
JENKINS-46087 stash fails in kubernetes container
-
- Resolved
-
- relates to
-
JENKINS-61950 Environment variables with '$' have '$$' when used in sh step inside a container step.
-
- Resolved
-
- links to
[JENKINS-42048] Cannot Connect, PID NumberFormatException
I'm experiencing the same recurring error message
the slave is using the golang docker image, and the pipeline is setup like this:
podTemplate(label: 'jenkpod', containers: [ containerTemplate(name: 'golang', image: 'golang:1.8', ttyEnabled: true, command: 'cat') ]) { node ('jenkpod') { container('golang') { stage('Pre-Build') { checkout scm sh 'make get' } } } }
the events for the slave pod:
Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 13m 13m 1 {default-scheduler } Normal Scheduled Successfully assigned kubernetes-f1e4a27973a941c2af08bebbc74cc080-10bf4e8527c4 to gke-jenkins 13m 13m 1 {kubelet gke-jenkins} spec.containers{golang} Normal Pulled Container image "golang:1.8" already present on machine 13m 13m 1 {kubelet gke-jenkins} spec.containers{golang} Normal Created Created container with docker id 97e4b71e323e; Security:[seccomp=unconfined] 13m 13m 1 {kubelet gke-jenkins} spec.containers{golang} Normal Started Started container with docker id 97e4b71e323e 13m 13m 1 {kubelet gke-jenkins} spec.containers{jnlp} Normal Pulled Container image "jenkinsci/jnlp-slave:alpine" already present on machine 13m 13m 1 {kubelet gke-jenkins} spec.containers{jnlp} Normal Created Created container with docker id 628623d03379; Security:[seccomp=unconfined] 13m 13m 1 {kubelet gke-jenkins} spec.containers{jnlp} Normal Started Started container with docker id 628623d03379
For more context we are running on the Google container engine (hosted k8), The weird thing is that it seems to be working right, i.e the pipeline builds, even with the constant exception.
The exception starts once the gradle shell is run:
[Pipeline] } [Pipeline] // stage [Pipeline] stage [Pipeline] { (Build & run unit tests) [Pipeline] withEnv [Pipeline] { [Pipeline] sh 00:01:05.567 [Robusta_robusta_develop-6EPNQBJK5BYEXOJV6L45MMZZGUIP7WO4Y6EGRUYNFFMRC7B2GL3A] Running shell script 00:01:05.572 Executing shell script inside container [gcloud-jdk7] of pod [kubernetes-9f544f8d984342c8bfa152fd3134608b-d1fdf7ba230b9] 00:01:05.653 Executing command: sh -c echo $$ > '/home/jenkins/workspace/Robusta_robusta_develop-6EPNQBJK5BYEXOJV6L45MMZZGUIP7WO4Y6EGRUYNFFMRC7B2GL3A@tmp/durable-aa4bf913/pid'; jsc=durable-ca85172bfb8670e4c44f30557e14af18; JENKINS_SERVER_COOKIE=$jsc '/home/jenkins/workspace/Robusta_robusta_develop-6EPNQBJK5BYEXOJV6L45MMZZGUIP7WO4Y6EGRUYNFFMRC7B2GL3A@tmp/durable-aa4bf913/script.sh' > '/home/jenkins/workspace/Robusta_robusta_develop-6EPNQBJK5BYEXOJV6L45MMZZGUIP7WO4Y6EGRUYNFFMRC7B2GL3A@tmp/durable-aa4bf913/jenkins-log.txt' 2>&1; echo $? > '/home/jenkins/workspace/Robusta_robusta_develop-6EPNQBJK5BYEXOJV6L45MMZZGUIP7WO4Y6EGRUYNFFMRC7B2GL3A@tmp/durable-aa4bf913/jenkins-result.txt' 00:01:05.694 # cd /home/jenkins/workspace/Robusta_robusta_develop-6EPNQBJK5BYEXOJV6L45MMZZGUIP7WO4Y6EGRUYNFFMRC7B2GL3A 00:01:05.694 sh -c echo $$ > '/home/jenkins/workspace/Robusta_robusta_develop-6EPNQBJK5BYEXOJV6L45MMZZGUIP7WO4Y6EGRUYNFFMRC7B2GL3A@tmp/durable-aa4bf913/pid'; jsc=durable-ca85172bfb8670e4c44f30557e14af18; JENKINS_SERVER_COOKIE=$jsc '/home/jenkins/workspace/Robusta_robusta_develop-6EPNQBJK5BYEXOJV6L45MMZZGUIP7WO4Y6EGRUYNFFMRC7B2GL3A@tmp/durable-aa4bf913/script.sh' > '/home/jenkins/workspace/Robusta_robusta_develop-6E# PNQBJK5BYEXOJV6L45MMZZGUIP7WO4Y6EGRUYNFFMRC7B2GL3A@tmp/durable-aa4bf913/jenkins-log.txt' 2>&1; echo $? > '/home/jenkins/workspace/Robusta_robusta_develop-6EPNQBJK5BYEXOJV6L45MMZZGUIP7WO4Y6EGRUYNFFMRC7B2GL3A@tmp/durable-aa4bf913/jenkins-result.txt' 00:01:05.694 exit 00:01:05.909 + ./gradlew --stacktrace --parallel buildUnitTest 00:01:05.909 Downloading https://services.gradle.org/distributions/gradle-3.3-all.zip 00:01:05.995 Cannot contact kubernetes-9f544f8d984342c8bfa152fd3134608b-d1fdf7ba230b9: java.io.IOException: corrupted content in /home/jenkins/workspace/Robusta_robusta_develop-6EPNQBJK5BYEXOJV6L45MMZZGUIP7WO4Y6EGRUYNFFMRC7B2GL3A@tmp/durable-aa4bf913/pid: java.lang.NumberFormatException: For input string: ""
Our jenkinsfile is pretty massive, but here is the core (has been edited):
podTemplate(label: 'JavaPod', containers: [ containerTemplate( name: 'gcloud-jdk7', image: 'gcr.io/pc-infrastructure/robusta-jenkins-gcloud-jdk7', ttyEnabled: true, args: 'cat', command: '/bin/sh -c', alwaysPullImage: true, workingDir: '/home/jenkins', resourceRequestCpu: '2', resourceRequestMemory: '8Gi', resourceLimitCpu: '5', resourceLimitMemory: '9Gi', ), containerTemplate( name: 'jnlp', image: 'jenkinsci/jnlp-slave:alpine', args: '${computer.jnlpmac} ${computer.name}', resourceRequestCpu: '100m', resourceRequestMemory: '500Mi', resourceLimitCpu: '500m', resourceLimitMemory: '1Gi', ) ]) { node('JavaPod') { container('gcloud-jdk7') { timeout(30) { //assume something is wrong if it takes an half an hour stage('checkout source') { checkout scm } switch (env.BRANCH_NAME) { case 'develop': buildUnitTest() runIntegrationTests('local') } } } void buildUnitTest() { stage('Build & run unit tests') { withEnv(runEnv) { try { def command = './gradlew --stacktrace --parallel buildUnitTest' if (env.BRANCH_NAME == 'master'){ command = 'export ROBUSTA_PROD_ANALYTICS=true && ' + command } sh command } catch (Exception e) { junit allowEmptyResults: true, testResults: '**/build/test-results/**/*.xml' step([$class: 'CheckStylePublisher', canComputeNew: false, defaultEncoding: '', healthy: '', pattern: '**/main.xml,**/test.xml', unHealthy: '']) throw e } junit allowEmptyResults: true, testResults: '**/build/test-results/**/*.xml' step([$class: 'CheckStylePublisher', canComputeNew: false, defaultEncoding: '', healthy: '', pattern: '**/main.xml,**/test.xml', unHealthy: '']) } } } void runIntegrationTests(String targetEnv) { stage('Run integration tests') { withEnv(runEnv) { try { sh "./gradlew --stacktrace :robusta-integration-tests:integrationTest -PtestEnv=${targetEnv}" } catch (Exception e) { junit allowEmptyResults: true, testResults: '**/build/test-results/**/*.xml' throw e } junit allowEmptyResults: true, testResults: '**/build/test-results/**/*.xml' } } }
kubectl describe pod kubernetes-9f544f8d984342c8bfa152fd3134608b-d1fdf7ba230b9 Name: kubernetes-9f544f8d984342c8bfa152fd3134608b-d1fdf7ba230b9 Namespace: default Node: gke-pci-default-pool-44b03267-0ztl/10.140.0.2 Start Time: Thu, 16 Feb 2017 09:38:06 +1100 Labels: jenkins=slave jenkins/JavaPod=true Status: Running IP: 10.40.20.94 Controllers: <none> Containers: gcloud-jdk7: Container ID: docker://28c3b60ae04d952cce366d5a03c2d950a171594828fd19446fe2aa9ed379dd33 Image: gcr.io/pc-infrastructure/robusta-jenkins-gcloud-jdk7 Image ID: docker://sha256:16e905ffe4f3393f6ee4b5125971a3029d6162ca0d3db5b2973f1f13b6201c3f Port: Command: /bin/sh -c Args: cat Limits: cpu: 5 memory: 9Gi Requests: cpu: 2 memory: 8Gi State: Running Started: Thu, 16 Feb 2017 09:38:07 +1100 Ready: True Restart Count: 0 Volume Mounts: /home/jenkins from workspace-volume (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-y9hsd (ro) Environment Variables: JENKINS_SECRET: 16b02724728739b72e0b559940adc7b5da29e9e190e8a35e858cece4bbc92346 JENKINS_NAME: kubernetes-9f544f8d984342c8bfa152fd3134608b-d1fdf7ba230b9 JENKINS_LOCATION_URL: https://build-robusta.papercut.software/ JENKINS_URL: http://kubectl describe pod kubernetes-9f544f8d984342c8bfa152fd3134608b-d1fdf7ba230b9 Name: kubernetes-9f544f8d984342c8bfa152fd3134608b-d1fdf7ba230b9 Namespace: default Node: gke-pci-default-pool-44b03267-0ztl/10.140.0.2 Start Time: Thu, 16 Feb 2017 09:38:06 +1100 Labels: jenkins=slave jenkins/JavaPod=true Status: Running IP: 10.40.20.94 Controllers: <none> Containers: gcloud-jdk7: Container ID: docker://28c3b60ae04d952cce366d5a03c2d950a171594828fd19446fe2aa9ed379dd33 Image: gcr.io/pc-infrastructure/robusta-jenkins-gcloud-jdk7 Image ID: docker://sha256:16e905ffe4f3393f6ee4b5125971a3029d6162ca0d3db5b2973f1f13b6201c3f Port: Command: /bin/sh -c Args: cat Limits: cpu: 5 memory: 9Gi Requests: cpu: 2 memory: 8Gi State: Running Started: Thu, 16 Feb 2017 09:38:07 +1100 Ready: True Restart Count: 0 Volume Mounts: /home/jenkins from workspace-volume (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-y9hsd (ro) Environment Variables: JENKINS_SECRET: 16b02724728739b72e0b559940adc7b5da29e9e190e8a35e858cece4bbc92346 JENKINS_NAME: kubernetes-9f544f8d984342c8bfa152fd3134608b-d1fdf7ba230b9 JENKINS_LOCATION_URL: https://build-robusta.papercut.software/ JENKINS_URL: http://build-robusta JENKINS_JNLP_URL: http://build-robusta/computer/kubernetes-9f544f8d984342c8bfa152fd3134608b-d1fdf7ba230b9/slave-agent.jnlp HOME: /home/jenkins jnlp: Container ID: docker://b27bb762a03525763aee7d2a60a85b4c3331aa91c6ac7d40b40693f570c1b564 Image: jenkinsci/jnlp-slave:alpine Image ID: docker://sha256:254fd665eaf0229f38295a9eac6c7f9bf32a2f450ecbcc8212f3e53b96dd339d Port: Args: 16b02724728739b72e0b559940adc7b5da29e9e190e8a35e858cece4bbc92346 kubernetes-9f544f8d984342c8bfa152fd3134608b-d1fdf7ba230b9 Limits: cpu: 500m memory: 1Gi Requests: cpu: 100m memory: 500Mi State: Running Started: Thu, 16 Feb 2017 09:38:06 +1100 Ready: True Restart Count: 0 Volume Mounts: /home/jenkins from workspace-volume (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-y9hsd (ro) Environment Variables: JENKINS_SECRET: <secret> JENKINS_NAME: kubernetes-9f544f8d984342c8bfa152fd3134608b-d1fdf7ba230b9 JENKINS_LOCATION_URL: <jenkins url> (I replaced) JENKINS_URL: <jenkins url> (I replaced) JENKINS_JNLP_URL: <jenkins url> (I replaced)/computer/kubernetes-9f544f8d984342c8bfa152fd3134608b-d1fdf7ba230b9/slave-agent.jnlp HOME: /home/jenkins Conditions: Type Status Initialized True Ready True PodScheduled True Volumes: workspace-volume: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: default-token-y9hsd: Type: Secret (a volume populated by a Secret) SecretName: default-token-y9hsd QoS Class: Burstable Tolerations: <none> Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 1m 1m 1 {default-scheduler } Normal Scheduled Successfully assigned kubernetes-9f544f8d984342c8bfa152fd3134608b-d1fdf7ba230b9 to gke-pci-default-pool-44b03267-0ztl 1m 1m 1 {kubelet gke-pci-default-pool-44b03267-0ztl} spec.containers{jnlp} Normal Pulled Container image "jenkinsci/jnlp-slave:alpine" already present on machine 1m 1m 1 {kubelet gke-pci-default-pool-44b03267-0ztl} spec.containers{jnlp} Normal Created Created container with docker id b27bb762a035; Security:[seccomp=unconfined] 1m 1m 1 {kubelet gke-pci-default-pool-44b03267-0ztl} spec.containers{jnlp} Normal Started Started container with docker id b27bb762a035 1m 1m 1 {kubelet gke-pci-default-pool-44b03267-0ztl} spec.containers{gcloud-jdk7} Normal Pulling pulling image "gcr.io/pc-infrastructure/robusta-jenkins-gcloud-jdk7" 1m 1m 1 {kubelet gke-pci-default-pool-44b03267-0ztl} spec.containers{gcloud-jdk7} Normal Pulled Successfully pulled image "gcr.io/pc-infrastructure/robusta-jenkins-gcloud-jdk7" 1m 1m 1 {kubelet gke-pci-default-pool-44b03267-0ztl} spec.containers{gcloud-jdk7} Normal Created Created container with docker id 28c3b60ae04d; Security:[seccomp=unconfined] 1m 1m 1 {kubelet gke-pci-default-pool-44b03267-0ztl} spec.containers{gcloud-jdk7} Normal Started Started container with docker id 28c3b60ae04d JENKINS_JNLP_URL: <jenkins url> (I replaced)/computer/kubernetes-9f544f8d984342c8bfa152fd3134608b-d1fdf7ba230b9/slave-agent.jnlp HOME: /home/jenkins jnlp: Container ID: docker://b27bb762a03525763aee7d2a60a85b4c3331aa91c6ac7d40b40693f570c1b564 Image: jenkinsci/jnlp-slave:alpine Image ID: docker://sha256:254fd665eaf0229f38295a9eac6c7f9bf32a2f450ecbcc8212f3e53b96dd339d Port: Args: 16b02724728739b72e0b559940adc7b5da29e9e190e8a35e858cece4bbc92346 kubernetes-9f544f8d984342c8bfa152fd3134608b-d1fdf7ba230b9 Limits: cpu: 500m memory: 1Gi Requests: cpu: 100m memory: 500Mi State: Running Started: Thu, 16 Feb 2017 09:38:06 +1100 Ready: True Restart Count: 0 Volume Mounts: /home/jenkins from workspace-volume (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-y9hsd (ro) Environment Variables: JENKINS_SECRET: <secret> JENKINS_NAME: kubernetes-9f544f8d984342c8bfa152fd3134608b-d1fdf7ba230b9 JENKINS_LOCATION_URL: <jenkins url> (I replaced) JENKINS_URL: <jenkins url> (I replaced) JENKINS_JNLP_URL: <jenkins url> (I replaced)/computer/kubernetes-9f544f8d984342c8bfa152fd3134608b-d1fdf7ba230b9/slave-agent.jnlp HOME: /home/jenkins Conditions: Type Status Initialized True Ready True PodScheduled True Volumes: workspace-volume: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: default-token-y9hsd: Type: Secret (a volume populated by a Secret) SecretName: default-token-y9hsd QoS Class: Burstable Tolerations: <none> Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 1m 1m 1 {default-scheduler } Normal Scheduled Successfully assigned kubernetes-9f544f8d984342c8bfa152fd3134608b-d1fdf7ba230b9 to gke-pci-default-pool-44b03267-0ztl 1m 1m 1 {kubelet gke-pci-default-pool-44b03267-0ztl} spec.containers{jnlp} Normal Pulled Container image "jenkinsci/jnlp-slave:alpine" already present on machine 1m 1m 1 {kubelet gke-pci-default-pool-44b03267-0ztl} spec.containers{jnlp} Normal Created Created container with docker id b27bb762a035; Security:[seccomp=unconfined] 1m 1m 1 {kubelet gke-pci-default-pool-44b03267-0ztl} spec.containers{jnlp} Normal Started Started container with docker id b27bb762a035 1m 1m 1 {kubelet gke-pci-default-pool-44b03267-0ztl} spec.containers{gcloud-jdk7} Normal Pulling pulling image "gcr.io/pc-infrastructure/robusta-jenkins-gcloud-jdk7" 1m 1m 1 {kubelet gke-pci-default-pool-44b03267-0ztl} spec.containers{gcloud-jdk7} Normal Pulled Successfully pulled image "gcr.io/pc-infrastructure/robusta-jenkins-gcloud-jdk7" 1m 1m 1 {kubelet gke-pci-default-pool-44b03267-0ztl} spec.containers{gcloud-jdk7} Normal Created Created container with docker id 28c3b60ae04d; Security:[seccomp=unconfined] 1m 1m 1 {kubelet gke-pci-default-pool-44b03267-0ztl} spec.containers{gcloud-jdk7} Normal Started Started container with docker id 28c3b60ae04d
The same kind of problem has occurred in my Jenkins running on GKE. This problem occurred by upgrading Pipeline Nodes and Processes Plugin from 2.8 to 2.9. I am confirming that this problem temporarily resolves by downgrading that plugin from 2.9 to 2.8.
daichirata Thanks, just tried this and can confirm that Pipeline Nodes and Processes Plugin 2.9 is the issue
iocanel He narrowed it down to this plugin / version
Awesome, that's really helpfull!
Unfortunately, I can only help regarding the kubernetes-plugin, can you reassign the issue to someone involved with `Pipeline Nodes and Processes Plugin`?
iocanel Will do, i believe jglick is part of that plugin team?
Also if you are in the kubernetes team and have time, could you have a quick look at JENKINS-40647 , I can confirm that it is still happening with the plugins updated to latest.
Unless there is some way to reproduce without Kubernetes, or someone using Kubernetes is willing to debug why an empty PID file is being written, I have nothing to go on.
I could probably find some time for debugging this week. Just let me know what to do.
Can you set the `workingDir` on the jnlp container and tell me if that makes any difference?
I tried that, the problem seems to be the same:
[Pipeline] node Running on test-55f1b2f811564473b115b8af4962a8ad-1b98a80d910eec in /var/tmp/workspace/Playground/test-JENKINS-42048 [Pipeline] { [Pipeline] sh [test-JENKINS-42048] Running shell script + echo Foo Foo + sleep 10 [Pipeline] container [Pipeline] { [Pipeline] sh [test-JENKINS-42048] Running shell script Executing shell script inside container [jnlp] of pod [test-55f1b2f811564473b115b8af4962a8ad-1b98a80d910eec] Executing command: sh -c echo $$ > '/var/tmp/workspace/Playground/test-JENKINS-42048@tmp/durable-b2162d32/pid'; jsc=durable-51e972f92a0e472b7953c41703e464ca; JENKINS_SERVER_COOKIE=$jsc '/var/tmp/workspace/Playground/test-JENKINS-42048@tmp/durable-b2162d32/script.sh' > '/var/tmp/workspace/Playground/test-JENKINS-42048@tmp/durable-b2162d32/jenkins-log.txt' 2>&1; echo $? > '/var/tmp/workspace/Playground/test-JENKINS-42048@tmp/durable-b2162d32/jenkins-result.txt' $ cd "/var/tmp/workspace/Playground/test-JENKINS-42048" sh -c echo $$ > '/var/tmp/workspace/Playground/test-JENKINS-42048@tmp/durable-b2162d32/pid'; jsc=durable-51e972f92a0e472b7953c41703e464ca; JENKINS_SERVER_COOKIE=$jsc '/var/tmp/workspace/Playground/test-JENKINS-42048@tmp/durable-b2162d32/script.sh' > '/var/tmp/workspace/Playground/test-JENKINS-42048@tmp/durable-b2162d32/jenkins-log.txt' 2>&1; echo $? > '/var/tmp/workspace/Playground/test-JENKINS-42048@tmp/durable-b2162d32/jenkins-result.txt' exit $ + echo Bar Bar + sleep 10 Cannot contact test-55f1b2f811564473b115b8af4962a8ad-1b98a80d910eec: java.io.IOException: corrupted content in /var/tmp/workspace/Playground/test-JENKINS-42048@tmp/durable-b2162d32/pid: java.lang.NumberFormatException: For input string: "" ...
The container and the directory is the same, only difference is that first I call it directly from within a node block, then I call it from within a container block.
We just upgraded the Kubernetes Plugin to 0.11 and are now seeing this as well. Seems to happen on our shell steps (but not all the shell steps).
Here is the pipeline code:
query = '\'Reservations[].Instances[].ImageId\'' imagesInUse = sh( returnStdout: true, script: """\ aws ec2 describe-instances \ --region ${region} \ --query ${query} \ --output text """.stripIndent() ).trim().split().toUnique() echo "Images in-use:\n${imagesInUse}"
...and here is the logging:
// Some comments here
+ aws ec2 describe-instances --region us-east-1 --query Reservations[].Instances[].ImageId --output text
Cannot contact kubernetes-d132fbe56cdf44b589aa03203db4ae55-f2bde129ba: java.io.IOException: corrupted content in /home/jenkins/workspace/ops-maintenance/ops-maintenance-scheduled/clean-up-amis/cleanup-baseline-amis@tmp/durable-fe439038/pid: java.lang.NumberFormatException: For input string: ""
Cannot contact kubernetes-d132fbe56cdf44b589aa03203db4ae55-f2bde129ba: java.io.IOException: corrupted content in /home/jenkins/workspace/ops-maintenance/ops-maintenance-scheduled/clean-up-amis/cleanup-baseline-amis@tmp/durable-fe439038/pid: java.lang.NumberFormatException: For input string: ""
Cannot contact kubernetes-d132fbe56cdf44b589aa03203db4ae55-f2bde129ba: java.io.IOException: corrupted content in /home/jenkins/workspace/ops-maintenance/ops-maintenance-scheduled/clean-up-amis/cleanup-baseline-amis@tmp/durable-fe439038/pid: java.lang.NumberFormatException: For input string: ""
Cannot contact kubernetes-d132fbe56cdf44b589aa03203db4ae55-f2bde129ba: java.io.IOException: corrupted content in /home/jenkins/workspace/ops-maintenance/ops-maintenance-scheduled/clean-up-amis/cleanup-baseline-amis@tmp/durable-fe439038/pid: java.lang.NumberFormatException: For input string: ""
Cannot contact kubernetes-d132fbe56cdf44b589aa03203db4ae55-f2bde129ba: java.io.IOException: corrupted content in /home/jenkins/workspace/ops-maintenance/ops-maintenance-scheduled/clean-up-amis/cleanup-baseline-amis@tmp/durable-fe439038/pid: java.lang.NumberFormatException: For input string: ""
Cannot contact kubernetes-d132fbe56cdf44b589aa03203db4ae55-f2bde129ba: java.io.IOException: corrupted content in /home/jenkins/workspace/ops-maintenance/ops-maintenance-scheduled/clean-up-amis/cleanup-baseline-amis@tmp/durable-fe439038/pid: java.lang.NumberFormatException: For input string: ""
Cannot contact kubernetes-d132fbe56cdf44b589aa03203db4ae55-f2bde129ba: java.io.IOException: corrupted content in /home/jenkins/workspace/ops-maintenance/ops-maintenance-scheduled/clean-up-amis/cleanup-baseline-amis@tmp/durable-fe439038/pid: java.lang.NumberFormatException: For input string: ""
Cannot contact kubernetes-d132fbe56cdf44b589aa03203db4ae55-f2bde129ba: java.io.IOException: corrupted content in /home/jenkins/workspace/ops-maintenance/ops-maintenance-scheduled/clean-up-amis/cleanup-baseline-amis@tmp/durable-fe439038/pid: java.lang.NumberFormatException: For input string: ""
Cannot contact kubernetes-d132fbe56cdf44b589aa03203db4ae55-f2bde129ba: java.io.IOException: corrupted content in /home/jenkins/workspace/ops-maintenance/ops-maintenance-scheduled/clean-up-amis/cleanup-baseline-amis@tmp/durable-fe439038/pid: java.lang.NumberFormatException: For input string: ""
Cannot contact kubernetes-d132fbe56cdf44b589aa03203db4ae55-f2bde129ba: java.io.IOException: corrupted content in /home/jenkins/workspace/ops-maintenance/ops-maintenance-scheduled/clean-up-amis/cleanup-baseline-amis@tmp/durable-fe439038/pid: java.lang.NumberFormatException: For input string: ""
/home/jenkins/workspace/ops-maintenance/ops-maintenance-scheduled/clean-up-amis/cleanup-baseline-amis # exit
[Pipeline] echo
Images in-use:
[ami-9b0df48d, ami-e8d11dfe, ami-3d3eff2b, ami-c4e81ad2, ami-33576959, ami-cd8b5fdb, ami-ebd205fd, ami-0e367a19, ami-56ed0740, ami-2e3c4c39, ami-4b71095c]
The scripts seem to be ok - and the job does not fail.
BTW workflow-durable-task-step 2.9 does add this log message but it is just exposing a problem that was already there, and simply being suppressed unless you were running a sufficiently fine logger. The problem is that this code is seeing a file which is supposed to contain a number once created, whereas it is being created as empty for some reason.
I tried to do an 'echo $$' in different ways within a container running in Kubernetes:
$ sh -c "echo $$" 4118 $ sh -c echo $$ $ sh -c 'echo $$' 4187
So it looks like the shell command which is responsible for creating the PID file doesn't work correctly. Please correct me if I'm wrong, but it looks like this happens in https://github.com/jenkinsci/durable-task-plugin/blob/master/src/main/java/org/jenkinsci/plugins/durabletask/BourneShellScript.java#L119 ?
Great - can it be fixed so that the `cmd` is enclosed in quotes? That should make this problem disappear as then the `echo $$` command should work just fine.
It should not be enclosed in quotes. It is a single argument to sh -c.
Maybe the Kubernetes plugin has a buggy Launcher?
private static String[] getCommands(Launcher.ProcStarter starter) { List<String> allCommands = new ArrayList<String>(); boolean first = true; String previous = ""; String previousPrevious = ""; for (String cmd : starter.cmds()) { if (first && "nohup".equals(cmd)) { first = false; continue; } if ("sh".equals(previousPrevious) && "-c".equals(previous)) { cmd = String.format("\"%s\"", cmd); } previousPrevious = previous; previous = cmd; //I shouldn't been doing that, but clearly the script that is passed to us is wrong? allCommands.add(cmd.replaceAll("\\$\\$", "\\$")); } return allCommands.toArray(new String[allCommands.size()]); }
That should take care of adding the quotes around the shell command argument, which kind of works. I start to get this error instead:
Executing shell script inside container [debian] of pod [test-f669ba016c06421092b43fbd8b23e3d1-f2d539661013] Executing command: ps -o pid= 9999 # cd "/home/jenkins/workspace/Playground/test-JENKINS-42048" ps -o pid= 9999 exit # # command terminated with non-zero exit code: Error executing in Docker Container: 1Executing shell script inside container [debian] of pod [test-f669ba016c06421092b43fbd8b23e3d1-f2d539661013] Executing command: ps -o pid= 6 [Pipeline] } [Pipeline] // container [Pipeline] } [Pipeline] // node [Pipeline] } [Pipeline] // stage [Pipeline] } [Pipeline] // podTemplate [Pipeline] End of Pipeline ERROR: script returned exit code -1 Finished: FAILURE
It looks like the Kubernetes plugin launcher can't handle any errors in the shell scripts and instead of returning true/false on the aliveness probe in https://github.com/jenkinsci/durable-task-plugin/blob/e174ce7f11e31da0a29c6d3af8023de48c269654/src/main/java/org/jenkinsci/plugins/durabletask/ProcessLiveness.java#L87 it drops dead. Any good ideas, anyone?
It looks like the Kubernetes plugin launcher can't handle any errors in the shell scripts
Maybe my warning was right?
It looks like the comments have already covered this bug, but when I upgrade to 2.9 I see these same exceptions in our logs. I have downgraded the plugin to version 2.8 and everything is fine.
[pylint_test] Cannot contact kubernetes-2035b9ceb44d46db9a42cd8dbc1fa0b7-27af70ee4f39c: java.io.IOException: corrupted content in /home/jenkins/workspace/ogies_AA_sre_datadog-events-YUQOU4YVOIDND57J35UZCATDN3QRXKHXSLMS4JYIUCSNEI6IDZGQ@tmp/durable-05858b77/pid: java.lang.NumberFormatException: For input string: ""
Again the bug probably exists in all versions, it is only printed to the build log as of 2.9. You can add a FINE logger to org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep to verify.
Thanks for the clarification, and yes you're correct about the errors being part of 2.8 as per the addition of the above logger.
Mar 04, 2017 4:52:27 AM FINE org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep could not check /home/jenkins/workspace/ogies_AA_sre_datadog-events-YUQOU4YVOIDND57J35UZCATDN3QRXKHXSLMS4JYIUCSNEI6IDZGQ java.lang.NumberFormatException: For input string: "" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:592) at java.lang.Integer.parseInt(Integer.java:615) at org.jenkinsci.plugins.durabletask.BourneShellScript$ShellController.pid(BourneShellScript.java:183) Caused: java.io.IOException: corrupted content in /home/jenkins/workspace/ogies_AA_sre_datadog-events-YUQOU4YVOIDND57J35UZCATDN3QRXKHXSLMS4JYIUCSNEI6IDZGQ@tmp/durable-35571e92/pid at org.jenkinsci.plugins.durabletask.BourneShellScript$ShellController.pid(BourneShellScript.java:185) at org.jenkinsci.plugins.durabletask.BourneShellScript$ShellController.exitStatus(BourneShellScript.java:197) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution$3.call(DurableTaskStep.java:314) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution$3.call(DurableTaskStep.java:306) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution$4.call(DurableTaskStep.java:359) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
Could this logoutput be avoided by setting a loglevel in Jenkins ?
The logfile looks not pretty and it's not easy to filter the relevant logoutput.
Thank you for your support.
Unfortunately there is no way to do that. I pinged csanchez about this a few weeks ago and he added the component label but I have not heard anything else. It would be nice to get this fixed. it's really annoying when trying to debug issues.
I think i found the issue, the durable-task plugin is expecting a pid file to be written out, but the kubernetes-plugin is streaming the command into the container over an OutputStream, and the echo $$ > just put in an empty line.
Since everything is running inside of a container on kubernetes, the sh -c echo $$ doesn't actually return the pid.
Example
If i run say the docker image vai docker where it gives me an sh shell, i can get the pid
docker run --rm -it --entrypoint=sh docker / # echo $$ 1 / # exit
However, if i try the same command but put the -c echo $$ as the container command it doesn't return anything
docker run --rm -it --entrypoint=sh docker -c echo $$
I also verified this by trying to execute the same command on my jenkins container in minikube and nothing was returned
kubectl -n jenkins exec jenkins-3824495712-m94j6 -c jenkins – sh -c echo $${code}
I'm unsure of the best way to fix this, but I hope that i've provided some details that will help get it fixed, cause it's really annoying.
For now as a workaround, i've recompiled workflow-durable-task-plugin with this line commented out. IMO the logger a few lines up should produce an error instead of fine, and not taint the job console like it is.
I'd be willing to issue a PR to make this change, if jglick agrees.
Got the same problem. Logs are extremely noisy because of that. Hard to debug a real problem.
I'm also effected by this, and it's seriously annoying and makes the plugin extremely painful to use when looking out the output.
I don't know why, but when this was initially implemented, the ContainerExecDecorator was receiving double the amount of `$` symbols. For example instead of `echo $$` it was getting `echo $$$$` and so on.
So as a workaround, the decorator itself removed the excess $ sumbols.
Could the problem be related to that?
I just tested your theory with this pipeline job and this is what it output in the log
podTemplate( label: 'test', containers: [ containerTemplate( name: 'nodejs', image: 'node:alpine', ttyEnabled: true, command: 'cat', args: '' ) ] ) { stage('test') { node('test') { env.MYTOOL_VERSION = '1.33' container('nodejs') { sh 'printenv' } echo env.MYTOOL_VERSION } } }
Results
[Pipeline] podTemplate [Pipeline] { [Pipeline] stage [Pipeline] { (test) [Pipeline] node Running on jenkins-slave-4qzr4-jqq83 in /home/jenkins/workspace/test [Pipeline] { [Pipeline] container [Pipeline] { [Pipeline] sh [test] Running shell script Executing shell script inside container [nodejs] of pod [jenkins-slave-4qzr4-jqq83] Executing command: /home/jenkins # [6ncd "/home/jenkins/workspace/test" /home/jenkins/workspace/test # /home/jenkins/workspace/test # exit [Pipeline] } [Pipeline] // container [Pipeline] } [Pipeline] // node [Pipeline] } [Pipeline] // stage [Pipeline] } [Pipeline] // podTemplate [Pipeline] End of Pipeline ERROR: script returned exit code -2
Any update on this? Super annoying message. Floods the console output.
This is not be a solution for everyone, however this log noise seems to only happen when you have a pod template with more than one container running. I was working around another issue (https://issues.jenkins-ci.org/browse/JENKINS-40825) and by only having one container running (named jnlp) the log noise was also gone.
workflow-durable-task-step PR 37 will reduce noise without addressing the underlying bug, probably somewhere in the Kubernetes plugin.
jredl Definitely not a solution for us. But good to know. Hopefully this will help csanchez or iocanel implement a fix. In the mean time, we did pull the workflow-durable-task-step PR 37 and merged it into a local build. It did lessen the number of time the message is logged. Hoping this issue is addressed soon.
we did pull the workflow-durable-task-step PR 37 and merged it into a local build
Just update to 2.11.
It seems that you pod is created but for some reason can't read the pid, which is something I've never seen before.
Can you please tell us how your pipeline looks like and maybe get us the output of `kubectl describe` for the kubernetes-ef39fe82c8a541be84bd780e4d7c1ddb-ce4d47fc96bfc pod?