[JENKINS-67167] in a kubernetes pod sh steps inside container() are failing sporadically - Jenkins Jira

Type: Bug
Resolution: Unresolved
Priority: Critical
Component/s: kubernetes-plugin
Labels:
- container
Environment:
Jenkins 2.303.3
Kubernetes plugin 1.30.6
Durable Task Plugin: 1.39
jnlp via jenkins/inbound-agent:4.11-1-alpine-jdk8

Similar Issues:
Powered by SuggestiMate

Show

Issue is reproducible using the attached pipeline: jnlpcontainer_tests.groovy

Description of the test:

running inside a k8s pod, with multiple containers
- a jnlp container
- a build container

the pipeline starts 3 parallel branches
- jnlp branch - runs sh inside container('jnlp'){}
- build branch - runs sh inside container('build'){} // this is how the second container in the pod is called
- noContainer() branch – runs sh outside any container(){} closure

in each of the parallel branches a simple sh call is executed
in the jnlp and build branches sh is called inside a container() closure
- in these 2 branches sh is failing sporadically
in the noContainer branch sh is called not inside a container() closure
- not a single failure was noticed in this branch in all the tries I started

mainly 2 Exceptions were thrown

[2021-11-18T10:49:57.920Z] java.io.EOFException
[2021-11-18T10:49:57.921Z] 	at okio.RealBufferedSource.require(RealBufferedSource.java:61)
[2021-11-18T10:49:57.921Z] 	at okio.RealBufferedSource.readByte(RealBufferedSource.java:74)
[2021-11-18T10:49:57.921Z] 	at okhttp3.internal.ws.WebSocketReader.readHeader(WebSocketReader.java:117)
[2021-11-18T10:49:57.921Z] 	at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:101)
[2021-11-18T10:49:57.921Z] 	at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
[2021-11-18T10:49:57.921Z] 	at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
[2021-11-18T10:49:57.921Z] 	at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
[2021-11-18T10:49:57.921Z] 	at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
[2021-11-18T10:49:57.921Z] 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[2021-11-18T10:49:57.921Z] 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[2021-11-18T10:49:57.921Z] 	at java.lang.Thread.run(Thread.java:748)
[2021-11-18T10:49:57.921Z] ERROR: Process exited immediately after creation. See output below
[2021-11-18T10:49:57.921Z] Executing sh script inside container jnlp of pod test-multiplecontainers-in-node-5d914e4e-3023-4bf0-845d-2-pcxs5
[2021-11-18T10:49:57.921Z] 
Process exited immediately after creation. Check logs above for more details.

and

[2021-11-18T10:49:58.203Z] java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
[2021-11-18T10:49:58.205Z] 	at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
[2021-11-18T10:49:58.205Z] 	at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
[2021-11-18T10:49:58.205Z] 	at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
[2021-11-18T10:49:58.205Z] 	at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
[2021-11-18T10:49:58.205Z] 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[2021-11-18T10:49:58.205Z] 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[2021-11-18T10:49:58.205Z] 	at java.lang.Thread.run(Thread.java:748)
io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: dial tcp 192.168.3.11:10250: connect: connection refused

NOTE: the test consists of 100 iteration for each branch, all executed in the same Agent pod. so if we get a KubernetesClientException with a connect refused error if retry again on the same container it will eventually work again
see:

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

image-2021-11-18-12-06-32-640.png
26 kB
2021-11-18 11:06
jnlpcontainer_tests.groovy
2 kB
2021-11-18 11:10

duplicates

JENKINS-59652 [kubernetes plugin] Protect Jenkins agent pods from eviction

Open

is related to

JENKINS-64848 Shell step failing randomly

Open

JENKINS-58065 EOF from OkHttp in kubernetes plugin on AKS 5m after start of sh step

Resolved

relates to

JENKINS-67664 KubernetesClientException: not ready after 5000 MILLISECONDS

Resolved

JENKINS-67474 Pipeline is failing due to io.fabric8.kubernetes.client.KubernetesClientException: not ready after n milliseconds

Closed

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates