Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-66209

Jenkins Controller crashing with exitCode: 1

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Minor
    • Resolution: Unresolved
    • Component/s: core
    • Labels:
      None
    • Environment:
      Jenkins 2.277.4 (Running on Kube cluster, managed using helm chart)
      Agents running as pods defined using kube templates (Latest image from jenkins/jnlp-slave)
    • Similar Issues:

      Description

      Since last 4 -5 months, we have been experiencing Jenkins Controller outages(across multiple versions). The kubernetes pods fails with exitcode 1 and self heals:

      O/p from kube describe:

      Containers:
      jenkins:
      Container ID: docker://f494b5939813d0582d7901894a16740178d6c905ff7f87db0bdbbacb63a64367
      Image: xxxxxxxxxx.dkr.ecr.ap-southeast-2.amazonaws.com/jenkins-controller:2.277.4-lts-alpine
      Image ID: docker-pullable://xxxxxxxx5.dkr.ecr.ap-southeast-2.amazonaws.com/jenkins-controller@sha256:56687bb853764312fe1f28d3b8c161738022f4e80da1b9b81ba82ba929426d1d
      Ports: 8080/TCP, 50000/TCP
      Host Ports: 0/TCP, 0/TCP
      Args:
      --httpPort=8080
      State: Running
      Started: Fri, 23 Jul 2021 10:24:42 +1000
      Last State: Terminated
      Reason: Error
      Exit Code: 1
      Started: Fri, 23 Jul 2021 09:38:37 +1000
      Finished: Fri, 23 Jul 2021 10:24:40 +1000
      Ready: True
      Restart Count: 2
      Limits:
      cpu: 3
      memory: 22Gi
      Requests:
      cpu: 2
      memory: 20Gi
      Liveness: http-get http://:http/login delay=0s timeout=5s period=20s #success=1 #failure=100
      Readiness: http-get http://:http/login delay=0s timeout=5s period=10s #success=1 #failure=100
      Startup: http-get http://:http/login delay=0s timeout=5s period=100s #success=1 #failure=5
      Environment:
      POD_NAME: jenkins-blue-0 (v1:metadata.name)
      JAVA_OPTS:
      JENKINS_OPTS:
      JENKINS_SLAVE_AGENT_PORT: 50000
      JAVA_OPTS: XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:+ExplicitGCInvokesConcurrent -XX:+ParallelRefProcEnabled -XX:+UseStringDeduplication -XX:MaxRAMPercentage=50.0 -Xloggc:/var/jenkins_home/log/gc%t.log -XX:NumberOfGCLogFiles=5 -XX:+UseGCLogFileRotation -XX:GCLogFileSize=20m -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCCause -XX:+PrintTenuringDistribution -XX:+PrintReferenceGC -XX:+PrintAdaptiveSizePolicy -XX:+UnlockDiagnosticVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:+AlwaysPreTouch -Duser.timezone=Australia/Sydney
      CASC_JENKINS_CONFIG: /var/jenkins_home/casc_configs
      Mounts:
      /var/jenkins_config from jenkins-config (ro)
      /var/jenkins_home from jenkins-home (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from jenkins-controller-sa-token-67wfs (ro)
      Conditions:
      Type Status
      Initialized True
      Ready True
      ContainersReady True
      PodScheduled True
      Volumes:
      plugins:
      Type: EmptyDir (a temporary directory that shares a pod's lifetime)
      Medium:
      SizeLimit: <unset>
      jenkins-config:
      Type: ConfigMap (a volume populated by a ConfigMap)
      Name: jenkins-blue
      Optional: false
      jenkins-home:
      Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
      ClaimName: jenkins-blue
      ReadOnly: false
      sc-config-volume:
      Type: EmptyDir (a temporary directory that shares a pod's lifetime)
      Medium:
      SizeLimit: <unset>
      jenkins-controller-sa-token-67wfs:
      Type: Secret (a volume populated by a Secret)
      SecretName: jenkins-controller-sa-token-67wfs
      Optional: false
      QoS Class: Burstable
      Node-Selectors: <none>
      Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
      node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
      Events:
      Type Reason Age From Message
      ---- ------ ---- ---- -------
      Warning Unhealthy 45m (x9 over 91m) kubelet, ip-10-90-111-59.ap-southeast-2.compute.internal Liveness probe failed: Get "http://10.90.110.107:8080/login": dial tcp 10.90.110.107:8080: connect: connection refused
      Warning Unhealthy 45m (x19 over 91m) kubelet, ip-10-90-111-59.ap-southeast-2.compute.internal Readiness probe failed: Get "http://10.90.110.107:8080/login": dial tcp 10.90.110.107:8080: connect: connection refused

       

      There isnt much in Jenkins core logs, except the error:

      webroot: EnvVars.masterEnvVars.get("JENKINS_HOME")
      2021-07-22 23:38:54.897+0000 [id=1] INFO winstone.Logger#logInternal: Beginning extraction from war file
      Running from: /usr/share/jenkins/jenkins.war
      2021-07-22 23:38:54.709+0000 [id=1] INFO org.eclipse.jetty.util.log.Log#initialized: Logging initialized @916ms to org.eclipse.jetty.util.log.JavaUtilLog
      exitCode: 0
      Scanning success.
      exitCode: 1
      exitCode: 1
      exitCode: 1
      exitCode: 1
      Terminated Kubernetes instance for agent jenkins/test-automation-virtual-device-virtual-device-e2e-390818--hmjsm
      Disconnected computer test-automation-virtual-device-virtual-device-e2e-390818–hmjsm

       

       From last one week, we are nightly restating jenkins controller but it hasnt helped.
       
       

        Attachments

          Activity

          nikhilp Nikhil created issue -
          nikhilp Nikhil made changes -
          Field Original Value New Value
          Description Since last 4 -5 months, we have been experiencing Jenkins Controller outages(across multiple versions). The kubernetes pods fails with exitcode 1 and self heals:

          +O/p from kube describe:+

          Containers:
           jenkins:
           Container ID: docker://f494b5939813d0582d7901894a16740178d6c905ff7f87db0bdbbacb63a64367
           Image: xxxxxxxxxx.dkr.ecr.ap-southeast-2.amazonaws.com/jenkins-controller:2.277.4-lts-alpine
           Image ID: docker-pullable://xxxxxxxx5.dkr.ecr.ap-southeast-2.amazonaws.com/jenkins-controller@sha256:56687bb853764312fe1f28d3b8c161738022f4e80da1b9b81ba82ba929426d1d
           Ports: 8080/TCP, 50000/TCP
           Host Ports: 0/TCP, 0/TCP
           Args:
           --httpPort=8080
           State: Running
           Started: Fri, 23 Jul 2021 10:24:42 +1000
           Last State: Terminated
           Reason: Error
           Exit Code: 1
           Started: Fri, 23 Jul 2021 09:38:37 +1000
           Finished: Fri, 23 Jul 2021 10:24:40 +1000
           Ready: True
           Restart Count: 2
           Limits:
           cpu: 3
           memory: 22Gi
           Requests:
           cpu: 2
           memory: 20Gi
           Liveness: http-get http://:http/login delay=0s timeout=5s period=20s #success=1 #failure=100
           Readiness: http-get http://:http/login delay=0s timeout=5s period=10s #success=1 #failure=100
           Startup: http-get http://:http/login delay=0s timeout=5s period=100s #success=1 #failure=5
           Environment:
           POD_NAME: jenkins-blue-0 (v1:metadata.name)
           JAVA_OPTS:
           JENKINS_OPTS:
           JENKINS_SLAVE_AGENT_PORT: 50000
           JAVA_OPTS: -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:+ExplicitGCInvokesConcurrent -XX:+ParallelRefProcEnabled -XX:+UseStringDeduplication -XX:MaxRAMPercentage=50.0 -Xloggc:/var/jenkins_home/log/gc-%t.log -XX:NumberOfGCLogFiles=5 -XX:+UseGCLogFileRotation -XX:GCLogFileSize=20m -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCCause -XX:+PrintTenuringDistribution -XX:+PrintReferenceGC -XX:+PrintAdaptiveSizePolicy -XX:+UnlockDiagnosticVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:+AlwaysPreTouch -Duser.timezone=Australia/Sydney
           CASC_JENKINS_CONFIG: /var/jenkins_home/casc_configs
           Mounts:
           /var/jenkins_config from jenkins-config (ro)
           /var/jenkins_home from jenkins-home (rw)
           /var/run/secrets/kubernetes.io/serviceaccount from jenkins-controller-sa-token-67wfs (ro)
          Conditions:
           Type Status
           Initialized True
           Ready True
           ContainersReady True
           PodScheduled True
          Volumes:
           plugins:
           Type: EmptyDir (a temporary directory that shares a pod's lifetime)
           Medium:
           SizeLimit: <unset>
           jenkins-config:
           Type: ConfigMap (a volume populated by a ConfigMap)
           Name: jenkins-blue
           Optional: false
           jenkins-home:
           Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
           ClaimName: jenkins-blue
           ReadOnly: false
           sc-config-volume:
           Type: EmptyDir (a temporary directory that shares a pod's lifetime)
           Medium:
           SizeLimit: <unset>
           jenkins-controller-sa-token-67wfs:
           Type: Secret (a volume populated by a Secret)
           SecretName: jenkins-controller-sa-token-67wfs
           Optional: false
          QoS Class: Burstable
          Node-Selectors: <none>
          Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
           node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
          Events:
           Type Reason Age From Message
           ---- ------ ---- ---- -------
           Warning Unhealthy 45m (x9 over 91m) kubelet, ip-10-90-111-59.ap-southeast-2.compute.internal Liveness probe failed: Get "http://10.90.110.107:8080/login": dial tcp 10.90.110.107:8080: connect: connection refused
           Warning Unhealthy 45m (x19 over 91m) kubelet, ip-10-90-111-59.ap-southeast-2.compute.internal Readiness probe failed: Get "http://10.90.110.107:8080/login": dial tcp 10.90.110.107:8080: connect: connection refused

           

          There isnt much in Jenkins core logs, excpet the error:
          webroot: EnvVars.masterEnvVars.get("JENKINS_HOME")
          2021-07-22 23:38:54.897+0000 [id=1] INFO winstone.Logger#logInternal: Beginning extraction from war file
          Running from: /usr/share/jenkins/jenkins.war
          2021-07-22 23:38:54.709+0000 [id=1] INFO org.eclipse.jetty.util.log.Log#initialized: Logging initialized @916ms to org.eclipse.jetty.util.log.JavaUtilLog
          exitCode: 0
          Scanning success.
          exitCode: 1
          exitCode: 1
          exitCode: 1
          exitCode: 1
          Terminated Kubernetes instance for agent jenkins/test-automation-virtual-device-virtual-device-e2e-390818--hmjsm
          Disconnected computer test-automation-virtual-device-virtual-device-e2e-390818–hmjsm
           
          In the last week, we are nightly restating jenkins controller but it hasnt helped.
           
           
          Since last 4 -5 months, we have been experiencing Jenkins Controller outages(across multiple versions). The kubernetes pods fails with exitcode 1 and self heals:

          +O/p from kube describe:+

          Containers:
           jenkins:
           Container ID: docker://f494b5939813d0582d7901894a16740178d6c905ff7f87db0bdbbacb63a64367
           Image: xxxxxxxxxx.dkr.ecr.ap-southeast-2.amazonaws.com/jenkins-controller:2.277.4-lts-alpine
           Image ID: docker-pullable://xxxxxxxx5.dkr.ecr.ap-southeast-2.amazonaws.com/jenkins-controller@sha256:56687bb853764312fe1f28d3b8c161738022f4e80da1b9b81ba82ba929426d1d
           Ports: 8080/TCP, 50000/TCP
           Host Ports: 0/TCP, 0/TCP
           Args:
           --httpPort=8080
           State: Running
           Started: Fri, 23 Jul 2021 10:24:42 +1000
           Last State: Terminated
           Reason: Error
           Exit Code: 1
           Started: Fri, 23 Jul 2021 09:38:37 +1000
           Finished: Fri, 23 Jul 2021 10:24:40 +1000
           Ready: True
           Restart Count: 2
           Limits:
           cpu: 3
           memory: 22Gi
           Requests:
           cpu: 2
           memory: 20Gi
           Liveness: http-get [http://:http/login] delay=0s timeout=5s period=20s #success=1 #failure=100
           Readiness: http-get [http://:http/login] delay=0s timeout=5s period=10s #success=1 #failure=100
           Startup: http-get [http://:http/login] delay=0s timeout=5s period=100s #success=1 #failure=5
           Environment:
           POD_NAME: jenkins-blue-0 (v1:metadata.name)
           JAVA_OPTS:
           JENKINS_OPTS:
           JENKINS_SLAVE_AGENT_PORT: 50000
           JAVA_OPTS: XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:+ExplicitGCInvokesConcurrent -XX:+ParallelRefProcEnabled -XX:+UseStringDeduplication -XX:MaxRAMPercentage=50.0 -Xloggc:/var/jenkins_home/log/gc%t.log -XX:NumberOfGCLogFiles=5 -XX:+UseGCLogFileRotation -XX:GCLogFileSize=20m -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCCause -XX:+PrintTenuringDistribution -XX:+PrintReferenceGC -XX:+PrintAdaptiveSizePolicy -XX:+UnlockDiagnosticVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:+AlwaysPreTouch -Duser.timezone=Australia/Sydney
           CASC_JENKINS_CONFIG: /var/jenkins_home/casc_configs
           Mounts:
           /var/jenkins_config from jenkins-config (ro)
           /var/jenkins_home from jenkins-home (rw)
           /var/run/secrets/kubernetes.io/serviceaccount from jenkins-controller-sa-token-67wfs (ro)
           Conditions:
           Type Status
           Initialized True
           Ready True
           ContainersReady True
           PodScheduled True
           Volumes:
           plugins:
           Type: EmptyDir (a temporary directory that shares a pod's lifetime)
           Medium:
           SizeLimit: <unset>
           jenkins-config:
           Type: ConfigMap (a volume populated by a ConfigMap)
           Name: jenkins-blue
           Optional: false
           jenkins-home:
           Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
           ClaimName: jenkins-blue
           ReadOnly: false
           sc-config-volume:
           Type: EmptyDir (a temporary directory that shares a pod's lifetime)
           Medium:
           SizeLimit: <unset>
           jenkins-controller-sa-token-67wfs:
           Type: Secret (a volume populated by a Secret)
           SecretName: jenkins-controller-sa-token-67wfs
           Optional: false
           QoS Class: Burstable
           Node-Selectors: <none>
           Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
           node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
           Events:
           Type Reason Age From Message
           ---- ------ ---- ---- -------
           Warning Unhealthy 45m (x9 over 91m) kubelet, ip-10-90-111-59.ap-southeast-2.compute.internal Liveness probe failed: Get "http://10.90.110.107:8080/login": dial tcp 10.90.110.107:8080: connect: connection refused
           Warning Unhealthy 45m (x19 over 91m) kubelet, ip-10-90-111-59.ap-southeast-2.compute.internal Readiness probe failed: Get "http://10.90.110.107:8080/login": dial tcp 10.90.110.107:8080: connect: connection refused

           

          There isnt much in *Jenkins core logs*, except the error:


           webroot: EnvVars.masterEnvVars.get("JENKINS_HOME")
           2021-07-22 23:38:54.897+0000 [id=1] INFO winstone.Logger#logInternal: Beginning extraction from war file
           Running from: /usr/share/jenkins/jenkins.war
           2021-07-22 23:38:54.709+0000 [id=1] INFO org.eclipse.jetty.util.log.Log#initialized: Logging initialized @916ms to org.eclipse.jetty.util.log.JavaUtilLog
           exitCode: 0
           Scanning success.
           exitCode: 1
           exitCode: 1
           exitCode: 1
           exitCode: 1
           Terminated Kubernetes instance for agent jenkins/test-automation-virtual-device-virtual-device-e2e-390818--hmjsm
           Disconnected computer test-automation-virtual-device-virtual-device-e2e-390818–hmjsm
            
           In the last week, we are nightly restating jenkins controller but it hasnt helped.
            
            
          nikhilp Nikhil made changes -
          nikhilp Nikhil made changes -
          Description Since last 4 -5 months, we have been experiencing Jenkins Controller outages(across multiple versions). The kubernetes pods fails with exitcode 1 and self heals:

          +O/p from kube describe:+

          Containers:
           jenkins:
           Container ID: docker://f494b5939813d0582d7901894a16740178d6c905ff7f87db0bdbbacb63a64367
           Image: xxxxxxxxxx.dkr.ecr.ap-southeast-2.amazonaws.com/jenkins-controller:2.277.4-lts-alpine
           Image ID: docker-pullable://xxxxxxxx5.dkr.ecr.ap-southeast-2.amazonaws.com/jenkins-controller@sha256:56687bb853764312fe1f28d3b8c161738022f4e80da1b9b81ba82ba929426d1d
           Ports: 8080/TCP, 50000/TCP
           Host Ports: 0/TCP, 0/TCP
           Args:
           --httpPort=8080
           State: Running
           Started: Fri, 23 Jul 2021 10:24:42 +1000
           Last State: Terminated
           Reason: Error
           Exit Code: 1
           Started: Fri, 23 Jul 2021 09:38:37 +1000
           Finished: Fri, 23 Jul 2021 10:24:40 +1000
           Ready: True
           Restart Count: 2
           Limits:
           cpu: 3
           memory: 22Gi
           Requests:
           cpu: 2
           memory: 20Gi
           Liveness: http-get [http://:http/login] delay=0s timeout=5s period=20s #success=1 #failure=100
           Readiness: http-get [http://:http/login] delay=0s timeout=5s period=10s #success=1 #failure=100
           Startup: http-get [http://:http/login] delay=0s timeout=5s period=100s #success=1 #failure=5
           Environment:
           POD_NAME: jenkins-blue-0 (v1:metadata.name)
           JAVA_OPTS:
           JENKINS_OPTS:
           JENKINS_SLAVE_AGENT_PORT: 50000
           JAVA_OPTS: XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:+ExplicitGCInvokesConcurrent -XX:+ParallelRefProcEnabled -XX:+UseStringDeduplication -XX:MaxRAMPercentage=50.0 -Xloggc:/var/jenkins_home/log/gc%t.log -XX:NumberOfGCLogFiles=5 -XX:+UseGCLogFileRotation -XX:GCLogFileSize=20m -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCCause -XX:+PrintTenuringDistribution -XX:+PrintReferenceGC -XX:+PrintAdaptiveSizePolicy -XX:+UnlockDiagnosticVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:+AlwaysPreTouch -Duser.timezone=Australia/Sydney
           CASC_JENKINS_CONFIG: /var/jenkins_home/casc_configs
           Mounts:
           /var/jenkins_config from jenkins-config (ro)
           /var/jenkins_home from jenkins-home (rw)
           /var/run/secrets/kubernetes.io/serviceaccount from jenkins-controller-sa-token-67wfs (ro)
           Conditions:
           Type Status
           Initialized True
           Ready True
           ContainersReady True
           PodScheduled True
           Volumes:
           plugins:
           Type: EmptyDir (a temporary directory that shares a pod's lifetime)
           Medium:
           SizeLimit: <unset>
           jenkins-config:
           Type: ConfigMap (a volume populated by a ConfigMap)
           Name: jenkins-blue
           Optional: false
           jenkins-home:
           Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
           ClaimName: jenkins-blue
           ReadOnly: false
           sc-config-volume:
           Type: EmptyDir (a temporary directory that shares a pod's lifetime)
           Medium:
           SizeLimit: <unset>
           jenkins-controller-sa-token-67wfs:
           Type: Secret (a volume populated by a Secret)
           SecretName: jenkins-controller-sa-token-67wfs
           Optional: false
           QoS Class: Burstable
           Node-Selectors: <none>
           Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
           node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
           Events:
           Type Reason Age From Message
           ---- ------ ---- ---- -------
           Warning Unhealthy 45m (x9 over 91m) kubelet, ip-10-90-111-59.ap-southeast-2.compute.internal Liveness probe failed: Get "http://10.90.110.107:8080/login": dial tcp 10.90.110.107:8080: connect: connection refused
           Warning Unhealthy 45m (x19 over 91m) kubelet, ip-10-90-111-59.ap-southeast-2.compute.internal Readiness probe failed: Get "http://10.90.110.107:8080/login": dial tcp 10.90.110.107:8080: connect: connection refused

           

          There isnt much in *Jenkins core logs*, except the error:


           webroot: EnvVars.masterEnvVars.get("JENKINS_HOME")
           2021-07-22 23:38:54.897+0000 [id=1] INFO winstone.Logger#logInternal: Beginning extraction from war file
           Running from: /usr/share/jenkins/jenkins.war
           2021-07-22 23:38:54.709+0000 [id=1] INFO org.eclipse.jetty.util.log.Log#initialized: Logging initialized @916ms to org.eclipse.jetty.util.log.JavaUtilLog
           exitCode: 0
           Scanning success.
           exitCode: 1
           exitCode: 1
           exitCode: 1
           exitCode: 1
           Terminated Kubernetes instance for agent jenkins/test-automation-virtual-device-virtual-device-e2e-390818--hmjsm
           Disconnected computer test-automation-virtual-device-virtual-device-e2e-390818–hmjsm
            
           In the last week, we are nightly restating jenkins controller but it hasnt helped.
            
            
          Since last 4 -5 months, we have been experiencing Jenkins Controller outages(across multiple versions). The kubernetes pods fails with exitcode 1 and self heals:

          +O/p from kube describe:+

          Containers:
           jenkins:
           Container ID: docker://f494b5939813d0582d7901894a16740178d6c905ff7f87db0bdbbacb63a64367
           Image: xxxxxxxxxx.dkr.ecr.ap-southeast-2.amazonaws.com/jenkins-controller:2.277.4-lts-alpine
           Image ID: docker-pullable://xxxxxxxx5.dkr.ecr.ap-southeast-2.amazonaws.com/jenkins-controller@sha256:56687bb853764312fe1f28d3b8c161738022f4e80da1b9b81ba82ba929426d1d
           Ports: 8080/TCP, 50000/TCP
           Host Ports: 0/TCP, 0/TCP
           Args:
           --httpPort=8080
           State: Running
           Started: Fri, 23 Jul 2021 10:24:42 +1000
           Last State: Terminated
           Reason: Error
           Exit Code: 1
           Started: Fri, 23 Jul 2021 09:38:37 +1000
           Finished: Fri, 23 Jul 2021 10:24:40 +1000
           Ready: True
           Restart Count: 2
           Limits:
           cpu: 3
           memory: 22Gi
           Requests:
           cpu: 2
           memory: 20Gi
           Liveness: http-get [http://:http/login] delay=0s timeout=5s period=20s #success=1 #failure=100
           Readiness: http-get [http://:http/login] delay=0s timeout=5s period=10s #success=1 #failure=100
           Startup: http-get [http://:http/login] delay=0s timeout=5s period=100s #success=1 #failure=5
           Environment:
           POD_NAME: jenkins-blue-0 (v1:metadata.name)
           JAVA_OPTS:
           JENKINS_OPTS:
           JENKINS_SLAVE_AGENT_PORT: 50000
           JAVA_OPTS: XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:+ExplicitGCInvokesConcurrent -XX:+ParallelRefProcEnabled -XX:+UseStringDeduplication -XX:MaxRAMPercentage=50.0 -Xloggc:/var/jenkins_home/log/gc%t.log -XX:NumberOfGCLogFiles=5 -XX:+UseGCLogFileRotation -XX:GCLogFileSize=20m -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCCause -XX:+PrintTenuringDistribution -XX:+PrintReferenceGC -XX:+PrintAdaptiveSizePolicy -XX:+UnlockDiagnosticVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:+AlwaysPreTouch -Duser.timezone=Australia/Sydney
           CASC_JENKINS_CONFIG: /var/jenkins_home/casc_configs
           Mounts:
           /var/jenkins_config from jenkins-config (ro)
           /var/jenkins_home from jenkins-home (rw)
           /var/run/secrets/kubernetes.io/serviceaccount from jenkins-controller-sa-token-67wfs (ro)
           Conditions:
           Type Status
           Initialized True
           Ready True
           ContainersReady True
           PodScheduled True
           Volumes:
           plugins:
           Type: EmptyDir (a temporary directory that shares a pod's lifetime)
           Medium:
           SizeLimit: <unset>
           jenkins-config:
           Type: ConfigMap (a volume populated by a ConfigMap)
           Name: jenkins-blue
           Optional: false
           jenkins-home:
           Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
           ClaimName: jenkins-blue
           ReadOnly: false
           sc-config-volume:
           Type: EmptyDir (a temporary directory that shares a pod's lifetime)
           Medium:
           SizeLimit: <unset>
           jenkins-controller-sa-token-67wfs:
           Type: Secret (a volume populated by a Secret)
           SecretName: jenkins-controller-sa-token-67wfs
           Optional: false
           QoS Class: Burstable
           Node-Selectors: <none>
           Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
           node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
           Events:
           Type Reason Age From Message
           ---- ------ ---- ---- -------
           Warning Unhealthy 45m (x9 over 91m) kubelet, ip-10-90-111-59.ap-southeast-2.compute.internal Liveness probe failed: Get "http://10.90.110.107:8080/login": dial tcp 10.90.110.107:8080: connect: connection refused
           Warning Unhealthy 45m (x19 over 91m) kubelet, ip-10-90-111-59.ap-southeast-2.compute.internal Readiness probe failed: Get "http://10.90.110.107:8080/login": dial tcp 10.90.110.107:8080: connect: connection refused

           

          There isnt much in *Jenkins core logs*, except the error:

          webroot: EnvVars.masterEnvVars.get("JENKINS_HOME")
           2021-07-22 23:38:54.897+0000 [id=1] INFO winstone.Logger#logInternal: Beginning extraction from war file
           Running from: /usr/share/jenkins/jenkins.war
           2021-07-22 23:38:54.709+0000 [id=1] INFO org.eclipse.jetty.util.log.Log#initialized: Logging initialized @916ms to org.eclipse.jetty.util.log.JavaUtilLog
           exitCode: 0
           Scanning success.
           exitCode: 1
           exitCode: 1
           exitCode: 1
           exitCode: 1
           Terminated Kubernetes instance for agent jenkins/test-automation-virtual-device-virtual-device-e2e-390818--hmjsm
           Disconnected computer test-automation-virtual-device-virtual-device-e2e-390818–hmjsm
            
           From last one week, we are nightly restating jenkins controller but it hasnt helped.
            
            
          nikhilp Nikhil made changes -
          Description Since last 4 -5 months, we have been experiencing Jenkins Controller outages(across multiple versions). The kubernetes pods fails with exitcode 1 and self heals:

          +O/p from kube describe:+

          Containers:
           jenkins:
           Container ID: docker://f494b5939813d0582d7901894a16740178d6c905ff7f87db0bdbbacb63a64367
           Image: xxxxxxxxxx.dkr.ecr.ap-southeast-2.amazonaws.com/jenkins-controller:2.277.4-lts-alpine
           Image ID: docker-pullable://xxxxxxxx5.dkr.ecr.ap-southeast-2.amazonaws.com/jenkins-controller@sha256:56687bb853764312fe1f28d3b8c161738022f4e80da1b9b81ba82ba929426d1d
           Ports: 8080/TCP, 50000/TCP
           Host Ports: 0/TCP, 0/TCP
           Args:
           --httpPort=8080
           State: Running
           Started: Fri, 23 Jul 2021 10:24:42 +1000
           Last State: Terminated
           Reason: Error
           Exit Code: 1
           Started: Fri, 23 Jul 2021 09:38:37 +1000
           Finished: Fri, 23 Jul 2021 10:24:40 +1000
           Ready: True
           Restart Count: 2
           Limits:
           cpu: 3
           memory: 22Gi
           Requests:
           cpu: 2
           memory: 20Gi
           Liveness: http-get [http://:http/login] delay=0s timeout=5s period=20s #success=1 #failure=100
           Readiness: http-get [http://:http/login] delay=0s timeout=5s period=10s #success=1 #failure=100
           Startup: http-get [http://:http/login] delay=0s timeout=5s period=100s #success=1 #failure=5
           Environment:
           POD_NAME: jenkins-blue-0 (v1:metadata.name)
           JAVA_OPTS:
           JENKINS_OPTS:
           JENKINS_SLAVE_AGENT_PORT: 50000
           JAVA_OPTS: XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:+ExplicitGCInvokesConcurrent -XX:+ParallelRefProcEnabled -XX:+UseStringDeduplication -XX:MaxRAMPercentage=50.0 -Xloggc:/var/jenkins_home/log/gc%t.log -XX:NumberOfGCLogFiles=5 -XX:+UseGCLogFileRotation -XX:GCLogFileSize=20m -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCCause -XX:+PrintTenuringDistribution -XX:+PrintReferenceGC -XX:+PrintAdaptiveSizePolicy -XX:+UnlockDiagnosticVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:+AlwaysPreTouch -Duser.timezone=Australia/Sydney
           CASC_JENKINS_CONFIG: /var/jenkins_home/casc_configs
           Mounts:
           /var/jenkins_config from jenkins-config (ro)
           /var/jenkins_home from jenkins-home (rw)
           /var/run/secrets/kubernetes.io/serviceaccount from jenkins-controller-sa-token-67wfs (ro)
           Conditions:
           Type Status
           Initialized True
           Ready True
           ContainersReady True
           PodScheduled True
           Volumes:
           plugins:
           Type: EmptyDir (a temporary directory that shares a pod's lifetime)
           Medium:
           SizeLimit: <unset>
           jenkins-config:
           Type: ConfigMap (a volume populated by a ConfigMap)
           Name: jenkins-blue
           Optional: false
           jenkins-home:
           Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
           ClaimName: jenkins-blue
           ReadOnly: false
           sc-config-volume:
           Type: EmptyDir (a temporary directory that shares a pod's lifetime)
           Medium:
           SizeLimit: <unset>
           jenkins-controller-sa-token-67wfs:
           Type: Secret (a volume populated by a Secret)
           SecretName: jenkins-controller-sa-token-67wfs
           Optional: false
           QoS Class: Burstable
           Node-Selectors: <none>
           Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
           node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
           Events:
           Type Reason Age From Message
           ---- ------ ---- ---- -------
           Warning Unhealthy 45m (x9 over 91m) kubelet, ip-10-90-111-59.ap-southeast-2.compute.internal Liveness probe failed: Get "http://10.90.110.107:8080/login": dial tcp 10.90.110.107:8080: connect: connection refused
           Warning Unhealthy 45m (x19 over 91m) kubelet, ip-10-90-111-59.ap-southeast-2.compute.internal Readiness probe failed: Get "http://10.90.110.107:8080/login": dial tcp 10.90.110.107:8080: connect: connection refused

           

          There isnt much in *Jenkins core logs*, except the error:

          webroot: EnvVars.masterEnvVars.get("JENKINS_HOME")
           2021-07-22 23:38:54.897+0000 [id=1] INFO winstone.Logger#logInternal: Beginning extraction from war file
           Running from: /usr/share/jenkins/jenkins.war
           2021-07-22 23:38:54.709+0000 [id=1] INFO org.eclipse.jetty.util.log.Log#initialized: Logging initialized @916ms to org.eclipse.jetty.util.log.JavaUtilLog
           exitCode: 0
           Scanning success.
           exitCode: 1
           exitCode: 1
           exitCode: 1
           exitCode: 1
           Terminated Kubernetes instance for agent jenkins/test-automation-virtual-device-virtual-device-e2e-390818--hmjsm
           Disconnected computer test-automation-virtual-device-virtual-device-e2e-390818–hmjsm
            
           From last one week, we are nightly restating jenkins controller but it hasnt helped.
            
            
          Since last 4 -5 months, we have been experiencing Jenkins Controller outages(across multiple versions). The kubernetes pods fails with exitcode 1 and self heals:

          +O/p from kube describe:+
          {quote}Containers:
           jenkins:
           Container ID: docker://f494b5939813d0582d7901894a16740178d6c905ff7f87db0bdbbacb63a64367
           Image: xxxxxxxxxx.dkr.ecr.ap-southeast-2.amazonaws.com/jenkins-controller:2.277.4-lts-alpine
           Image ID: docker-pullable://xxxxxxxx5.dkr.ecr.ap-southeast-2.amazonaws.com/jenkins-controller@sha256:56687bb853764312fe1f28d3b8c161738022f4e80da1b9b81ba82ba929426d1d
           Ports: 8080/TCP, 50000/TCP
           Host Ports: 0/TCP, 0/TCP
           Args:
           --httpPort=8080
           State: Running
           Started: Fri, 23 Jul 2021 10:24:42 +1000
           Last State: Terminated
           Reason: Error
           Exit Code: 1
           Started: Fri, 23 Jul 2021 09:38:37 +1000
           Finished: Fri, 23 Jul 2021 10:24:40 +1000
           Ready: True
           Restart Count: 2
           Limits:
           cpu: 3
           memory: 22Gi
           Requests:
           cpu: 2
           memory: 20Gi
           Liveness: http-get [http://:http/login] delay=0s timeout=5s period=20s #success=1 #failure=100
           Readiness: http-get [http://:http/login] delay=0s timeout=5s period=10s #success=1 #failure=100
           Startup: http-get [http://:http/login] delay=0s timeout=5s period=100s #success=1 #failure=5
           Environment:
           POD_NAME: jenkins-blue-0 (v1:metadata.name)
           JAVA_OPTS:
           JENKINS_OPTS:
           JENKINS_SLAVE_AGENT_PORT: 50000
           JAVA_OPTS: XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:+ExplicitGCInvokesConcurrent -XX:+ParallelRefProcEnabled -XX:+UseStringDeduplication -XX:MaxRAMPercentage=50.0 -Xloggc:/var/jenkins_home/log/gc%t.log -XX:NumberOfGCLogFiles=5 -XX:+UseGCLogFileRotation -XX:GCLogFileSize=20m -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCCause -XX:+PrintTenuringDistribution -XX:+PrintReferenceGC -XX:+PrintAdaptiveSizePolicy -XX:+UnlockDiagnosticVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:+AlwaysPreTouch -Duser.timezone=Australia/Sydney
           CASC_JENKINS_CONFIG: /var/jenkins_home/casc_configs
           Mounts:
           /var/jenkins_config from jenkins-config (ro)
           /var/jenkins_home from jenkins-home (rw)
           /var/run/secrets/kubernetes.io/serviceaccount from jenkins-controller-sa-token-67wfs (ro)
           Conditions:
           Type Status
           Initialized True
           Ready True
           ContainersReady True
           PodScheduled True
           Volumes:
           plugins:
           Type: EmptyDir (a temporary directory that shares a pod's lifetime)
           Medium:
           SizeLimit: <unset>
           jenkins-config:
           Type: ConfigMap (a volume populated by a ConfigMap)
           Name: jenkins-blue
           Optional: false
           jenkins-home:
           Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
           ClaimName: jenkins-blue
           ReadOnly: false
           sc-config-volume:
           Type: EmptyDir (a temporary directory that shares a pod's lifetime)
           Medium:
           SizeLimit: <unset>
           jenkins-controller-sa-token-67wfs:
           Type: Secret (a volume populated by a Secret)
           SecretName: jenkins-controller-sa-token-67wfs
           Optional: false
           QoS Class: Burstable
           Node-Selectors: <none>
           Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
           node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
           Events:
           Type Reason Age From Message
           ---- ------ ---- ---- -------
           Warning Unhealthy 45m (x9 over 91m) kubelet, ip-10-90-111-59.ap-southeast-2.compute.internal Liveness probe failed: Get "http://10.90.110.107:8080/login": dial tcp 10.90.110.107:8080: connect: connection refused
           Warning Unhealthy 45m (x19 over 91m) kubelet, ip-10-90-111-59.ap-southeast-2.compute.internal Readiness probe failed: Get "http://10.90.110.107:8080/login": dial tcp 10.90.110.107:8080: connect: connection refused
          {quote}
           

          There isnt much in *Jenkins core logs*, except the error:
          {quote}webroot: EnvVars.masterEnvVars.get("JENKINS_HOME")
           2021-07-22 23:38:54.897+0000 [id=1] INFO winstone.Logger#logInternal: Beginning extraction from war file
           Running from: /usr/share/jenkins/jenkins.war
           2021-07-22 23:38:54.709+0000 [id=1] INFO org.eclipse.jetty.util.log.Log#initialized: Logging initialized @916ms to org.eclipse.jetty.util.log.JavaUtilLog
           exitCode: 0
           Scanning success.
           exitCode: 1
           exitCode: 1
           exitCode: 1
           exitCode: 1
           Terminated Kubernetes instance for agent jenkins/test-automation-virtual-device-virtual-device-e2e-390818--hmjsm
           Disconnected computer test-automation-virtual-device-virtual-device-e2e-390818–hmjsm
            
           From last one week, we are nightly restating jenkins controller but it hasnt helped.
            
            
          {quote}
          nikhilp Nikhil made changes -
          Description Since last 4 -5 months, we have been experiencing Jenkins Controller outages(across multiple versions). The kubernetes pods fails with exitcode 1 and self heals:

          +O/p from kube describe:+
          {quote}Containers:
           jenkins:
           Container ID: docker://f494b5939813d0582d7901894a16740178d6c905ff7f87db0bdbbacb63a64367
           Image: xxxxxxxxxx.dkr.ecr.ap-southeast-2.amazonaws.com/jenkins-controller:2.277.4-lts-alpine
           Image ID: docker-pullable://xxxxxxxx5.dkr.ecr.ap-southeast-2.amazonaws.com/jenkins-controller@sha256:56687bb853764312fe1f28d3b8c161738022f4e80da1b9b81ba82ba929426d1d
           Ports: 8080/TCP, 50000/TCP
           Host Ports: 0/TCP, 0/TCP
           Args:
           --httpPort=8080
           State: Running
           Started: Fri, 23 Jul 2021 10:24:42 +1000
           Last State: Terminated
           Reason: Error
           Exit Code: 1
           Started: Fri, 23 Jul 2021 09:38:37 +1000
           Finished: Fri, 23 Jul 2021 10:24:40 +1000
           Ready: True
           Restart Count: 2
           Limits:
           cpu: 3
           memory: 22Gi
           Requests:
           cpu: 2
           memory: 20Gi
           Liveness: http-get [http://:http/login] delay=0s timeout=5s period=20s #success=1 #failure=100
           Readiness: http-get [http://:http/login] delay=0s timeout=5s period=10s #success=1 #failure=100
           Startup: http-get [http://:http/login] delay=0s timeout=5s period=100s #success=1 #failure=5
           Environment:
           POD_NAME: jenkins-blue-0 (v1:metadata.name)
           JAVA_OPTS:
           JENKINS_OPTS:
           JENKINS_SLAVE_AGENT_PORT: 50000
           JAVA_OPTS: XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:+ExplicitGCInvokesConcurrent -XX:+ParallelRefProcEnabled -XX:+UseStringDeduplication -XX:MaxRAMPercentage=50.0 -Xloggc:/var/jenkins_home/log/gc%t.log -XX:NumberOfGCLogFiles=5 -XX:+UseGCLogFileRotation -XX:GCLogFileSize=20m -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCCause -XX:+PrintTenuringDistribution -XX:+PrintReferenceGC -XX:+PrintAdaptiveSizePolicy -XX:+UnlockDiagnosticVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:+AlwaysPreTouch -Duser.timezone=Australia/Sydney
           CASC_JENKINS_CONFIG: /var/jenkins_home/casc_configs
           Mounts:
           /var/jenkins_config from jenkins-config (ro)
           /var/jenkins_home from jenkins-home (rw)
           /var/run/secrets/kubernetes.io/serviceaccount from jenkins-controller-sa-token-67wfs (ro)
           Conditions:
           Type Status
           Initialized True
           Ready True
           ContainersReady True
           PodScheduled True
           Volumes:
           plugins:
           Type: EmptyDir (a temporary directory that shares a pod's lifetime)
           Medium:
           SizeLimit: <unset>
           jenkins-config:
           Type: ConfigMap (a volume populated by a ConfigMap)
           Name: jenkins-blue
           Optional: false
           jenkins-home:
           Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
           ClaimName: jenkins-blue
           ReadOnly: false
           sc-config-volume:
           Type: EmptyDir (a temporary directory that shares a pod's lifetime)
           Medium:
           SizeLimit: <unset>
           jenkins-controller-sa-token-67wfs:
           Type: Secret (a volume populated by a Secret)
           SecretName: jenkins-controller-sa-token-67wfs
           Optional: false
           QoS Class: Burstable
           Node-Selectors: <none>
           Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
           node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
           Events:
           Type Reason Age From Message
           ---- ------ ---- ---- -------
           Warning Unhealthy 45m (x9 over 91m) kubelet, ip-10-90-111-59.ap-southeast-2.compute.internal Liveness probe failed: Get "http://10.90.110.107:8080/login": dial tcp 10.90.110.107:8080: connect: connection refused
           Warning Unhealthy 45m (x19 over 91m) kubelet, ip-10-90-111-59.ap-southeast-2.compute.internal Readiness probe failed: Get "http://10.90.110.107:8080/login": dial tcp 10.90.110.107:8080: connect: connection refused
          {quote}
           

          There isnt much in *Jenkins core logs*, except the error:
          {quote}webroot: EnvVars.masterEnvVars.get("JENKINS_HOME")
           2021-07-22 23:38:54.897+0000 [id=1] INFO winstone.Logger#logInternal: Beginning extraction from war file
           Running from: /usr/share/jenkins/jenkins.war
           2021-07-22 23:38:54.709+0000 [id=1] INFO org.eclipse.jetty.util.log.Log#initialized: Logging initialized @916ms to org.eclipse.jetty.util.log.JavaUtilLog
           exitCode: 0
           Scanning success.
           exitCode: 1
           exitCode: 1
           exitCode: 1
           exitCode: 1
           Terminated Kubernetes instance for agent jenkins/test-automation-virtual-device-virtual-device-e2e-390818--hmjsm
           Disconnected computer test-automation-virtual-device-virtual-device-e2e-390818–hmjsm
            
           From last one week, we are nightly restating jenkins controller but it hasnt helped.
            
            
          {quote}
          Since last 4 -5 months, we have been experiencing Jenkins Controller outages(across multiple versions). The kubernetes pods fails with exitcode 1 and self heals:

          +O/p from kube describe:+
          {quote}Containers:
           jenkins:
           Container ID: docker://f494b5939813d0582d7901894a16740178d6c905ff7f87db0bdbbacb63a64367
           Image: xxxxxxxxxx.dkr.ecr.ap-southeast-2.amazonaws.com/jenkins-controller:2.277.4-lts-alpine
           Image ID: docker-pullable://xxxxxxxx5.dkr.ecr.ap-southeast-2.amazonaws.com/jenkins-controller@sha256:56687bb853764312fe1f28d3b8c161738022f4e80da1b9b81ba82ba929426d1d
           Ports: 8080/TCP, 50000/TCP
           Host Ports: 0/TCP, 0/TCP
           Args:
           --httpPort=8080
           State: Running
           Started: Fri, 23 Jul 2021 10:24:42 +1000
           Last State: Terminated
           Reason: Error
           Exit Code: 1
           Started: Fri, 23 Jul 2021 09:38:37 +1000
           Finished: Fri, 23 Jul 2021 10:24:40 +1000
           Ready: True
           Restart Count: 2
           Limits:
           cpu: 3
           memory: 22Gi
           Requests:
           cpu: 2
           memory: 20Gi
           Liveness: http-get [http://:http/login] delay=0s timeout=5s period=20s #success=1 #failure=100
           Readiness: http-get [http://:http/login] delay=0s timeout=5s period=10s #success=1 #failure=100
           Startup: http-get [http://:http/login] delay=0s timeout=5s period=100s #success=1 #failure=5
           Environment:
           POD_NAME: jenkins-blue-0 (v1:metadata.name)
           JAVA_OPTS:
           JENKINS_OPTS:
           JENKINS_SLAVE_AGENT_PORT: 50000
           JAVA_OPTS: XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:+ExplicitGCInvokesConcurrent -XX:+ParallelRefProcEnabled -XX:+UseStringDeduplication -XX:MaxRAMPercentage=50.0 -Xloggc:/var/jenkins_home/log/gc%t.log -XX:NumberOfGCLogFiles=5 -XX:+UseGCLogFileRotation -XX:GCLogFileSize=20m -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCCause -XX:+PrintTenuringDistribution -XX:+PrintReferenceGC -XX:+PrintAdaptiveSizePolicy -XX:+UnlockDiagnosticVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:+AlwaysPreTouch -Duser.timezone=Australia/Sydney
           CASC_JENKINS_CONFIG: /var/jenkins_home/casc_configs
           Mounts:
           /var/jenkins_config from jenkins-config (ro)
           /var/jenkins_home from jenkins-home (rw)
           /var/run/secrets/kubernetes.io/serviceaccount from jenkins-controller-sa-token-67wfs (ro)
           Conditions:
           Type Status
           Initialized True
           Ready True
           ContainersReady True
           PodScheduled True
           Volumes:
           plugins:
           Type: EmptyDir (a temporary directory that shares a pod's lifetime)
           Medium:
           SizeLimit: <unset>
           jenkins-config:
           Type: ConfigMap (a volume populated by a ConfigMap)
           Name: jenkins-blue
           Optional: false
           jenkins-home:
           Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
           ClaimName: jenkins-blue
           ReadOnly: false
           sc-config-volume:
           Type: EmptyDir (a temporary directory that shares a pod's lifetime)
           Medium:
           SizeLimit: <unset>
           jenkins-controller-sa-token-67wfs:
           Type: Secret (a volume populated by a Secret)
           SecretName: jenkins-controller-sa-token-67wfs
           Optional: false
           QoS Class: Burstable
           Node-Selectors: <none>
           Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
           node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
           Events:
           Type Reason Age From Message
           ---- ------ ---- ---- -------
           Warning Unhealthy 45m (x9 over 91m) kubelet, ip-10-90-111-59.ap-southeast-2.compute.internal Liveness probe failed: Get "http://10.90.110.107:8080/login": dial tcp 10.90.110.107:8080: connect: connection refused
           Warning Unhealthy 45m (x19 over 91m) kubelet, ip-10-90-111-59.ap-southeast-2.compute.internal Readiness probe failed: Get "http://10.90.110.107:8080/login": dial tcp 10.90.110.107:8080: connect: connection refused
          {quote}
           

          There isnt much in *Jenkins core logs*, except the error:
          {quote}webroot: EnvVars.masterEnvVars.get("JENKINS_HOME")
           2021-07-22 23:38:54.897+0000 [id=1] INFO winstone.Logger#logInternal: Beginning extraction from war file
           Running from: /usr/share/jenkins/jenkins.war
           2021-07-22 23:38:54.709+0000 [id=1] INFO org.eclipse.jetty.util.log.Log#initialized: Logging initialized @916ms to org.eclipse.jetty.util.log.JavaUtilLog
           exitCode: 0
           Scanning success.
           exitCode: 1
           exitCode: 1
           exitCode: 1
           exitCode: 1
           Terminated Kubernetes instance for agent jenkins/test-automation-virtual-device-virtual-device-e2e-390818--hmjsm
           Disconnected computer test-automation-virtual-device-virtual-device-e2e-390818–hmjsm
          {quote}
           


           From last one week, we are nightly restating jenkins controller but it hasnt helped.
            
            
          nikhilp Nikhil made changes -
          Attachment jenkins_plugins.txt [ 55260 ]

            People

            Assignee:
            Unassigned Unassigned
            Reporter:
            nikhilp Nikhil
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated: