Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-57086

Stuck, hanging, unkillable jobs in Jenkins

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • core, slack-plugin
    • None

      I have a job in the queue for 21 days now. According to the log, the build has failed:

      Mar 26, 2019 7:38:04 PM hudson.model.Run execute
      INFO: 0 Update R package docs #11580 main build action completed: FAILURE
      Mar 26, 2019 7:38:04 PM jenkins.plugins.slack.SlackNotifier perform
      INFO: Performing complete notifications
      

      But the job is still on the queue as running.

      I've tried everthing I could read on https://stackoverflow.com/questions/14456592/how-to-stop-an-unstoppable-zombie-job-on-jenkins-without-restarting-the-server, but it's still there.

      I will restart the server next monday, but opening this issue in the hope that something can be done.

      I couldn't find any other lines related to this job in the logs. The script itself terminated with an exit code different than zero (so it exited, was not hanging around).

      I've attached a gist with the /threadDump output. Searching for the job's name gives:

      Executor #10 for master : executing 0 Update R package docs #11580
              "Executor #10 for master : executing 0 Update R package docs #11580" Id=1961561 Group=main RUNNABLE (in native)
              	at java.net.SocketInputStream.socketRead0(Native Method)
              	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
              	at java.net.SocketInputStream.read(SocketInputStream.java:171)
              	at java.net.SocketInputStream.read(SocketInputStream.java:141)
              	at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
              	at sun.security.ssl.InputRecord.read(InputRecord.java:503)
              	at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:975)
              	-  locked java.lang.Object@2644728a
              	at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1367)
              	-  locked java.lang.Object@18cea778
              	at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1395)
              	at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1379)
              	at org.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:275)
              	at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:254)
              	at org.apache.http.impl.conn.HttpClientConnectionOperator.connect(HttpClientConnectionOperator.java:118)
              	at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:314)
              	at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:363)
              	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:219)
              	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:195)
              	at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:85)
              	at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:108)
              	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:186)
              	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
              	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
              	at jenkins.plugins.slack.StandardSlackService.publish(StandardSlackService.java:163)
              	at jenkins.plugins.slack.StandardSlackService.publish(StandardSlackService.java:104)
              	at jenkins.plugins.slack.ActiveNotifier.completed(ActiveNotifier.java:150)
              	at jenkins.plugins.slack.SlackNotifier.perform(SlackNotifier.java:444)
              	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
              	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:744)
              	at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:690)
              	at hudson.model.Build$BuildExecution.cleanUp(Build.java:196)
              	at hudson.model.Run.execute(Run.java:1863)
              	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
              	at hudson.model.ResourceController.execute(ResourceController.java:97)
              	at hudson.model.Executor.run(Executor.java:429)

      Inspecting this, the slack notification plugin becomes the suspect.

      Doing a netstat on the machine gives a lingering connection there to 99.84.75.163:443. After killing it with the following command:

      ss -K dst 99.84.75.163 dport = 443

      the job (and the associated thread in the thread dump) immediately disappeared.

            Unassigned Unassigned
            bra Attila Nagy
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: