Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-57086

Stuck, hanging, unkillable jobs in Jenkins

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Component/s: core, slack-plugin
    • Labels:
      None
    • Environment:
    • Similar Issues:

      Description

      I have a job in the queue for 21 days now. According to the log, the build has failed:

      Mar 26, 2019 7:38:04 PM hudson.model.Run execute
      INFO: 0 Update R package docs #11580 main build action completed: FAILURE
      Mar 26, 2019 7:38:04 PM jenkins.plugins.slack.SlackNotifier perform
      INFO: Performing complete notifications
      

      But the job is still on the queue as running.

      I've tried everthing I could read on https://stackoverflow.com/questions/14456592/how-to-stop-an-unstoppable-zombie-job-on-jenkins-without-restarting-the-server, but it's still there.

      I will restart the server next monday, but opening this issue in the hope that something can be done.

      I couldn't find any other lines related to this job in the logs. The script itself terminated with an exit code different than zero (so it exited, was not hanging around).

      I've attached a gist with the /threadDump output. Searching for the job's name gives:

      Executor #10 for master : executing 0 Update R package docs #11580
              "Executor #10 for master : executing 0 Update R package docs #11580" Id=1961561 Group=main RUNNABLE (in native)
              	at java.net.SocketInputStream.socketRead0(Native Method)
              	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
              	at java.net.SocketInputStream.read(SocketInputStream.java:171)
              	at java.net.SocketInputStream.read(SocketInputStream.java:141)
              	at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
              	at sun.security.ssl.InputRecord.read(InputRecord.java:503)
              	at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:975)
              	-  locked java.lang.Object@2644728a
              	at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1367)
              	-  locked java.lang.Object@18cea778
              	at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1395)
              	at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1379)
              	at org.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:275)
              	at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:254)
              	at org.apache.http.impl.conn.HttpClientConnectionOperator.connect(HttpClientConnectionOperator.java:118)
              	at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:314)
              	at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:363)
              	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:219)
              	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:195)
              	at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:85)
              	at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:108)
              	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:186)
              	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
              	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
              	at jenkins.plugins.slack.StandardSlackService.publish(StandardSlackService.java:163)
              	at jenkins.plugins.slack.StandardSlackService.publish(StandardSlackService.java:104)
              	at jenkins.plugins.slack.ActiveNotifier.completed(ActiveNotifier.java:150)
              	at jenkins.plugins.slack.SlackNotifier.perform(SlackNotifier.java:444)
              	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
              	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:744)
              	at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:690)
              	at hudson.model.Build$BuildExecution.cleanUp(Build.java:196)
              	at hudson.model.Run.execute(Run.java:1863)
              	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
              	at hudson.model.ResourceController.execute(ResourceController.java:97)
              	at hudson.model.Executor.run(Executor.java:429)

      Inspecting this, the slack notification plugin becomes the suspect.

      Doing a netstat on the machine gives a lingering connection there to 99.84.75.163:443. After killing it with the following command:

      ss -K dst 99.84.75.163 dport = 443

      the job (and the associated thread in the thread dump) immediately disappeared.

        Attachments

          Activity

          There are no comments yet on this issue.

            People

            Assignee:
            Unassigned Unassigned
            Reporter:
            bra Attila Nagy
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: