Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-42945

Unable to abort a job while post-build notifier is running

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • logstash-plugin
    • None

      If one enters a wrong port number into global tool configuration the builds will get stuck for a long time in the post-build step (if enabled).

      This happened to us a few times both with the syslog and logstash backends.

      The only way we were able to kill the job stuck in the logstash post-build step was by restarting Jenkins.

      What is even worse: it seems other builds (in different projects, but also with logstash configured)  wouldn't start because of the one build that was stuck.

       

      I think the "unable to abort" problem is caused by the underlying IO operations not supporting interrupts.

      • Apache HC supports aborting the connection in 4.x, although IDK if it's enabled for default clients created with HttpClientBuilder.create()
      • I'm not sure now how java.net.DatagramSocket (used via UdpSyslogMessageSender) behaves re. interrupts right now

       

      I'm confused about the global contention point though. I can't find anything obvious in plugin code that would explain this, so I expect it's some interaction with the plugin code and how jenkins handles post-build steps and global tools?

       EDIT: I realized the jobs were just hanging on the socket IO independently of each other, there is no global contention point

          [JENKINS-42945] Unable to abort a job while post-build notifier is running

          I know this project is marked as waiting for adoption, but some pointers on where to start looking would be great

          Jakub Bochenski added a comment - I know this project is marked as waiting for adoption, but some pointers on where to start looking would be great

          I wanted to take a stab at fixing this but now I can't reproduce the error, tried:

          • wrong port
          • HTTPS instead of HTTP
          • wrong ip
          • ES and SYSLOG backends
          • wrapper and postbuild action

          It seems there might have been some additional factors (e.g. load, network config) involved that are not present on a test instance that I was trying to reproduce it with.
          Leaving this here in case somebody else hits this problem.

          Jakub Bochenski added a comment - I wanted to take a stab at fixing this but now I can't reproduce the error, tried: wrong port HTTPS instead of HTTP wrong ip ES and SYSLOG backends wrapper and postbuild action It seems there might have been some additional factors (e.g. load, network config) involved that are not present on a test instance that I was trying to reproduce it with. Leaving this here in case somebody else hits this problem.

          Andrey Smirnov added a comment - - edited

          Has faced with the same issue, in my case slaves are provided using mesos plugin

           

          UPD: in my case it was network issue - wrong host, so plugin opened http connection and did not terminate.

          Andrey Smirnov added a comment - - edited Has faced with the same issue, in my case slaves are provided using mesos plugin   UPD:  in my case it was network issue - wrong host, so plugin opened http connection and did not terminate.

            rgerard Rusty Gerard
            jbochenski Jakub Bochenski
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: