Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-64347

Bounds exceeds available space exception after successful build-step

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Minor Minor
    • core, remoting
    • Master on RHEL7.6
      Worker on MacOS Big Sur
      Jenkins 2.249.3
    • 2.297, 2.287.2

      As previously reported in JENKINS-6423, I have an exception after my build successfully completes. It only started happening this weekend, only on the Mac worker, right after I updated to Big Sur... It doesn't happen to every build, just seemingly random across jobs. It also doesn't always happen after the whole job is finished, it can happen after X out of Y steps.

      20:53:05.897 FATAL: Bounds exceeds available space : size=1048576, offset=1048577
      20:53:05.898 Also:   hudson.remoting.Channel$CallSiteStackTrace: Remote call to worker-mobile-01
      20:53:05.898 		at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1800)
      20:53:05.898 		at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:357)
      20:53:05.898 		at hudson.remoting.Channel.call(Channel.java:1001)
      20:53:05.898 		at hudson.Launcher$RemoteLauncher.kill(Launcher.java:1147)
      20:53:05.898 		at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:510)
      20:53:05.898 		at hudson.model.Run.execute(Run.java:1894)
      20:53:05.898 		at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
      20:53:05.898 		at hudson.model.ResourceController.execute(ResourceController.java:97)
      20:53:05.898 		at hudson.model.Executor.run(Executor.java:428)
      20:53:05.898 java.lang.IndexOutOfBoundsException: Bounds exceeds available space : size=1048576, offset=1048577
      20:53:05.898 	at com.sun.jna.Memory.boundsCheck(Memory.java:221)
      20:53:05.898 	at com.sun.jna.Memory.getByte(Memory.java:443)
      20:53:05.898 	at hudson.util.ProcessTree$Darwin$DarwinProcess$1StringArrayMemory.readString(ProcessTree.java:1739)
      20:53:05.898 	at hudson.util.ProcessTree$Darwin$DarwinProcess.parse(ProcessTree.java:1811)
      20:53:05.898 	at hudson.util.ProcessTree$Darwin$DarwinProcess.getEnvironmentVariables(ProcessTree.java:1688)
      20:53:05.898 	at hudson.util.ProcessTree$OSProcess.hasMatchingEnvVars(ProcessTree.java:339)
      20:53:05.898 	at hudson.util.ProcessTree$Unix.killAll(ProcessTree.java:733)
      20:53:05.898 	at hudson.Launcher$RemoteLauncher$KillTask.call(Launcher.java:1164)
      20:53:05.898 	at hudson.Launcher$RemoteLauncher$KillTask.call(Launcher.java:1155)
      20:53:05.898 	at hudson.remoting.UserRequest.perform(UserRequest.java:211)
      20:53:05.898 	at hudson.remoting.UserRequest.perform(UserRequest.java:54)
      20:53:05.898 	at hudson.remoting.Request$2.run(Request.java:375)
      20:53:05.898 	at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:73)
      20:53:05.898 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      20:53:05.898 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      20:53:05.898 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      20:53:05.898 	at java.lang.Thread.run(Thread.java:748)
      

          [JENKINS-64347] Bounds exceeds available space exception after successful build-step

          Tim Jacomb added a comment -

          cc jthompson oleg_nenashev not sure what info you need here.

          Tim Jacomb added a comment - cc jthompson oleg_nenashev not sure what info you need here.

          Jeff Thompson added a comment -

          I really don't have much to suggest. I'm not terribly familiar with the ProcessTree class. This particular part of the code was last changed 12 years ago.

          My first guess with things like this is always that something changed with the environment, the system, network or something. It could be something changed with Big Sur, but I don't know that we can really be confident with that yet.

          It looks like it could be a memory allocation or consumption issue, but I'm not familiar enough with the ProcessTree code to know. Maybe memory is used differently on Big Sur than before – there are certainly instances of such behavior on significant OS upgrades.

          With the inconsistent nature of this problem, it's going to require good troubleshooting to isolate the problem. I don't have any specific troubleshooting ideas, though. If there's any way to make it reproducible that would help.

          Jeff Thompson added a comment - I really don't have much to suggest. I'm not terribly familiar with the ProcessTree class. This particular part of the code was last changed 12 years ago. My first guess with things like this is always that something changed with the environment, the system, network or something. It could be something changed with Big Sur, but I don't know that we can really be confident with that yet. It looks like it could be a memory allocation or consumption issue, but I'm not familiar enough with the ProcessTree code to know. Maybe memory is used differently on Big Sur than before – there are certainly instances of such behavior on significant OS upgrades. With the inconsistent nature of this problem, it's going to require good troubleshooting to isolate the problem. I don't have any specific troubleshooting ideas, though. If there's any way to make it reproducible that would help.

          Roland Asmann added a comment -

          I will keep an eye on it and see if I can find a pattern – maybe it's concurrent builds or some other thing I haven't noticed yet...

          Anyway, if anybody has some ideas for troubleshooting, let me know and I will try!

          Roland Asmann added a comment - I will keep an eye on it and see if I can find a pattern – maybe it's concurrent builds or some other thing I haven't noticed yet... Anyway, if anybody has some ideas for troubleshooting, let me know and I will try!

          My understanding of the code from a not too deep read is that, a combination of process arguments and environment variables exceed a megabyte in size.

          I'm not sure if this behaviour is normal or if my understanding is correct but perhaps this will narrow our search a little.

          Raihaan Shouhell added a comment - My understanding of the code from a not too deep read is that, a combination of process arguments and environment variables exceed a megabyte in size. I'm not sure if this behaviour is normal or if my understanding is correct but perhaps this will narrow our search a little.

          Roland Asmann added a comment -

          a combination of process arguments and environment variables exceed a megabyte in size

          Wouldn't that mean that it would be easy to reproduce? I mean, when a build fails with this error, I can just do a rebuild and most of the time it then finishes correctly – with the same arguments and environment. Or am I missing something here?

          Update on the issue itself: I rebooted the machine shortly after my last comment and all was quiet for about 4 days. Now it has started again, unfortunately with the same random behavior... Last night even my smallest job on that machine (runs every night, no parameters, just a simple 'cocoapods update') failed with this error. This is an old-school matrix-build, but I don't think that makes a real difference.

          Any suggestions as to how I might figure out the size of the arguments and environment on the jobs/runs?

          Roland Asmann added a comment - a combination of process arguments and environment variables exceed a megabyte in size Wouldn't that mean that it would be easy to reproduce? I mean, when a build fails with this error, I can just do a rebuild and most of the time it then finishes correctly – with the same arguments and environment. Or am I missing something here? Update on the issue itself: I rebooted the machine shortly after my last comment and all was quiet for about 4 days. Now it has started again, unfortunately with the same random behavior... Last night even my smallest job on that machine (runs every night, no parameters, just a simple 'cocoapods update') failed with this error. This is an old-school matrix-build, but I don't think that makes a real difference. Any suggestions as to how I might figure out the size of the arguments and environment on the jobs/runs?

          AFAICT you can't change KERN_ARGMAX.

          I agree it should be easy to reproduce if that was in fact the case. It could be a more subtle bug.

          Raihaan Shouhell added a comment - AFAICT you can't change KERN_ARGMAX. I agree it should be easy to reproduce if that was in fact the case. It could be a more subtle bug.

          Roland Asmann added a comment -

          This has been annoying us more and more, so I went Googling and found another old issue: JENKINS-9634

          I also found this: https://stackoverflow.com/questions/6147815/jenkins-throwing-indexoutofboundsexception-at-end-of-build-on-mac, which suggests disabling the ProcessTreeKiller (which apparently has been renamed since – read that somewhere else, but don't have a link atm). Does anybody know if it would be possible to disable this on only the node running Mac or do I need to do it for the whole instance of Jenkins?

          Since I am not sure if disabling the PTK has any (negative) effects, I don't really want to do it on the whole instance...

          Roland Asmann added a comment - This has been annoying us more and more, so I went Googling and found another old issue: JENKINS-9634 I also found this: https://stackoverflow.com/questions/6147815/jenkins-throwing-indexoutofboundsexception-at-end-of-build-on-mac, which suggests disabling the ProcessTreeKiller (which apparently has been renamed since – read that somewhere else, but don't have a link atm). Does anybody know if it would be possible to disable this on only the node running Mac or do I need to do it for the whole instance of Jenkins? Since I am not sure if disabling the PTK has any (negative) effects, I don't really want to do it on the whole instance...

          Tim Jacomb added a comment -

          It might be worth trying the latest weekly release, we’ve updated JNA to the latest version recently

          Tim Jacomb added a comment - It might be worth trying the latest weekly release, we’ve updated JNA to the latest version recently

          Roland Asmann added a comment -

          I'll give that a try next week – Do I just need the core or also some plugins?

          Roland Asmann added a comment - I'll give that a try next week – Do I just need the core or also some plugins?

          Tim Jacomb added a comment -

          A number of plugins need updating too

          Tim Jacomb added a comment - A number of plugins need updating too

          David D added a comment - - edited

          I added this when launch jenkins, now only seeing this issue for tasks running on nodes.
             -Dhudson.util.ProcessTreeKiller.disable=true

          • could reproduce it with a dump task, say, just echo a string in the build script.  even the node is on the same machine (the server)
          • and this happens for both jenkins 1 (lte) and jenkins2

          -David

          David D added a comment - - edited I added this when launch jenkins, now only seeing this issue for tasks running on nodes.     -Dhudson.util.ProcessTreeKiller.disable=true could reproduce it with a dump task, say, just echo a string in the build script.  even the node is on the same machine (the server) and this happens for both jenkins 1 (lte) and jenkins2 -David

          The

          -Dhudson.util.ProcessTreeKiller.disable=true

           should be an option of the launch agent on the node side

          No errors since one week since option added to the macOS "LaunchAgent" command in a shell script :

          java -Dhudson.util.ProcessTreeKiller.disable=true -jar agent.jar -jnlpUrl https://REDACTED/slave-agent.jnlp -secret REDACTED -noCertificateCheck

          Richard Bergoin added a comment - The -Dhudson.util.ProcessTreeKiller.disable= true  should be an option of the launch agent on the node side No errors since one week since option added to the macOS "LaunchAgent" command in a shell script : java -Dhudson.util.ProcessTreeKiller.disable= true -jar agent.jar -jnlpUrl https: //REDACTED/slave-agent.jnlp -secret REDACTED -noCertificateCheck

          David D added a comment - - edited

          thanks kenji,  it works when launch agent with that option
          (actually need to have it for both, launching server and each node...)

          David D added a comment - - edited thanks kenji ,  it works when launch agent with that option (actually need to have it for both, launching server and each node...)

          Zhao added a comment -

          kenji How to add?

          Zhao added a comment - kenji  How to add?

          simply download it from https://your-jenkins.url/jnlpJars/agent.jar
           

          Richard Bergoin added a comment - simply download it from  https://your-jenkins.url/jnlpJars/agent.jar  

          Sagar added a comment - - edited

          kenji
          I tried downloading the agent.jar file from https://localhost:8080/jnlpJars/agent.jar
          1. My mac tells me that this file can't be trusted and doesn't let me run this file. Is there any other way to run this file? I tried running it by double clicking it.
          2. Where exactly do you add this command on the launch agent?
          java -Dhudson.util.ProcessTreeKiller.disable=true -jar agent.jar -jnlpUrl https://REDACTED/slave-agent.jnlp -secret REDACTED -noCertificateCheck
          Is there a certain file I should add this to? What is this REDACTED part of the command? Running it in the terminal throws unknown exception REDACTED.

          We only have one master agent - no other nodes/slave agent.

          Sagar added a comment - - edited kenji I tried downloading the agent.jar file from https://localhost:8080/jnlpJars/agent.jar 1. My mac tells me that this file can't be trusted and doesn't let me run this file. Is there any other way to run this file? I tried running it by double clicking it. 2. Where exactly do you add this command on the launch agent? java -Dhudson.util.ProcessTreeKiller.disable=true -jar agent.jar -jnlpUrl https://REDACTED/slave-agent.jnlp -secret REDACTED -noCertificateCheck Is there a certain file I should add this to? What is this REDACTED part of the command? Running it in the terminal throws unknown exception REDACTED. We only have one master agent - no other nodes/slave agent.

          You can follow this blog post explaining how to configure a new node in jenkins and how to run a macOS launch agent : https://mgrebenets.github.io/mobile%20ci/2015/02/01/jenkins-remote-node

          I prefer running some shell script to have a cleaner launchd plist file.

          The REDACTED following -secret is redacted to keep this secret token... secret.

          Richard Bergoin added a comment - You can follow this blog post explaining how to configure a new node in jenkins and how to run a macOS launch agent : https://mgrebenets.github.io/mobile%20ci/2015/02/01/jenkins-remote-node I prefer running some shell script to have a cleaner launchd plist file. The REDACTED following -secret is redacted to keep this secret token... secret.

          Sagar added a comment -

          kenji
          Is there a way to solve the memory-bound error on the main server (that has the master) without using/setting up nodes? We would prefer to use this machine for all of our builds and not use any Jenkins slaves.

          Sagar added a comment - kenji Is there a way to solve the memory-bound error on the main server (that has the master) without using/setting up nodes? We would prefer to use this machine for all of our builds and not use any Jenkins slaves.

          Tim Jacomb added a comment -

          This should be fixed in recent weekly versions and not need these work arounds. It will be in LTS next Wednesday

          Tim Jacomb added a comment - This should be fixed in recent weekly versions and not need these work arounds. It will be in LTS next Wednesday

            kenji Richard Bergoin
            malice00 Roland Asmann
            Votes:
            3 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: