Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-64178

Agent disconnects during a build due to JVM crash

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Component/s: remoting, ws-cleanup-plugin
    • Labels:
      None
    • Environment:
    • Similar Issues:

      Description

      Jenkins brings all 10 nodes online. A build is started on one of the nodes. The JVM crashes, disconnecting the node, and failing the build.

      Steps to reproduce:

      1. Jenkins brings each node online (see file nwb-sol11-test1_connection_log.txt)
      2. Directory /x1/jenkins/agent_directory/remoting/jarCache is populated with multiple directories containing jar files(see file jarCache_directory_before_build.txt)
      3. Start a build that runs on a node (nwb-sol11-test1 in this case) (see files job_config.xml and nwb-soll1-test1.config.xml)
      4. Very early on in the build execution, the JVM running the remoting.jar file crashes causing the build to fail (see file build_log.txt)

      Once the build starts on the node, directory /x1/jenkins/agent_directory/remoting/jarCache is populated with even more directories containing jar files - all jar files have their modification timestamps updated (see file jarCache_directory_after_build.txt). The assumption is that existing jar files in the cache are rewritten during the build.

      The JVM crash generates a core file. The core file indicates the JVM crashes due to signal 10 (SIGBUS) (see file jvm_core_file_where.txt).

      The following truss command was run against the PID of the JVM before starting the build: truss -a -d -D -E -f -o /x1/truss.out -p <jvm PID>
      The truss output shows the JVM incurs a fault when trying to read a file in /x1/jenkins/agent_directory/remoting/jarCache (see file truss.txt).
      Note these lines in the truss output:

      1818/220: 98.175569 0.000047 0.000017 stat("/x1/jenkins/agent_directory/remoting/jarCache/DD/891A07A8C64C7162B75516CD586859.jar", 0xFFFF80FF9F9CCAB0) = 0
      1818/220: 98.175657 0.000088 0.000016 lseek(27, 0x0016D598, SEEK_SET) = 1496472
      1818/220: 98.175703 0.000046 0.000021 read(27, " P K01021403\n\0\0\b\b\0".., 160) = 160
      1818/220: 98.175766 0.000063 0.000015 lseek(27, 0x0001918B, SEEK_SET) = 102795
      1818/220: 98.175808 0.000042 0.000018 read(27, " P K0304\n\0\0\b\b\0 o a".., 30) = 30
      1818/220: 98.175857 0.000049 0.000015 lseek(27, 0x000191D4, SEEK_SET) = 102868
      1818/220: 98.175914 0.000057 0.000017 read(27, "8D92 M OC2 @1086DF85 BA1".., 353) = 353
      1818/220: 98.176729 0.000815 0.000832 Incurred fault #5, FLTACCESS %pc = 0xFFFF80FFBD63F8C0
      1818/220: siginfo: SIGBUS BUS_OBJERR addr=0xFFFF80FFBD63F8C0 errno=151(ESTALE)
      1818/220: 98.177078 0.000349 0.001182 Received signal #10, SIGBUS [caught]
      1818/220: siginfo: SIGBUS BUS_OBJERR addr=0xFFFF80FFBD63F8C0 errno=151(ESTALE)
      

      It appears as if the updating of the jar files in the jarCache directory while the JVM is running is triggering a stale file handle error in the JVM, ultimately causing the JVM to crash.

      The JVM is run with property "-Dsun.zip.disableMemoryMapping=true" in an attempt to avoid the crash. It does not help.

      This example illustrates how the JVM crashes on one of the 10 Solaris intel nodes.  It happens on all 10 nodes (both Solaris 10 intel and Solaris 11 intel).

      Note that while this crash happens often, the odd time a build will not crash and run to completion.  This might suggest a timing issue.

        Attachments

        1. build_log.txt
          3 kB
        2. jarCache_directory_after_build.txt
          3 kB
        3. jarCache_directory_before_build.txt
          2 kB
        4. job_config.xml
          10 kB
        5. jvm_core_file_where.txt
          5 kB
        6. nwb-sol11-test1_config.xml
          0.9 kB
        7. nwb-sol11-test1_connection_log.txt
          7 kB
        8. truss.txt
          4.38 MB

          Activity

          Hide
          ifernandezcalvo Ivan Fernandez Calvo added a comment -

          I seem not related to SSH build agents or SSH Agents plugins, it is a JDK or remoting issue. The trace points to some kind of error accessing to the jar cache, Is you "/x1/jenkins/agent_directory/" folder in a network drive (NFS/Samba/...)? also Has this configuration been working or it is a new configuration?

          1818/220:	98.174970     0.042947     0.042811    lwp_cond_wait(0x011C2448, 0x011C2430, 0xFFFF80FF9F9CCC40, 0) = 0
          1818/220:	98.175151     0.000181     0.000015    lwp_setname(220, "pool-1-thread-187 for channel i") = 0
          1818/220:	98.175266     0.000115     0.000022    stat("/x1/jenkins/agent_directory/remoting/jarCache/DD/891A07A8C64C7162B75516CD586859.jar", 0xFFFF80FF9F9CCBD0) = 0
          1818/220:	98.175341     0.000075     0.000021    open("/x1/jenkins/agent_directory/remoting/jarCache/DD/891A07A8C64C7162B75516CD586859.jar", O_RDONLY) = 43
          1818/220:	98.175395     0.000054     0.000016    fstat(43, 0xFFFF80FF9F9CCB50)			= 0
          1818/220:	98.175437     0.000042     0.000014    uucopy(0xFFFF80FF9F9CCBB0, 0xFFFF80FF9F9CCB50, 32) = 0
          1818/220:	98.175481     0.000044     0.000021    futimens(43, 0xFFFF80FF9F9CCB70)		= 0
          1818/220:	98.175522     0.000041     0.000015    close(43)					= 0
          1818/220:	98.175569     0.000047     0.000017    stat("/x1/jenkins/agent_directory/remoting/jarCache/DD/891A07A8C64C7162B75516CD586859.jar", 0xFFFF80FF9F9CCAB0) = 0
          1818/220:	98.175657     0.000088     0.000016    lseek(27, 0x0016D598, SEEK_SET)			= 1496472
          1818/220:	98.175703     0.000046     0.000021    read(27, " P K01021403\n\0\0\b\b\0".., 160)	= 160
          1818/220:	98.175766     0.000063     0.000015    lseek(27, 0x0001918B, SEEK_SET)			= 102795
          1818/220:	98.175808     0.000042     0.000018    read(27, " P K0304\n\0\0\b\b\0 o a".., 30)	= 30
          1818/220:	98.175857     0.000049     0.000015    lseek(27, 0x000191D4, SEEK_SET)			= 102868
          1818/220:	98.175914     0.000057     0.000017    read(27, "8D92 M OC2 @1086DF85 BA1".., 353)	= 353
          1818/220:	98.176729     0.000815     0.000832        Incurred fault #5, FLTACCESS  %pc = 0xFFFF80FFBD63F8C0
          1818/220:	      siginfo: SIGBUS BUS_OBJERR addr=0xFFFF80FFBD63F8C0 errno=151(ESTALE)
          1818/220:	98.177078     0.000349     0.001182        Received signal #10, SIGBUS [caught]
          1818/220:	      siginfo: SIGBUS BUS_OBJERR addr=0xFFFF80FFBD63F8C0 errno=151(ESTALE)
          1818/220:	98.177126     0.000048     0.000014    lwp_sigmask(SIG_SETMASK, 0xFFBFFEFF, 0xFFFFFFF7, 0x000000FF, 0x00000000) = 0xFFBFFEFF [0xFFFFFFFF]
          1818/220:	98.177170     0.000044     0.000015    sigaction(SIGBUS, 0x00000000, 0xFFFF80FF9F9DD180) = 0
          1818/220:	98.177210     0.000040     0.000014    lwp_sigmask(SIG_SETMASK, 0xFFBFFCFF, 0xFFFFFFF7, 0x000000FF, 0x00000000) = 0xFFBFFEFF [0xFFFFFFFF]
          1818/220:	98.177319     0.000109     0.000029    getcwd("/x1/jenkins", 2000)			= 0
          1818/220:	98.177372     0.000053     0.000016    getrlimit(RLIMIT_CORE, 0xFFFF80FF9F9DCE60)	= 0
          1818/220:	98.177414     0.000042     0.000014    sigaction(SIGSEGV, 0x00000000, 0xFFFF80FF9F9DCE40) = 0
          1818/220:	98.177453     0.000039     0.000014    sigaction(SIGBUS, 0x00000000, 0xFFFF80FF9F9DCE60) = 0
          1818/220:	98.177492     0.000039     0.000014    sigaction(SIGSEGV, 0xFFFF80FF9F9DCD70, 0xFFFF80FF9F9DCE00) = 0
          1818/220:	98.177530     0.000038     0.000014    sigaction(SIGBUS, 0xFFFF80FF9F9DCD70, 0xFFFF80FF9F9DCE00) = 0
          1818/220:	98.177660     0.000130     0.000032    write(1, " #\n", 2)				= 2
          1818/220:	98.177743     0.000083     0.000051    write(1, " #   A   f a t a l   e r".., 67)	= 67
          1818/220:	98.177811     0.000068     0.000034    write(1, " #\n", 2)				= 2
          1818/220:	98.177875     0.000064     0.000032    write(1, " #    ", 3)				= 3
          1818/220:	98.177949     0.000074     0.000030    sysconfig(_CONFIG_SIGRT_MIN)			= 41
          1818/220:	98.177996     0.000047     0.000017    write(1, " S I G B U S", 6)			= 6
          1818/220:	98.178066     0.000070     0.000019    write(1, "   ( 0 x a )", 6)			= 6
          1818/220:	98.178147     0.000081     0.000053    write(1, "   a t   p c = 0 x f f f".., 25)	= 25
          1818/220:	98.178211     0.000064     0.000032    write(1, " ,   p i d = 1 8 1 8", 10)		= 10
          1818/220:	98.178274     0.000063     0.000031    write(1, " ,   t i d = 0 x 0 0 0 0".., 24)	= 24
          1818/220:	98.178423     0.000149     0.000119    write(1, "\n", 1)				= 1
          1818/220:	98.178584     0.000161     0.000022    write(1, " #\n", 2)				= 2
          1818/220:	98.178639     0.000055     0.000018    write(1, " #   J R E   v e r s i o".., 43)	= 43
          1818/220:	98.178688     0.000049     0.000018    write(1, " ( Z u l u   8 . 4 8 . 0".., 28)	= 28
          1818/220:	98.178735     0.000047     0.000016    write(1, " ( 8 . 0 _ 2 6 5 - b 1 1".., 36)	= 36
          1818/220:	98.178788     0.000053     0.000018    write(1, " #   J a v a   V M :   O".., 90)	= 90
          1818/220:	98.178838     0.000050     0.000018    write(1, " #   P r o b l e m a t i".., 21)	= 21
          1818/220:	98.178903     0.000065     0.000018    write(1, " #  ", 2)				= 2
          1818/220:	98.179274     0.000371     0.000389        Incurred fault #5, FLTACCESS  %pc = 0xFFFF80FFBF5BA192
          1818/220:	      siginfo: SIGBUS BUS_OBJERR addr=0xFFFF80FFBD635644 errno=151(ESTALE)
          1818/220:	98.179308     0.000034     0.000423        Received signal #10, SIGBUS [caught]
          1818/220:	      siginfo: SIGBUS BUS_OBJERR addr=0xFFFF80FFBD635644 errno=151(ESTALE)
          1818/220:	98.179668     0.000360     0.000329        Incurred fault #5, FLTACCESS  %pc = 0xFFFF80FFBF5BA192
          1818/220:	      siginfo: SIGBUS BUS_OBJERR addr=0xFFFF80FFBD635644 errno=151(ESTALE)
          1818/220:	98.179698     0.000030     0.000359        Received signal #10, SIGBUS [default]
          1818/220:	      siginfo: SIGBUS BUS_OBJERR addr=0xFFFF80FFBD635644 errno=151(ESTALE)
          1818/220:	98.179698     0.000000     0.000359    setcontext(0xFFFF80FF9F9DB170)
          
          Show
          ifernandezcalvo Ivan Fernandez Calvo added a comment - I seem not related to SSH build agents or SSH Agents plugins, it is a JDK or remoting issue. The trace points to some kind of error accessing to the jar cache, Is you "/x1/jenkins/agent_directory/" folder in a network drive (NFS/Samba/...)? also Has this configuration been working or it is a new configuration? 1818/220: 98.174970 0.042947 0.042811 lwp_cond_wait(0x011C2448, 0x011C2430, 0xFFFF80FF9F9CCC40, 0) = 0 1818/220: 98.175151 0.000181 0.000015 lwp_setname(220, "pool-1-thread-187 for channel i" ) = 0 1818/220: 98.175266 0.000115 0.000022 stat( "/x1/jenkins/agent_directory/remoting/jarCache/DD/891A07A8C64C7162B75516CD586859.jar" , 0xFFFF80FF9F9CCBD0) = 0 1818/220: 98.175341 0.000075 0.000021 open( "/x1/jenkins/agent_directory/remoting/jarCache/DD/891A07A8C64C7162B75516CD586859.jar" , O_RDONLY) = 43 1818/220: 98.175395 0.000054 0.000016 fstat(43, 0xFFFF80FF9F9CCB50) = 0 1818/220: 98.175437 0.000042 0.000014 uucopy(0xFFFF80FF9F9CCBB0, 0xFFFF80FF9F9CCB50, 32) = 0 1818/220: 98.175481 0.000044 0.000021 futimens(43, 0xFFFF80FF9F9CCB70) = 0 1818/220: 98.175522 0.000041 0.000015 close(43) = 0 1818/220: 98.175569 0.000047 0.000017 stat( "/x1/jenkins/agent_directory/remoting/jarCache/DD/891A07A8C64C7162B75516CD586859.jar" , 0xFFFF80FF9F9CCAB0) = 0 1818/220: 98.175657 0.000088 0.000016 lseek(27, 0x0016D598, SEEK_SET) = 1496472 1818/220: 98.175703 0.000046 0.000021 read(27, " P K01021403\n\0\0\b\b\0" .., 160) = 160 1818/220: 98.175766 0.000063 0.000015 lseek(27, 0x0001918B, SEEK_SET) = 102795 1818/220: 98.175808 0.000042 0.000018 read(27, " P K0304\n\0\0\b\b\0 o a" .., 30) = 30 1818/220: 98.175857 0.000049 0.000015 lseek(27, 0x000191D4, SEEK_SET) = 102868 1818/220: 98.175914 0.000057 0.000017 read(27, "8D92 M OC2 @1086DF85 BA1" .., 353) = 353 1818/220: 98.176729 0.000815 0.000832 Incurred fault #5, FLTACCESS %pc = 0xFFFF80FFBD63F8C0 1818/220: siginfo: SIGBUS BUS_OBJERR addr=0xFFFF80FFBD63F8C0 errno=151(ESTALE) 1818/220: 98.177078 0.000349 0.001182 Received signal #10, SIGBUS [caught] 1818/220: siginfo: SIGBUS BUS_OBJERR addr=0xFFFF80FFBD63F8C0 errno=151(ESTALE) 1818/220: 98.177126 0.000048 0.000014 lwp_sigmask(SIG_SETMASK, 0xFFBFFEFF, 0xFFFFFFF7, 0x000000FF, 0x00000000) = 0xFFBFFEFF [0xFFFFFFFF] 1818/220: 98.177170 0.000044 0.000015 sigaction(SIGBUS, 0x00000000, 0xFFFF80FF9F9DD180) = 0 1818/220: 98.177210 0.000040 0.000014 lwp_sigmask(SIG_SETMASK, 0xFFBFFCFF, 0xFFFFFFF7, 0x000000FF, 0x00000000) = 0xFFBFFEFF [0xFFFFFFFF] 1818/220: 98.177319 0.000109 0.000029 getcwd( "/x1/jenkins" , 2000) = 0 1818/220: 98.177372 0.000053 0.000016 getrlimit(RLIMIT_CORE, 0xFFFF80FF9F9DCE60) = 0 1818/220: 98.177414 0.000042 0.000014 sigaction(SIGSEGV, 0x00000000, 0xFFFF80FF9F9DCE40) = 0 1818/220: 98.177453 0.000039 0.000014 sigaction(SIGBUS, 0x00000000, 0xFFFF80FF9F9DCE60) = 0 1818/220: 98.177492 0.000039 0.000014 sigaction(SIGSEGV, 0xFFFF80FF9F9DCD70, 0xFFFF80FF9F9DCE00) = 0 1818/220: 98.177530 0.000038 0.000014 sigaction(SIGBUS, 0xFFFF80FF9F9DCD70, 0xFFFF80FF9F9DCE00) = 0 1818/220: 98.177660 0.000130 0.000032 write(1, " #\n" , 2) = 2 1818/220: 98.177743 0.000083 0.000051 write(1, " # A f a t a l e r" .., 67) = 67 1818/220: 98.177811 0.000068 0.000034 write(1, " #\n" , 2) = 2 1818/220: 98.177875 0.000064 0.000032 write(1, " # " , 3) = 3 1818/220: 98.177949 0.000074 0.000030 sysconfig(_CONFIG_SIGRT_MIN) = 41 1818/220: 98.177996 0.000047 0.000017 write(1, " S I G B U S" , 6) = 6 1818/220: 98.178066 0.000070 0.000019 write(1, " ( 0 x a )" , 6) = 6 1818/220: 98.178147 0.000081 0.000053 write(1, " a t p c = 0 x f f f" .., 25) = 25 1818/220: 98.178211 0.000064 0.000032 write(1, " , p i d = 1 8 1 8" , 10) = 10 1818/220: 98.178274 0.000063 0.000031 write(1, " , t i d = 0 x 0 0 0 0" .., 24) = 24 1818/220: 98.178423 0.000149 0.000119 write(1, "\n" , 1) = 1 1818/220: 98.178584 0.000161 0.000022 write(1, " #\n" , 2) = 2 1818/220: 98.178639 0.000055 0.000018 write(1, " # J R E v e r s i o" .., 43) = 43 1818/220: 98.178688 0.000049 0.000018 write(1, " ( Z u l u 8 . 4 8 . 0" .., 28) = 28 1818/220: 98.178735 0.000047 0.000016 write(1, " ( 8 . 0 _ 2 6 5 - b 1 1" .., 36) = 36 1818/220: 98.178788 0.000053 0.000018 write(1, " # J a v a V M : O" .., 90) = 90 1818/220: 98.178838 0.000050 0.000018 write(1, " # P r o b l e m a t i" .., 21) = 21 1818/220: 98.178903 0.000065 0.000018 write(1, " # " , 2) = 2 1818/220: 98.179274 0.000371 0.000389 Incurred fault #5, FLTACCESS %pc = 0xFFFF80FFBF5BA192 1818/220: siginfo: SIGBUS BUS_OBJERR addr=0xFFFF80FFBD635644 errno=151(ESTALE) 1818/220: 98.179308 0.000034 0.000423 Received signal #10, SIGBUS [caught] 1818/220: siginfo: SIGBUS BUS_OBJERR addr=0xFFFF80FFBD635644 errno=151(ESTALE) 1818/220: 98.179668 0.000360 0.000329 Incurred fault #5, FLTACCESS %pc = 0xFFFF80FFBF5BA192 1818/220: siginfo: SIGBUS BUS_OBJERR addr=0xFFFF80FFBD635644 errno=151(ESTALE) 1818/220: 98.179698 0.000030 0.000359 Received signal #10, SIGBUS [ default ] 1818/220: siginfo: SIGBUS BUS_OBJERR addr=0xFFFF80FFBD635644 errno=151(ESTALE) 1818/220: 98.179698 0.000000 0.000359 setcontext(0xFFFF80FF9F9DB170)
          Hide
          ifernandezcalvo Ivan Fernandez Calvo added a comment -

          Did you try to disable the cleanWs at the end of the job?

          [WS-CLEANUP] Deleting project workspace...
          [WS-CLEANUP] Deferred wipeout is used...
          FATAL: java.io.IOException: Unexpected termination of the channel
          java.io.EOFException
          	at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2759)
          	at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3234)
          	at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:913)
          	at java.io.ObjectInputStream.<init>(ObjectInputStream.java:375)
          	at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
          	at hudson.remoting.Command.readFrom(Command.java:142)
          	at hudson.remoting.Command.readFrom(Command.java:128)
          	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35)
          	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
          Caused: java.io.IOException: Unexpected termination of the channel
          	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
          Also:   hudson.remoting.Channel$CallSiteStackTrace: Remote call to nwb-sol11-test1
          		at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1800)
          		at hudson.remoting.Request.call(Request.java:198)
          		at hudson.remoting.Channel.call(Channel.java:1000)
          		at hudson.FilePath.act(FilePath.java:1070)
          		at hudson.FilePath.act(FilePath.java:1059)
          		at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:927)
          		at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:857)
          		at hudson.scm.SCM.checkout(SCM.java:505)
          		at hudson.model.AbstractProject.checkout(AbstractProject.java:1206)
          		at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:574)
          		at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
          		at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:499)
          		at hudson.model.Run.execute(Run.java:1894)
          		at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
          		at hudson.model.ResourceController.execute(ResourceController.java:97)
          		at hudson.model.Executor.run(Executor.java:428)
          Caused: hudson.remoting.RequestAbortedException
          	at hudson.remoting.Request.abort(Request.java:344)
          	at hudson.remoting.Channel.terminate(Channel.java:1085)
          	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:94)
          [Checks API] No suitable checks publisher found.
          ERROR: Step �Delete workspace when build is done� failed: no workspace for VAPP_INTERNAL_ORBIX3_TEST #1
          Finished: FAILURE
          
          Show
          ifernandezcalvo Ivan Fernandez Calvo added a comment - Did you try to disable the cleanWs at the end of the job? [WS-CLEANUP] Deleting project workspace... [WS-CLEANUP] Deferred wipeout is used... FATAL: java.io.IOException: Unexpected termination of the channel java.io.EOFException at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2759) at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3234) at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:913) at java.io.ObjectInputStream.<init>(ObjectInputStream.java:375) at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49) at hudson.remoting.Command.readFrom(Command.java:142) at hudson.remoting.Command.readFrom(Command.java:128) at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63) Caused: java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77) Also: hudson.remoting.Channel$CallSiteStackTrace: Remote call to nwb-sol11-test1 at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1800) at hudson.remoting.Request.call(Request.java:198) at hudson.remoting.Channel.call(Channel.java:1000) at hudson.FilePath.act(FilePath.java:1070) at hudson.FilePath.act(FilePath.java:1059) at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:927) at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:857) at hudson.scm.SCM.checkout(SCM.java:505) at hudson.model.AbstractProject.checkout(AbstractProject.java:1206) at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:574) at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:499) at hudson.model.Run.execute(Run.java:1894) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) at hudson.model.ResourceController.execute(ResourceController.java:97) at hudson.model.Executor.run(Executor.java:428) Caused: hudson.remoting.RequestAbortedException at hudson.remoting.Request.abort(Request.java:344) at hudson.remoting.Channel.terminate(Channel.java:1085) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:94) [Checks API] No suitable checks publisher found. ERROR: Step �Delete workspace when build is done� failed: no workspace for VAPP_INTERNAL_ORBIX3_TEST #1 Finished: FAILURE
          Hide
          jdavey John Davey added a comment -

          Directory /x1/jenkins/agent_directory is a local directory on the Solaris intel VM.

          We have a similar setup using a very old version of Jenkins (1.609.1) where we do not see this issue.  We are trying to migrate to Jenkins 2.249.2, and we are now seeing this issue.

          A test where the build does not delete the workspace at the end of the job still results in the same JVM crash.

          Also, removed the "Delete workspace before build starts" from the job.  A test build still results in the same JVM crash.

          Show
          jdavey John Davey added a comment - Directory /x1/jenkins/agent_directory is a local directory on the Solaris intel VM. We have a similar setup using a very old version of Jenkins (1.609.1) where we do not see this issue.  We are trying to migrate to Jenkins 2.249.2, and we are now seeing this issue. A test where the build does not delete the workspace at the end of the job still results in the same JVM crash. Also, removed the "Delete workspace before build starts" from the job.  A test build still results in the same JVM crash.
          Hide
          ifernandezcalvo Ivan Fernandez Calvo added a comment -

          Do you have those agents connected to the old system too? if so use a different user because this will cause issues, the remote process is not designed to share the same home and files with several processes.
          This is a big update, more than 5 years of Jenkins core and plugins, I'll suggest you open a thread on the google users group and ask for help there (https://groups.google.com/g/jenkinsci-users).

          • I will recommend you to start simple, create a new job with a basic shell command, and check if with this job the agents crash.
          • Check that you have updated all plugins
          • On 1.609 exist one feature name plugins pinned, check that you do not have *.pinned files in JENKINS_HOME/plugins
          Show
          ifernandezcalvo Ivan Fernandez Calvo added a comment - Do you have those agents connected to the old system too? if so use a different user because this will cause issues, the remote process is not designed to share the same home and files with several processes. This is a big update, more than 5 years of Jenkins core and plugins, I'll suggest you open a thread on the google users group and ask for help there ( https://groups.google.com/g/jenkinsci-users ). I will recommend you to start simple, create a new job with a basic shell command, and check if with this job the agents crash. Check that you have updated all plugins On 1.609 exist one feature name plugins pinned, check that you do not have *.pinned files in JENKINS_HOME/plugins
          Hide
          jdavey John Davey added a comment -

          The Jenkins system and agents are separate from the new Jenkins system and agents.  They do not share the /x1/jenkins/agent_directory.

          I will try your suggestions and see how it goes.

          Question:  Is there any way to use remoting without a jarCache?

          Show
          jdavey John Davey added a comment - The Jenkins system and agents are separate from the new Jenkins system and agents.  They do not share the /x1/jenkins/agent_directory. I will try your suggestions and see how it goes. Question:  Is there any way to use remoting without a jarCache?
          Hide
          ifernandezcalvo Ivan Fernandez Calvo added a comment -

          I do not think so, you can change the dir but that's it https://github.com/jenkinsci/remoting/blob/master/docs/workDir.md

          Show
          ifernandezcalvo Ivan Fernandez Calvo added a comment - I do not think so, you can change the dir but that's it https://github.com/jenkinsci/remoting/blob/master/docs/workDir.md

            People

            Assignee:
            Unassigned Unassigned
            Reporter:
            jdavey John Davey
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated: