-
Bug
-
Resolution: Unresolved
-
Major
-
Master:
Jenkins Version : 2.32.2
Running on Windows Server 2012 R2
Pipeline: Nodes and Processes 2.10 (works fine in 2.8)
Client:
Swarm Client 3.3 on AIX 7.1 / JDK 8
-
Powered by SuggestiMate
Im trying to run a pipeline job in an agent which is using swarm client. The job runs fine but im getting a lot of error messages in the log like below:
Cannot contact tst_db2: java.io.IOException: Remote call on Channel to /XX.XX.XX.XXX failed
(actual IP address replaced with XX)
In my observation the master is throwing this errors while waiting for the script that is running in the client. Again, the pipeline job run perfectly except that im getting this error on the pipeline logs.
Below is my pipeline script:
pipeline { agent none stages { stage('Recreate DB') { agent { label 'tst_db2'} steps { checkout([$class: 'SubversionSCM', additionalCredentials: [], excludedCommitMessages: '', excludedRegions: '', excludedRevprop: '', excludedUsers: '', filterChangelog: false, ignoreDirPropChanges: false, includedRegions: '', locations: [[credentialsId: 'a84f7197-929a-437e-9aac-ca09fcd4c63a', depthOption: 'infinity', ignoreExternalsOption: true, local: '', remote: 'svn://XXXXX/XXX/tags/CR/Rebuild_VCRDWD01']], workspaceUpdater: [$class: 'CheckoutUpdater']]) sh 'Rebuild_VCRDWD01/recreate_db.sh' } } } }
Is there anyway we can get rid of this errors?
- is duplicated by
-
JENKINS-43543 Receiving multiple "Cannot contact <slave>: java.io.IOException: Remote call on <slave> failed" in build output
-
- Resolved
-
[JENKINS-42428] Jenkins master throwing java.io.IOException when running pipeline in swarm client
We started seeing the same issue a week or so ago. It doesn't cause the builds to fail but it certainly makes the console message noisy. Also add that it only happens on our linux slave, windows does not throw the same error.
This got me checking the plugins and i noticed this change was made to the "ssh slaves plugin" : https://github.com/jenkinsci/ssh-slaves-plugin/pull/41
We tried reverting to the previous version of the plugin but it didn't resolve the issue.
This is probably the same as https://issues.jenkins-ci.org/browse/JENKINS-42405
This is actually quite ugly when trying to sort through the logs. Is there a way to filter and ignore these messages?
I'm seeing the same thing after recent plugin upgrades. Even seems to happen in the very simplest test case where you have a Linux slave on the same system as a master (no swarm involved).
Downgrade of plugin "Pipeline: Nodes and Processes" from v.2.9 to v.2.8 has fixed this problem (at least in my environment)
Thanks ivanpro, I was able to get rid of the problem as well by backleveling my 2.10 version of Pipeline: Nodes and Processes to 2.8. Unfortunately there are a number of other plugins I have installed dependent on 2.10 that I'm not keen on backleveling so guess I'm going to have to live with this as an annoyance until there's an official fix. Hopefully that will be soon.
A recent version of workflow-durable-task-step began reporting connectivity errors that had previously been suppressed unless you happened to have a FINE logger on DurableTaskStep. The error is somewhere in the Remoting layer, generally specific to the agent connection method.
Even seems to happen in the very simplest test case where you have a Linux slave on the same system as a master (no swarm involved).
Then you are probably seeing some unrelated issue. Use the logger to diagnose more precisely.
jglick So I am getting the same error message repeatedly in the same circumstances of activity on an agent (activity to every agent actually). And this same error message is fixed by doing the same revert of the same plugin.
That certainly doesn't seem like an unrelated issue. How do I use the logger to diagnose more precisely?
Cannot contact worker_node1: java.io.IOException: Remote call on worker_node1 failed
Cannot contact worker_node1: java.io.IOException: Remote call on worker_node1 failed
Cannot contact worker_node1: java.io.IOException: Remote call on worker_node1 failed
Cannot contact worker_node1: java.io.IOException: Remote call on worker_node1 failed
Cannot contact worker_node1: java.io.IOException: Remote call on worker_node1 failed
Cannot contact worker_node1: java.io.IOException: Remote call on worker_node1 failed
Cannot contact worker_node1: java.io.IOException: Remote call on worker_node1 failed
Cannot contact worker_node1: java.io.IOException: Remote call on worker_node1 failed
Cannot contact worker_node1: java.io.IOException: Remote call on worker_node1 failed
Running shell script
[worker2] + /usr/share/gradle/bin/gradle -D test.single=TestExample3 test
[worker2] Cannot contact worker_node2: java.io.IOException: Remote call on worker_node2 failed
[worker2] Cannot contact worker_node2: java.io.IOException: Remote call on worker_node2 failed
[worker2] Cannot contact worker_node2: java.io.IOException: Remote call on worker_node2 failed
[worker2] Cannot contact worker_node2: java.io.IOException: Remote call on worker_node2 failed
[worker2] Cannot contact worker_node2: java.io.IOException: Remote call on worker_node2 failed
[worker2] Cannot contact worker_node2: java.io.IOException: Remote call on worker_node2 failed
[worker2] Cannot contact worker_node2: java.io.IOException: Remote call on worker_node2 failed
[worker2] Cannot contact worker_node2: java.io.IOException: Remote call on worker_node2 failed
I am having a similar problem but seemingly with additional ramifications. My jenkins master is running on linux with Jenkins version 2.32.3.
I have a pipeline job based upon the parallel multiple nodes example found at:
https://jenkins.io/doc/pipeline/examples/
// Parallel JNI Build if ("${run_jni}" == "true") { stage ('Run jni builds on each platform') { def labels = ['winky', 'harry', 'hagrid', 'lnxec333'] //def labels = ['winky', 'hannah', 'moss', 'lnxec651'] def ws_list = ['rm_lnx_86dv', 'rm_win_86dv', 'rm_aix_86dv', 'rm_zlnx_86dv'] Integer i=0 def builders = [:] for ( x in labels ) { def label = x def ws = ws_list[i] builders[label] = { node(label) { stage ('Checkout the code on ' + label) { if (isUnix()) { checkout([$class: 'RTCScm', avoidUsingToolkit: false, buildTool: '4.0.2 Toolkit', buildType: [buildWorkspace: ws, clearLoadDirectory: true, loadDirectory: SB_ROOT_unix, value: 'buildWorkspace'], credentialsId: 'efe12e69-483c-4ed0-becd-a63180224c00', overrideGlobal: false, serverURI: 'https://xxxx', timeout: 480]) } else { env.SB_ROOT_win = "${SB_ROOT_win}" bat '''rd /s/q %SB_ROOT_win% exit 0''' checkout([$class: 'RTCScm', avoidUsingToolkit: false, buildTool: '4.0.2 Toolkit', buildType: [buildWorkspace: ws, clearLoadDirectory: false, loadDirectory: SB_ROOT_win, value: 'buildWorkspace'], credentialsId: 'efe12e69-483c-4ed0-becd-a63180224c00', overrideGlobal: true, serverURI: 'https://xxxx', timeout: 480]) } } stage ('Run the build on ' + label) { switch(label) { case 'hagrid': load "${SB_ROOT_unix}/env_aix.properties" withCredentials([usernamePassword(credentialsId: 'efe12e69-483c-4ed0-becd-a63180224c00', passwordVariable: 'rtcPassword', usernameVariable: 'rtcUser')]) { withEnv(["JAVA_HOME=${env.JAVA_AIX_HOME}","TSM_HOME=${env.TSM_HOME_aix}","ANT_HOME=${env.ANT_HOME_aix}","COMPILER_HOME=${env.COMPILER_HOME_aix}","PATH=${env.JAVA_AIX_HOME}/bin:${env.ANT_HOME_aix}/bin:${env.TSM_HOME_aix}/api/bin:${env.COMPILER_HOME_aix}/bin:${env.PATH}"]) { sh '''cd ${SB_ROOT_unix} './gradlew' -PJAVA_HOME=${JAVA_HOME} -PnexusUsername=${nexusUsername} -PnexusUrl=${nexusUrl} -PnexusPassword=$rtcPassword -PPATH=$PATH showEnv rmjni:clean rmjni:build rmjni:upload rmjni:clean''' } } break; case ['winky', 'lnxec651'] : load "${SB_ROOT_unix}/env_lnx.properties" withCredentials([usernamePassword(credentialsId: 'efe12e69-483c-4ed0-becd-a63180224c00', passwordVariable: 'rtcPassword', usernameVariable: 'rtcUser')]) { withEnv(["JAVA_HOME=${env.JAVA_LNX_HOME}","TSM_HOME=${env.TSM_HOME_lnx}","ANT_HOME=${env.ANT_HOME_lnx}","COMPILER_HOME=${env.COMPILER_HOME_lnx}","PATH=${env.JAVA_LNX_HOME}/bin:${env.ANT_HOME_lnx}/bin:${env.TSM_HOME_lnx}/api/bin:${env.COMPILER_HOME_lnx}/bin:${env.PATH}"]) { sh '''cd ${SB_ROOT_unix} './gradlew' -PJAVA_HOME=${JAVA_HOME} -PnexusUsername=${nexusUsername} -PnexusUrl=${nexusUrl} -PnexusPassword=$rtcPassword rmjni:clean rmjni:build rmjni:upload rmjni:clean''' } } break; case 'lnxec333' : load "${SB_ROOT_unix}/env_zlnx.properties" withCredentials([usernamePassword(credentialsId: 'efe12e69-483c-4ed0-becd-a63180224c00', passwordVariable: 'rtcPassword', usernameVariable: 'rtcUser')]) { withEnv(["JAVA_HOME=${env.JAVA_LNX_HOME}","TSM_HOME=${env.TSM_HOME_zlnx}","ANT_HOME=${env.ANT_HOME_zlnx}","COMPILER_HOME=${env.COMPILER_HOME_zlnx}","PATH=${env.JAVA_LNX_HOME}/bin:${env.ANT_HOME_zlnx}/bin:${env.TSM_HOME_zlnx}/api/bin:${env.COMPILER_HOME_zlnx}/bin:${env.PATH}"]) { sh '''export BUILD_NUMBER=$BUILD_NUMBER cd ${SB_ROOT_unix} './gradlew' -PJAVA_HOME=${JAVA_HOME} -PnexusUsername=${nexusUsername} -PnexusUrl=${nexusUrl} -PnexusPassword=$rtcPassword rmjni:clean rmjni:build rmjni:upload rmjni:clean''' } } break; case ['hannah', 'harry'] : env.SB_ROOT_win = "${SB_ROOT_win}" env.nexusUsername = "${nexusUsername}" env.nexusUrl = "${nexusUrl}" load "${SB_ROOT_win}/env_win.properties" withCredentials([usernamePassword(credentialsId: 'efe12e69-483c-4ed0-becd-a63180224c00', passwordVariable: 'rtcPassword', usernameVariable: 'rtcUser')]) { withEnv(["JAVA_HOME=${env.JAVA_WIN_HOME}","TSM_HOME=${env.TSM_HOME_win}","ANT_HOME=${env.ANT_HOME_win}","COMPILER_HOME=${env.COMPILER_HOME_win}","PATH=${env.JAVA_WIN_HOME}\\bin;${env.ANT_HOME_win}\\bin;${env.TSM_HOME_win};${env.COMPILER_HOME_win}\\bin;C:\\UNXTOOLS\\usr\\local\\wbin;c:\\WinZip;c:\\grep;C:\\WINDOWS\\SYSTEM32;C:\\WINDOWS;c:\\WINDOWS\\SYSTEM32\\WBEM;c:\\msvs2008\\Common7\\Tools;c:\\msvs2008\\Common7\\Tools\\Bin;c:\\msvs2008\\vc\\bin;c:\\msvs2008\\common7\\ide;c:\\msvs2008\\common7\\tools;C:\\PROGRA~1\\MIA713~1\\Windows\\v6.0A\\bin"]) { bat '''cd %SB_ROOT_win% gradlew.bat -PnexusUsername=%nexusUsername% -PnexusUrl=%nexusUrl% -PnexusPassword=%rtcPassword% -PCOMPILER_HOME=%COMPILER_HOME% -PPATH=%PATH% rmjni:clean rmjni:build rmjni:upload rmjni:clean''' } } break; default: echo "do nothing" break; } } } } i++ } parallel builders } }
The nodes are all different platforms ( windows, linux, aix and s390 linux ) and things were working fine when I was using a set of nodes that were set up for building a prior release of our product. So far, I have set up new machines/nodes for windows, aix and s390 linux.
When I use those new nodes, aix and s390 linux have the problem described in this issue which, in itself is not that bad, but these same nodes are also somehow forgetting/losing the Build Number during the course of their build. This causes an issue because I use the Build Number as part of the artifact name that gets uploaded to our nexus repository and it fails as it thinks I am trying to update a previous artifact...
If I reboot these systems, then the build will pass ( it won't 'lose' the Build Number ) but once I run a subsequent one, it breaks again with the same issue.
I am currently using Pipeline: Nodes and Process 2.10; I can try downgrading to 2.8... It is just strange that this
only is an issue for my new nodes. I do see that the older and new nodes are using the same version of the slave.jar (3.4.1).
I suspect it must be some configuration issue that I am missing.
Interesting, I am getting a very similar error on my pipeline jobs:
Cannot contact SERVER_ABC: java.io.IOException: Remote call on SERVER_ABC failed.
I have 2 different Jenkins Masters failing to 2 different agents.
As noted, the builds execute fine. It is as though some polling is occurring