Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-75187

Unable to Connect to Agents after upgrade to 1822.v87175d209b_b_5

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • ec2-plugin
    • None

      Unable to Connect to Agents after upgrade to 1822.v87175d209b_b_5 
      plugin logs are given below,

      java.io.EOFException: unexpected stream termination at hudson.remoting.ChannelBuilder.negotiate(ChannelBuilder.java:478) at hudson.remoting.ChannelBuilder.build(ChannelBuilder.java:422) at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:440) at PluginClassLoader for ec2//hudson.plugins.ec2.ssh.EC2UnixLauncher.launchRemotingAgent(EC2UnixLauncher.java:453) at PluginClassLoader for ec2//hudson.plugins.ec2.ssh.EC2UnixLauncher.launchScript(EC2UnixLauncher.java:402) at PluginClassLoader for ec2//hudson.plugins.ec2.EC2ComputerLauncher.launch(EC2ComputerLauncher.java:55) at hudson.slaves.SlaveComputer.lambda$_connect$0(SlaveComputer.java:297) at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46) at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:80) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base/java.lang.Thread.run(Thread.java:1583)

          [JENKINS-75187] Unable to Connect to Agents after upgrade to 1822.v87175d209b_b_5

          Update.

          I found out that there is also a problem with the host key verification. I'm working on it.

          Jean-Marc Desprez added a comment - Update. I found out that there is also a problem with the host key verification. I'm working on it.

          Update on this topic:

          Damien Duportal added a comment - Update on this topic: The version `1822.v87175d209b_b_5` has been excluded from the Update Center to ensure it is not proposed to users widely and limit the impact: https://github.com/jenkins-infra/update-center2/pull/837 If you are impacted, you should roll back to the previous version 1801.v526399543dca The Jenkins Infra also had the problem in the "test" controller we have in AWS. We worked with jmdesprez to ensure that his current fixes (not published yet) are working: https://github.com/jenkins-infra/helpdesk/issues/4316#issuecomment-2621225351 It means we have to wait for all fixes to be validated, tested and published into a new version

          Simon added a comment - - edited

          Installed the released fix (1823.v828850f7f155) but Windows agents (Win server 2022) still won't connect. 

          Simon added a comment - - edited Installed the released fix ( 1823.v828850f7f155 ) but Windows agents (Win server 2022) still won't connect. 

          Installed the new released version on aws.ci.jenkins.io and it works for us for Linux, Windows 2019 and Windows 2022:

          Important notes:

          • Our Windows agents are using "Unix Launcher" method (as we have Windows's OpenSSH installed on our custom AMIs)
          • We have Cloud Init set up (to start datadog and set up things as admin during VM startup) but we have NO init script set up for Windows

          Have not tried with WinRM though

          Damien Duportal added a comment - Installed the new released version on aws.ci.jenkins.io and it works for us for Linux, Windows 2019 and Windows 2022: https://aws.ci.jenkins.io/job/ath/job/master/8/ (Linux) https://aws.ci.jenkins.io/job/docker/job/master/16/ (Linux and Windows 2019) https://aws.ci.jenkins.io/job/docker/job/master/17/ (Windows 2019 and Windows 2022) Important notes: Our Windows agents are using "Unix Launcher" method (as we have Windows's OpenSSH installed on our custom AMIs) We have Cloud Init set up (to start datadog and set up things as admin during VM startup) but we have NO init script set up for Windows Have not tried with WinRM though

          Simon added a comment -

          Our agents are also connected with "Unix Launcher" method and have Windows OpenSSH installed. However, we have an init script and that fails. If I remove that it connects - haven't tested if a job can actually run on it though.
          This is the init script:

          powershell C:\\Users\\User\\computer_rename.ps1
          powershell -Command "Start-Service -Name 'Alloy'"          

          Simon added a comment - Our agents are also connected with "Unix Launcher" method and have Windows OpenSSH installed. However, we have an init script and that fails. If I remove that it connects - haven't tested if a job can actually run on it though. This is the init script: powershell C:\\Users\\User\\computer_rename.ps1 powershell -Command "Start-Service -Name 'Alloy' "        

          simondivi can you share which version of the plugin with which the described behavior (Windows 2022 + Unix launcher + init script) worked?
          We (Jenkins infra) never succeeded in having the init script to work with Unix launcher because the plugin always assumed it has a boure shell available.

          Do you happen to have `bash.exe` in your Windows 2022 `PATH` (through GitBash, or Cygwin)?

          Damien Duportal added a comment - simondivi can you share which version of the plugin with which the described behavior (Windows 2022 + Unix launcher + init script) worked? We (Jenkins infra) never succeeded in having the init script to work with Unix launcher because the plugin always assumed it has a boure shell available. Do you happen to have `bash.exe` in your Windows 2022 `PATH` (through GitBash, or Cygwin)?

          Simon added a comment - - edited

          dduportal the working plugin version is: 1801.v526399543dca_ 

          You are right we added C:\Program Files\Git\usr\bin to the PATH and it is GitBash and it has bash in there. 

          The full PATH looks like this. 

           

          C:\Windows\system32;
          C:\Windows;
          C:\Windows\System32\Wbem;
          C:\Windows\System32\WindowsPowerShell\v1.0\;
          C:\Windows\System32\OpenSSH\;
          C:\Program Files\Amazon\cfn-bootstrap\;
          C:\ProgramData\chocolatey\bin;
          C:\Program Files\Git\cmd;\bin;\bin;
          C:\Program Files\Git\usr\bin;
          C:\Installed\Java\11\bin;
          C:\Program Files\Microsoft VS Code\bin;
          C:\installed\unity\2022.3.42f1\Editor;
          C:\Program Files\dotnet\;
          C:\Users\Administrator\AppData\Local\Microsoft\WindowsApps;
          C:\Users\Administrator\.dotnet\tools 

           

          Simon added a comment - - edited dduportal the working plugin version is: 1801.v526399543dca_  You are right we added C:\Program Files\Git\usr\bin to the PATH and it is GitBash and it has bash in there.  The full PATH looks like this.    C:\Windows\system32; C:\Windows; C:\Windows\System32\Wbem; C:\Windows\System32\WindowsPowerShell\v1.0\; C:\Windows\System32\OpenSSH\; C:\Program Files\Amazon\cfn-bootstrap\; C:\ProgramData\chocolatey\bin; C:\Program Files\Git\cmd;\bin;\bin; C:\Program Files\Git\usr\bin; C:\Installed\Java\11\bin; C:\Program Files\Microsoft VS Code\bin; C:\installed\unity\2022.3.42f1\Editor; C:\Program Files\dotnet\; C:\Users\Administrator\AppData\Local\Microsoft\WindowsApps; C:\Users\Administrator\.dotnet\tools  

          Cannot connect to existing Agents with this fix applied.

          jmdesprez We upgraded from 1801.v526399543dca_ to 1823.v828850f7f155, and when it attempted to start an existing Agent EC2 instance, the connection log showed:

          WARNING: The SSH key (xx:xx:xx:xx:a9:6b:5c:7b:a9:xx:xx:xx:xx:xx:xx:xx) presented by the instance has changed since first saved (yy:yy:yy:yy:yy:d5:84:46:ee:21:ce:6b:0a:yy:yy:yy). The connection to EC2 (CI-PROD_v2) - Standard(x86_64) (i-1234567890) is closed to prevent a possible man-in-the-middle attack

          It works if I delete my Agents and allow some new ones to be created.
          (My "Host Key Verification Strategy" is accept-new).

          Is that expected?

          Matthew Webber added a comment - Cannot connect to existing Agents with this fix applied. jmdesprez We upgraded from 1801.v526399543dca_ to 1823.v828850f7f155, and when it attempted to start an existing Agent EC2 instance, the connection log showed: WARNING: The SSH key (xx:xx:xx:xx:a9:6b:5c:7b:a9:xx:xx:xx:xx:xx:xx:xx) presented by the instance has changed since first saved (yy:yy:yy:yy:yy:d5:84:46:ee:21:ce:6b:0a:yy:yy:yy). The connection to EC2 (CI-PROD_v2) - Standard(x86_64) (i-1234567890) is closed to prevent a possible man-in-the-middle attack It works if I delete my Agents and allow some new ones to be created. (My "Host Key Verification Strategy" is accept-new). Is that expected?

          Jean-Marc Desprez added a comment - - edited

          mwebber yes, that is possible. In the instance console, you should see that several keys can be used, something like:

          -BEGIN SSH HOST KEY KEYS-
          ecdsa-sha2-nistp256 AAAAE.... xxx@yyyy
          ssh-ed25519 AAAAC... xxx@yyy
          -END SSH HOST KEY KEYS-

          trilead was using ssh-ed25519 but mina uses ecdsa-sha2-nistp256

          Jean-Marc Desprez added a comment - - edited mwebber yes, that is possible. In the instance console, you should see that several keys can be used, something like: - BEGIN SSH HOST KEY KEYS - ecdsa-sha2-nistp256 AAAAE.... xxx@yyyy ssh-ed25519 AAAAC... xxx@yyy - END SSH HOST KEY KEYS - trilead was using ssh-ed25519 but mina uses ecdsa-sha2-nistp256

          Jean-Marc Desprez added a comment -

          simondivi I created https://issues.jenkins.io/browse/JENKINS-75230 to keep track of this issue.

          Jean-Marc Desprez added a comment - simondivi I created https://issues.jenkins.io/browse/JENKINS-75230 to keep track of this issue.

            jmdesprez Jean-Marc Desprez
            rammanokar Rammanokar
            Votes:
            8 Vote for this issue
            Watchers:
            17 Start watching this issue

              Created:
              Updated:
              Resolved: