Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-63517

Very slow archiveArtifacts on agents

    XMLWordPrintable

Details

    • Improvement
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • plugin-proposals
    • None

    Description

      We're using ssh-slaves-plugin for our linux build nodes, and experiencing abysmal agent-master throughput (~13*M*bps), despite their (verified) 10*G*bps link. The full set of our build artifacts are huge (~10GB) right now, and yes we are compressing (with zip step) the ones that aren't already compressed.

      There are several Jenkins core issues tracking the sluggishness of archiving artifacts from agent to master over the last decade, but none of them appear to have ever been resolved properly, so I thought I'd create this as an "Improvement" in this component which seems to be the right place for it. It would seem that since the agent session is an ssh session that we should be able to achieve parity with scp (and in fact, use scp?), when archiving artifacts.

      As suggested here, the problem may be "that Jenkins archives via its control channel (e.g. ssh slave - using java SSH implementation JSCH). The java ssh just can't get anywhere near 1Gb/s network speed that native SSH can manage easily."

      If however, this becomes a "can't fix" or "won't fix", has anyone achieved a reasonable workaround for archiving artifacts on jenkins agents?

      We could replace our usage of `archiveArtifacts` steps with a custom groovy call to use the system scp to the correct "artifacts" location on the master, but this of course feels hacky, and I'd really rather not have to reinvent this wheel. I found [the publish-over-ssh plugin|https://github.com/jenkinsci/publish-over-ssh-plugin], but it doesn't seem to be maintained anymore. Anyone using that, or some other alternative to native archiveArtifacts step?

      FWIW, we're using Debian Buster, Jenkins 2.222.3, and this java version on all master/agents:

      jenkins@jenkins-testing-agent-2:~$ java --version
      openjdk 11.0.8 2020-07-14
      OpenJDK Runtime Environment (build 11.0.8+10-post-Debian-1deb10u1)
      OpenJDK 64-Bit Server VM (build 11.0.8+10-post-Debian-1deb10u1, mixed mode, sharing)

      Thanks for your time.

      Attachments

        Activity

          Hi, 

           

          we're using jenkins 2.235.1 on windows 10, with jre 1.8.0.144 and archiving artifacts in local network is slow, it took 25 minutes to archive a 6.5 GB zip file. 

          erusso Emanuele Russo added a comment - Hi,    we're using jenkins 2.235.1 on windows 10, with jre 1.8.0.144 and archiving artifacts in local network is slow, it took 25 minutes to archive a 6.5 GB zip file. 
          ifernandezcalvo Ivan Fernandez Calvo added a comment - - edited

          First of all, the ssh build agent plugin manages the connection from the Jenkins instance to the Agent by making an SSH connection, coping the agent.jar file and start a channel, that's all. The request in this Jira is a new plugin to copy artifacts by SSH from the agents to the Jenkins instance, it is not related at all with the ssh build agents plugin thus I'll change the component to the plugin-proposals to be triaged as a new plugin request.

          About artifacts, the default implementation does not perform well with big files as you said, the download also does not perform well, also both processes have a direct impact on the Jenkins instance performance, I think is because is designed for small things, for all the rest you should use other solution designed for big files and designed to support fast downloads. My recommendation is to use Artifact Manager on S3 it uses the regular archive steps but archive the files on an S3 bucket is fast for uploads and really fast for downloads. You have also a plugin to store in GCP, Azure, or on artifactory repositories like Sonatype Nexus and JFrog Artifactory

          ifernandezcalvo Ivan Fernandez Calvo added a comment - - edited First of all, the ssh build agent plugin manages the connection from the Jenkins instance to the Agent by making an SSH connection, coping the agent.jar file and start a channel, that's all. The request in this Jira is a new plugin to copy artifacts by SSH from the agents to the Jenkins instance, it is not related at all with the ssh build agents plugin thus I'll change the component to the plugin-proposals to be triaged as a new plugin request. About artifacts, the default implementation does not perform well with big files as you said, the download also does not perform well, also both processes have a direct impact on the Jenkins instance performance, I think is because is designed for small things, for all the rest you should use other solution designed for big files and designed to support fast downloads. My recommendation is to use Artifact Manager on S3 it uses the regular archive steps but archive the files on an S3 bucket is fast for uploads and really fast for downloads. You have also a plugin to store in GCP , Azure , or on artifactory repositories like Sonatype Nexus and JFrog Artifactory

          Thanks a lot, that's very useful. I will stop archiving artifacts that way and use other tools, maybe using a simple SCP. 

          Since I use to keep only a few builds with their artifact, I need a logic to delete them when the build is discarded. What would you suggest as most elegant solution to "override" the function that cleans up the old artifacts hard-coding the less amount of file names and paths?

          The most trivial is adding a stage in my pipeline that deletes a folder based on some number, that would be the oldest number, but how would I get that one?  

           

          erusso Emanuele Russo added a comment - Thanks a lot, that's very useful. I will stop archiving artifacts that way and use other tools, maybe using a simple SCP.  Since I use to keep only a few builds with their artifact, I need a logic to delete them when the build is discarded. What would you suggest as most elegant solution to "override" the function that cleans up the old artifacts hard-coding the less amount of file names and paths? The most trivial is adding a stage in my pipeline that deletes a folder based on some number, that would be the oldest number, but how would I get that one?    
          ifernandezcalvo Ivan Fernandez Calvo added a comment - - edited

          probably this is a question to make on the Jenkins users google group

          ifernandezcalvo Ivan Fernandez Calvo added a comment - - edited probably this is a question to make on the Jenkins users google group
          timblaktu Tim Black added a comment -

          Thanks for the clarifying information, ifernandezcalvo. I understand you're saying that the artifact copying is completely independent and separate process from the agent's remoting channel/process. 

          For some time, because of their huge space and time overhead, we have been in the process of moving away from using Jenkins artifacts, with the exception of cases where we have jobs that are "chained" together and the downstream job copies artifacts from the upstream job (using copy artifacts plugin). The information and experienced opinion you provide affirms our decision to go this direction.

          One of my upcoming projects is to stand up an internal [pulp instance|https://pulpproject.org/|https://pulpproject.org/] to provide artifact management service to our build, test, and release processes. Pulp is similar to artifactory, but is FOSS and uses some significantly different design patterns which I am drawn to. I'm mentioning this in case others who end up here for the same reasons can share their experiences in the journey from Jenkins artifacts to a proper artifact management system. 

          timblaktu Tim Black added a comment - Thanks for the clarifying information, ifernandezcalvo . I understand you're saying that the artifact copying is completely independent and separate process from the agent's remoting channel/process.  For some time, because of their huge space and time overhead, we have been in the process of moving away from using Jenkins artifacts, with the exception of cases where we have jobs that are "chained" together and the downstream job copies artifacts from the upstream job (using copy artifacts plugin). The information and experienced opinion you provide affirms our decision to go this direction. One of my upcoming projects is to stand up an internal [pulp instance| https://pulpproject.org/ |https://pulpproject.org/] to provide artifact management service to our build, test, and release processes. Pulp is similar to artifactory, but is FOSS and uses some significantly different design patterns which I am drawn to. I'm mentioning this in case others who end up here for the same reasons can share their experiences in the journey from Jenkins artifacts to a proper artifact management system. 

          People

            Unassigned Unassigned
            timblaktu Tim Black
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: