Uploaded image for project: 'Infrastructure'
  1. Infrastructure
  2. INFRA-2972

Update trusted.ci.jenkins.io Jenkins Controlelr's Configuration to allow the release 2.277.4

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Blocker
    • Resolution: Fixed
    • Component/s: trusted.ci
    • Labels:
      None
    • Similar Issues:

      Description

      While performing the release 2.277.4 we are blocked by build errors on the jenkins controller trusted.ci.jenkins.io (reported by Mark Waite):

      • The job in charge of deploying the production website www.jenkins.io is failing due to "Unauthorized errors on `/var/run/docker.sock`
      • The job in charge of publishing the Docker images for Jenkins is failing on both branches:

        Attachments

          Activity

          Hide
          dduportal Damien Duportal added a comment -

          Work done with Mark Waite and Gareth Evans:

          • The Jenkins logs where showing that it was unable to schedule a new VM due to the Image template specified in the Node configuration not existing. There was only a single VM used non stop since days/weeks/months, where the user "jenkins" was not allowed to call any docker command, resulting in pipeline failures.
          • Fixed:
            • We defined the image ID to the latest one built yesterday from https://github.com/jenkins-infra/packer-images, in the correct combination of region ("East US") and resource group associated to trusted.ci.jenkins.io (and NO other instances)
            • We also updated the init script configured for Ubuntu: as the images are not using a stock Ubuntu 18 image anymore, but a prebuilt image, no need to install stuff when the agent is allocated. Possible impact if we were using tools not installed on packer images
            • Successful result: the build of the principal branch for www.jenkins.io worked again and deployed the website as expected
          Show
          dduportal Damien Duportal added a comment - Work done with Mark Waite and Gareth Evans : The Jenkins logs where showing that it was unable to schedule a new VM due to the Image template specified in the Node configuration not existing. There was only a single VM used non stop since days/weeks/months, where the user "jenkins" was not allowed to call any docker command, resulting in pipeline failures. Fixed: We defined the image ID to the latest one built yesterday from https://github.com/jenkins-infra/packer-images , in the correct combination of region ("East US") and resource group associated to trusted.ci.jenkins.io (and NO other instances) We also updated the init script configured for Ubuntu: as the images are not using a stock Ubuntu 18 image anymore, but a prebuilt image, no need to install stuff when the agent is allocated. Possible impact if we were using tools not installed on packer images Successful result: the build of the principal branch for www.jenkins.io worked again and deployed the website as expected
          Hide
          dduportal Damien Duportal added a comment -

          Work done with Mark Waite and Tim Jacomb:

          • Fixed the Windows image with the same method as the Ubuntu image.
            • It took us some time as I failed to get the correct image ID initially: we worked with a 2020 image flavor which failed to initialize, but ended with the correct one
            • Success: the "windows" Docker images were published successfully
          • Fixed the Linux Docker Image publication
            • We replayed the pipeline and enabled the bash verbose output to diagnose the error, by prepending the
              sh 'make publish'

              step by a a

              sh 'sed -i "s#./.ci/publish.sh #bash -x ./.ci/publish.sh #g" Makefile'

              , to avoid switching verbose in the repository itself

            • The debug diagnose showed us that the `TOKEN` variable expetced to hold a beared token for requesting the DockerHub API was empty. We dug up and ended in the PR https://github.com/jenkinsci/docker/pull/1106 where `jq` is installed and used to retrieve this token properly in a more solid way.
            • Success: the build for linux successfully published all the linux images :party:
          Show
          dduportal Damien Duportal added a comment - Work done with Mark Waite and Tim Jacomb : Fixed the Windows image with the same method as the Ubuntu image. It took us some time as I failed to get the correct image ID initially: we worked with a 2020 image flavor which failed to initialize, but ended with the correct one Success: the "windows" Docker images were published successfully Fixed the Linux Docker Image publication We replayed the pipeline and enabled the bash verbose output to diagnose the error, by prepending the sh 'make publish' step by a a sh 'sed -i "s#./.ci/publish.sh #bash -x ./.ci/publish.sh #g" Makefile' , to avoid switching verbose in the repository itself The debug diagnose showed us that the `TOKEN` variable expetced to hold a beared token for requesting the DockerHub API was empty. We dug up and ended in the PR https://github.com/jenkinsci/docker/pull/1106 where `jq` is installed and used to retrieve this token properly in a more solid way. Success: the build for linux successfully published all the linux images :party:

            People

            Assignee:
            dduportal Damien Duportal
            Reporter:
            dduportal Damien Duportal
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: