Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-68988

My jenkins job stuck in between and don't show any error

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • ec2-plugin
    • None
    • Dev

      My jenkins job used to work fine , but as a monthly ami refresh, means on aws ec2, on which jenkins is running, we do a ami refresh of that ec2 after 2 months.
      after that, problem started.
      Then we tried to roll back to older ami(though the recent ami was not avaiable) ,but still the issue was same.
      So, it doesn't looks like an ami issue.
      But our main concern is, we are not able to see the errors in jenkins pipeline, it will just stuck at one at  some docker test cases.

      we though, its an memory or cpu issues, we checked that in slave node, using htop and free-m, all looked good.
      Then we thought, it jdk mismacth issue between master and slave, even after matching those jdk version, its the same issue.
      we are using ec2 plugin for managing aws ec2 nodes on the go in jenkins.
      So, our main concern is , we are doing hit and try as we don't have error, So, mainly we want to know the error.
      our jenkins inside ec2 is running on docker, when we do docker logs container id, then this logs visible, but no help, will attach the log file.

      I checked my slave agent logs, it was all fine, I have no idea where to check things.

      Jenkins version: 
      REST API
      Jenkins 2.330

          [JENKINS-68988] My jenkins job stuck in between and don't show any error

          to add my investigation in it:
          Our jenkins run on Docker container, and this Docker container runs on EC2 and EC2 is based on an AMI, and that AMi gets expired after 2 months and jenkins stops working and then , we get a new ami, that we add in ASG, so, our jenkins ec2 gets a new AMI and then DOcker container runs and jenkins comes up.
          But this issue started after this ami refresh
          ****************
          There is hopefully, no change in AMI side, it would be same.
          We have tried taking old ami , which was base AMI for jenkins, still issue was same.
          though, now we are back to ami issue persist.
          we have upgraded java version in slave nodes to 11, but still the issue is same.
          we have upgraded amazon ec2 plugin, but still the issue is same from .66 to .68, as that was available in updates in manage plugins.
          we have tried by macthing slave and master jdk as 11, that also stuck at same place and with both slave and master at jdk 8, that job also stuck

          rahulkamboj21rahulkamboj21 Kamboj added a comment - to add my investigation in it: Our jenkins run on Docker container, and this Docker container runs on EC2 and EC2 is based on an AMI, and that AMi gets expired after 2 months and jenkins stops working and then , we get a new ami, that we add in ASG, so, our jenkins ec2 gets a new AMI and then DOcker container runs and jenkins comes up. But this issue started after this ami refresh **************** There is hopefully, no change in AMI side, it would be same. We have tried taking old ami , which was base AMI for jenkins, still issue was same. though, now we are back to ami issue persist. we have upgraded java version in slave nodes to 11, but still the issue is same. we have upgraded amazon ec2 plugin, but still the issue is same from .66 to .68, as that was available in updates in manage plugins. we have tried by macthing slave and master jdk as 11, that also stuck at same place and with both slave and master at jdk 8, that job also stuck

          all these jobs issue started after this ami refresh and all jenkins jobs start stucking, So, we think that is infra issue(jenkins issue)

          rahulkamboj21rahulkamboj21 Kamboj added a comment - all these jobs issue started after this ami refresh and all jenkins jobs start stucking, So, we think that is infra issue(jenkins issue)

          Looking at your log files, I see this message repeated:

          2022-07-08 15:37:07.939+0000 [id=247] WARNING c.c.j.GitHubRepositoryName$1#applyNullSafe: Failed to obtain repository com.cloudbees.jenkins.GitHubRepositoryName$1@70b7390e org.kohsuke.github.HttpException: {"message":"Resource protected by organization SAML enforcement. You must grant your Personal Access token access to an organization within this business.","documentation_url":"https://docs.github.com/articles/authenticating-to-a-github-organization-with-saml-single-sign-on/"}

          I'm not an expert in all things ec2-plugin, but if I was to guess, this is an authentication issue with GitHub associated with the permissions granted to the personal access token used when attempting to get the Jenkinsfile from the repository of the job.

          The message which appears immediately before:

          2022-07-08 15:36:47.691+0000 [id=144] INFO hudson.plugins.ec2.EC2Cloud#log: Launching remoting agent (via Trilead SSH2 Connection): java -jar /tmp/remoting.jar -workDir /tmp

          is what I expect to see for a node that has started successfully and is attempting to execute a new job.

          You could try starting a node using the drop-down in cloud-config and see if it comes up successfully without a job being allocated. Alternatively, you could attempt to build something from a public repository that would not require authentication or a locally defined pipeline that would perform SCM operations. I think the alternative processes would expose the source of your problem. It might be something surprising like network firewall rules.

          Scott Sutherland added a comment - Looking at your log files, I see this message repeated: 2022-07-08 15:37:07.939+0000 [id=247] WARNING c.c.j.GitHubRepositoryName$1#applyNullSafe: Failed to obtain repository com.cloudbees.jenkins.GitHubRepositoryName$1@70b7390e org.kohsuke.github.HttpException: {"message":"Resource protected by organization SAML enforcement. You must grant your Personal Access token access to an organization within this business.","documentation_url":"https://docs.github.com/articles/authenticating-to-a-github-organization-with-saml-single-sign-on/"} I'm not an expert in all things ec2-plugin, but if I was to guess, this is an authentication issue with GitHub associated with the permissions granted to the personal access token used when attempting to get the Jenkinsfile from the repository of the job. The message which appears immediately before: 2022-07-08 15:36:47.691+0000 [id=144] INFO hudson.plugins.ec2.EC2Cloud#log: Launching remoting agent (via Trilead SSH2 Connection): java -jar /tmp/remoting.jar -workDir /tmp is what I expect to see for a node that has started successfully and is attempting to execute a new job. You could try starting a node using the drop-down in cloud-config and see if it comes up successfully without a job being allocated. Alternatively, you could attempt to build something from a public repository that would not require authentication or a locally defined pipeline that would perform SCM operations. I think the alternative processes would expose the source of your problem. It might be something surprising like network firewall rules.

            thoulen FABRIZIO MANFREDI
            rahulkamboj21 rahulkamboj21rahulkamboj21 Kamboj
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: