Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-36029

multibranch-job deleted when bitbucket communication fails

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      I'm trying out the "Bitbucket team/project" feature: I have an issue where some of my jobs keeps being deleted and recreated randomly, and it seems to be related to situations where communicating with bitbucket fails. I would not expect Jenkins to delete my multibranch-job when e.g. Bitbucket or some network link is down.

      Relevant log:

      Proposing ansible-docker
      Connecting to https://bitbucket.org using hidden-org-name/****** (hidden-org-name bitbucket credentials)
      Looking up hidden-org-name/ansible-docker for branches
      Checking branch jenkins-test from hidden-org-name/ansible-docker
      Met criteria
      ERROR: Failed to create or update a subproject ansible-docker
      com.cloudbees.jenkins.plugins.bitbucket.api.BitbucketRequestException: Communication error: javax.net.ssl.SSLHandshakeException: Remote host closed connection during handshake
      	at com.cloudbees.jenkins.plugins.bitbucket.client.BitbucketCloudApiClient.getRequest(BitbucketCloudApiClient.java:421)
      	at com.cloudbees.jenkins.plugins.bitbucket.client.BitbucketCloudApiClient.getRepository(BitbucketCloudApiClient.java:173)
      	at com.cloudbees.jenkins.plugins.bitbucket.BitbucketSCMSource.getRepositoryType(BitbucketSCMSource.java:254)
      	at com.cloudbees.jenkins.plugins.bitbucket.BitbucketSCMSource.observe(BitbucketSCMSource.java:372)
      	at com.cloudbees.jenkins.plugins.bitbucket.BitbucketSCMSource.retrieveBranches(BitbucketSCMSource.java:326)
      	at com.cloudbees.jenkins.plugins.bitbucket.BitbucketSCMSource.retrieve(BitbucketSCMSource.java:279)
      	at jenkins.scm.api.SCMSource.fetch(SCMSource.java:146)
      	at jenkins.scm.api.SCMSource.retrieve(SCMSource.java:230)
      	at jenkins.scm.api.SCMSource.fetch(SCMSource.java:175)
      	at jenkins.branch.MultiBranchProjectFactory$BySCMSourceCriteria$1.call(MultiBranchProjectFactory.java:157)
      	at jenkins.branch.MultiBranchProjectFactory$BySCMSourceCriteria$1.call(MultiBranchProjectFactory.java:154)
      	at jenkins.branch.OrganizationFolder.withSCMSourceCriteria(OrganizationFolder.java:255)
      	at jenkins.branch.MultiBranchProjectFactory$BySCMSourceCriteria.recognizes(MultiBranchProjectFactory.java:154)
      	at jenkins.branch.OrganizationFolder$1$1.complete(OrganizationFolder.java:165)
      	at com.cloudbees.jenkins.plugins.bitbucket.BitbucketSCMNavigator.add(BitbucketSCMNavigator.java:198)
      	at com.cloudbees.jenkins.plugins.bitbucket.BitbucketSCMNavigator.visitSources(BitbucketSCMNavigator.java:175)
      	at jenkins.branch.OrganizationFolder.computeChildren(OrganizationFolder.java:125)
      	at com.cloudbees.hudson.plugins.folder.computed.ComputedFolder.updateChildren(ComputedFolder.java:157)
      	at com.cloudbees.hudson.plugins.folder.computed.FolderComputation.run(FolderComputation.java:122)
      	at hudson.model.ResourceController.execute(ResourceController.java:98)
      	at hudson.model.Executor.run(Executor.java:410)
      Caused by: javax.net.ssl.SSLHandshakeException: Remote host closed connection during handshake
      	at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:992)
      	at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375)
      	at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:747)
      	at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123)
      	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
      	at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
      	at org.apache.commons.httpclient.HttpConnection.flushRequestOutputStream(HttpConnection.java:828)
      	at org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2116)
      	at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096)
      	at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
      	at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
      	at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
      	at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
      	at com.cloudbees.jenkins.plugins.bitbucket.client.BitbucketCloudApiClient.getRequest(BitbucketCloudApiClient.java:412)
      	... 20 more
      Caused by: java.io.EOFException: SSL peer shut down incorrectly
      	at sun.security.ssl.InputRecord.read(InputRecord.java:505)
      	at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973)
      	... 33 more
      
        Evaluating orphaned items in hidden-org-name on Bitbucket
        Will remove ansible-docker as it is #1 in the list
        Finished: SUCCESS
      
      

        Attachments

          Issue Links

            Activity

            asgeirf Asgeir Frimannsson created issue -
            Hide
            amuniz Antonio Muñiz added a comment -

            Counterpart of JENKINS-34776

            Show
            amuniz Antonio Muñiz added a comment - Counterpart of JENKINS-34776
            Hide
            rozhok Vladimir Rozhkov added a comment -

            I have a lot of repositories in our team account (kinda 30) and I'm very annoyed that I have to re-run folder computation 10 times to get all my repos configured.
            Will some retry logic work here?

            Show
            rozhok Vladimir Rozhkov added a comment - I have a lot of repositories in our team account (kinda 30) and I'm very annoyed that I have to re-run folder computation 10 times to get all my repos configured. Will some retry logic work here?
            mdkf Michael Fowler made changes -
            Field Original Value New Value
            Priority Major [ 3 ] Critical [ 2 ]
            Hide
            javi_m Javier Martín added a comment - - edited

            We are experiencing this issue as well. As we delegate our User Directory to JIRA, when this instance is not available an Unauthorized Error 401 appears rising the following error:

            com.cloudbees.jenkins.plugins.bitbucket.api.BitbucketRequestException: HTTP request error. Status: 401: No Autorizado.
            {"errors":[

            {"context":null,"message":"Authentication failed. Please check your credentials and try again.","exceptionName":"com.atlassian.bitbucket.auth.AuthenticationSystemException"}

            ]}
            at com.cloudbees.jenkins.plugins.bitbucket.server.client.BitbucketServerAPIClient.getRequest(BitbucketServerAPIClient.java:382)
            at com.cloudbees.jenkins.plugins.bitbucket.server.client.BitbucketServerAPIClient.getBranches(BitbucketServerAPIClient.java:258)
            at com.cloudbees.jenkins.plugins.bitbucket.BitbucketSCMSource.retrieve(BitbucketSCMSource.java:384)
            at jenkins.scm.api.SCMSource.fetch(SCMSource.java:245)
            at org.jenkinsci.plugins.workflow.multibranch.SCMBinder.create(SCMBinder.java:75)
            at org.jenkinsci.plugins.workflow.job.WorkflowRun.run(WorkflowRun.java:206)
            at hudson.model.ResourceController.execute(ResourceController.java:98)
            at hudson.model.Executor.run(Executor.java:410)
            Finished: FAILURE

            Then we need to run the folder computation again and all build numbers are restarted. We use this parameter as a complement to version number so there is a way to retain build numbers when this happens? This is not a case when a Jenkinsfile is not present in a branch (I assume when this happens the multibranch job is deleted), this is a temporary issue with the scan credentials and I think it wouln't be that way.

            Show
            javi_m Javier Martín added a comment - - edited We are experiencing this issue as well. As we delegate our User Directory to JIRA, when this instance is not available an Unauthorized Error 401 appears rising the following error: com.cloudbees.jenkins.plugins.bitbucket.api.BitbucketRequestException: HTTP request error. Status: 401: No Autorizado. {"errors":[ {"context":null,"message":"Authentication failed. Please check your credentials and try again.","exceptionName":"com.atlassian.bitbucket.auth.AuthenticationSystemException"} ]} at com.cloudbees.jenkins.plugins.bitbucket.server.client.BitbucketServerAPIClient.getRequest(BitbucketServerAPIClient.java:382) at com.cloudbees.jenkins.plugins.bitbucket.server.client.BitbucketServerAPIClient.getBranches(BitbucketServerAPIClient.java:258) at com.cloudbees.jenkins.plugins.bitbucket.BitbucketSCMSource.retrieve(BitbucketSCMSource.java:384) at jenkins.scm.api.SCMSource.fetch(SCMSource.java:245) at org.jenkinsci.plugins.workflow.multibranch.SCMBinder.create(SCMBinder.java:75) at org.jenkinsci.plugins.workflow.job.WorkflowRun.run(WorkflowRun.java:206) at hudson.model.ResourceController.execute(ResourceController.java:98) at hudson.model.Executor.run(Executor.java:410) Finished: FAILURE Then we need to run the folder computation again and all build numbers are restarted. We use this parameter as a complement to version number so there is a way to retain build numbers when this happens? This is not a case when a Jenkinsfile is not present in a branch (I assume when this happens the multibranch job is deleted), this is a temporary issue with the scan credentials and I think it wouln't be that way.
            Hide
            rozhok Vladimir Rozhkov added a comment -

            Javier Martin this is not the same issue. The issue is about ssl handshake failure and yours is about auth failure. Make sure you've setup credentials correctly.

            Meanwhile, I'm working on PR to fix the SSL handshake issue.

            Show
            rozhok Vladimir Rozhkov added a comment - Javier Martin this is not the same issue. The issue is about ssl handshake failure and yours is about auth failure. Make sure you've setup credentials correctly. Meanwhile, I'm working on PR to fix the SSL handshake issue.
            Hide
            javi_m Javier Martín added a comment -

            The title of the issue is "multibranch-job deleted when bitbucket communication fails" and I think this is the same issue. The log says unauthorized request, but this is due to a communication problem between bitbucket and jira and therefore bitbucket-branch-source-plugin behaves in the same way as the open issue. The credentials are placed correctly and works fine almost always.

            Maybe I neet to create another issue because this is only because SSL handshake, but seems pretty close to me, the problem is that when bitbucket communication fails jenkins deletes all multibranch-jobs and restarts build numbers when the computation folder runs again.

            Show
            javi_m Javier Martín added a comment - The title of the issue is "multibranch-job deleted when bitbucket communication fails" and I think this is the same issue. The log says unauthorized request, but this is due to a communication problem between bitbucket and jira and therefore bitbucket-branch-source-plugin behaves in the same way as the open issue. The credentials are placed correctly and works fine almost always. Maybe I neet to create another issue because this is only because SSL handshake, but seems pretty close to me, the problem is that when bitbucket communication fails jenkins deletes all multibranch-jobs and restarts build numbers when the computation folder runs again.
            Hide
            amuniz Antonio Muñiz added a comment -

            Right, the fix is the same for any "connection issue".

            Show
            amuniz Antonio Muñiz added a comment - Right, the fix is the same for any "connection issue".
            Hide
            rozhok Vladimir Rozhkov added a comment -

            Javier Martín: sorry then for misunderstanding.
            Does simple retry fix the issue, how do you think?

            Show
            rozhok Vladimir Rozhkov added a comment - Javier Martín : sorry then for misunderstanding. Does simple retry fix the issue, how do you think?
            Hide
            amuniz Antonio Muñiz added a comment - - edited

            Already fixed in github-branch-source https://github.com/jenkinsci/github-branch-source-plugin/pull/57

            The fix needed here is similar.

            Show
            amuniz Antonio Muñiz added a comment - - edited Already fixed in github-branch-source https://github.com/jenkinsci/github-branch-source-plugin/pull/57 The fix needed here is similar.
            Hide
            rozhok Vladimir Rozhkov added a comment -

            Antonio Muñiz: this fix is about github availability, there is only one check – "github == unavailable ? abort job : proceed job". Our issue is different. Our issue is about single requests failing sporadically that's why I'm talking about retries.

            Show
            rozhok Vladimir Rozhkov added a comment - Antonio Muñiz : this fix is about github availability, there is only one check – "github == unavailable ? abort job : proceed job". Our issue is different. Our issue is about single requests failing sporadically that's why I'm talking about retries.
            Hide
            javi_m Javier Martín added a comment -

            That's right, if you rerun folder computation after a failed connection it works again, but with the build numbers restarted. As Antonio Muñiz said, I think the problem is similar as the one resolved for github-branch-source. Regards!

            Show
            javi_m Javier Martín added a comment - That's right, if you rerun folder computation after a failed connection it works again, but with the build numbers restarted. As Antonio Muñiz said, I think the problem is similar as the one resolved for github-branch-source. Regards!
            Hide
            akurdyukov Alik Kurdyukov added a comment -
            Show
            akurdyukov Alik Kurdyukov added a comment - https://github.com/jenkinsci/bitbucket-branch-source-plugin/pull/14/files seems to be a fix for the problem described
            Hide
            amuniz Antonio Muñiz added a comment -

            No, that PR is not fixing this.

            Show
            amuniz Antonio Muñiz added a comment - No, that PR is not fixing this.
            Hide
            akurdyukov Alik Kurdyukov added a comment -

            Well, for me it works. I've built and installed plugin with this patch and haven't got any SSL errors in a week or so.

            Show
            akurdyukov Alik Kurdyukov added a comment - Well, for me it works. I've built and installed plugin with this patch and haven't got any SSL errors in a week or so.
            rtyler R. Tyler Croy made changes -
            Workflow JNJira [ 172671 ] JNJira + In-Review [ 184721 ]
            Hide
            stradenko C added a comment -

            Any ETA on fixing the delete issue? Deleting all the build history when there's a transient bitbucket error is impacting me as well.

            Show
            stradenko C added a comment - Any ETA on fixing the delete issue? Deleting all the build history when there's a transient bitbucket error is impacting me as well.
            Hide
            sanjeevnandam Sanjeev Nithyanandam added a comment -

            Experiencing the same issue. Currently have indexing turned off in multibranch plugin so we constantly don't end up deleting jobs.

            Waiting for the fix for this.

            Show
            sanjeevnandam Sanjeev Nithyanandam added a comment - Experiencing the same issue. Currently have indexing turned off in multibranch plugin so we constantly don't end up deleting jobs. Waiting for the fix for this.
            Hide
            davidkarlsen davidkarlsen added a comment -

            Any progress on this? We´re a paying support customer with CB and have an issue open there as well.

            Show
            davidkarlsen davidkarlsen added a comment - Any progress on this? We´re a paying support customer with CB and have an issue open there as well.
            Hide
            rkok Rene Kok added a comment - - edited

            We are using BitBucket cloud. Our scan over 100+repositories randomly fails because the API rate limit is reached (https://confluence.atlassian.com/bitbucket/rate-limits-668173227.html). With the current implementation of the plugin we can do nothing but wait one hour (rate limit is measured per hour) and then try again to recover the lost build jobs.

            I.m.o. the request rate can be reduced a lot (preventing hitting the API rate limit) if the scan logic is changed to only add jobs or branches that do not exist yet. A separate scan could do nightly cleanups of removed branches/repositories.

            Show
            rkok Rene Kok added a comment - - edited We are using BitBucket cloud. Our scan over 100+repositories randomly fails because the API rate limit is reached ( https://confluence.atlassian.com/bitbucket/rate-limits-668173227.html ). With the current implementation of the plugin we can do nothing but wait one hour (rate limit is measured per hour) and then try again to recover the lost build jobs. I.m.o. the request rate can be reduced a lot (preventing hitting the API rate limit) if the scan logic is changed to only add jobs or branches that do not exist yet. A separate scan could do nightly cleanups of removed branches/repositories.
            Hide
            amuniz Antonio Muñiz added a comment -

            A workaround for this issue is to set the "Orphaned Item Strategy -> Days to keep old items" to something different than 0, for example 1 day.

            Show
            amuniz Antonio Muñiz added a comment - A workaround for this issue is to set the "Orphaned Item Strategy -> Days to keep old items" to something different than 0, for example 1 day.
            Hide
            fortuna Ben Fortuna added a comment -

            Antonio Muñiz Yes, this is a semi-workaround which I use - currently set to 7 day expiry. However for some projects the develop branch isn't built so frequently (i.e. greater than 7 days between builds) so the build history is lost (i.e. resets to build #1 which causes other issues such as versioning conflicts).

            Show
            fortuna Ben Fortuna added a comment - Antonio Muñiz Yes, this is a semi-workaround which I use - currently set to 7 day expiry. However for some projects the develop branch isn't built so frequently (i.e. greater than 7 days between builds) so the build history is lost (i.e. resets to build #1 which causes other issues such as versioning conflicts).
            Hide
            anders_batstrand Anders Båtstrand added a comment -

            Same issue for my team: After Bitbucket Server was down, all jobs reset their build numbers. We used the build number as versioning, but due to this bug we have to use the timestamp instead, which is less human friendly.

            It would be nice if the plugin could see the difference between a branch not being present, and the Bitbucket server not answering.

            Show
            anders_batstrand Anders Båtstrand added a comment - Same issue for my team: After Bitbucket Server was down, all jobs reset their build numbers. We used the build number as versioning, but due to this bug we have to use the timestamp instead, which is less human friendly. It would be nice if the plugin could see the difference between a branch not being present, and the Bitbucket server not answering.
            stephenconnolly Stephen Connolly made changes -
            Link This issue relates to JENKINS-40767 [ JENKINS-40767 ]
            Hide
            stephenconnolly Stephen Connolly added a comment -

            OK this seems to be a case of the BitBucketBranchSource not actually propagating exceptions.

            Basically the BitbucketCloudApiClient and BitbucketServerAPIClient methods that might invoke network operations do not declare throwing IOException or InterruptedException and instead opt for the "friendly" returning of "dummy" values or null

            This defeats the intended behaviour of branch api (which was verified in JENKINS-40767 e.g. see this test https://github.com/jenkinsci/branch-api-plugin/blob/d55f2b4369e0fe9b3b0441654e632eaf8bb5920f/src/test/java/integration/EventsTest.java#L287-L321 as evidence that a propagated exception prevents the orphaned item strategy from kicking in)

            Show
            stephenconnolly Stephen Connolly added a comment - OK this seems to be a case of the BitBucketBranchSource not actually propagating exceptions. Basically the BitbucketCloudApiClient and BitbucketServerAPIClient methods that might invoke network operations do not declare throwing IOException or InterruptedException and instead opt for the "friendly" returning of "dummy" values or null This defeats the intended behaviour of branch api (which was verified in JENKINS-40767 e.g. see this test https://github.com/jenkinsci/branch-api-plugin/blob/d55f2b4369e0fe9b3b0441654e632eaf8bb5920f/src/test/java/integration/EventsTest.java#L287-L321 as evidence that a propagated exception prevents the orphaned item strategy from kicking in)
            kburnett Kevin Burnett made changes -
            Link This issue is duplicated by JENKINS-41863 [ JENKINS-41863 ]
            Hide
            stephenconnolly Stephen Connolly added a comment -

            I have a fix for this under test. If anyone wants to assist in doing some early testing the code is on https://github.com/jenkinsci/bitbucket-branch-source-plugin/pull/35 (though at this point it doesn't build)

            Show
            stephenconnolly Stephen Connolly added a comment - I have a fix for this under test. If anyone wants to assist in doing some early testing the code is on https://github.com/jenkinsci/bitbucket-branch-source-plugin/pull/35 (though at this point it doesn't build)
            stephenconnolly Stephen Connolly made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            Hide
            fortuna Ben Fortuna added a comment -

            Stephen Connolly Will this fix also apply for Git/Github repositories? I suspect most of these plugins will just swallow exceptions when they can't connect to a repository, so perhaps the design needs to be rethought.. (possibly require an exception/explicit return value before deleting the job?)

            Show
            fortuna Ben Fortuna added a comment - Stephen Connolly Will this fix also apply for Git/Github repositories? I suspect most of these plugins will just swallow exceptions when they can't connect to a repository, so perhaps the design needs to be rethought.. (possibly require an exception/explicit return value before deleting the job?)
            Hide
            stephenconnolly Stephen Connolly added a comment -

            Ben Fortuna to the best of my knowledge the Git, GitHub and Subversion implementations do the correct thing and propagate IO errors as exceptions. Certainly when I run out of rate limit the scan is aborted and no repositories or branches are deleted. If you have a reproducible test case or an example log where there was an API error and the scan / index completed as success (which would then cause the missing branches to be removed) please create an issue for it in JIRA as that kind of data loss is highest priority IMHO (i.e. this is why I am working on this issue ahead of all others at present - because it is a data loss issue)

            Show
            stephenconnolly Stephen Connolly added a comment - Ben Fortuna to the best of my knowledge the Git, GitHub and Subversion implementations do the correct thing and propagate IO errors as exceptions. Certainly when I run out of rate limit the scan is aborted and no repositories or branches are deleted. If you have a reproducible test case or an example log where there was an API error and the scan / index completed as success (which would then cause the missing branches to be removed) please create an issue for it in JIRA as that kind of data loss is highest priority IMHO (i.e. this is why I am working on this issue ahead of all others at present - because it is a data loss issue)
            fortuna Ben Fortuna made changes -
            Link This issue relates to JENKINS-42000 [ JENKINS-42000 ]
            Hide
            fortuna Ben Fortuna added a comment -

            Stephen Connolly new issue created for github: JENKINS-42000

            Show
            fortuna Ben Fortuna added a comment - Stephen Connolly new issue created for github: JENKINS-42000
            stephenconnolly Stephen Connolly made changes -
            Assignee Antonio Muñiz [ amuniz ] Stephen Connolly [ stephenconnolly ]
            Hide
            stephenconnolly Stephen Connolly added a comment -

            I claim fixed in 2.1.0 release

            Show
            stephenconnolly Stephen Connolly added a comment - I claim fixed in 2.1.0 release
            stephenconnolly Stephen Connolly made changes -
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Resolved [ 5 ]
            stephenconnolly Stephen Connolly made changes -
            Status Resolved [ 5 ] Closed [ 6 ]
            cloudbees CloudBees Inc. made changes -
            Remote Link This issue links to "CloudBees Internal OSS-1424 (Web Link)" [ 18700 ]

              People

              Assignee:
              stephenconnolly Stephen Connolly
              Reporter:
              asgeirf Asgeir Frimannsson
              Votes:
              19 Vote for this issue
              Watchers:
              24 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: