Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-72776

Jenkins cannot handle failure by timeout waiting for credits when Jenkins is setup in Linux kernel version higher than v5.12

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not A Defect
    • Icon: Critical Critical
    • cifs-plugin, core
    • None
    • TLS version higher than LTS 2.426.2 which is deployed in AKS 1.26 or 1.27 or 1.28

      Hey teams,

       

      We setup Jenkins TLS docker image into Microsoft AKS legacy version like AKS 1.24 (Linux kernel LTS version v5.4) and all were OK before 2023.07.

      But Microsoft upgraded AKS with new Linux kernel LTS version higher than v5.15 in AKS 1.26/1.27/1.28, we found latest Jenkins is always halted here which requires manual restarting of deployed POD inside AKS to resolve it.

       

      We raised ticket to Microsoft for more debugging on this SMB issue and we had below comment from Microsoft:

      The SMB protocol uses credits as a mechanism to limit and control the number of outstanding requests on a connection at any time. The server gives out credits to the client, which the client can consume to send more requests.  In this scenario, the application workload is generating many parallel file operations on the mount point causing SMB credits utilization to hit the limit, making the SMB client wait for outstanding IO's to complete, and after a 60-second request timeout, return an error to the application.

      From kernel version v5.12 there was a change in Linux SMB client error code returned by the SMB client in this scenario, from ENOTSUPP to EBUSY in case of request timeout due to credits exhaustion. Change in the error code was to return a more logical and correct error number during this situation. i.e., from a non-standard "Operation not supported" (ENOTSUPP) (ENOTSUPP is not something that most libraries/applications recognize) to a more standard "Device or resource busy" (EBUSY).
      We suspect and it is possible here that the customer application is behaving differently for these two errors. i.e., retrying the file I/O when it gets ENOTSUPP vs stopping abruptly when it gets EBUSY. We recommend that the customer investigate the application on how it handles the two errors for file operations differently.

       

      And you can see the kernel change from: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7de0394801da4f759684c4a33cf62f12da6e447d

       

      May I know if you have realized this impact for SMB protocol basing on the kernel change by changing the error returned to the app (like Jenkins) with EBUSY, instead of ENOTSUPP.

       

      Please feel free to let me know if you require more information on this SMB issue which may impact Jenkins now.

      If it is true Jenkins issue, please kindly share the timeline when we can have this fix in Jenkins LTS docker image version. Thank you very much.

            Unassigned Unassigned
            jayclover2024 Jay
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: