• Bundle anonymization

      For sites with stringent security policies, there should be an option when generating a support bundle (or perhaps just a global setting applicable also to auto-generated bundles) that would search for mentions in all files of labels created by the customer which might reflect proprietary processes: job, folder, view, slave, and template names, slave labels, etc.

      The plugin would gather a list of all such labels, create randomized tokens, and produce a mapping so that a job AppBuild becomes Job_ayrzw. For labels with spaces or other special characters, which could have triggered bugs, the mapping should follow, so App ? Build should become Job_ayrzw ? X, and the mapping should also include encoded variants such as App%20%E2%86%92%20Build to Job_ayrzw%20%E2%86%92%20X and App%20%e2%86%92%20Build to Job_ayrzw%20%e2%86%92%20X.

      Then these substitutions would be applied to all files included in the support bundle, particularly log files and thread dumps.

      It is impossible to guarantee that customer text does not appear in some unusual context, e.g. an exception quoting a syntactically incorrect Groovy script, but these substitutions would sanitize the great majority of what the support bundle produces, and make it feasible for the customer to do a final inspection without needing to do much or any manual editing.

          [JENKINS-21670] Option to anonymize customer labels

          IPs and network settings have also to be shadowed.

          Arnaud Héritier added a comment - IPs and network settings have also to be shadowed.

          Please help me to clarify the following things.

          1) How to find and keep track of labels created by the customer from the plugin side.
          2) What is the purpose of creating randomized tokens, Producing a mapping and substitution?

          Minudika Malshan added a comment - Please help me to clarify the following things. 1) How to find and keep track of labels created by the customer from the plugin side. 2) What is the purpose of creating randomized tokens, Producing a mapping and substitution?

          Hi minudika

          From my POV (but I hope that many others will comment) I would like to have in the bundle generation form a set of new options to decide what kind of informations I would like to anonymise (by default everything checked). These king of informations may be something like URLs, IPs, ...
          Based on these settings we should try to find in the bundle all the entries matching them and for each different entry we should replace it by a unique entry. This is what is explaining jglick.
          It is critical within a bundle to to always replace the same entry by the same value to be able to understand the relation in all files.
          Nowadays we don't allow to export job configuration files or build informations ( JENKINS-30468 ) but I hope that one day we will and thus in that case we'll have to use the same mechanism.

          Arnaud Héritier added a comment - Hi minudika From my POV (but I hope that many others will comment) I would like to have in the bundle generation form a set of new options to decide what kind of informations I would like to anonymise (by default everything checked). These king of informations may be something like URLs, IPs, ... Based on these settings we should try to find in the bundle all the entries matching them and for each different entry we should replace it by a unique entry. This is what is explaining jglick . It is critical within a bundle to to always replace the same entry by the same value to be able to understand the relation in all files. Nowadays we don't allow to export job configuration files or build informations ( JENKINS-30468 ) but I hope that one day we will and thus in that case we'll have to use the same mechanism.

          Jesse Glick added a comment -

          By the way I would suggest using something like this library instead of unreadable tokens. Easier for humans to remember and match.

          Jesse Glick added a comment - By the way I would suggest using something like this library instead of unreadable tokens. Easier for humans to remember and match.

          +1 with jglick

          Arnaud Héritier added a comment - +1 with jglick

          Here are more feedbacks of what could be considered as “Sensitive/Non-Public”:

          1/ System and network informations : processes, accounts, IPs, hostnames and everything related to the hosting system and its network configuration.
          -> Current status: Such informations are provided by different bundles options which can be already deactivated (Environment variables, File descriptors (Unix only), Networking Interface, Root CAs, System configuration (Linux only), System properties) but there are probably others places where they should be anonymised

          2/ All kind of credentials
          -> Current status: Never stored in bundles. Should never appear in Jenkins logs (but ...). Should never appear in jobs logs (but ... it can - there are some known issues) but for now no job informations/logs are provided in bundles

          3/ All kind of audit logs to know who did what
          -> Current status: it should be necessary to not bundle the audit plugin logs if it is installed (I don't know it very well thus I need to investigate).

          4/ Users informations (Name, emails, logon, ...)
          -> Provided by option "About user (basic authentication details only)" but we may probably have to anonymise them in more locations

          Arnaud Héritier added a comment - Here are more feedbacks of what could be considered as “Sensitive/Non-Public”: 1/ System and network informations : processes, accounts, IPs, hostnames and everything related to the hosting system and its network configuration. -> Current status: Such informations are provided by different bundles options which can be already deactivated (Environment variables, File descriptors (Unix only), Networking Interface, Root CAs, System configuration (Linux only), System properties) but there are probably others places where they should be anonymised 2/ All kind of credentials -> Current status: Never stored in bundles. Should never appear in Jenkins logs (but ...). Should never appear in jobs logs (but ... it can - there are some known issues) but for now no job informations/logs are provided in bundles 3/ All kind of audit logs to know who did what -> Current status: it should be necessary to not bundle the audit plugin logs if it is installed (I don't know it very well thus I need to investigate). 4/ Users informations (Name, emails, logon, ...) -> Provided by option "About user (basic authentication details only)" but we may probably have to anonymise them in more locations

          aheritier jglick Since the entities like job names does not have a specific format like network addresses do, how to find a string as such a name which is needed to be anatomized?

          Minudika Malshan added a comment - aheritier jglick Since the entities like job names does not have a specific format like network addresses do, how to find a string as such a name which is needed to be anatomized?

          Really good question cc schristou
          As we are running in Jenkins I think that the best is to use its APIs to find all kind of Jobs (Jenkins.instance.getAllItems...) and then to create a map to replace each name.
          We probably need to do the same for views, users/accounts, ....
          After this we need a mechanism of filter when we are getting logs, config files, ... to replace all orginal names by the "protected" value

          Arnaud Héritier added a comment - Really good question cc schristou As we are running in Jenkins I think that the best is to use its APIs to find all kind of Jobs (Jenkins.instance.getAllItems...) and then to create a map to replace each name. We probably need to do the same for views, users/accounts, .... After this we need a mechanism of filter when we are getting logs, config files, ... to replace all orginal names by the "protected" value

          Minudika Malshan added a comment - - edited

          aheritier What did you mean by "views"? Jenkins.getInstance().getViewActions() ?
          Also could you please tell me is there a way to get the information of users/accounts through API?

          jglick Is this https://github.com/kohsuke/wordnet-random-name random word generator available as a maven dependency? Or do we have to add that jar or classes to Support core plug-in project manually?
          Thanks a lot!

          Minudika Malshan added a comment - - edited aheritier What did you mean by "views"? Jenkins.getInstance().getViewActions() ? Also could you please tell me is there a way to get the information of users/accounts through API? jglick Is this https://github.com/kohsuke/wordnet-random-name random word generator available as a maven dependency? Or do we have to add that jar or classes to Support core plug-in project manually? Thanks a lot!

          Sam Gleske added a comment -

          User-defined list of expressions to search and replace can help with job names that have no convention. An enterprise may have keywords which they don't want leaked.

          Sam Gleske added a comment - User-defined list of expressions to search and replace can help with job names that have no convention. An enterprise may have keywords which they don't want leaked.

          Devin Nusbaum added a comment -

          To make sure that users understand that this feature is not guaranteed to anonymize all uses of confidential information, we should add a warning immediately before the "Generate Bundle" button that explains that the anonymization and encrypted secret masking are best-effort and that users should double-check and redact any confidential information before sending the bundle to a third party. We should also change the description for configuration file components from

          ... (Encrypted secrets are redacted)

          to

          ... (Encrypted secrets are redacted. See the <a href=#warning>warning</a> for details)

          Devin Nusbaum added a comment - To make sure that users understand that this feature is not guaranteed to anonymize all uses of confidential information, we should add a warning immediately before the "Generate Bundle" button that explains that the anonymization and encrypted secret masking are best-effort and that users should double-check and redact any confidential information before sending the bundle to a third party. We should also change the description for configuration file components from ... (Encrypted secrets are redacted) to ... (Encrypted secrets are redacted. See the <a href=#warning>warning</a> for details)

          Matt Sicker added a comment -

          Here's a proposed admin console for this feature so far:

          Matt Sicker added a comment - Here's a proposed admin console for this feature so far:

          Jesse Glick added a comment -

          PR 144 seems to be the current link.

          Jesse Glick added a comment - PR 144 seems to be the current link.

          Devin Nusbaum added a comment -

          Released in Support Core 2.48. Thanks jvz!

          Devin Nusbaum added a comment - Released in Support Core 2.48 . Thanks jvz !

          Matt Sicker added a comment -

          Updated documentation in wiki to reflect released feature.

          Matt Sicker added a comment - Updated documentation in wiki to reflect released feature.

          Matt Sicker added a comment -

          Feature is released now, marking this as closed.

          Matt Sicker added a comment - Feature is released now, marking this as closed.

            jvz Matt Sicker
            jglick Jesse Glick
            Votes:
            2 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: