Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-64490

ThinBackup include/exclude regex doesn't work

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • thinbackup-plugin
    • None
    • Jenkins 2.263.1
      ThinBackup 1.10

      Excluding .*xml still backs up some, not all, xml files.
      Including .*logs.* does not back up the logs directory nor any files with logs in the name.

      robot_acct@jenkins-controller:~/backup/FULL-2020-12-21_05-04$ tree .
      .
      ├── installedPlugins.xml
      └── jobs
          ├── adder
          │   └── config.xml
          └── subtractor
              └── config.xml 

       

          [JENKINS-64490] ThinBackup include/exclude regex doesn't work

          Calvin Park added a comment - - edited

          I've read through the code and understand the problem now.

          The plugin uses org.apache.commons.io.FileUtils.copyDirectory which uses java.io.File and provides a filter that includes the regex that the users supply. Problem is that copyDirectory is behaving unexpectedly.

           

          Suppose this is our source directory:

          /var/jenkins_home/
              log/
                  hudson.plugins.thinbackup.xml
                  org.jvnet.hudson.plugins.thinbackup.xml
          
              logs/
                  slaves/
                      slave0/slave.log
                      slave1/slave.log
                  tasks/
                      'Fingerprint cleanup.log'
                      'telemetry collection.log'

          When I provide a regex (.*\/)?logs\/.*, I expect that to be compared against the absolute path of the file (e.g., /var/jenkins_home/logs/slaves/slave0/slave.log which would match the regex). That is not at all what copyDirectory does.

           

          copyDirectory uses java.io.File to first tokenize the top level items and saves it to a list.

          evalList = ['log', 'logs']

          then it compares each item in the list against the filter. Since neither matches to the regex (.*\/)?logs\/.*, nothing is copied.

           

          Let's suppose we set regex to logs.* in hopes that we capture everything inside logs directory.

          copyDirectory first tokenizes the top level items and saves it to a list.

          evalList = ['log', 'logs']

          logs matches the regex, therefore it is saved to a "matched" list. log does not match therefore it is not saved to the matched list.

          matchedList = ['logs'] 

          Since logs is a directory its children names are saved to evalList.

          evalList = ['slaves', 'tasks'] 

          Neither words match the regex logs.* therefore neither are added to matchList. Since logs is a directory but none of its children matched, it is considered an empty directory and is not copied.

          As a result, nothing is copied.

           

          In order to match all files in logs directory, the regex has to be logs|slave.*|tasks|.*\.log. It being unintuitive aside, the regex has to list all directory names as well as the file patterns on a single pattern. At that point, regex becomes too broad and starts including unintended files.

          I've read through the function signatures under FileUtils but I don't see anything that doesn't apply the filter on each file name. I think the only way to make it work the way at least I find intuitive would be to move away from Apache commons library which would be a huge task. In fact, underneath it's java.io.File that's applying the filter this way, so I don't think using file filters would work in terms of matching regex against the absolute (or even relative) path of the file.

          Calvin Park added a comment - - edited I've read through the code  and understand the problem now. The plugin uses org.apache.commons.io.FileUtils.copyDirectory  which uses java.io.File and provides a filter that includes the regex that the users supply. Problem is that copyDirectory is behaving unexpectedly.   Suppose this is our source directory: / var /jenkins_home/ log/ hudson.plugins.thinbackup.xml org.jvnet.hudson.plugins.thinbackup.xml logs/ slaves/ slave0/slave.log slave1/slave.log tasks/ 'Fingerprint cleanup.log' 'telemetry collection.log' When I provide a regex (.*\/)?logs\/.* , I expect that to be compared against the absolute path of the file (e.g., /var/jenkins_home/logs/slaves/slave0/slave.log  which would match the regex). That is not at all what copyDirectory does.   copyDirectory  uses java.io.File  to first tokenize the top level items and saves it to a list . evalList = [ 'log' , 'logs' ] then it compares each item in the list against the filter . Since neither matches to the regex (.*\/)?logs\/.* , nothing is copied.   Let's suppose we set regex to logs.* in hopes that we capture everything inside logs  directory. copyDirectory first tokenizes the top level items and saves it to a list. evalList = [ 'log' , 'logs' ] logs matches the regex, therefore it is saved to a "matched" list. log does not match therefore it is not saved to the matched list. matchedList = [ 'logs' ] Since logs  is a directory its children names are saved to evalList . evalList = [ 'slaves' , 'tasks' ] Neither words match the regex logs.* therefore neither are added to matchList . Since logs is a directory but none of its children matched, it is considered an empty directory and is not copied. As a result, nothing is copied.   In order to match all files in logs directory, the regex has to be logs|slave.*|tasks|.*\.log . It being unintuitive aside, the regex has to list all directory names as well as the file patterns on a single pattern. At that point, regex becomes too broad and starts including unintended files. I've read through the function signatures  under FileUtils but I don't see anything that doesn't apply the filter on each file name. I think the only way to make it work the way at least I find intuitive would be to move away from Apache commons library which would be a huge task. In fact, underneath it's java.io.File that's applying the filter this way, so I don't think using file filters would work in terms of matching regex against the absolute (or even relative) path of the file.

          Pasquale added a comment - - edited

          I've the same problem and I fixed it using org.apache.tools.ant.DirectoryScanner instead of org.apache.commons.io.FileUtils

          With this other library users have to use Ant notation (e.g. **/*.xml) instead of regular expression.

          If you think that this could be a good solution, I can open a pull request soon.

          Pasquale added a comment - - edited I've the same problem and I fixed it using org.apache.tools.ant.DirectoryScanner instead of  org.apache.commons.io.FileUtils With this other library users have to use Ant notation (e.g. **/*.xml) instead of regular expression. If you think that this could be a good solution, I can open a pull request soon.

          Calvin Park added a comment -

          I think it'll be a good solution. That changes the usage model and possibly breaks the existing configurations so we'll have to see if the maintainers will approve it, but I for one will use your PR

          Calvin Park added a comment - I think it'll be a good solution. That changes the usage model and possibly breaks the existing configurations so we'll have to see if the maintainers will approve it, but I for one will use your PR

            tofuatjava Thomas Fürer
            calvinpark Calvin Park
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: