Uploaded image for project: 'Infrastructure'
  1. Infrastructure
  2. INFRA-1436

Maven indexing is no longer finished every 15 minutes

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      Maven indexing started taking much longer again this week. This is a problem for delivering security fixes, as our update site creation process relies on the maven index of Artifactory.

        Attachments

          Activity

          danielbeck Daniel Beck created issue -
          kohsuke Kohsuke Kawaguchi made changes -
          Field Original Value New Value
          Status Open [ 1 ] In Progress [ 3 ]
          Hide
          kohsuke Kohsuke Kawaguchi added a comment -

          Email sent to JFrog support team.

          Show
          kohsuke Kohsuke Kawaguchi added a comment - Email sent to JFrog support team.
          Hide
          danielbeck Daniel Beck added a comment -

          According to JFrog support,

          In your request log we saw direst the api was called without specifying any repos. Note that indexing is resource intensive. Calculating and indexing for a repository may be a resource-intensive operation, especially for a large local repository or if the repository is a virtual one containing other underlying repositories.

          Therefore, we recommend that you do not include repositories that do not require indexing for a periodic index calculation. Since you poke the indexing API every 15 minutes against all your repo. If your repo has increased in size to a point where each indexing requests took more than 15 mins, you may see the 500 error. Can you refine your choice of repo in your scheduled api call?


          I don't see the unrestricted indexing requests they're referring to. There are requests like this one:

          20171216011801|285|REQUEST|(IP address)|(user name)|POST|/api/maven|HTTP/1.1|200|0

          The request is done by this command, explicitly specifying a single repository:

          $ curl --fail -X POST -H 'Content-Length: 0' -u $USERNAME:$PASSWORD "https://repo.jenkins-ci.org/api/maven?repos=releases&force=1"

          The Artifactory log confirms that it only indexes one actual local repo:

          2017-12-16 01:18:01,321 [http-nio-8083-exec-248] [INFO ] (o.a.m.i.MavenIndexerServiceImpl:115) - Activating indexer for repo '[releases]' manually
          2017-12-16 01:18:01,326 [art-exec-108213] [INFO ] (o.a.m.i.MavenIndexerServiceImpl:136) - Starting Maven indexing
          2017-12-16 01:18:01,326 [art-exec-108213] [INFO ] (o.a.m.i.MavenIndexerServiceImpl:157) - Starting non virtual repositories indexing...
          2017-12-16 01:18:01,327 [art-exec-108213] [INFO ] (o.a.m.i.MavenIndexerServiceImpl:159) - Non virtual repositories to index: [releases]

          Notably, this restriction to a single local repo is a recent optimization attempt (from Wednesday/Thursday) I implemented in the hope of speeding up indexing. We used to request indexing on multiple explicitly specified repositories before Wednesday, and had done the same for several months when indexing was finished much quicker.

          So, to recap: For multiple months, until Monday or Tuesday this week, indexing of multiple repos, was finished in ~15 minutes. Since Tuesday or Wednesday this week, indexing of a single local repo takes 1.5 hrs. Nothing from our side changed – in fact, we tried to speed it up some.

          Show
          danielbeck Daniel Beck added a comment - According to JFrog support, In your request log we saw direst the api was called without specifying any repos. Note that indexing is resource intensive. Calculating and indexing for a repository may be a resource-intensive operation, especially for a large local repository or if the repository is a virtual one containing other underlying repositories. Therefore, we recommend that you do not include repositories that do not require indexing for a periodic index calculation. Since you poke the indexing API every 15 minutes against all your repo. If your repo has increased in size to a point where each indexing requests took more than 15 mins, you may see the 500 error. Can you refine your choice of repo in your scheduled api call? I don't see the unrestricted indexing requests they're referring to. There are requests like this one: 20171216011801|285|REQUEST|(IP address)|(user name)|POST|/api/maven|HTTP/1.1|200|0 The request is done by this command, explicitly specifying a single repository: $ curl --fail -X POST -H 'Content-Length: 0' -u $USERNAME:$PASSWORD "https://repo.jenkins-ci.org/api/maven?repos=releases&force=1" The Artifactory log confirms that it only indexes one actual local repo: 2017-12-16 01:18:01,321 [http-nio-8083-exec-248] [INFO ] (o.a.m.i.MavenIndexerServiceImpl:115) - Activating indexer for repo ' [releases] ' manually 2017-12-16 01:18:01,326 [art-exec-108213] [INFO ] (o.a.m.i.MavenIndexerServiceImpl:136) - Starting Maven indexing 2017-12-16 01:18:01,326 [art-exec-108213] [INFO ] (o.a.m.i.MavenIndexerServiceImpl:157) - Starting non virtual repositories indexing... 2017-12-16 01:18:01,327 [art-exec-108213] [INFO ] (o.a.m.i.MavenIndexerServiceImpl:159) - Non virtual repositories to index: [releases] Notably, this restriction to a single local repo is a recent optimization attempt (from Wednesday/Thursday) I implemented in the hope of speeding up indexing. We used to request indexing on multiple explicitly specified repositories before Wednesday, and had done the same for several months when indexing was finished much quicker. So, to recap: For multiple months, until Monday or Tuesday this week, indexing of multiple repos, was finished in ~15 minutes. Since Tuesday or Wednesday this week, indexing of a single local repo takes 1.5 hrs. Nothing from our side changed – in fact, we tried to speed it up some.
          Hide
          danielbeck Daniel Beck added a comment -

          Age of index is consistently ~1.5 hrs:

          $ while true ; do curl https://repo.jenkins-ci.org/api/storage/releases/.index/nexus-maven-repository-index.gz 2>/dev/null | jq '.lastModified' ; sleep 60 ; done
          "2017-12-16T01:16:59.072Z"
          (repeated)
          "2017-12-16T01:16:59.072Z"
          "2017-12-16T02:43:20.784Z"
          (repeated)
          "2017-12-16T02:43:20.784Z"
          "2017-12-16T04:09:30.886Z"
          (repeated)
          "2017-12-16T04:09:30.886Z"
          "2017-12-16T05:38:39.587Z"
          (repeated)
          "2017-12-16T05:38:39.587Z"
          "2017-12-16T07:03:57.746Z"
          (repeated)
          "2017-12-16T07:03:57.746Z"
          "2017-12-16T08:33:10.841Z"
          (repeated)
          "2017-12-16T08:33:10.841Z"
          "2017-12-16T09:59:01.961Z"
          (repeated)
          "2017-12-16T09:59:01.961Z"
          "2017-12-16T11:10:56.455Z"
          (repeated)
          "2017-12-16T11:10:56.455Z"
          "2017-12-16T12:29:33.815Z"
          (repeated)
          "2017-12-16T12:29:33.815Z"
          

          On Monday afternoon UTC, as also seen by Wadeck Follonier when I performed the script-security release, the interval was 15 minutes. On Wednesday late evening UTC, it was 1.5 hrs. We didn't change anything in between. The Thursday morning change to only index the releases repo was ineffective.

          Show
          danielbeck Daniel Beck added a comment - Age of index is consistently ~1.5 hrs: $ while true ; do curl https: //repo.jenkins-ci.org/api/storage/releases/.index/nexus-maven-repository-index.gz 2>/dev/ null | jq '.lastModified' ; sleep 60 ; done "2017-12-16T01:16:59.072Z" (repeated) "2017-12-16T01:16:59.072Z" "2017-12-16T02:43:20.784Z" (repeated) "2017-12-16T02:43:20.784Z" "2017-12-16T04:09:30.886Z" (repeated) "2017-12-16T04:09:30.886Z" "2017-12-16T05:38:39.587Z" (repeated) "2017-12-16T05:38:39.587Z" "2017-12-16T07:03:57.746Z" (repeated) "2017-12-16T07:03:57.746Z" "2017-12-16T08:33:10.841Z" (repeated) "2017-12-16T08:33:10.841Z" "2017-12-16T09:59:01.961Z" (repeated) "2017-12-16T09:59:01.961Z" "2017-12-16T11:10:56.455Z" (repeated) "2017-12-16T11:10:56.455Z" "2017-12-16T12:29:33.815Z" (repeated) "2017-12-16T12:29:33.815Z" On Monday afternoon UTC, as also seen by Wadeck Follonier when I performed the script-security release, the interval was 15 minutes. On Wednesday late evening UTC, it was 1.5 hrs. We didn't change anything in between. The Thursday morning change to only index the releases repo was ineffective.
          Hide
          danielbeck Daniel Beck added a comment -

          I responded to JFrog with the above, let's see whether I get a response.

          Show
          danielbeck Daniel Beck added a comment - I responded to JFrog with the above, let's see whether I get a response.
          Hide
          danielbeck Daniel Beck added a comment -

          It's getting worse:

          2017-12-19 19:48:01,311 [art-exec-226769] [INFO ] (o.a.m.i.MavenIndexerServiceImpl:136) - Starting Maven indexing
          2017-12-19 19:48:01,312 [art-exec-226769] [INFO ] (o.a.m.i.MavenIndexerServiceImpl:157) - Starting non virtual repositories indexing...
          2017-12-19 19:48:01,312 [art-exec-226769] [INFO ] (o.a.m.i.MavenIndexerServiceImpl:159) - Non virtual repositories to index: [releases]
          2017-12-19 21:57:02,409 [art-exec-226769] [INFO ] (o.a.m.i.MavenIndexManager:276) - Successfully saved index file 'releases:.index/nexus-maven-repository-index.gz' and index info 'releases:.index/nexus-maven-repository-index.properties'.
          
          Show
          danielbeck Daniel Beck added a comment - It's getting worse: 2017-12-19 19:48:01,311 [art-exec-226769] [INFO ] (o.a.m.i.MavenIndexerServiceImpl:136) - Starting Maven indexing 2017-12-19 19:48:01,312 [art-exec-226769] [INFO ] (o.a.m.i.MavenIndexerServiceImpl:157) - Starting non virtual repositories indexing... 2017-12-19 19:48:01,312 [art-exec-226769] [INFO ] (o.a.m.i.MavenIndexerServiceImpl:159) - Non virtual repositories to index: [releases] 2017-12-19 21:57:02,409 [art-exec-226769] [INFO ] (o.a.m.i.MavenIndexManager:276) - Successfully saved index file 'releases:.index/nexus-maven-repository-index.gz' and index info 'releases:.index/nexus-maven-repository-index.properties'.
          Hide
          danielbeck Daniel Beck added a comment -

          Response from JFrog:

          Hi Daniel and Kohsuke,

          Thank you for the update, we have escalated to our R&D team to investigate your issue further. Thank you for your patience.

          Show
          danielbeck Daniel Beck added a comment - Response from JFrog: Hi Daniel and Kohsuke, Thank you for the update, we have escalated to our R&D team to investigate your issue further. Thank you for your patience.
          danielbeck Daniel Beck made changes -
          Assignee Kohsuke Kawaguchi [ kohsuke ] Daniel Beck [ danielbeck ]
          Hide
          danielbeck Daniel Beck added a comment -

          We're down to about 30 minutes again, but that's still not great.

          Show
          danielbeck Daniel Beck added a comment - We're down to about 30 minutes again, but that's still not great.
          Hide
          rtyler R. Tyler Croy added a comment -

          I thought I saw some emails about this being fixed?

          Show
          rtyler R. Tyler Croy added a comment - I thought I saw some emails about this being fixed?
          Hide
          danielbeck Daniel Beck added a comment -

          Sort of. JFrog informed us they have no idea what happened in December: http://lists.jenkins-ci.org/pipermail/jenkins-infra/2018-February/001386.html

          That said, I informed them that since Jan 24, the indexing is fairly stable at 10-15 minutes (better than in November): http://lists.jenkins-ci.org/pipermail/jenkins-infra/2018-January/001379.html

          I didn't monitor the times while I was at FOSDEM (misconfigured script), but other than Feb 8 with times up at 2 hours, there were no notable outliers since Jan 26.

          This seems good enough for now.

          Show
          danielbeck Daniel Beck added a comment - Sort of. JFrog informed us they have no idea what happened in December: http://lists.jenkins-ci.org/pipermail/jenkins-infra/2018-February/001386.html That said, I informed them that since Jan 24, the indexing is fairly stable at 10-15 minutes (better than in November): http://lists.jenkins-ci.org/pipermail/jenkins-infra/2018-January/001379.html I didn't monitor the times while I was at FOSDEM (misconfigured script), but other than Feb 8 with times up at 2 hours, there were no notable outliers since Jan 26. This seems good enough for now.
          danielbeck Daniel Beck made changes -
          Resolution Fixed [ 1 ]
          Status In Progress [ 3 ] Resolved [ 5 ]
          danielbeck Daniel Beck made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            Assignee:
            danielbeck Daniel Beck
            Reporter:
            danielbeck Daniel Beck
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: