-
Bug
-
Resolution: Fixed
-
Blocker
-
Windows server 2008 R2 (A VM). 32GB RAM, X5765 @3.07 GHz (6 processors)
Jenkins 1.554.3; 1.565.1;
-
Powered by SuggestiMate
The symptom is:
When I download a file, e.g. the file size should be 472MB, and at the first few seconds, the progress shows 46.8/472MB and changes with size. Then suddenly the file size because 169/169MB and then finished download without error.
Meet this issue in IE/Chrome/Firefox. Clear the load download history could resolve it for a while but may have the same issue other other download in other jobs(We have several jobs with the same artifact name)
Run jenkins with
<executable>C:\Program Files\Java\jre7\bin\java.exe</executable>
<arguments>-d64 -Xrs -XX:MaxPermSize=2g -Xmx16g -Dhudson.lifecycle=hudson.lifecycle.WindowsServiceLifecycle -jar "%BASE%\jenkins.war" --httpPort=8080</arguments>
[JENKINS-23868] Use IE/Chrome/Firefox to download Jenkins artifacts has wrong size but no error reported
What happens when you download the artifact using different tools, e.g. other browser, command line tools like wget, curl (may need authentication if Jenkins is secured, but that's possible)?
It looks the same from IE/Chrome/Firefox. The download is incomplete but no error message.
When I download use wget, I saw onetime that the download just hung there and no speed. But if I kill it and download again, it works.
We are seeing it again. This issue is often seen when the server is in US and we download it from Shanghai office. We do not see it often when download from US.
You could copy the file from Jenkins to another http server in the U.S. (preferably on the same machine), then attempt the download from the http server (instead of Jenkins). That would let you confirm that the problem is more related to network transfer than to Jenkins being involved in the transfer.
Alternately, if the receiving machine is a Linux machine, you could copy to a nearby Linux server in the U.S., then use rsync to download the file. The rsync protocol will restart and only transfer missing portions of the file.
I copied the Jenkins artifacts to the same machine in our Jenkins server. I tried several downloads. None of them fail or hung.
However, at the same time, I got a 50% percent rate for downloading from jenkins job that the download started with ~=2MB/s (which is the expected speed), but then after a few seconds, the speed goes down and became zero. And one time that the download complete but the size is smaller than the actual size. So I believe there is a high chance that this is issue with Jenkins. I saw this issue on Jenkins 1.565.1 and 1. 554.3.
This is easy to be reproduce to download an artifact for the first time.
I'm not aware of any change to Jenkins which would alter that behavior. Since you were able to place your large binary on another server, and download from that other server, you could consider making that a permanent work around.
Since you're running on a Windows computer as your Jenkins server, you might also check to see if the same behavior appears when running Jenkins from a Linux server.
We have set up another jenkins in another server in US (windows server 2008 r2), jenkins version 1.565.1, and the same thing happen if I download from Shanghai. The file downloaded is not complete but no error is reported.
I asked other people in US to download it and they do not see this issue. I will do another experiment to check whether it will exist on linux jenkins server.
I would not be a good idea to place the files on other server because we have a large collection of customers. If we change the places, it is a big impact.
another jenkins ... the same thing happen if I download from Shanghai. ... other people in US to download it and they do not see this issue.
This looks more like a network problem to me.
I did three experiments by using ab (http://httpd.apache.org/docs/2.2/programs/ab.html) and do stress tests of download the artifacts 300 times:
1. Download artifacts from jenkins (18 failures out of 300 requests)
2. Download artifacts from jenkins proxy by apache (9 failures out of 300 requests)
3. Download the same files from IIS server, without going though jenkins. (0 failures out of 300 requests)
ab -n300 -c2 http://xxxx.EXE
So based on the above result, we can say that there is issue with jenkins download. The artifacts are 150-300MB. We are downloading it by putting server in US and download from China. I don't believe this is caused by network issue. Because if it is network issue, we will get corrupted download other than a a complete download but file size if wrong. Or we can say that poor network could trigger jenkins bug.
To clarify, both IIS and Apache are on the same host or at least network segment as Jenkins?
The test is done on 1.565.2. I will do another test to run it on a previous version.
I did another test on 1.532.3, not see this download issue with ab -n300 -c2 http://xxxx.EXE.
However, we saw jenkins blue screen to death issue on 1.523.3, see https://issues.jenkins-ci.org/browse/JENKINS-24453?focusedCommentId=211164#comment-211164
Is there anything we can do to help debugging this issue? It is very easy for us to reproduce this download issue.
Log hudson.model.DirectoryBrowserSupport on FINE and report the output: https://wiki.jenkins-ci.org/display/JENKINS/Logging
Capture the HTTP request and response headers (there may be multiple). It may be related to partial downloads.
Stapler had only minimal changes between 1.532.3...1.565.2:
https://github.com/stapler/stapler/compare/stapler-parent-1.223...stapler-parent-1.224
Especially the last released change to serving static files (for JENKINS-13125) was already in Stapler 1.222, since Jenkins 1.539/1.532.2.
How to "Capture the HTTP request and response headers"?
We did get below summary from our IT when asking them to investigating this issue(we thought it was about network before):
Our networking team have run a few captures from the host you provided (10.62.34.216) using Wireshark while downloading the platypus.exe file. There seems to be duplicate requests where a packet is trying re-sending over and over (sometimes up to 120 times). This happens both when the file is loaded with a lower MB size and even when the file loads the entire 472MB.
We also ran Wireshark captures from two separate desktops on our end and received no duplicate packet requests and a complete download. This is leading us to believe that there is isn’t a problem on the web server and that there may be a problem on the hosts in your network blocking traffic. Ronald took a look at the captures and does see a huge amount of duplicate ack packets. His report was these are requests from the receiver side towards the server asking for the server to resend a specific packet/frame that is missing. Normally, the server sees around 3 of these and immediately resends the packet the missing packet. This could point to packet loss problem but further captures and data would help us to verify this. Below is a image of the capture(see attachment wireshark log).
As our network team see no abnormal in network, we did the experiments and found this issue is not exist on 1.532.3, but exist on 1.565.2. So we can say that it should be a jenkins issue, but this issue might be triggered by network that has a big latency.
Which part of your comment is a quote? Because all the information in your comment points to OS/environment (e.g. JRE) or hardware level problem, which the last paragraph seems to contradict.
Hi Daniel:
Previously we thought it was a network problem as our network team has using wireshark to monitor the download of artifact file. We found there were duplicate requests when a package is trying to re-sending over and over again(sometimes up to 120 times). his happens both when the file is loaded with a lower MB size and even when the file loads the entire 472MB.
We also did experiments on another two desktop in United Stated where the server is placed, and see no duplicate issue and a complete download. This leading us to believe there is a network issue.
However, then we did another experiment, to let the server in US, and use differnt version of jenkins to try to download. We found that in jenkins 1.532.2, we see no issue either from US or from China. For Jenkins 1.565.2, we saw issues in China very easily. So we believe this is a jenkins issue that could be triggered if we download from another contry for the artifacts. Since the network will have a network latency across country, it might trigger jenkins to resend the packages again and again which leads to the download issue.
I did another experiments for 1.565.2, and saw 3 failed requests out of 300.
Concurrency Level: 2
Time taken for tests: 10396.019 seconds
Complete requests: 300
Failed requests: 3
Here is the log (Just keep a small part as I checked the other pasts are just the same. ).
Serving file:/C:/Program%20Files%20(x86)/Jenkins/jobs/pdk/builds/2014-09-11_20-03-08/archive/platypus.EXE with lastModified=1410491002000, length=162376398
Sep 21, 2014 9:33:09 AM FINE hudson.model.DirectoryBrowserSupport
Serving file:/C:/Program%20Files%20(x86)/Jenkins/jobs/pdk/builds/2014-09-11_20-03-08/archive/platypus.EXE with lastModified=1410491002000, length=162376398
We cannot reproduce this issue on 1.532.3, but we can reproduce it on 1.565.2. Is there any related change during these two version?
We are stuck at this issue now.
Hi, Could you please revisit this issue? I saw it also on the latest nightly build of jenkins yesterday and still found this issue. I believe download across country is very easy to reproduce this issue. And a command "ab -n300 -c2 http://xxxx.EXE" could easily help you to verify whether this version is working properly or not.
This issue is existing on a the latest nightly build and LTS version. It has a global impact to users as people will get artifacts download with no error but the actual size if wrong. Using "ab -n300 -c2 http://xxxx.your_artifact_link" is very easy to reproduce.
Similar problem with version 1.565.1 here.
We have Linux server which settle in China.
And our colleagues from US and Europe report that they can hardly download an artifacts successfully since we upgrade to 1.565 from 1.515
And this even happens in our local office, with LAN I mean, so that's definitely NOT a network issue I think.
We also have this issue.
We've seen several instances of this with 1.565.1, from India to a Jenkins master in the Netherlands. Never noticed anything like this with 1.532.2 which we were using before, or any older versions.
Funnily enough, we also yesterday had an issue where a build (running on the NL Jenkins master) downloaded (in some Ant build step) a large binary from another Jenkins master running 1.456 (Server: Apache-Coyote/1.1) located in the Philippines, which also resulted in a partial download with no error.
No idea what to make of this, but perhaps this helps. Cannot yet reproduce the problem using ab, but haven't yet let it run for a longer period of time...
@danielbeck: I don't know about /userContent, and can't easily test.
However, just now a colleague encountered a partial download again, for a <300MB file in our NL LAN (so no inter-site communication), downloading from the Jenkins UI (1.565.1), and I noticed in the JavaMelody log the following exception, at the exact time that his download failed, corresponding to the GET request for that download:
org.eclipse.jetty.io.EofException: timeout at org.eclipse.jetty.http.AbstractGenerator.blockForOutput(AbstractGenerator.java:520) at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:147) at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:107) at net.bull.javamelody.FilterServletOutputStream.write(FilterServletOutputStream.java:69) at net.bull.javamelody.CounterResponseStream.write(CounterResponseStream.java:82) at org.kohsuke.stapler.Stapler.serveStaticResource(Stapler.java:567) at org.kohsuke.stapler.ResponseImpl.serveFile(ResponseImpl.java:215) at hudson.model.DirectoryBrowserSupport.serveFile(DirectoryBrowserSupport.java:305) at hudson.model.DirectoryBrowserSupport.generateResponse(DirectoryBrowserSupport.java:123) at org.kohsuke.stapler.HttpResponseRenderer$Default.handleHttpResponse(HttpResponseRenderer.java:117) at org.kohsuke.stapler.HttpResponseRenderer$Default.generateResponse(HttpResponseRenderer.java:66) at org.kohsuke.stapler.Function.renderResponse(Function.java:113) at org.kohsuke.stapler.Function.bindAndInvokeAndServeResponse(Function.java:98) at org.kohsuke.stapler.MetaClass$1.doDispatch(MetaClass.java:120) at org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:53) at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:728) at org.kohsuke.stapler.Stapler.invoke(Stapler.java:858) at org.kohsuke.stapler.MetaClass$12.dispatch(MetaClass.java:390) at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:728) at org.kohsuke.stapler.Stapler.invoke(Stapler.java:858) at org.kohsuke.stapler.MetaClass$6.doDispatch(MetaClass.java:248) at org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:53) at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:728) at org.kohsuke.stapler.Stapler.invoke(Stapler.java:858) at org.kohsuke.stapler.MetaClass$6.doDispatch(MetaClass.java:248) at org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:53) at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:728) at org.kohsuke.stapler.Stapler.invoke(Stapler.java:858) at org.kohsuke.stapler.MetaClass$6.doDispatch(MetaClass.java:248) at org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:53) at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:728) at org.kohsuke.stapler.Stapler.invoke(Stapler.java:858) at org.kohsuke.stapler.MetaClass$6.doDispatch(MetaClass.java:248) at org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:53) at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:728) at org.kohsuke.stapler.Stapler.invoke(Stapler.java:858) at org.kohsuke.stapler.Stapler.invoke(Stapler.java:631) at org.kohsuke.stapler.Stapler.service(Stapler.java:225) at javax.servlet.http.HttpServlet.service(HttpServlet.java:848) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:686) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1494) at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:96) at net.bull.javamelody.MonitoringFilter.doFilter(MonitoringFilter.java:202) at net.bull.javamelody.MonitoringFilter.doFilter(MonitoringFilter.java:180) at net.bull.javamelody.PluginMonitoringFilter.doFilter(PluginMonitoringFilter.java:85) at org.jvnet.hudson.plugins.monitoring.HudsonMonitoringFilter.doFilter(HudsonMonitoringFilter.java:89) at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:99) at hudson.plugins.greenballs.GreenBallFilter.doFilter(GreenBallFilter.java:58) at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:99) at hudson.plugins.audit_trail.AuditTrailFilter.doFilter(AuditTrailFilter.java:89) at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:99) at hudson.util.PluginServletFilter.doFilter(PluginServletFilter.java:88) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1482) at hudson.security.csrf.CrumbFilter.doFilter(CrumbFilter.java:48) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1482) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:84) at hudson.security.UnwrapSecurityExceptionFilter.doFilter(UnwrapSecurityExceptionFilter.java:51) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at jenkins.security.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:117) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at org.acegisecurity.providers.anonymous.AnonymousProcessingFilter.doFilter(AnonymousProcessingFilter.java:125) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at org.acegisecurity.ui.rememberme.RememberMeProcessingFilter.doFilter(RememberMeProcessingFilter.java:142) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at org.acegisecurity.ui.AbstractProcessingFilter.doFilter(AbstractProcessingFilter.java:271) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at org.acegisecurity.ui.basicauth.BasicProcessingFilter.doFilter(BasicProcessingFilter.java:174) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at jenkins.security.ApiTokenFilter.doFilter(ApiTokenFilter.java:74) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at org.acegisecurity.context.HttpSessionContextIntegrationFilter.doFilter(HttpSessionContextIntegrationFilter.java:249) at hudson.security.HttpSessionContextIntegrationFilter2.doFilter(HttpSessionContextIntegrationFilter2.java:67) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at hudson.security.ChainedServletFilter.doFilter(ChainedServletFilter.java:76) at hudson.security.HudsonFilter.doFilter(HudsonFilter.java:164) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1482) at org.kohsuke.stapler.compression.CompressionFilter.doFilter(CompressionFilter.java:46) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1482) at hudson.util.CharacterEncodingFilter.doFilter(CharacterEncodingFilter.java:81) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1474) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:499) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:533) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:370) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:949) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1011) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:668) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) at winstone.BoundedExecutorService$1.run(BoundedExecutorService.java:77) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662)
That's the timeout you should be able to control via the argument
--httpKeepAliveTimeout
or
--httpsKeepAliveTimeout
to java -jar jenkins.war. See java -jar jenkins.war --help.
@Daniel, we didn't see this exception when we met this download issue. Not sure how it helps us to prevent this issue? I guess the only way is to downgrade back to 1.532.2?
I've been experiencing the truncated-download issue, too. No exception was raised.
Setting --httpsKeepAliveTimeout=600000 (10 minutes) so far seems to have solved the problem for me. (Thanks!)
I have changed the jenkins to use --httpsKeepAliveTimeout=600000, so far I cannot reproduce this issue even with ab. Thanks Daniel!
It concerns me a bit that only marnix_klooster got the timeout exception. Otherwise this would be a straightforward config issue.
@danielbeck: Today I'm adding the --httpKeepAliveTimeout=600000 flag, and we'll see what happens.
Still, this timeout cannot be the whole story: the report sounds like the problem does not occur with 1.532.3 and before, while it does occur with 1.554.3 and later. (The fact that for me, a remote 1.456 also triggers the problem might be an outlier...?) If the default for httpKeepAliveTimeout has not changed (and I think it is already 5000 for quite some time), and multiple users' networks didn't simultaneously and suddenly deteriorate, then there has to be some other cause.
Right?
Finally, we also have seen this problem occurring with 1.565.1 on our local LAN, which is when we got the timeout exception. I'm pretty sure our local LAN not have any real networking problems, but I could be wrong of course, and the timeout occurring on 1.565.1 could be a coincidence which could equally have happened on 1.532.3.
the whole story
Jetty replaced Winstone as embedded container in 1.535. (CLI and some messages were kept, so it still appears to be winstone, but it's not). So while they're supposed to behave similar, it should be no surprise that details are handled differently.
Sorry, I assumed it was desirable to close issues when we believe work is complete on them. Should I resolve issues and never close them as a general practice, or is that specific to certain types of bug reports?
It's done inconsistently so IMO value of closing is doubtful (at least on core issues). Unfortunately it'll prevent any subsequent editing at the moment, including labels. This can lead to annoying reopen-edit-close cycles when fixing labels, hence my recommendation above.
Maybe I'll be able to convince rtyler to allow editing labels on closed issues. In that case, closing isn't nearly as bad.
Does this happen for all (large) artifacts or only specific ones? Are there any warnings or errors in the log?