-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
Jenkins 1.609
android-emulator-plugin 2.12
A bit more evidence in the long running saga of Android emulator failing to start properly. I have started a new issue but this is closely related to JENKINS-11952
Notes:
- The information below was gathered using plugin 2.12. That would fail often enough for me that I could get some slow debug.
- I'm pretty sure that what I've found will apply to 2.13 and onwards.
- I'm personally not convinced that switching back to the localhost:nnnn method of connection is 100% safe given my previous investigations for
JENKINS-12821but that is a different matter.
For those who do not want to try to understand the logs below then the short story is that I suspect that the android-emulator-plugin is causing the emulator<->ADB connection to lock up by connecting to the emulator ADB port to check whether it has started up. It maybe that the lockup would happen anyway but I think we can remove the waitForSocket (and subsequent single ADB connect - if using the emulator-nnnn connection) without any adverse effects on the startup procedure.
I do not have a sample pull request yet but I have a modified version of 2.12 running locally and will see if my once/twice daily lockups continue to happen.
Only the brave need read below...
Below is the TCP traffic to the emulator ADB port (5561 in this instance) on a startup that fails. I have a number of examples of this with the same symptoms.
richm@bishop:~$ tcpdump -r /tmp/150423_android1.pcap port 5561 reading from file /tmp/150423_android1.pcap, link-type LINUX_SLL (Linux cooked) => ADB server connects here (port 60975) 10:44:18.487165 IP localhost.60975 > localhost.5561: Flags [S], seq 3774551783, win 43690, options [mss 65495,sackOK,TS val 791658429 ecr 0,nop,wscale 7], length 0 10:44:18.487183 IP localhost.5561 > localhost.60975: Flags [R.], seq 0, ack 3774551784, win 0, length 0 10:44:25.247907 IP localhost.32808 > localhost.5561: Flags [S], seq 1102610480, win 43690, options [mss 65495,sackOK,TS val 791660119 ecr 0,nop,wscale 7], length 0 10:44:25.247950 IP localhost.5561 > localhost.32808: Flags [S.], seq 869467149, ack 1102610481, win 43690, options [mss 65495,sackOK,TS val 791660119 ecr 791660119,nop,wscale 7], length 0 10:44:25.247987 IP localhost.32808 > localhost.5561: Flags [.], ack 1, win 342, options [nop,nop,TS val 791660119 ecr 791660119], length 0 => ADB server sends start of CNXN handshake 10:44:25.252799 IP localhost.32808 > localhost.5561: Flags [P.], seq 1:32, ack 1, win 342, options [nop,nop,TS val 791660121 ecr 791660119], length 31 10:44:25.252886 IP localhost.5561 > localhost.32808: Flags [.], ack 32, win 342, options [nop,nop,TS val 791660121 ecr 791660121], length 0 => Jenkins android plugin tests for emulator adb port alive => When things go wrong the emulator seems to ignore this for a while 10:44:26.651163 IP localhost.32810 > localhost.5561: Flags [S], seq 571765708, win 43690, options [mss 65495,sackOK,TS val 791660470 ecr 0,nop,wscale 7], length 0 10:44:26.651200 IP localhost.5561 > localhost.32810: Flags [S.], seq 2069788705, ack 571765709, win 43690, options [mss 65495,sackOK,TS val 791660470 ecr 791660470,nop,wscale 7], length 0 10:44:26.651237 IP localhost.32810 > localhost.5561: Flags [.], ack 1, win 342, options [nop,nop,TS val 791660470 ecr 791660470], length 0 => Jenkins android plugin immediately closes connection => Emulator looks to be sitting waiting for CNXN handshake 10:44:26.651407 IP localhost.32810 > localhost.5561: Flags [F.], seq 1, ack 1, win 342, options [nop,nop,TS val 791660470 ecr 791660470], length 0 10:44:26.652528 IP localhost.5561 > localhost.32810: Flags [.], ack 2, win 342, options [nop,nop,TS val 791660471 ecr 791660470], length 0 10:45:41.012188 IP localhost.5561 > localhost.32808: Flags [F.], seq 1, ack 32, win 342, options [nop,nop,TS val 791679060 ecr 791660121], length 0 => ADB server gives up waiting for CNXN handshake 10:45:41.012442 IP localhost.32808 > localhost.5561: Flags [.], ack 2, win 342, options [nop,nop,TS val 791679061 ecr 791679060], length 0 10:45:41.013081 IP localhost.32808 > localhost.5561: Flags [F.], seq 32, ack 2, win 342, options [nop,nop,TS val 791679061 ecr 791679060], length 0 10:45:41.013127 IP localhost.5561 > localhost.32808: Flags [.], ack 33, win 342, options [nop,nop,TS val 791679061 ecr 791679061], length 0 10:45:42.010478 IP localhost.5561 > localhost.32810: Flags [F.], seq 1, ack 2, win 342, options [nop,nop,TS val 791679310 ecr 791660470], length 0 => Emulator finally closes the Jenkins android plugin connection 10:45:42.010602 IP localhost.32810 > localhost.5561: Flags [R], seq 571765710, win 0, length 0
At 10:44:25 netstat -nap shows the following connections to the emulator ADB port
Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 127.0.0.1:5561 0.0.0.0:* LISTEN 1693/emulator64-arm tcp 31 0 127.0.0.1:5561 127.0.0.1:32808 ESTABLISHED 1693/emulator64-arm tcp 0 0 127.0.0.1:32808 127.0.0.1:5561 ESTABLISHED 1666/adb
The second connection to port 5561 has not been made yet but the 31 bytes connection handshake are clearly visible waiting to be read in the TCP stream.
2 seconds later at 10:44:27 Jenkins has executed waitForSocket and opened the short lived connection to the emulator ADB port
The emulator side of this connection is in a CLOSE_WAIT state because the emulator has not handled the connection to the ADB port with no data being handled. The Jenkins side of the connection is sitting in FIN_WAIT2 because Jenkins has closed the file descriptor associated with the socket and the kernel is waiting for the emulator to do the same at its end.
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 127.0.0.1:5561 0.0.0.0:* LISTEN 1693/emulator64-arm tcp 31 0 127.0.0.1:5561 127.0.0.1:32808 ESTABLISHED 1693/emulator64-arm tcp 1 0 127.0.0.1:5561 127.0.0.1:32810 CLOSE_WAIT 1693/emulator64-arm tcp 0 0 127.0.0.1:32808 127.0.0.1:5561 ESTABLISHED 1666/adb tcp6 0 0 127.0.0.1:32810 127.0.0.1:5561 FIN_WAIT2 -
The FIN_WAIT2 state hangs around until 10:45:23 as per the normal TCP stack behaviour
At 10:45:40 we are left with the proper ADB server to ADB emulator connection with 31 bytes of data unread and the rogue Jenkins connection in CLOSE_WAIT
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 127.0.0.1:5561 0.0.0.0:* LISTEN 1693/emulator64-arm tcp 31 0 127.0.0.1:5561 127.0.0.1:32808 ESTABLISHED 1693/emulator64-arm tcp 1 0 127.0.0.1:5561 127.0.0.1:32810 CLOSE_WAIT 1693/emulator64-arm tcp 0 0 127.0.0.1:32808 127.0.0.1:5561 ESTABLISHED 1666/adb
At 10:45:41 the ADB server connection is now closed and in TIME_WAIT state
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 127.0.0.1:5561 0.0.0.0:* LISTEN 1693/emulator64-arm tcp 0 0 127.0.0.1:5561 127.0.0.1:32808 TIME_WAIT - tcp 1 0 127.0.0.1:5561 127.0.0.1:32810 CLOSE_WAIT 1693/emulator64-arm
Then one second later at 10:45:42 the rogue connection is now properly closed in TIME_WAIT but we are left with no connection to the ADB server.
Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 127.0.0.1:5561 0.0.0.0:* LISTEN 1693/emulator64-arm tcp 0 0 127.0.0.1:5561 127.0.0.1:32808 TIME_WAIT -
In the working case similar connections happen but the Jenkins connection gets closed much earlier within about 10 seconds. So it looks to me like the emulator ADB infrastructure gets clogged up. I have no evidence that it is the Jenkins connection that is doing that yet but it seems to be a very likely candidate.
I did a quick test and removed the waitForSocket and also the single adb connect command. That did not break the android-emulator-plugin startup process but that was based on only a couple of jobs. That did show that the device was office at the start of the process (i.e. before the emulator had opened the ADB port) but that progressed to the standard waiting for boot complete after a few seconds.
[android] Using Android SDK: /opt/android/android-sdk-linux_x86 $ /opt/android/android-sdk-linux_x86/platform-tools/adb start-server * daemon not running. starting it now on port 6563 * * daemon started successfully * $ /opt/android/android-sdk-linux_x86/platform-tools/adb start-server [android] Starting Android emulator $ /opt/android/android-sdk-linux_x86/tools/emulator -no-boot-anim -ports 5562,5563 -prop persist.sys.language=en -prop persist.sys.country=GB -avd hudson_en-GB_160_WVGA_android-7 -no-snapshot-load -no-snapshot-save -no-window Failed to Initialize backend EGL display Failed to create secure directory (/run/user/1000/pulse): Permission denied emulator: WARNING: Could not initialize OpenglES emulation, using software renderer. emulator: warning: opening audio output failed [android] Waiting for emulator to finish booting... $ /opt/android/android-sdk-linux_x86/platform-tools/adb -s emulator-5562 shell getprop dev.bootcomplete error: device offline $ /opt/android/android-sdk-linux_x86/platform-tools/adb -s emulator-5562 shell getprop dev.bootcomplete error: device offline $ /opt/android/android-sdk-linux_x86/platform-tools/adb -s emulator-5562 shell getprop dev.bootcomplete error: device offline $ /opt/android/android-sdk-linux_x86/platform-tools/adb -s emulator-5562 shell getprop dev.bootcomplete error: device offline $ /opt/android/android-sdk-linux_x86/platform-tools/adb -s emulator-5562 shell getprop dev.bootcomplete $ /opt/android/android-sdk-linux_x86/platform-tools/adb -s emulator-5562 shell getprop dev.bootcomplete $ /opt/android/android-sdk-linux_x86/platform-tools/adb -s emulator-5562 logcat -v time [android] Attempting to unlock emulator screen $ /opt/android/android-sdk-linux_x86/platform-tools/adb -s emulator-5562 shell input keyevent 82 $ /opt/android/android-sdk-linux_x86/platform-tools/adb -s emulator-5562 shell input keyevent 4 [android] Emulator is ready for use (took 77 seconds)