-
Bug
-
Resolution: Fixed
-
Blocker
-
None
-
Jenkins server 2.319.1
-
Powered by SuggestiMate -
2.333
After upgrade to 2.319.1 (from 2.277.2), the command "service jenkins start" on the Linux console hangs (we waited for ~30 mins). It doesnt return back to shell. This currently prevents the ansible playbook from progressing since it gets stuck at this task waiting for a return to shell.
previously, the 'service jenkins start' command should used to return to shell with 5 seconds.
Eg:
service jenkins start
[ABC@ip-ABC]# service jenkins start Starting Jenkins Session terminated, killing shell... ...killed. [ABC@ ~]#
Debugging the init.d script for jenkins, the daemon command which used to return back to the shell, doesnt do that anymore.
echo -n "Starting Jenkins " daemon --user "$JENKINS_USER" --pidfile "$JENKINS_PID_FILE" "$JAVA_CMD" $PARAMS > /dev/null RETVAL=$?
Values of the variables:
JENKINS_USER = jenkins
JENKINS_PID_FILE = /var/run/jenkins.pid
JAVA_CMD = /etc/alternatives/java
PARAMS = -Xmx3883m -Xms3883m -Djava.awt.headless=true -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=8010 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dhudson.model.DirectoryBrowserSupport.CSP="default-src 'self'; connect-src 'self' 'unsafe-inline' storybook.js.org; script-src 'self' 'unsafe-inline' 'unsafe-eval'; style-src 'self' 'unsafe-inline'; font-src 'self' 'unsafe-inline';" -DJENKINS_HOME=/var/lib/jenkins -jar /usr/lib/jenkins/jenkins.war and --logfile=/var/log/jenkins/jenkins.log --webroot=/var/cache/jenkins/war --daemon --httpPort=8080 --debug=5 --handlerCountMax=100 --handlerCountMaxIdle=20
We see that while the service start command is stuck, it is able to start Jenkins successfully.
We confirmed:
- Jenkins process (ps output) is up.
- /var/log/jenkins/jenkins.log has "Jenkins is fully up and running"
[JENKINS-67487] Start Jenkins command starts but does not return back to shell on Amazon Linux 2
Hi Mark,
Thanks for quick response. We use Amzn Linux1 currently.
we installed jenkins using yum , version 2.277.3 a while ago and then upgraded to 2.319.1 via the UI upgrade option in Dec'21.
We use JDK1.8
[ABC@ABC ansible]# /etc/alternatives/java -version openjdk version "1.8.0_312" OpenJDK Runtime Environment (build 1.8.0_312-b07) OpenJDK 64-Bit Server VM (build 25.312-b07, mixed mode) [root@ip-10-224-53-73 ansible]#
We have other jenkins instances that run on the same underlying OS + installed/upgraded Jenkins the same way. We are seeing this issue only with instances on version 2.319.1 Jenkins.
I don't test with Amazon Linux 1. I don't know anyone else that tests with Amazon Linux 1. Amazon states on their web site that Amazon Linux AMI (Amazon Linux 1) entered maintenance mode in Dec 2020.
I don't plan to investigate further on Amazon Linux 1. If you see a similar failure on Amazon Linux 2, that would be more likely to be investigated.
I understand. Is "service jenkins start" still the recommended way of starting up Jenkins on AMZN Linux 2?
I believe that Amazon Linux 2 is based on systemd, so it can start with either a service command or with systemctl
Thanks. I am able to replicate the issue on AMZN linux 2 (ami-0ed9277fb7eb570c9) as well. Let me know if you need more information.
Jenkins 2.277 service command was able to complete successfully within a few seconds
-rw-r--r-- 1 root root 70887043 Apr 20 2021 jenkins.war [root@ip-XXXX ansible]# service jenkins start Starting jenkins (via systemctl): [ OK ] [root@ip-XXXXX ansible]#
then, stopped Jenkins 2.277 and copied war file for 2.319.1
[root@ip-XXXXX jenkins]# service jenkins stop Stopping jenkins (via systemctl): [ OK ] [root@ip-XXXXX jenkins]# [root@ip-10-223-68-17 jenkins]# ls -trl total 69228 -rw-r--r-- 1 root root 70887043 Apr 20 2021 jenkins.war [root@ip-XXXXX jenkins]# mv jenkins.war jenkins.war.backup [root@ip-XXXXX jenkins]# cp -p /home/ec2-user/jenkins-2.319.1.war jenkins.war [root@ip-XXXXX jenkins]# ls -trl total 139784 -rw-r--r-- 1 root root 70887043 Apr 20 2021 jenkins.war.backup -rw-rw-r-- 1 ec2-user ec2-user 72247484 Jan 2 17:09 jenkins.war
Then tried to start Jenkins 2.319.1(Start command was stuck for a while and then timed out 5 mins later)
[root@ip-XXXXX jenkins]# service jenkins start Starting jenkins (via systemctl): Job for jenkins.service failed because a timeout was exceeded. See "systemctl status jenkins.service" and "journalctl -xe" for details. [FAILED] [root@ip-XXXXX jenkins]# [root@ip-XXXXX jenkins]# systemctl status jenkins.service ● jenkins.service - LSB: Jenkins Automation Server Loaded: loaded (/etc/rc.d/init.d/jenkins; bad; vendor preset: disabled) Active: failed (Result: timeout) since Sun 2022-01-02 17:15:38 EST; 31s ago Docs: man:systemd-sysv-generator(8) Process: 19038 ExecStop=/etc/rc.d/init.d/jenkins stop (code=exited, status=0/SUCCESS) Process: 19538 ExecStart=/etc/rc.d/init.d/jenkins start (code=killed, signal=TERM) Tasks: 58 Memory: 1.0G CGroup: /system.slice/jenkins.service ├─19543 runuser -s /bin/bash jenkins -c ulimit -S -c 0 >/dev/null 2>&1 ; /etc/alternatives/java -Djav... ├─19544 bash -c ulimit -S -c 0 >/dev/null 2>&1 ; /etc/alternatives/java -Djava.awt.headless=true -Dco... └─19545 /etc/alternatives/java -Djava.awt.headless=true -Dcom.sun.management.jmxremote -Dcom.sun.mana...Jan 02 17:10:38 ip-10-223-68-17.ec2.internal systemd[1]: Starting LSB: Jenkins Automation Server... Jan 02 17:10:38 ip-10-223-68-17.ec2.internal runuser[19543]: pam_unix(runuser:session): session opened for us...=0) Jan 02 17:15:38 ip-10-223-68-17.ec2.internal systemd[1]: jenkins.service start operation timed out. Terminating. Jan 02 17:15:38 ip-10-223-68-17.ec2.internal systemd[1]: Failed to start LSB: Jenkins Automation Server. Jan 02 17:15:38 ip-10-223-68-17.ec2.internal systemd[1]: Unit jenkins.service entered failed state. Jan 02 17:15:38 ip-10-223-68-17.ec2.internal systemd[1]: jenkins.service failed. Hint: Some lines were ellipsized, use -l to show in full.
One thing to note, the Jenkins server is actually up.
[root@ip-XXXXX jenkins]# tail /var/log/jenkins/jenkins.log
2022-01-02 22:10:59.635+0000 [id=32] INFO jenkins.InitReactorRunner$1#onAttained: Completed initialization
2022-01-02 22:10:59.813+0000 [id=23] INFO o.j.p.skipcert.ItemListenerImpl#onLoaded: Bypassing certificate check
2022-01-02 22:10:59.838+0000 [id=23] INFO hudson.WebAppMain$3#run: Jenkins is fully up and running
ETC java output
[root@ip-XXXXX jenkins]# /etc/alternatives/java -version openjdk version "1.8.0_312" OpenJDK Runtime Environment (build 1.8.0_312-b07) OpenJDK 64-Bit Server VM (build 25.312-b07, mixed mode) [root@ip-XXXXX jenkins]#
Repeated on CentOS Linux release 7.9.2009 (Core) using openjdk version "1.8.0_312" OpenJDK Runtime Environment (build 1.8.0_312-b07) OpenJDK 64-Bit Server VM (build 25.312-b07, mixed mode)
Upgrading from 2.303.3 the same issue was encountered, further investigations revealed that in the /etc/rc.d/init.d/jenkins script the command:
daemon --user "$JENKINS_USER" --pidfile "$JENKINS_PID_FILE" $JAVA_CMD $PARAMS > /dev/null
is returning a value of 1, even though Jenkins is active and started per logs the return code of the command above is not being formally recognized as 0
RETVAL=$? <- This variable is not being set properly
The next lines (condensed):
if [ $RETVAL = 0 ]; then
success
else
failure
Every time Jenkins starts, it is not able to communicate that it is up, and then the service stops because it believes the service is not up, when in fact, the service is up, Jenkins is running correctly, but the service does not know that it is up properly.
Adding a command like echo "test" before the RETVAL variable, gives a 0 return code, which then the Jenkins service completes the start successfully.
Thanks lg17031 for doing that research! Much appreciated.
Would you like to submit a pull request to fix the issue?
To further follow up on this Centos 7 this is from the journalctl output from the startup. Not sure what the return code is for the WARNING message, that could be what is not returning a 0 return code.
Jan 11 14:24:43 instance jenkins[970]: + RETVAL=0
Jan 11 14:24:43 instance jenkins[970]: + case "$1" in
Jan 11 14:24:43 instance jenkins[970]: + echo -n 'Starting Jenkins '
Jan 11 14:24:43 instance jenkins[970]: Starting Jenkins + daemon --user jenkins --pidfile /var/run/jenkins.pid /bin/java -Djava.awt.headless=true -Djenkins.install.runSetupWizard=false -Djava.net.preferIPv4Stack=true '-Dhudson.model.
Jan 11 14:24:43 instance jenkins[970]: + local gotbase= force= nicelevel corelimit
Jan 11 14:24:43 instance jenkins[970]: + local pid base= user= nice= bg= pid_file=
Jan 11 14:24:43 instance jenkins[970]: + local cgroup=
Jan 11 14:24:43 instance jenkins[970]: + nicelevel=0
Jan 11 14:24:43 instance jenkins[970]: + '[' --user '!=' -user ']'
Jan 11 14:24:43 instance jenkins[970]: + case $1 in
Jan 11 14:24:43 instance jenkins[970]: + user=jenkins
Jan 11 14:24:43 instance jenkins[970]: + shift 2
Jan 11 14:24:43 instance jenkins[970]: + '[' --pidfile '!=' -pidfile ']'
Jan 11 14:24:43 instance jenkins[970]: + case $1 in
Jan 11 14:24:43 instance jenkins[970]: + pid_file=/var/run/jenkins.pid
Jan 11 14:24:43 instance jenkins[970]: + shift 2
Jan 11 14:24:43 instance jenkins[970]: + '[' /bin/java '!=' /bin/java ']'
Jan 11 14:24:43 instance jenkins[970]: + '[' -z '' ']'
Jan 11 14:24:43 instance jenkins[970]: + base=java
Jan 11 14:24:43 instance jenkins[970]: + __pids_var_run java /var/run/jenkins.pid
Jan 11 14:24:43 instance jenkins[970]: + local base=java
Jan 11 14:24:43 instance jenkins[970]: + local pid_file=/var/run/jenkins.pid
Jan 11 14:24:43 instance jenkins[970]: ++ /usr/bin/dirname /var/run/jenkins.pid
Jan 11 14:24:43 instance jenkins[970]: + local pid_dir=
Jan 11 14:24:43 instance jenkins[970]: + local binary=
Jan 11 14:24:43 instance jenkins[970]: + '[' -d '' -a '!' -r '' ']'
Jan 11 14:24:43 instance jenkins[970]: + pid=
Jan 11 14:24:43 instance jenkins[970]: + '[' -f /var/run/jenkins.pid ']'
Jan 11 14:24:43 instance jenkins[970]: + return 3
Jan 11 14:24:43 instance jenkins[970]: + '[' -n '' -a -z '' ']'
Jan 11 14:24:43 instance jenkins[970]: + corelimit='ulimit -S -c 0'
Jan 11 14:24:43 instance jenkins[970]: + '[' -n '' ']'
Jan 11 14:24:43 instance jenkins[970]: + '[' -n '' ']'
Jan 11 14:24:43 instance jenkins[970]: + '[' serial = verbose -a -z '' ']'
Jan 11 14:24:43 instance jenkins[970]: + '[' -z jenkins ']'
Jan 11 14:24:43 instance jenkins[970]: + runuser -s /bin/bash jenkins -c 'ulimit -S -c 0 >/dev/null 2>&1 ; /bin/java -Djava.awt.headless=true -Djenkins.install.runSetupWizard=false -Djava.net.preferIPv4Stack=true -Dhudson.model.Direc
Jan 11 14:24:49 instance jenkins[970]: WARNING: An illegal reflective access operation has occurred
Jan 11 14:24:49 instance jenkins[970]: WARNING: Illegal reflective access by com.google.inject.internal.cglib.core.$ReflectUtils$2 (file:/var/cache/jenkins/war/WEB-INF/lib/guice-4.0.jar) to method java.lang.ClassLoader.defineClass(ja
Jan 11 14:24:49 instance jenkins[970]: WARNING: Please consider reporting this to the maintainers of com.google.inject.internal.cglib.core.$ReflectUtils$2
Jan 11 14:24:49 instance jenkins[970]: WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
Jan 11 14:24:49 instance jenkins[970]: WARNING: All illegal access operations will be denied in a future release
Jan 11 14:29:43 instance systemd[1]: jenkins.service start operation timed out. Terminating.
Jan 11 14:29:43 instance systemd[1]: Failed to start LSB: Jenkins Automation Server.
Jenkins 2.333 removes the dependency on the daemon program. It should resolve this issue.
I checked that pull request 266 also resolves the issue as part of the switch from System V init to systemd.
Hello facing the same issue on Centos 7, modified /etc/rc.d/init.d/jenkins,
added some echo before RETVAL=$ but doesnt work for me, have you got any other ideas?
666999 as a short term workaround, you can use RETVAL=0 to assure that the script reports success.
i ve changed it no 0, but nothing happened, ive killed the process then started it via systemctl and it stuck again but jenkins is up and running so i made another change commented the RETVAL=0 total is the same
RETVAL=0
case "$1" in
start)
echo -n "Starting Jenkins "
daemon --user "$JENKINS_USER" --pidfile "$JENKINS_PID_FILE" $JAVA_CMD $PARAMS > /dev/null
RETVAL=0
echo -n "RETVAL"
if [ $RETVAL = 0 ]; then
success
echo > "$JENKINS_PID_FILE" # just in case we fail to find it
MY_SESSION_ID=`/bin/ps h -o sess -p $$`
echo -n "my sesion ID"
- get PID
/bin/ps hww -u "$JENKINS_USER" -o sess,ppid,pid,cmd | \
while read sess ppid pid cmd; do
echo -n "while start"
[ "$ppid" = 1 ] || continue - this test doesn't work because Jenkins sets a new Session ID
- [ "$sess" = "$MY_SESSION_ID" ] || continue
echo "$cmd" | grep $JENKINS_WAR > /dev/null
[ $? = 0 ] || continue
echo -n "before found PID" - found a PID
echo $pid > "$JENKINS_PID_FILE"
echo -n $pid
done
That error seems to indicate that you have other customization included in your Jenkins service unit. Fixing those messages seems like a good place to start.
actually i got those configurations from ex devops, and have not made any changes, so here are they,
do you think i should make any changes in it ?
666999 I'm not a systemd expert. You could compare those settings to a fresh installation to decide if you need to consider changes.
I've checked with Debian 10 (buster) and Debian 11 (bullseye) with Jenkins 2.319.1 and cannot duplicate the problem.
What operating system are you using?
Is it running the most recent patches?
How did you install Jenkins?
What is the output of /etc/alternatives/java -version?