Actions
action #34750
closed[functional][y][easy]rsync process triggered by rsync.pl is stuck for many hours -> add timeout parameter to rsync call
Status:
Resolved
Priority:
Normal
Assignee:
Category:
Infrastructure
Target version:
Start date:
2018-04-12
Due date:
2018-05-22
% Done:
0%
Estimated time:
Difficulty:
easy
Description
Observation¶
New SLE15 build was expected today but was not there.
process table on osd:
root 1710 0.0 0.0 18880 1788 ? Ss Apr04 0:08 /usr/sbin/cron -n
root 32679 0.0 0.0 76004 3928 ? S 01:30 0:00 \_ /usr/sbin/CRON -n
geekote+ 32710 0.0 0.0 11872 2540 ? Ss 01:30 0:00 | \_ /bin/sh -c /opt/openqa-scripts/openqa-iso-sync-sles sle15_sp0 >> /var/log/openqa_rsync.log 2>&1
geekote+ 32730 0.0 0.0 11876 2712 ? S 01:30 0:00 | \_ /bin/bash /opt/openqa-scripts/openqa-iso-sync-sles sle15_sp0
geekote+ 4604 0.0 0.2 117804 44448 ? S 01:32 0:01 | \_ /usr/bin/perl -w /opt/openqa-scripts/rsync.pl --host openqa.suse.de --verbose --deprioritize-or-cancel sle15_sp0
geekote+ 14297 0.0 0.0 24636 3680 ? S 01:39 0:00 | \_ rsync --checksum --verbose dist.suse.de::repos/SUSE:/SLE-15:/GA:/TEST/images/iso/SLE-15-Installer-DVD-s390x-Build562.6-Media1.iso /var/lib/o
And it's 06:22 by now so the rsync-process is stuck for nearly 5 hours.
The logfile /var/log/openqa_rsync.log shows that build 562.6 was picked up for syncing but never finished. See attachment for details
$ sudo strace -f -p 14297
Process 14297 attached
select(4, [3], [], [3], {1, 697003}) = 0 (Timeout)
select(4, [3], [], [3], {60, 0}) = 0 (Timeout)
select(4, [3], [], [3], {60, 0}) = 0 (Timeout)
select(4, [3], [], [3], {60, 0}) = 0 (Timeout)
select(4, [3], [], [3], {60, 0}
$ sudo cat /proc/14297/stack
[<ffffffff8122a3a0>] poll_schedule_timeout+0x50/0x80
[<ffffffff8122adb6>] do_select+0x5b6/0x770
[<ffffffff8122b141>] core_sys_select+0x1d1/0x2f0
[<ffffffff8122b31b>] SyS_select+0xbb/0x100
[<ffffffff81640809>] entry_SYSCALL_64_fastpath+0x22/0xba
[<ffffffffffffffff>] 0xffffffffffffffff
Problem¶
I could not find more information. A web research only brought the insight this problem also appeared for some others.
Suggestion¶
rsync has a timeout option. By default it is 0 so disabled. Having a timeout sounds like a good idea.
Mitigation¶
I killed the rsync.pl and rsync processes and triggered manually with
/usr/bin/perl -w /opt/openqa-scripts/rsync.pl --host openqa.suse.de --verbose --add-existing --deprioritize-or-cancel sle15_sp0 | tee -a /var/log/openqa_rsync.log 2>&1
Files
Actions