Project

General

Profile

Actions

action #34750

closed

[functional][y][easy]rsync process triggered by rsync.pl is stuck for many hours -> add timeout parameter to rsync call

Added by okurz about 6 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Infrastructure
Start date:
2018-04-12
Due date:
2018-05-22
% Done:

0%

Estimated time:
Difficulty:
easy

Description

Observation

New SLE15 build was expected today but was not there.

process table on osd:

root      1710  0.0  0.0  18880  1788 ?        Ss   Apr04   0:08 /usr/sbin/cron -n
root     32679  0.0  0.0  76004  3928 ?        S    01:30   0:00  \_ /usr/sbin/CRON -n
geekote+ 32710  0.0  0.0  11872  2540 ?        Ss   01:30   0:00  |   \_ /bin/sh -c     /opt/openqa-scripts/openqa-iso-sync-sles sle15_sp0 >> /var/log/openqa_rsync.log 2>&1
geekote+ 32730  0.0  0.0  11876  2712 ?        S    01:30   0:00  |       \_ /bin/bash /opt/openqa-scripts/openqa-iso-sync-sles sle15_sp0
geekote+  4604  0.0  0.2 117804 44448 ?        S    01:32   0:01  |           \_ /usr/bin/perl -w /opt/openqa-scripts/rsync.pl --host openqa.suse.de --verbose --deprioritize-or-cancel sle15_sp0
geekote+ 14297  0.0  0.0  24636  3680 ?        S    01:39   0:00  |               \_ rsync --checksum --verbose dist.suse.de::repos/SUSE:/SLE-15:/GA:/TEST/images/iso/SLE-15-Installer-DVD-s390x-Build562.6-Media1.iso /var/lib/o

And it's 06:22 by now so the rsync-process is stuck for nearly 5 hours.

The logfile /var/log/openqa_rsync.log shows that build 562.6 was picked up for syncing but never finished. See attachment for details

$ sudo strace -f -p 14297
Process 14297 attached
select(4, [3], [], [3], {1, 697003})    = 0 (Timeout)
select(4, [3], [], [3], {60, 0})        = 0 (Timeout)
select(4, [3], [], [3], {60, 0})        = 0 (Timeout)
select(4, [3], [], [3], {60, 0})        = 0 (Timeout)
select(4, [3], [], [3], {60, 0}
$ sudo cat /proc/14297/stack
[<ffffffff8122a3a0>] poll_schedule_timeout+0x50/0x80
[<ffffffff8122adb6>] do_select+0x5b6/0x770
[<ffffffff8122b141>] core_sys_select+0x1d1/0x2f0
[<ffffffff8122b31b>] SyS_select+0xbb/0x100
[<ffffffff81640809>] entry_SYSCALL_64_fastpath+0x22/0xba
[<ffffffffffffffff>] 0xffffffffffffffff

Problem

I could not find more information. A web research only brought the insight this problem also appeared for some others.

Suggestion

rsync has a timeout option. By default it is 0 so disabled. Having a timeout sounds like a good idea.

Mitigation

I killed the rsync.pl and rsync processes and triggered manually with

/usr/bin/perl -w /opt/openqa-scripts/rsync.pl --host openqa.suse.de --verbose --add-existing --deprioritize-or-cancel sle15_sp0 | tee -a /var/log/openqa_rsync.log 2>&1

Files

openqa_rsync.log.xz (168 KB) openqa_rsync.log.xz okurz, 2018-04-12 04:26
Actions

Also available in: Atom PDF