Project

General

Profile

action #31591

[tools]openqaworker-arm-2 unreliable in downloading assets - sync of tests fails with Failed to rsync tests: exit 139

Added by nicksinger over 4 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Infrastructure
Target version:
-
Start date:
2018-02-09
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

https://openqa.suse.de/tests/1462370/file/worker-log.txt
https://openqa.suse.de/tests/1462386/file/worker-log.txt
https://openqa.suse.de/tests/1462209/file/worker-log.txt
https://openqa.suse.de/tests/1462211/file/worker-log.txt
https://openqa.suse.de/tests/1462212/file/worker-log.txt

All of them incomplete after a sequence like this:

[2018-02-09T13:07:09.0264 CET] [info] got job 1462212: 01462212-sle-15-Installer-DVD-aarch64-Build451.2-yast_no_self_update@aarch64
[2018-02-09T13:07:09.0302 CET] [info] OpenQA::Worker::Cache: Initialized with http://openqa.suse.de at /var/lib/openqa/cache, current size is 52942487552
[2018-02-09T13:07:09.0304 CET] [info] Downloading SLE-15-Installer-DVD-aarch64-Build451.2-Media1.iso from http://openqa.suse.de/tests/1462212/asset/iso/SLE-15-Installer-DVD-aarch64-Build451.2-Media1.iso
[2018-02-09T13:07:09.0353 CET] [info] Waiting for subprocess
[…]
[2018-02-09T13:07:20.0923 CET] [info] Waiting for subprocess
[2018-02-09T13:07:21.0423 CET] [warn] job is missing files, releasing job
[2018-02-09T13:07:21.0461 CET] [info] uploading autoinst-log.txt
[2018-02-09T13:07:21.0501 CET] [info] uploading worker-log.txt

Possible solution:

  1. Disable worker
  2. Investigate further why this happens. Maybe disc space? (couldn't check since not salted)

Related issues

Related to openQA Project - action #31576: [tools][aarch64] "setup failure: Failed to rsync tests: exit 134" - so what's going on?Resolved2018-02-09

History

#1 Updated by coolo over 4 years ago

  • Project changed from openQA Project to openQA Tests
  • Category set to Infrastructure

#2 Updated by szarate over 4 years ago

  • Subject changed from openqaworker-arm-2 unreliable in downloading assets - many jobs incomplete therefore to openqaworker-arm-2 unreliable in downloading assets - sync of tests fails with Failed to rsync tests: exit 139

All these jobs are most likely to be caused by result: setup failure: Failed to rsync tests: exit 139, looked at the jobs of the recent build.

#3 Updated by szarate over 4 years ago

Feb 13 04:05:26 openqaworker-arm-2 kernel: worker[13195]: unhandled level 2 translation fault (11) at 0x00000184, esr 0x92000006
Feb 13 04:05:26 openqaworker-arm-2 kernel: pgd = ffff8106fa24b000
Feb 13 04:05:26 openqaworker-arm-2 kernel: [00000184] *pgd=00000004d420b003, *pud=000000040cc4d003, *pmd=0000000000000000
Feb 13 04:05:26 openqaworker-arm-2 kernel:
Feb 13 04:05:26 openqaworker-arm-2 kernel: CPU: 89 PID: 13195 Comm: worker Tainted: G        W          4.4.103-6.38-default #1
Feb 13 04:05:26 openqaworker-arm-2 kernel: Hardware name: GIGABYTE R270-T64-00/MT60-SC4-00, BIOS T32 03/03/2017
Feb 13 04:05:26 openqaworker-arm-2 kernel: task: ffff810210264100 ti: ffff8004d4cd0000 task.ti: ffff8004d4cd0000
Feb 13 04:05:26 openqaworker-arm-2 kernel: PC is at 0x4ac378
Feb 13 04:05:26 openqaworker-arm-2 kernel: LR is at 0x4a4184
Feb 13 04:05:26 openqaworker-arm-2 kernel: pc : [<00000000004ac378>] lr : [<00000000004a4184>] pstate: 60000000
Feb 13 04:05:26 openqaworker-arm-2 kernel: sp : 0000ffffdb930d80
Feb 13 04:05:26 openqaworker-arm-2 kernel: x29: 0000ffffdb930d80 x28: 0000000000000000
Feb 13 04:05:26 openqaworker-arm-2 kernel: x27: 00000000004ac364 x26: 000000001013a010
Feb 13 04:05:26 openqaworker-arm-2 kernel: x25: 0000000000000018 x24: 0000000000018019
Feb 13 04:05:26 openqaworker-arm-2 kernel: x23: 00000000136fef60 x22: 0000000000000800
Feb 13 04:05:26 openqaworker-arm-2 kernel: x21: 0000000000000800 x20: 00000000136fff50
Feb 13 04:05:26 openqaworker-arm-2 kernel: x19: 00000000136ffef0 x18: 00000000000026b3
Feb 13 04:05:26 openqaworker-arm-2 kernel: x17: 0000ffff95c6a0a8 x16: 00000000005804a8
Feb 13 04:05:26 openqaworker-arm-2 kernel: x15: 0000ffff95d56508 x14: 0000000000404fa8
Feb 13 04:05:26 openqaworker-arm-2 kernel: x13: ffffffff00000000 x12: 0000000000000000
Feb 13 04:05:26 openqaworker-arm-2 kernel: x11: 0000000000000020 x10: 4f5e424aff524446
Feb 13 04:05:26 openqaworker-arm-2 kernel: x9 : 0000000000584000 x8 : 00000000137041d0
Feb 13 04:05:26 openqaworker-arm-2 kernel: x7 : 0000ffffdb930c98 x6 : 0000000000000000
Feb 13 04:05:26 openqaworker-arm-2 kernel: x5 : 0000000000000002 x4 : 0000000000000800
Feb 13 04:05:26 openqaworker-arm-2 kernel: x3 : 0000000015ec8b40 x2 : 0000000000000001
Feb 13 04:05:26 openqaworker-arm-2 kernel: x1 : 0000000000000178 x0 : 000000001013a010
Feb 13 04:05:26 openqaworker-arm-2 kernel:

#4 Updated by thehejik over 4 years ago

Try to setup the worker to use hugetlbfs. https://github.com/Linaro/odp-thunderx#hugetlbfs

#5 Updated by szarate over 4 years ago

  • Related to action #31576: [tools][aarch64] "setup failure: Failed to rsync tests: exit 134" - so what's going on? added

#6 Updated by metan over 4 years ago

For what it's worth I'm seeing several different problems with latest build (Beta7 candidate 456.1) in openQA:

Machine failed to shutdown?
https://openqa.suse.de/tests/1470982#step/shutdown_ltp/6
https://openqa.suse.de/tests/1470982#step/shutdown_ltp/5

Machine frozen during a test run:
https://openqa.suse.de/tests/1470971#step/ADSP079/6
https://openqa.suse.de/tests/1470994#step/ext4-nsec-timestamps/6

Machine haven't made it to the bootloader:
https://openqa.suse.de/tests/1471007#step/boot_ltp/24
https://openqa.suse.de/tests/1470976#step/boot_ltp/45
https://openqa.suse.de/tests/1470979#step/boot_ltp/24
https://openqa.suse.de/tests/1470980#step/boot_ltp/24
https://openqa.suse.de/tests/1470973#step/boot_ltp/45
etc...

And here we have obvious memory corruption:
https://openqa.suse.de/tests/1470907#step/execute_test_run/8

If you look at what happens there it's obvious that somehow we got a line from grub menu into the script_output(), I do not know how that happened but I suppose that something very very wrong happened.

#7 Updated by okurz about 4 years ago

  • Subject changed from openqaworker-arm-2 unreliable in downloading assets - sync of tests fails with Failed to rsync tests: exit 139 to [functional][u]openqaworker-arm-2 unreliable in downloading assets - sync of tests fails with Failed to rsync tests: exit 139
  • Target version set to Milestone 16

#8 Updated by okurz about 4 years ago

  • Subject changed from [functional][u]openqaworker-arm-2 unreliable in downloading assets - sync of tests fails with Failed to rsync tests: exit 139 to [tools]openqaworker-arm-2 unreliable in downloading assets - sync of tests fails with Failed to rsync tests: exit 139
  • Target version deleted (Milestone 16)

#9 Updated by coolo about 4 years ago

  • Status changed from New to Resolved

from what I can tell, arm-2 is stable since it was reinstalled. Whatever it was :(

Also available in: Atom PDF