action #160628
closedcoordination #112862: [saga][epic] Future ideas for easy multi-machine handling: MM-tests as first-class citizens
coordination #111929: [epic] Stable multi-machine tests covering multiple physical workers
periodic multi-machine OSD test in https://gitlab.suse.de/openqa/scripts-ci/ does not trigger any jobs size:S
0%
Description
Observation¶
https://gitlab.suse.de/openqa/scripts-ci/-/jobs/2635998 is a passed job but apparently no jobs were ever triggered:
++ echo '$ bash -e < <(curl -s ${GITHUB_REPO_URL//github.com/raw.githubusercontent.com}/master/$OS_AUTOINST_SCRIPT)'
++ bash -e
+++ curl -s https://raw.githubusercontent.com/os-autoinst/scripts/master/openqa-schedule-mm-ping-test
+ openqa_url=http://openqa.suse.de
+ distri=sle
+ flavor=Server-DVD-Updates
+ arch=x86_64
+ version=15-SP5
+ test_name=ovs-client
++ mktemp
+ tmpfile=/tmp/tmp.QFrAbZzq4U
+ trap 'rm -f "$tmpfile"' EXIT
+ cat
++ openqa-cli api --host http://openqa.suse.de jobs version=15-SP5 scope=relevant arch=x86_64 flavor=Server-DVD-Updates test=ovs-client latest=1
++ jq -r '.jobs | map(select(.result == "passed")) | max_by(.settings.BUILD) .settings.HDD_1'
+ hdd=SLES-15-SP5-x86_64-mru-install-minimal-with-addons-Build20240519-1-Server-DVD-Updates-64bit.qcow2
+ rm -f /tmp/tmp.QFrAbZzq4U
Cleaning up project directory and file based variables 00:00
Job succeeded
Executing locally looks correct:
$ test_name=ovs-client flavor=Server-DVD-Updates version=15-SP5 distri=sle openqa_url=http://openqa.suse.de ./openqa-schedule-mm-ping-test
+(./openqa-schedule-mm-ping-test:4): main(): openqa_url=http://openqa.suse.de
+(./openqa-schedule-mm-ping-test:5): main(): distri=sle
+(./openqa-schedule-mm-ping-test:6): main(): flavor=Server-DVD-Updates
+(./openqa-schedule-mm-ping-test:7): main(): arch=x86_64
+(./openqa-schedule-mm-ping-test:8): main(): version=15-SP5
+(./openqa-schedule-mm-ping-test:9): main(): test_name=ovs-client
++(./openqa-schedule-mm-ping-test:11): main(): mktemp
+(./openqa-schedule-mm-ping-test:11): main(): tmpfile=/tmp/tmp.TfQJAGLQR9
+(./openqa-schedule-mm-ping-test:12): main(): trap 'rm -f "$tmpfile"' EXIT
+(./openqa-schedule-mm-ping-test:14): main(): cat
++(./openqa-schedule-mm-ping-test:53): main(): openqa-cli api --host http://openqa.suse.de jobs version=15-SP5 scope=relevant arch=x86_64 flavor=Server-DVD-Updates test=ovs-client latest=1
++(./openqa-schedule-mm-ping-test:53): main(): jq -r '.jobs | map(select(.result == "passed")) | max_by(.settings.BUILD) .settings.HDD_1'
+(./openqa-schedule-mm-ping-test:53): main(): hdd=SLES-15-SP5-x86_64-mru-install-minimal-with-addons-Build20240519-1-Server-DVD-Updates-64bit.qcow2
++(./openqa-schedule-mm-ping-test:54): main(): date -Im
+(./openqa-schedule-mm-ping-test:54): main(): openqa-cli schedule --monitor --host http://openqa.suse.de --param-file SCENARIO_DEFINITIONS_YAML=/tmp/tmp.TfQJAGLQR9 DISTRI=sle VERSION=15-SP5 FLAVOR=Server-DVD-Updates ARCH=x86_64 BUILD=2024-05-21T08:37+02:00 _GROUP_ID=0 HDD_1=SLES-15-SP5-x86_64-mru-install-minimal-with-addons-Build20240519-1-Server-DVD-Updates-64bit.qcow2
{"count":2,"failed":[],"ids":[14384617,14384618],"scheduled_product_id":2126387}
2 jobs have been created:
- http://openqa.suse.de/tests/14384617
- http://openqa.suse.de/tests/14384618
{"blocked_by_id":null,"id":14384617,"result":"none","state":"scheduled"}
Job state of job ID 14384617: scheduled, waiting … (delay: 10; waited 0s)
{"blocked_by_id":null,"id":14384617,"result":"none","state":"running"}
…
Job state of job ID 14384617: running, waiting … (delay: 10; waited 100s)
{"blocked_by_id":null,"id":14384617,"result":"passed","state":"done"}
{"blocked_by_id":null,"id":14384618,"result":"passed","state":"done"}
Reproduces consistently within gitlab CI
Steps to reproduce¶
- On https://gitlab.suse.de/openqa/scripts-ci/-/pipeline_schedules trigger pipeline "openqa-schedule-mm-ping-test -- osd (11 * * * *)"
- observe the error from the observation
Suggestions¶
- Try to reproduce the problem within the container environment registry.opensuse.org/devel/openqa/ca/containers/os-autoinst-scripts . Maybe the "time" prefix in the call
time openqa-cli schedule
is the problem?
Updated by okurz 7 months ago
- Copied from action #154624: Periodically running simple ping-check multi-machine tests on x86_64 covering multiple physical hosts on OSD alerting tools team on failures size:M added
Updated by livdywan 7 months ago
- Subject changed from periodic multi-machine OSD test in https://gitlab.suse.de/openqa/scripts-ci/ does not trigger any jobs to periodic multi-machine OSD test in https://gitlab.suse.de/openqa/scripts-ci/ does not trigger any jobs size:S
- Status changed from New to Workable
Updated by mkittler 7 months ago · Edited
The o3-related jobs are equally affected, e.g. https://gitlab.suse.de/openqa/scripts-ci/-/jobs/2636833.
Also not reproducible in the container (e.g. podman container run -v$PWD:/opt --rm -it registry.opensuse.org/devel/openqa/ca/containers/os-autoinst-scripts bash
):
…
+ cat
++ openqa-cli api --host https://openqa.opensuse.org jobs version=Tumbleweed scope=relevant arch=x86_64 flavor=DVD test=ping_client latest=1
++ jq -r '.jobs | map(select(.result == "passed")) | max_by(.settings.BUILD) .settings.HDD_1'
+ hdd=opensuse-Tumbleweed-x86_64-20240226-textmode@64bit.qcow2
++ date -Im
+ openqa-cli schedule --monitor --host https://openqa.opensuse.org --param-file SCENARIO_DEFINITIONS_YAML=/tmp/tmp.jShRofY0CL DISTRI=opensuse VERSION=Tumbleweed FLAVOR=DVD ARCH=x86_64 BUILD=2024-05-21T15:45+00:00 _GROUP_ID=0 HDD_1=opensuse-Tumbleweed-x86_64-20240226-textmode@64bit.qcow2
403 Forbidden
{"error":"no api key","error_status":403}
real 0m0.487s
user 0m0.321s
sys 0m0.043s
+ rm -f /tmp/tmp.jShRofY0CL
Updated by openqa_review 7 months ago
- Due date set to 2024-06-05
Setting due date based on mean cycle time of SUSE QE Tools
Updated by mkittler 7 months ago
I still have no idea. Even this version stops early (after the assignment of the json
variable; the assigned JSON looks good). So it basically already fails after the first openqa-cli
invocation and removing pipefail doesn't change anything.
Updated by tinita 7 months ago · Edited
I was curious and had a short look.
I was able to create a small reproducer, but am not sure yet what's the exact problem, only that it's < <(curl script)
vs. ./script
, and only if openqa-cli is called. Doing cat some.json | jq ...
is fine.
# cat script.sh
#!/bin/bash
set -eux -o pipefail
hdd=$(openqa-cli api --o3 jobs/4214329)
echo "HDD: $hdd"
echo "-------------- END -----------------"
# bash < <(cat script.sh)
++ openqa-cli api --o3 jobs/4214329
+ hdd='{"job":{"assets":...}}'
# ./script.sh
++ openqa-cli api --o3 jobs/4214329
+ hdd='{"job":{...}}'
+ echo 'HDD: {"job":{"assets":{...}}'
HDD: {"job":{"assets":{...}}
+ echo '-------------- END -----------------'
-------------- END -----------------
edit: and the failure same for:
# bash < script.sh
Updated by tinita 7 months ago · Edited
Aha! perl is reading STDIN, and with bash < ...
the whole process is using the same STDIN. So after the call to openqa-cli STDIN is empty, so no more bash lines to execute.
Reproduced with this:
% cat script.sh
#!/bin/bash
perl openqa-cli
echo "-------------- END -----------------"
% cat openqa-cli
#!/usr/bin/env perl
use v5.10;
say '{"some":"json"}';
sub data_from_stdin { # from OpenQA/Command.pm
vec(my $r = '', fileno(STDIN), 1) = 1;
return !-t STDIN && select($r, undef, undef, 0) ? join '', <STDIN> : '';
}
my $test = data_from_stdin();
say "data_from_stdin: '$test'";
% ./script.sh
{"some":"json"}
data_from_stdin: ''
-------------- END -----------------
% bash < script.sh
{"some":"json"}
data_from_stdin: 'echo "-------------- END -----------------"
'
So the fix is to just download the script to a temp file and then execute it :)
Updated by mkittler 7 months ago
- Status changed from In Progress to Feedback
I created and merged https://gitlab.suse.de/openqa/scripts-ci/-/merge_requests/6. It works, e.g. https://gitlab.suse.de/openqa/scripts-ci/-/jobs/2647057 now enters the scheduling/monitoring step. I also re-triggered the pipeline for OSD.
Updated by tinita 7 months ago
Just for some context: The bug started to appear when I made this change:
https://github.com/os-autoinst/scripts/commit/5e1d67a27a68600f8bdfca92bd151d32fc657226
Before that change both openqa-cli commands were in one line (the last), so emptying STDIN wasn't doing any harm.
Updated by tinita 7 months ago
- Copied to action #160820: openqa-cli: Do not read from STDIN unless explicitly requested size:S added