Project

General

Profile

action #174601

Updated by tinita 18 days ago

## Observation 
 While looking into https://progress.opensuse.org/issues/174580 and why openqa-gru failed today, I found a lot and repeated stack traces in gru's journal all looking similar to: 

 ``` 
 Dec 19 13:26:23 openqa openqa-gru[5066]: openqa-clone-job (81 /opt/os-autoinst-scripts/openqa-investigate): (openqa-clone-job --json-output --skip-chained-deps --max-depth 0 --parental-inheritance --within-instance https://openqa.suse.de/tests/16252243 TEST+=:investigate:last_good_tests:7c3e460816d9f4305b288674abdc15d295158b49 _TRIGGER_JOB_DONE_HOOK=1 _GROUP_ID=0 BUILD= CASEDIR=https://github.com/os-autoinst/os-autoinst-distri-opensuse.git#7c3e460816d9f4305b288674abdc15d295158b49 OPENQA_INVESTIGATE_ORIGIN=https://openqa.suse.de/t16252243) stderr: >>>Current job 16252243 will fail, because the repositories for the below updates are unavailable<<< 

 Dec 19 13:26:23 openqa openqa-gru[5066]: openqa-clone-job (81 /opt/os-autoinst-scripts/openqa-investigate): (openqa-clone-job --json-output --skip-chained-deps --max-depth 0 --parental-inheritance --within-instance https://openqa.suse.de/tests/16252243 TEST+=:investigate:last_good_tests:7c3e460816d9f4305b288674abdc15d295158b49 _TRIGGER_JOB_DONE_HOOK=1 _GROUP_ID=0 BUILD= CASEDIR=https://github.com/os-autoinst/os-autoinst-distri-opensuse.git#7c3e460816d9f4305b288674abdc15d295158b49 OPENQA_INVESTIGATE_ORIGIN=https://openqa.suse.de/t16252243) rc: 255 >>><<< 

 Dec 19 13:26:24 openqa openqa-gru[5129]: Current job 16252241 will fail, because the repositories for the below updates are unavailable 
 Dec 19 13:26:24 openqa openqa-gru[5129]: [ 
 Dec 19 13:26:24 openqa openqa-gru[5129]:     "http://download.suse.de/ibs/SUSE:/Maintenance:/36747/SUSE_Updates_SLE-Module-Basesystem_15-SP5_x86_64/", 
 Dec 19 13:26:24 openqa openqa-gru[5129]: ] at /usr/share/openqa/script/../lib/OpenQA/Script/CloneJobSUSE.pm line 39. 
 Dec 19 13:26:24 openqa openqa-gru[5072]: Traceback (most recent call last): 
 Dec 19 13:26:24 openqa openqa-gru[5072]:     File "/opt/os-autoinst-scripts/openqa-trigger-bisect-jobs", line 322, in <module> 
 Dec 19 13:26:24 openqa openqa-gru[5072]:       main(parse_args()) 
 Dec 19 13:26:24 openqa openqa-gru[5072]:     File "/opt/os-autoinst-scripts/openqa-trigger-bisect-jobs", line 304, in main 
 Dec 19 13:26:24 openqa openqa-gru[5072]:       args.dry_run, 
 Dec 19 13:26:24 openqa openqa-gru[5072]:     File "/opt/os-autoinst-scripts/openqa-trigger-bisect-jobs", line 149, in openqa_clone 
 Dec 19 13:26:24 openqa openqa-gru[5072]:       return call(["openqa-clone-job"] + default_opts + cmds + default_cmds, dry_run) 
 Dec 19 13:26:24 openqa openqa-gru[5072]:     File "/opt/os-autoinst-scripts/openqa-trigger-bisect-jobs", line 114, in call 
 Dec 19 13:26:24 openqa openqa-gru[5072]:       (["echo", "Simulating: "] if dry_run else []) + cmds 
 Dec 19 13:26:24 openqa openqa-gru[5072]:     File "/usr/lib64/python3.6/subprocess.py", line 356, in check_output 
 Dec 19 13:26:24 openqa openqa-gru[5072]:       **kwargs).stdout 
 Dec 19 13:26:24 openqa openqa-gru[5072]:     File "/usr/lib64/python3.6/subprocess.py", line 438, in run 
 Dec 19 13:26:24 openqa openqa-gru[5072]:       output=stdout, stderr=stderr) 
 Dec 19 13:26:24 openqa openqa-gru[5072]: subprocess.CalledProcessError: Command '['openqa-clone-job', '--skip-chained-deps', '--json-output', '--within-instance', 'https://openqa.suse.de/tests/16252241', 'SDK_TEST_REPOS=http://download.suse.de/ibs/SUSE:/Maintenance:/36728/SUSE_Updates_SLE-Module-Development-Tools_15-SP5_x86_64/,http://download.suse.de/ibs/SUSE:/Maintenance:/36797/SUSE_Updates_SLE-Module-Development-Tools_15-SP5_x86_64/,http://download.suse.de/ibs/SUSE:/Maintenance:/36821/SUSE_Updates_SLE-Module-Development-Tools_15-SP5_x86_64/', 'TEST=jeos-containers-podman:investigate:bisect_without_36475', 'OPENQA_INVESTIGATE_ORIGIN=https://openqa.suse.de/tests/16252241', 'MAINT_TEST_REPO=', '_GROUP=0']' returned non-zero exit status 255. 

 Dec 19 13:26:24 openqa openqa-gru[5158]: openqa-clone-job (81 /opt/os-autoinst-scripts/openqa-investigate): (openqa-clone-job --json-output --skip-chained-deps --max-depth 0 --parental-inheritance --within-instance https://openqa.suse.de/tests/16205104 TEST+=:investigate:last_good_build:20241216-1 _TRIGGER_JOB_DONE_HOOK=1 _GROUP_ID=0 BUILD= OPENQA_INVESTIGATE_ORIGIN=https://openqa.suse.de/t16252243) stderr: >>>Current job 16205104 will fail, because the repositories for the below updates are unavailable<<< 

 Dec 19 13:26:24 openqa openqa-gru[5158]: openqa-clone-job (81 /opt/os-autoinst-scripts/openqa-investigate): (openqa-clone-job --json-output --skip-chained-deps --max-depth 0 --parental-inheritance --within-instance https://openqa.suse.de/tests/16205104 TEST+=:investigate:last_good_build:20241216-1 _TRIGGER_JOB_DONE_HOOK=1 _GROUP_ID=0 BUILD= OPENQA_INVESTIGATE_ORIGIN=https://openqa.suse.de/t16252243) rc: 255 >>><<< 

 Dec 19 13:26:26 openqa openqa-gru[5248]: openqa-clone-job (81 /opt/os-autoinst-scripts/openqa-investigate): (openqa-clone-job --json-output --skip-chained-deps --max-depth 0 --parental-inheritance --within-instance https://openqa.suse.de/tests/16205104 TEST+=:investigate:last_good_tests_and_build:7c3e460816d9f4305b288674abdc15d295158b49+20241216-1 _TRIGGER_JOB_DONE_HOOK=1 _GROUP_ID=0 BUILD= CASEDIR=https://github.com/os-autoinst/os-autoinst-distri-opensuse.git#7c3e460816d9f4305b288674abdc15d295158b49 WORKER_CLASS=svirt-xen,openqaw5-xen,zone-cc,region-prg,datacenter-dc7,location-prg2,worker35,cpu-x86_64,cpu-x86_64-v2,cpu-x86_64-v3 OPENQA_INVESTIGATE_ORIGIN=https://openqa.suse.de/t16252243) stderr: >>>Current job 16205104 will fail, because the repositories for the below updates are unavailable<<< 
 ``` 

 The first time I see this in logs happening is `Dec 19 09:55:21` (I assume this is UTC as it is directly from journalctl on OSD). I'm not sure if this is causing the gru-service to fail but it is certainly something to look into. 

 ## Acceptance Criteria 
 * **AC1**: No Python stacktraces in openqa-gru service logs for known errors 
 * **AC2**: openQA users have access to the details from openqa-clone-job, e.g. in the openQA job comment written by openqa-trigger-bisect-jobs 

 ## Suggestions 
 * Confirm what is causing Python stacktraces to show up in log files 
 * Redirect logs to /dev/null 
 * Catch the error and propagate it to openqa investigate and write a comment on the job 
 * Be aware that we have openqa-investigate (bash) and openqa-trigger-bisect-jobs (python) both calling openqa-clone-job which in both cases fail the same way 
 * We need to find a way to propagate the information in both differing implementations with focus on preventing the python stack trace which is only coming from openqa-trigger-bisect-jobs. In the worst case don't fix the bash script as the python stacktrace comes from the python script

Back