action #166772
opencoordination #102915: [saga][epic] Automated classification of failures
coordination #166655: [epic] openqa-label-known-issues
openqa-label-known-issues overrides size:S
0%
Description
Observation¶
https://github.com/os-autoinst/scripts/blob/master/openqa-label-known-issues#L55
if ! curl "${curl_args[@]}" -s "$testurl" -o "$out"; then
Problem is that $out
is overridden. Then, in case it doesnt reach the block, the script will continue with the label_on_issues_from_issue_tracker
with modified context, when it is expected to be the context of autoinst-log.txt.
Raised on https://github.com/os-autoinst/scripts/pull/342/files#r1745228802
Related #169699 showing
Nov 10 03:30:05 ariel openqa-gru[12854]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 68
Nov 10 03:30:07 ariel openqa-gru[13280]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 68
Nov 10 03:30:24 ariel openqa-gru[13985]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 68
Nov 10 03:30:29 ariel openqa-gru[14252]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 68
Nov 10 03:30:31 ariel openqa-gru[14391]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 68
Nov 10 03:30:32 ariel openqa-gru[14597]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 68
Acceptance criteria¶
- AC1: The output file is written to the right location in the reference function.
Suggestions¶
- Research what the use of the output file is and how to test/verify this
- Try to make sense of the code to find out what the wanted behavior is
- Add unit tests
Updated by ybonatakis 2 months ago
- Related to action #165716: [o3] Munin - minion hook failed - /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 68 size:M added
Updated by livdywan 2 months ago
Raised on https://github.com/os-autoinst/scripts/pull/342/files#r1745228802 but i think it is not an issue as the out in the function is a local variable.
What is the goal of this ticket? #166649 covers making the code legible so I wouldn't worry about it here.
Should this be "Complete unit test coverage for openqa-label-known-issues"? Or maybe "Consistent handling of old assets in openqa-label-known-issues"?
Updated by ybonatakis 2 months ago
livdywan wrote in #note-2:
Raised on https://github.com/os-autoinst/scripts/pull/342/files#r1745228802 but i think it is not an issue as the out in the function is a local variable.
What is the goal of this ticket? #166649 covers making the code legible so I wouldn't worry about it here.
Should this be "Complete unit test coverage for openqa-label-known-issues"? Or maybe "Consistent handling of old assets in openqa-label-known-issues"?
if the ticket needs more info for the estimation, give me some time to investigate and update. sounds good?
Updated by ybonatakis 2 months ago
- Description updated (diff)
I verified that the autoinst-logs are override and updated the ticket. I also created a draft https://github.com/os-autoinst/scripts/pull/347
Updated by okurz 9 days ago
- Related to action #169699: [alert] opensuse.org openqa.opensuse.org openqa_minion_jobs_hook_rc_failed minion 'hook failed - see openqa-gru service logs for details' added
Updated by ybonatakis 9 days ago
- Status changed from Workable to In Progress
- Assignee set to ybonatakis
Updated by tinita 9 days ago · Edited
I hope I can help with some research.
I tried to find the minion jobs / openqa jobs for those failures in the logs.
When running openqa-label-known-issues on those jobs, the error cannot be reproduced.
But I found a pattern: all of the failing jobs are on investigate jobs.
(All of them are incompletes, but that is something we knew before already, as the error is in a function that's called when there is no autoinst log)
I couldn't figure out in what cases handle_unreachable
would result in that error output.
Here is the list of jobs from yesterday and today:
openqa=> select id, concat('https://openqa.opensuse.org/tests/', args->1), task, started, state from minion_jobs where task = 'hook_script' and created >= '2024-11-10 11:39:00' and created <= '2024-11-12 11:42:00' and notes::varchar like '%hook_rc": 1%' order by started limit 100;
id | concat | task | started | state
---------+-------------------------------------------+-------------+-------------------------------+----------
4541962 | https://openqa.opensuse.org/tests/4635024 | hook_script | 2024-11-11 11:39:08.434197+00 | finished
4545150 | https://openqa.opensuse.org/tests/4637425 | hook_script | 2024-11-12 09:29:45.814464+00 | finished
4545155 | https://openqa.opensuse.org/tests/4637425 | hook_script | 2024-11-12 09:30:05.762508+00 | finished
4545157 | https://openqa.opensuse.org/tests/4637425 | hook_script | 2024-11-12 09:30:05.863404+00 | finished
4545165 | https://openqa.opensuse.org/tests/4637439 | hook_script | 2024-11-12 09:30:26.215627+00 | finished
4545173 | https://openqa.opensuse.org/tests/4637439 | hook_script | 2024-11-12 09:30:45.886272+00 | finished
4545175 | https://openqa.opensuse.org/tests/4637439 | hook_script | 2024-11-12 09:30:50.763358+00 | finished
4545183 | https://openqa.opensuse.org/tests/4637440 | hook_script | 2024-11-12 09:31:12.157622+00 | finished
4545190 | https://openqa.opensuse.org/tests/4637440 | hook_script | 2024-11-12 09:31:25.725176+00 | finished
4545193 | https://openqa.opensuse.org/tests/4637440 | hook_script | 2024-11-12 09:31:25.903787+00 | finished
4545204 | https://openqa.opensuse.org/tests/4637442 | hook_script | 2024-11-12 09:32:16.65011+00 | finished
4545209 | https://openqa.opensuse.org/tests/4637441 | hook_script | 2024-11-12 09:32:32.69126+00 | finished
4545213 | https://openqa.opensuse.org/tests/4637442 | hook_script | 2024-11-12 09:32:43.312287+00 | finished
4545214 | https://openqa.opensuse.org/tests/4637442 | hook_script | 2024-11-12 09:32:43.369843+00 | finished
4545216 | https://openqa.opensuse.org/tests/4637441 | hook_script | 2024-11-12 09:32:52.946281+00 | finished
4545218 | https://openqa.opensuse.org/tests/4637441 | hook_script | 2024-11-12 09:32:53.106781+00 | finished
4545224 | https://openqa.opensuse.org/tests/4637455 | hook_script | 2024-11-12 09:33:18.878535+00 | finished
4545229 | https://openqa.opensuse.org/tests/4637459 | hook_script | 2024-11-12 09:33:33.341963+00 | finished
4545233 | https://openqa.opensuse.org/tests/4637455 | hook_script | 2024-11-12 09:33:48.173654+00 | finished
4545235 | https://openqa.opensuse.org/tests/4637455 | hook_script | 2024-11-12 09:33:48.259264+00 | finished
4545237 | https://openqa.opensuse.org/tests/4637459 | hook_script | 2024-11-12 09:33:57.955449+00 | finished
4545239 | https://openqa.opensuse.org/tests/4637459 | hook_script | 2024-11-12 09:33:58.084309+00 | finished
4545257 | https://openqa.opensuse.org/tests/4637465 | hook_script | 2024-11-12 09:35:37.193929+00 | finished
4545263 | https://openqa.opensuse.org/tests/4637465 | hook_script | 2024-11-12 09:36:11.984583+00 | finished
4545265 | https://openqa.opensuse.org/tests/4637465 | hook_script | 2024-11-12 09:36:12.078513+00 | finished
4545267 | https://openqa.opensuse.org/tests/4637475 | hook_script | 2024-11-12 09:36:17.567124+00 | finished
4545273 | https://openqa.opensuse.org/tests/4637475 | hook_script | 2024-11-12 09:36:51.89106+00 | finished
4545275 | https://openqa.opensuse.org/tests/4637475 | hook_script | 2024-11-12 09:36:51.983071+00 | finished
4545277 | https://openqa.opensuse.org/tests/4637481 | hook_script | 2024-11-12 09:36:57.903039+00 | finished
4545284 | https://openqa.opensuse.org/tests/4637481 | hook_script | 2024-11-12 09:37:31.831233+00 | finished
4545286 | https://openqa.opensuse.org/tests/4637481 | hook_script | 2024-11-12 09:37:31.940482+00 | finished
4545290 | https://openqa.opensuse.org/tests/4637471 | hook_script | 2024-11-12 09:39:14.894448+00 | finished
4545294 | https://openqa.opensuse.org/tests/4637471 | hook_script | 2024-11-12 09:39:48.528856+00 | finished
4545296 | https://openqa.opensuse.org/tests/4637471 | hook_script | 2024-11-12 09:39:48.607363+00 | finished
4545298 | https://openqa.opensuse.org/tests/4637485 | hook_script | 2024-11-12 09:39:55.220461+00 | finished
4545300 | https://openqa.opensuse.org/tests/4637485 | hook_script | 2024-11-12 09:40:29.07404+00 | finished
4545302 | https://openqa.opensuse.org/tests/4637485 | hook_script | 2024-11-12 09:40:29.14412+00 | finished
4545304 | https://openqa.opensuse.org/tests/4637486 | hook_script | 2024-11-12 09:40:35.520763+00 | finished
4545309 | https://openqa.opensuse.org/tests/4637486 | hook_script | 2024-11-12 09:41:09.042989+00 | finished
4545311 | https://openqa.opensuse.org/tests/4637486 | hook_script | 2024-11-12 09:41:09.121461+00 | finished
4545313 | https://openqa.opensuse.org/tests/4637487 | hook_script | 2024-11-12 09:41:15.832463+00 | finished
4545317 | https://openqa.opensuse.org/tests/4637487 | hook_script | 2024-11-12 09:41:43.402739+00 | finished
4545496 | https://openqa.opensuse.org/tests/4637546 | hook_script | 2024-11-12 11:21:59.522885+00 | finished
4545498 | https://openqa.opensuse.org/tests/4637552 | hook_script | 2024-11-12 11:22:00.62644+00 | finished
4545500 | https://openqa.opensuse.org/tests/4637553 | hook_script | 2024-11-12 11:22:02.797964+00 | finished
4545502 | https://openqa.opensuse.org/tests/4637551 | hook_script | 2024-11-12 11:22:04.694155+00 | finished
4545535 | https://openqa.opensuse.org/tests/4637558 | hook_script | 2024-11-12 11:41:06.887807+00 | finished
4545536 | https://openqa.opensuse.org/tests/4637557 | hook_script | 2024-11-12 11:41:06.970634+00 | finished
4545539 | https://openqa.opensuse.org/tests/4637555 | hook_script | 2024-11-12 11:41:12.210091+00 | finished
4545325 | https://openqa.opensuse.org/tests/4637359 | hook_script | 2024-11-12 12:28:58.202091+00 | inactive
(50 rows)
I suggest to add some helpful debugging. Those cases are not happening that often, so IMHO it's ok to have some more information in the log, from inside the handle_unreachable
function.
edit: Also noteworthy, there is sometimes more than one minion job for the same openqa job.
Updated by ybonatakis 9 days ago
thanks Tina. Also I couldnt reproduce it. I tried with both scenarios (fixing the $out override issue and without) the result is the same
Updated by tinita 9 days ago
Probably unrelated, but I noticed that we sometimes have more than one minion job per openqa job:
hook_script:
- https://openqa.opensuse.org/minion/jobs?id=4545183
- https://openqa.opensuse.org/minion/jobs?id=4545190
- https://openqa.opensuse.org/minion/jobs?id=4545193
finalize_job_results:
- https://openqa.opensuse.org/minion/jobs?id=4545182
- https://openqa.opensuse.org/minion/jobs?id=4545187
- https://openqa.opensuse.org/minion/jobs?id=4545191
I'm not sure if that is expected.
Updated by tinita 9 days ago
- Copied to action #169747: Multiple finalize_job_results and hook_script minion jobs per openQA job size:M added
Updated by openqa_review 8 days ago
- Due date set to 2024-11-27
Setting due date based on mean cycle time of SUSE QE Tools
Updated by ybonatakis 8 days ago
I am so confused
❯ dry_run=1 ./openqa-label-known-issues-and-investigate-hook 4639446
openqa-cli (318 /home/iob/Documents/Work/qatools/repos/scripts/_common): Error making API request (jobs/http://openqa.opensuse.org/t4638817): 404 Not Found
{"error_status":404}
Skipping posting investigation comment on original job http://openqa.opensuse.org/t4638817 as it does not exist anymore
But https://openqa.opensuse.org/tests/4638817 does exist
o3 openqa-gru shows constant errors and produced emails
Updated by ybonatakis 8 days ago
I have updated https://github.com/os-autoinst/scripts/pull/347 added some debug output in different scripts
Updated by tinita 8 days ago
ybonatakis wrote in #note-18:
I am so confused
❯ dry_run=1 ./openqa-label-known-issues-and-investigate-hook 4639446
Calling openqa-label-known-issues-and-investigate-hook
here is not really helpful I think. The error happens in openqa-label-known-issues
. Calling the whole hook script will call openqa-investigate before and that does different things if it already processed the job before.
Just calling ./openqa-label-known-issues https://openqa.opensuse.org/tests/4637359
should be enough, but the error is not reproducible, so it could be a temporary network thing.
Updated by tinita 8 days ago
https://github.com/os-autoinst/scripts/pull/342 was never deployed on o3, due to https://progress.opensuse.org/issues/166721#note-15
This explains why we still saw the error in line 68.
The repo is now updated on o3, and the actual bugfix regarding the output file can be worked on. I will enable the cronjob and email notification again.
Updated by tinita 8 days ago · Edited
tinita wrote in #note-21:
I temporalily disabled
/etc/cron.d/os-autoinst-scripts-update-git
on o3 and addedset -x
tohandle_unreachable
, because I have no clue where the error actually happens.
Set the notification email to mine in /etc/munin/munin.conf for now.
Waiting until we see this again.
Enabled cronjob, enabled email notification again.
@ybonatakis can continue tomorrow.
Updated by tinita 6 days ago
- Related to action #166721: [alert] Waves of emails due to kex_exchange_identification: Connection closed by remote host errors added