action #95788
open[qe-sap][ha][shap] test fails in iscsi_client or other modules in HA tests, missing network
0%
Description
Observation¶
openQA test in scenario sle-15-SP2-Server-DVD-HA-Incidents-x86_64-qam_ha_priority_fencing_node01@64bit fails in
iscsi_client
or other modules being unable to resolve DNS.
Reproducible¶
Fails sporadically.
Find jobs referencing this ticket with the help of
https://raw.githubusercontent.com/os-autoinst/scripts/master/openqa-query-for-job-label ,
call openqa-query-for-job-label poo#95788
Expected result¶
Ability to resolve download.suse.com and other hosts within curl or zypper calls
Further details¶
Always latest result in this scenario: latest
Suggestions¶
- Someone from Tools and from SAP pair up to debug this
Ask maintainer (Loic)- get familiar with iscsi (storage over TCP)
Updated by okurz about 3 years ago
- Related to action #95458: [qe-sap][ha] SUT reboots unexpectedly, leading to tests failing in HA scenarios auto_review:"(?s)tests/ha.*(command.*timed out|Test died).*match=root-console timed out":retry added
Updated by okurz about 3 years ago
- Subject changed from [ha][shap] test fails in iscsi_client or other modules in HA tests, missing network auto_review:"(?s)tests/ha.*command.*curl":retry to [ha][shap] test fails in iscsi_client or other modules in HA tests, missing network auto_review:"(?s)tests/ha.*command.*curl.*failed":retry
Updated by okurz about 3 years ago
- Subject changed from [ha][shap] test fails in iscsi_client or other modules in HA tests, missing network auto_review:"(?s)tests/ha.*command.*curl.*failed":retry to [ha][shap] test fails in iscsi_client or other modules in HA tests, missing network auto_review:"(?s)tests/ha.*post_fail_hook failed: command[^$]*curlcommand.*curl":retry
Updated by okurz about 3 years ago
- Subject changed from [ha][shap] test fails in iscsi_client or other modules in HA tests, missing network auto_review:"(?s)tests/ha.*post_fail_hook failed: command[^$]*curlcommand.*curl":retry to [ha][shap] test fails in iscsi_client or other modules in HA tests, missing network auto_review:"(?s)tests/ha.*post_fail_hook failed: command[^$].*curl":retry
Updated by okurz about 3 years ago
- Subject changed from [ha][shap] test fails in iscsi_client or other modules in HA tests, missing network auto_review:"(?s)tests/ha.*post_fail_hook failed: command[^$].*curl":retry to [ha][shap] test fails in iscsi_client or other modules in HA tests, missing network auto_review:"(?s)tests/ha.*post_fail_hook failed: command.*curl":retry
Updated by okurz about 3 years ago
Updated the auto-review regex to still match on https://openqa.suse.de/tests/6491214/ but not on https://openqa.suse.de/tests/6426081 which is about #95458 instead
Updated by okurz about 3 years ago
- Related to action #95801: [qe-sap][ha][css][shap] test fails in register_system of multi-machine HA tests, failing to access network added
Updated by acarvajal about 3 years ago
While reviewing SLES+HA and SLES for SAP Applications QR build 188.13 results, I ran into several jobs that failed with this issue.
All jobs have been tagged with this poo#. These are:
Between round brackets: architecture - module that failed - reason.
- https://openqa.suse.de/tests/6590829#step/ha_cluster_join/8 (aarch64 - ha/ha_cluster_join - cannot reach the other node with ha-cluster-join)
- https://openqa.suse.de/tests/6590692#step/ha_cluster_join/8 (ppc64le - ha/ha_cluster_join - cannot reach the other node with ha-cluster-join)
- https://openqa.suse.de/tests/6590752#step/iscsi_client/9 (x86_64 - ha/iscsi_client - cannot resolve updates.suse.com)
- https://openqa.suse.de/tests/6590755#step/ha_cluster_join/6 (x86_64 - ha/ha_cluster_join - cannot reach the other node with ping)
- https://openqa.suse.de/tests/6590738#step/iscsi_client/9 (ppc64le - ha/iscsi_client - cannot resolve updates.suse.com)
Updated by okurz about 3 years ago
- Related to coordination #96185: [epic] Multimachine failure rate increased added
Updated by openqa_review about 3 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: ha_hawk_haproxy_node02
https://openqa.suse.de/tests/6642090
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The label in the openQA scenario is removed
Updated by acarvajal about 3 years ago
- Subject changed from [ha][shap] test fails in iscsi_client or other modules in HA tests, missing network auto_review:"(?s)tests/ha.*post_fail_hook failed: command.*curl":retry to [ha][shap] test fails in iscsi_client or other modules in HA tests, missing network auto_review:"(?s)tests/ha.*(post_fail_hook failed: command.*curl|command.+ping.+node0.+failed)":retry
Updated by acarvajal about 3 years ago
Updated auto_review to cover https://openqa.suse.de/tests/6887791#step/ha_cluster_join/7 as well.
Updated by openqa_review about 3 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: xfstests_btrfs-generic-401-999
https://openqa.suse.de/tests/6982589
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The label in the openQA scenario is removed
Updated by okurz almost 3 years ago
- Subject changed from [ha][shap] test fails in iscsi_client or other modules in HA tests, missing network auto_review:"(?s)tests/ha.*(post_fail_hook failed: command.*curl|command.+ping.+node0.+failed)":retry to [qe-sap][ha][shap] test fails in iscsi_client or other modules in HA tests, missing network auto_review:"(?s)tests/ha.*(post_fail_hook failed: command.*curl|command.+ping.+node0.+failed)":retry
Using keyword "qe-sap" as verified by jmichel in weekly QE sync 2021-09-15
Updated by openqa_review almost 3 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: select_modules_and_patterns
https://openqa.suse.de/tests/7261466
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Updated by openqa_review almost 3 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: xfstests_btrfs-btrfs-151-999
https://openqa.suse.de/tests/7362281
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Updated by okurz almost 3 years ago
Please see my proposal to remove the failing test modules from the schedule until the issue could be resolved:
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/13475
Updated by okurz almost 3 years ago
- Subject changed from [qe-sap][ha][shap] test fails in iscsi_client or other modules in HA tests, missing network auto_review:"(?s)tests/ha.*(post_fail_hook failed: command.*curl|command.+ping.+node0.+failed)":retry to [tools][qe-sap][ha][shap] test fails in iscsi_client or other modules in HA tests, missing network auto_review:"(?s)tests/ha.*(post_fail_hook failed: command.*curl|command.+ping.+node0.+failed)":retry
- Due date set to 2021-10-20
- Priority changed from Normal to Urgent
- Target version set to Ready
As proposed by vpelcak we are looking for a volunteer from both SUSE QE Tools as well as QE SAP to collaborate and fix until next Tue EOB, otherwise the according tests should be disabled, e.g. as proposed in https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/13475
Updated by livdywan almost 3 years ago
- Description updated (diff)
- Status changed from New to Workable
- Assignee set to livdywan
I'm taking a look, and will query who might help as a domain expert
Updated by livdywan almost 3 years ago
okurz wrote:
Observation¶
openQA test in scenario sle-15-SP2-Server-DVD-HA-Incidents-x86_64-qam_ha_priority_fencing_node01@64bit fails in
iscsi_client
testapi::assert_script_run("curl --form upload=\@/var/log/zypper.log --form upname=iscsi_c"..., 90)
Reproducible¶
Fails sporadically.
Find jobs referencing this ticket with the help of
https://raw.githubusercontent.com/os-autoinst/scripts/master/openqa-query-for-job-label ,
callopenqa-query-for-job-label poo#95788
This unfortunately gives me some invalid job ID's. A couple that worked with the according errors:
https://openqa.suse.de/tests/7451408
sulogin: tcgetattr failed: Input/output error
https://openqa.suse.de/tests/7443999
Bad Request (400)
googleapi: Error 400: Precondition check failed., failedPrecondition
Further details¶
Always latest result in this scenario: latest
Looking at previous results, I just see softfails there.
Updated by acarvajal almost 3 years ago
okurz wrote:
Please see my proposal to remove the failing test modules from the schedule until the issue could be resolved:
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/13475
This is a bad idea.
For starters, the PR is changing the contents of schedule/ha/qam/common/qam_ha_rolling_upgrade_migration_node01.yaml
but the referenced test uses schedule/ha/qam/common/qam_ha_rolling_upgrade_migration_node01_sle12.yaml
instead.
But even if the wrong schedule issue was addressed in the PR, and while it could have the intended outcome for the job linked in the PR, that very same job is not a standalone job, so commenting those modules from the appropriate schedule will just move the failure to any of the other jobs in the MM cluster. In short, it would trade an sporadic failure[1], for a permanent one.
If the PR is merged as is (with the wrong schedule), not only will it not impact the linked job, but it would introduce issues where the schedule is currently in use:
- The schedule is present in https://gitlab.suse.de/acarvajal/qam-openqa-yml/-/blob/master/JobGroups/qam_qu/sle15sp2.yml#L1521 (15-SP2 QU).[2]
- It's also configured as a setting in the qam_ha_rolling_upgrade_migration_node01 test suite in osd.
- The qam_ha_rolling_upgrade_migration_node01 is used in the QAM TestRepo job groups for 12-SP3, 12-SP4, 12-SP5, 15-SP1, 15-SP2 and 15-SP3, but in most of those the YAML_SCHEDULE setting is being overwritten via the Job Group configuration. AFAICS, the schedule is however used in 15-SP2 TestRepo job group[3], quite successfully I might add.
On the other hand, if the PR is merged with the right schedule, the linked node 1 job may pass, but I see 2 possible outcomes for the related jobs:
- Either node 2 will fail timing out in a
lockapi::barrier_wait()
call waiting forbarrier_wait()
calls on the same barrier from the other jobs (for example, in https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/master/tests/ha/check_hawk.pm#L69) - Support server will time out after reaching MAX_JOB_TIME while waiting for all child jobs to finish; node 1 would finish of course thanks to the PR, but node 2 will stay "forever" in a
lockapi::barrier_wait()
call.
Another thing to consider is that this scenario is testing rolling upgrade, i.e.:
- Configure 2 nodes with the HA stack.
- Add some resources to the HA cluster.
- Stop cluster on node 1
- Upgrade node 1 to the next SP version of SLES+HA
- Start cluster on node 1
- Check HA cluster health
- Stop cluster on node 2
- Upgrade node 2 to the next SP version of SLES+HA
- Start cluster on node 2
- Check HA cluster health
By commenting the test modules in the PR, no cluster health is being checked (step 6 in the list above) after node 1 migration. In essence, the job will remain, it may pass (but not the other jobs in the MM setup) but the job itself will not be testing anything relevant. It will of course test cluster setup in a previous version of SLES+HA, which is already covered elsewhere.
Finally, the correct YAML schedule at schedule/ha/qam/common/qam_ha_rolling_upgrade_migration_node01_sle12.yaml
is not only used in the 12-SP4 HA Single Incidents job group where the linked test is located. It's also in use in 12-SP3 and 12-SP5:
acarvajal:~/git/qam-openqa-yml [master|✔] > grep -r schedule/ha/qam/common/qam_ha_rolling_upgrade_migration_node01_sle12.yaml
JobGroups/test_repo/sle12sp3.yml: YAML_SCHEDULE: 'schedule/ha/qam/common/qam_ha_rolling_upgrade_migration_node01_sle12.yaml'
JobGroups/test_repo/sle12sp4.yml: YAML_SCHEDULE: 'schedule/ha/qam/common/qam_ha_rolling_upgrade_migration_node01_sle12.yaml'
JobGroups/test_repo/sle12sp5.yml: YAML_SCHEDULE: 'schedule/ha/qam/common/qam_ha_rolling_upgrade_migration_node01_sle12.yaml'
In 12-SP5 test has ca. 89% success rate[4] at the time of this writing, while in 12-SP3 it has a 82% success rate[5].
In conclusion, removing/commenting test modules from the schedule, will break working tests in 12-SP3 and 12-SP5 QAM TestRepo job groups, while in 12-SP4 it will simply move an sporadic failure in node 1, to a permanent failure in node 2 or the support server.
If decision is to remove the test, better to do it by commenting related node 1, node 2 and supportserver jobs in https://gitlab.suse.de/qa-maintenance/qam-openqa-yml/-/blob/master/JobGroups/test_repo/sle12sp4.yml
This is what was done in 15-SP1 due to bsc#1183744: https://gitlab.suse.de/acarvajal/qam-openqa-yml/-/blob/master/JobGroups/test_repo/sle15sp1.yml#L316-337
[1] Yes! 70% is still sporadic. It means the test is passing 30% of the time.
[2] https://openqa.suse.de/tests/7288981#next_previous ... 75% success rate, but only 4 jobs.
[3] https://openqa.suse.de/tests/7451366#next_previous ... over 90% success rate.
[4] https://openqa.suse.de/tests/7451334#next_previous
[5] https://openqa.suse.de/tests/7451314#next_previous
Updated by okurz almost 3 years ago
If decision is to remove the test, better to do it by commenting related node 1, node 2 and supportserver jobs in https://gitlab.suse.de/qa-maintenance/qam-openqa-yml/-/blob/master/JobGroups/test_repo/sle12sp4.yml
Sounds good. So if no fix for the underlying issues could be found until EOB tomorrow then this schedule change should be ready to be merged at that time.
Updated by livdywan almost 3 years ago
Ricardo kindly helped me understand a little better what's happening here, and I took some notes on what questions came up:
- https://openqa.suse.de/tests/7221118#settings
- There's no Dependencies tab here. I would expect to see node02 and support-server which can be seen e.g. on https://openqa.suse.de/tests/7224107#dependencies (which passed).
Node qam-node02 (...) UNCLEAN (offline)
stands out as the most relevant error output- I don't know what unclean means or how the test tries to access qam-node02 and how it fails
- This seems to originate in
crm_mon -R -r -n -N -1 | grep -i 'no inactive resources'
- A successful run seems to include an
Inactive Resources:
section - Trying
crm_mon -R -r -n -N -1
on a cluster provided by Ricardo seems to have things like* rsc_ip_PRD_HDB00_start_0 on hana02 'error' (1): call=35, status='Timed Out', exitreason='', last-rc-change='2021-09-19 17:51:05 +02:00', queued=0ms, exec=20001ms
, where'error' (1): call=40, status='Timed Out'
stands out to me as an error TIMEOUT_SCALE 3
in job settings should mean50 seconds
times 3 meaning 150s for this job. Might make sense to increase the factor?
@acarvajal maybe you or somebody else can comment on the points above? In particular what UNCLEAN
means and how the test checks it and why there's no error output there e.g. timed out or unreachable
Updated by okurz almost 3 years ago
cdywan wrote:
Ricardo kindly helped me understand a little better what's happening here, and I took some notes on what questions came up:
- https://openqa.suse.de/tests/7221118#settings
- There's no Dependencies tab here. I would expect to see node02 and support-server which can be seen e.g. on https://openqa.suse.de/tests/7224107#dependencies (which passed).
jobs which are cloned cannot consistently resolve dependencies hence this won't show there. There is a feature request about this, couldn't find the ticket right now
Updated by livdywan almost 3 years ago
acarvajal wrote:
If decision is to remove the test, better to do it by commenting related node 1, node 2 and supportserver jobs in https://gitlab.suse.de/qa-maintenance/qam-openqa-yml/-/blob/master/JobGroups/test_repo/sle12sp4.yml
https://gitlab.suse.de/qa-maintenance/qam-openqa-yml/-/merge_requests/193
Here's my attempt, following your suggestion, in case we won't have a fix by EOD.
Updated by livdywan almost 3 years ago
cdywan wrote:
Node qam-node02 (...) UNCLEAN (offline)
stands out as the most relevant error output
- I don't know what unclean means or how the test tries to access qam-node02 and how it fails
- This seems to originate in
crm_mon -R -r -n -N -1 | grep -i 'no inactive resources'
- A successful run seems to include an
Inactive Resources:
section- Trying
crm_mon -R -r -n -N -1
on a cluster provided by Ricardo seems to have things like* rsc_ip_PRD_HDB00_start_0 on hana02 'error' (1): call=35, status='Timed Out', exitreason='', last-rc-change='2021-09-19 17:51:05 +02:00', queued=0ms, exec=20001ms
, where'error' (1): call=40, status='Timed Out'
stands out to me as an error
Btw this is in lib/hacluster.pm in check_cluster_state which conditionally greps for 'no inactive resources'
. And I notice the crm_verify -LV
is also conditionally fatal. Maybe this should not fail the test? I don't understand why it's fatal only in some cases, though, so this may be totally wrong.
TIMEOUT_SCALE 3
in job settings should mean50 seconds
times 3 meaning 150s for this job. Might make sense to increase the factor?
I couldn't actually find where this is set. I can only see it in the yaml for other tests.
Updated by acarvajal almost 3 years ago
cdywan wrote:
Ricardo kindly helped me understand a little better what's happening here, and I took some notes on what questions came up:
- https://openqa.suse.de/tests/7221118#settings
- There's no Dependencies tab here. I would expect to see node02 and support-server which can be seen e.g. on https://openqa.suse.de/tests/7224107#dependencies (which passed).
Node qam-node02 (...) UNCLEAN (offline)
stands out as the most relevant error output
Agree. Taken from the short description above for the test:
- Configure 2 nodes with the HA stack.
- Add some resources to the HA cluster.
- Stop cluster on node 1
- Upgrade node 1 to the next SP version of SLES+HA
- Start cluster on node 1
- Check HA cluster health
- Stop cluster on node 2
- Upgrade node 2 to the next SP version of SLES+HA
- Start cluster on node 2
- Check HA cluster health
It seems this is happening during step 6, i.e., node 1 has just been migrated to the next SP, cluster has been restarted on that node (it would start automatically after the reboot), but then it's finding the other node unhealthy/unclean.
As to the root cause, I would think either a product bug, a communication issue between both nodes or some race condition.
Not sure increasing the timeout would help as node 2 should always be available during node 1 migration.
- I don't know what unclean means or how the test tries to access qam-node02 and how it fails - This seems to originate in `crm_mon -R -r -n -N -1 | grep -i 'no inactive resources'`
Node 2 is unclean, there are inactive resources, so the test fails.
- A successful run seems to include an `Inactive Resources:` section
Failing test also includes the section. If you see it lists a lot of inactive resources there: https://openqa.suse.de/tests/7221118#step/check_cluster_integrity/6
What successful test should include is an empty Inactive Resources:
section.
- Trying `crm_mon -R -r -n -N -1` on a cluster provided by Ricardo seems to have things like `* rsc_ip_PRD_HDB00_start_0 on hana02 'error' (1): call=35, status='Timed Out', exitreason='', last-rc-change='2021-09-19 17:51:05 +02:00', queued=0ms, exec=20001ms`, where `'error' (1): call=40, status='Timed Out'` stands out to me as an error
Different type of cluster/scenario. That error is seen on HANA clusters after a site takeover/takeback. You can see it in successful test for example at: https://openqa.suse.de/tests/7430859#step/check_after_reboot#1/15
Test modules handle that error (registers fenced HANA node for system replication again in the cluster) and test continues.
This scenario (rolling upgrade) is not using HANA.
TIMEOUT_SCALE 3
in job settings should mean50 seconds
times 3 meaning 150s for this job. Might make sense to increase the factor?
I think it can be tested with an increased timeout just to confirm whether it helps or not, but my hunch is that it will not help.
Updated by livdywan almost 3 years ago
- Status changed from Workable to Feedback
cdywan wrote:
https://gitlab.suse.de/qa-maintenance/qam-openqa-yml/-/merge_requests/193
Here's my attempt, following your suggestion, in case we won't have a fix by EOD.
The above MR was reviewed and merged.
There was a suggestion in chat to have the test in a development group. Due the concerns over breaking other tests I've not tried that.
I'm thinking if we want this or a new ticket to continue the investigation of the failures.
Updated by okurz almost 3 years ago
- Due date deleted (
2021-10-20) - Status changed from Feedback to Workable
- Assignee deleted (
livdywan) - Priority changed from Urgent to High
- Target version deleted (
Ready)
better continue here. But I think at this point it's better for QE SAP to decide how to go on, what to cover manually, what to fix in tests, where to test it, etc. @cdywan thanks for your help. Removing you from assignee and reducing prio after the urgent issue was addressed.
Updated by livdywan almost 3 years ago
- Related to action #69976: Show dependency graph for cloned jobs added
Updated by okurz almost 3 years ago
- Subject changed from [tools][qe-sap][ha][shap] test fails in iscsi_client or other modules in HA tests, missing network auto_review:"(?s)tests/ha.*(post_fail_hook failed: command.*curl|command.+ping.+node0.+failed)":retry to [qe-sap][ha][shap] test fails in iscsi_client or other modules in HA tests, missing network auto_review:"(?s)tests/ha.*(post_fail_hook failed: command.*curl|command.+ping.+node0.+failed)":retry
Updated by openqa_review almost 3 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: jeos-extratest
https://openqa.suse.de/tests/7350856
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Updated by openqa_review almost 3 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: toolchain_zypper
https://openqa.suse.de/tests/7728612
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Updated by openqa_review over 2 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: jeos-base+phub
https://openqa.suse.de/tests/7802298
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Updated by asmorodskyi over 2 years ago
- Subject changed from [qe-sap][ha][shap] test fails in iscsi_client or other modules in HA tests, missing network auto_review:"(?s)tests/ha.*(post_fail_hook failed: command.*curl|command.+ping.+node0.+failed)":retry to [qe-sap][ha][shap] test fails in iscsi_client or other modules in HA tests, missing network auto_review:"(?s)(tests/ha_cluster_join|tests/iscsi_client).*(post_fail_hook failed: command.*curl|command.+ping.+node0.+failed)":retry
Updated by okurz over 2 years ago
- Subject changed from [qe-sap][ha][shap] test fails in iscsi_client or other modules in HA tests, missing network auto_review:"(?s)(tests/ha_cluster_join|tests/iscsi_client).*(post_fail_hook failed: command.*curl|command.+ping.+node0.+failed)":retry to [qe-sap][ha][shap] test fails in iscsi_client or other modules in HA tests, missing network auto_review:"(?s)(tests/ha/ha_cluster_join|tests/iscsi/iscsi_client).*(post_fail_hook failed: command.*curl|command.+ping.+node0.+failed)":retry
paths like "tests/ha_cluster_join|tests/iscsi_client" don't exist. In os-autoinst-distri-opensuse there are paths like "tests/ha/ha_cluster_join.pm" and "tests/iscsi/iscsi_client.pm"
Updated by openqa_review over 2 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: toolchain_zypper
https://openqa.suse.de/tests/7925816
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Updated by openqa_review over 2 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: toolchain_zypper
https://openqa.suse.de/tests/7976119
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Updated by openqa_review over 2 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: ha_gamma_node03
https://openqa.suse.de/tests/8044170
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Updated by asmorodskyi over 2 years ago
- Subject changed from [qe-sap][ha][shap] test fails in iscsi_client or other modules in HA tests, missing network auto_review:"(?s)(tests/ha/ha_cluster_join|tests/iscsi/iscsi_client).*(post_fail_hook failed: command.*curl|command.+ping.+node0.+failed)":retry to [qe-sap][ha][shap] test fails in iscsi_client or other modules in HA tests, missing network
removing autoreview due to false labeling https://openqa.suse.de/tests/8109535#comments
Updated by openqa_review over 2 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: rsync-client
https://openqa.suse.de/tests/8197348
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Updated by rbranco over 2 years ago
- Status changed from Workable to Resolved
This ticket contains totally unrelated tests.
From the HA/SAP side we could fix with:
https://gitlab.suse.de/qa-maintenance/qam-openqa-yml/-/merge_requests/231
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/14457
Updated by okurz over 2 years ago
I am not sure if this will stay true. For example the latest job in https://openqa.suse.de/tests/latest?arch=x86_64&distri=sle&flavor=Online&machine=64bit&test=rsync-client&version=15-SP4#next_previous , related to the last linked job in comments, was on 2022-03-08, it passed, but this was a sporadic issue. And no test was conducted since then. I wouldn't be so sure this can't happen again but I am crossing fingers as well :)
Updated by openqa_review over 2 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: ha_delta_node02
https://openqa.suse.de/tests/8445292#step/ha_cluster_join/1
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.
Updated by openqa_review over 2 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: qam_2nodes_02
https://openqa.suse.de/tests/8630040#step/ha_cluster_join/1
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Expect the next reminder at the earliest in 60 days if nothing changes in this ticket.
Updated by openqa_review over 2 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: qam_2nodes_02
https://openqa.suse.de/tests/8706249#step/ha_cluster_join/1
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.
Updated by openqa_review over 2 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: qam_2nodes_02
https://openqa.suse.de/tests/8915222#step/ha_cluster_join/1
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.
Updated by openqa_review about 2 years ago
- Status changed from Resolved to Feedback
Re-opening tickets with unhandled openqa-review reminder comment, see https://progress.opensuse.org/projects/openqatests/wiki/Wiki#openqa-review-reminder-handling
Updated by szarate about 2 years ago
- Priority changed from High to Normal
They aren't high prio if nobody looks at them, perhaps the soft failure should be changed to: label:wontfix:xxxx
Updated by llzhao 7 months ago
- Status changed from Feedback to Workable
Reopen it as there are some occurrences in OSD:
https://openqa.suse.de/tests/13380296#step/iscsi_client/9
https://openqa.suse.de/tests/13380301#step/iscsi_client/9
Updated by acarvajal 7 months ago
llzhao wrote in #note-53:
Reopen it as there are some occurrences in OSD:
https://openqa.suse.de/tests/13380296#step/iscsi_client/9
https://openqa.suse.de/tests/13380301#step/iscsi_client/9
We're observing this only in ppc64le and only in SLES for SAP jobs. HA jobs in ppc64le do not have the issue, so it could be possibly related to qemu_ppc64le-large-mem
workers.
Updated by acarvajal 7 months ago
https://openqa.suse.de/tests/13381522#step/iscsi_client/9
https://openqa.suse.de/tests/13381519#step/iscsi_client/9
Seems cluster nodes ran in petrol and support servers ran in mania ... and the error is resolving openqa.suse.de. Could be a MM connection issue between cluster nodes and support server.
Updated by acarvajal 7 months ago
- Related to action #154552: [ppc64le] test fails in iscsi_client - zypper reports Error Message: Could not resolve host: openqa.suse.de added