Project

General

Profile

Wiki » History » Version 61

okurz, 2023-09-25 11:37
Statistical investigation: Use "TEST+=" syntax so that the original test suite name is preserved

1 15 okurz
# Introduction
2 1 okurz
3 15 okurz
{{toc}}
4
5 1 okurz
Also see https://progress.opensuse.org/projects/openqav3/wiki
6
7 20 okurz
8
# Organisational
9
10
## ticket workflow
11
12 21 okurz
This project adheres to the ticket workflow as described on the parent project: [ticket workflow](https://progress.opensuse.org/projects/openqav3/wiki/Wiki#ticket-workflow)
13 20 okurz
14
Also see the [[Wiki#Definition-of-DONE|Definition-of-DONE]] on the use of ticket status, especially when to set *Resolved*.
15
16 35 okurz
The following issue categories are used:
17
18
* *New test*: Any extension of the existing test coverage, for example a new test module, a new test scenario as well as simply an addition of test steps within existing test modules
19 37 okurz
* *Bugs in existing tests*: Test failures that need to be investigated or obvious test failures which need fixing in the test codes or needle updates; **not** product bugs: Necessary adaptions to existing tests to make them usable again after acceptable product changes or to increase stability of unstable tests
20
* *Enhancement to existing tests*: Enhancements without changing the test scope, for example improvement of post_fail_hooks to gather more relevant logs, refactoring, cleanup of code, reducing duplication, make tests more stable, refactor to be easier to read, workflow related
21
* *Infrastructure*: Anything regarding the test infrastructure including workers used for o3 (openqa.opensuse.org) and osd (openqa.suse.de). Not directly related to the test code or needles but our infrastructure, e.g. worker issues, our syncing and triggering approach, etc.
22 38 szarate
* *Spike/Research*: Tickets that represent a timeboxed research of some sort, or also [a spike](http://agiledictionary.com/209/spike/) with the sole intention of clarifying before spawning new tasks on a certain topic.
23 20 okurz
24 15 okurz
# test organization on https://openqa.suse.de/
25 1 okurz
26 15 okurz
## job group names
27 1 okurz
28 15 okurz
### Job group names should be consistent and structured for easy (daily) review of the current status
29 1 okurz
30
template:
31
```
32
<product_group_short_name> <order_nr>.<product_variant>
33
```
34
e.g. "SLE 12 SP1 1.Server". Keep the whitespace for separation consistent, also see https://progress.opensuse.org/issues/9916
35
36 15 okurz
### Released products should be named with a prefix 'x' to show up late in the overview page
37 1 okurz
38 2 okurz
This way we can keep track if tests fail even though the product does not produce new builds. This could help us crosscheck tests. E.g. "x-released SLE 12 SP1 1.Server".
39 1 okurz
40 2 okurz
lowercase "x" as all our product names start with capital letters. Sorting works regardless (or uppercase first?).
41 1 okurz
42
For now we do not retrigger tests on old builds automatically but any test developer may retrigger it manually, e.g. if he suspects the tests broke and he wants to confirm that local changes are not at fault.
43 4 okurz
44 15 okurz
# needling best practices
45 14 okurz
There are also other locations where "needling best practices" can be found but we should also have the possibility to keep something on the wiki. Feel free to contact me (okurz) and tell me where it should be instead if there is a better place. Also look into [openQA Pitfalls](https://github.com/os-autoinst/openQA/blob/master/docs/Pitfalls.asciidoc)
46 4 okurz
47 15 okurz
## applying "workaround" needles
48 4 okurz
If a test reveals a product issue of minor importance it can make sense create a needle with the property "workaround" set. This way, if the needle is matched, the test records this as a "soft-fail". To backtrack the product issue and follow on this and eventually delete the workaround needle if the product issue is fixed, the product issue should be recorded in the needle name itself and at best also in the git commit message adding the needle. If test changes are necessary the source code should have a corresponding comment referencing the issue as well as marking start and stop of the test procedure that is necessary for applying the workaround. Example for a needle name: "gdm-workaround-bsc962806-20160125" referencing bsc#962806
49 1 okurz
50
*keep in mind:*
51 33 okurz
Since [gh-os-autoinst#532](https://github.com/os-autoinst/os-autoinst/pull/532) workaround needles are always preferred, otherwise if two needles match, the first in alphabetical list wins. Therefore it is even more important to prevent "greedy" needles, i.e. make sure the workaround needles do not match without checking for the error condition
52 5 okurz
53 15 okurz
## do not overwrite old needles because old date confuses people
54 19 okurz
With the needle editor a timestamp of the current day is automatically added to new needles. When updating a needle, don't overwrite a needle with the old date tag not to confuse people as it will look really weird in the needle editor.
55 8 okurz
56 15 okurz
## needle indidvidual column entries in tables
57 8 okurz
**Problem**: Tables might auto-adjust column size based on content. Therefore it is unsafe to create needles covering multiple columns in a row. Failing example: https://openqa.suse.de/tests/441169#step/yast2_snapper/23
58
**Solution**: Needles support multiple areas. Use them to needle individual cells in this example.
59 6 okurz
60 17 okurz
61
## don't include version specific content in needles
62
63
**Problem**: Creating a needle that covers version number of application or product version fails often for every update, e.g. see [opensuse-42.2-DVD-x86_64-Build0112-xfce@64bit](https://openqa.opensuse.org/tests/228793#step/firefox/10). Obviously the needle does not match because no one so far created a needle for firefox 47 on Leap42.2 on xfce.
64
**Solution**: openQA in general supports exclusion areas and even OCR but they have its [flaws](https://progress.opensuse.org/issues/12858). For now better carefully select matching areas so that versions are not included like in the following example
65
![needling example](openQA_needle_firefox_wo_version_cropped.png).
66
67 42 okurz
68
## ensure prerequisites for next test steps
69
70
**Problem**: A needle looks for a part of a screen before continuing with next steps, e.g. in a wizard. It might happen that the system is seemingly "loosing" key presses as the expected actions are not triggered. For example this happened in https://progress.opensuse.org/issues/61877
71
**Solution**: Needles need to ensure that the system is ready to accept the next action, e.g. not only check for expected content in text but also that a "Next" button is not greyed out or that all dynamic content in a wizard is already shown. Also see https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/9667#discussion_r386828207
72
73 46 okurz
## consider a lower than default "match ratio"
74
75
**Problem**: By default the openQA needle editor proposes a rather strict "match ratio" to prevent false positive matches. Many people do not know about the meaning of "match ratio" at all and just create new needles on mismatches with default parameters. As soon as there are even slight UI or rendering changes often many tests fail needing multiple needle updates which can cause a lot of work and a big overhead of needle changes which are again often created keeping the default high "match ratio".
76
**Solution**: Consider lowering the "match ratio" in all cases where slight UI or rendering changes should be acceptable and not cause test failures. For this keep in mind that bigger match areas allow bigger changes for the same match ratio so e.g. select "80%" for small size match areas, higher values for bigger areas.
77
78 15 okurz
# Definition of DONE/READY
79 6 okurz
80
Each of the following points has to be fulfilled to regard individual contributions as *DONE*. Not every step has to be done by the same step. The overall completion is in responsibility of the complete team.
81
82 15 okurz
## Definition of DONE
83 6 okurz
84
Also see http://www.allaboutagile.com/definition-of-done-10-point-checklist/ and https://www.scrumalliance.org/community/articles/2008/september/what-is-definition-of-done-%28dod%29
85
86
The following definitions are used to ensure development on individual tests has been completed covering all existing different workflows, e.g. covering "hot-fixes" on the productive instance as well as contributions by new contributors with no previous experience and no control over needle generation on productive instances.
87
88 1 okurz
* Code changes are made available via a pull request on the github repository
89 6 okurz
* New tests as individual test modules (i.e. files under `tests/`): They are loaded in main.pm of sle and/or opensuse 
90
* "make test" works (e.g. automatic travis CI check triggered on each github PR)
91
* [Guidelines for git commits](http://chris.beams.io/posts/git-commit/) have been followed
92
* Code has been reviewed (e.g. in the github PR)
93
* Favored, but depending on criticality/complexity/size: A local verification test has been run, e.g. post link to a local openQA machine or screenshot or logfile
94 18 okurz
* Test modules that have been touched have updated metadata, e.g. "Maintainer" and "Summary" (#13034)
95 28 okurz
* Potentially impacted product variants have been considered, e.g. openSUSE, SLE, validation tests for new product versions currently in development, maintenance tests on older product versions
96 6 okurz
* Code has been merged (either by reviewer or reviewee after 'LGTM' from others)
97 1 okurz
* Code has been deployed to osd and o3 (automatic git sync every few minutes)
98
* If new variables are necessary (feature toggles): A test_suite is executing the test, e.g. test_suite is created or variable is added to existing test_suite over web interface configuration on osd and/or o3
99 31 okurz
* If a new test_suite has been created:
100
  * The test_suite is added to at least one job_group
101
  * The test_suite has a description describing the goal of the new test + at least one maintainer. Optional: References to fate#, boo#, bsc#, poo#
102 6 okurz
* Necessary needles are made available as PR for sle and/or opensuse (depending if executed, see above for 'main.pm') or are created on the productive instance
103 29 okurz
* At least one successful test run has been observed on osd or o3 and referenced in the corresponding progress item or bugzilla bug report if one exists. There is one exception: If the test fails in a valid product bug and it is expected that a bug fix will be provided shortly the test run may also fail when labeled accordingly.
104 6 okurz
105 15 okurz
## Definition of READY for new tests
106 6 okurz
107
The following points should be considered before a new test is READY to be implemented:
108
109
* Either a product bug has been discovered for which there is no automated test in openQA or a FATE request for new features exists
110
* A test case description exists depicting the prerequisites of the test, the steps to conduct and the expected result
111
* The impact and applicability for both SLE and openSUSE products has been considered
112 10 okurz
113 30 okurz
a good practice is to also add the following one after another:
114
* a tag in the subject line of either "[easy]", "[medium]", "[hard]" depending on how you judge the implementation of the ticket in comparison to other known examples from experience. E.g. a simple "needle update" should be "[easy]" as well as changes to only one test module. A change that would involve updating the test API, needles and test code and impacting multiple products can be "[hard]"
115
* add acceptance criteria (see ticket template)
116
* add tasks as a hint where to start
117
118 47 okurz
# How we work on tickets
119
120 39 SLindoMansilla
## ticket backlog triaging
121 24 okurz
122 34 okurz
Also see https://progress.opensuse.org/projects/suseqa/wiki#ticket-refinement-grooming
123 24 okurz
124
125 27 okurz
1. [**Categorize**](https://progress.opensuse.org/projects/openqatests/issues?utf8=%E2%9C%93&set_filter=1&f%5B%5D=category_id&op%5Bcategory_id%5D=%21*&f%5B%5D=status_id&op%5Bstatus_id%5D=o&f%5B%5D=&c%5B%5D=subject&c%5B%5D=project&c%5B%5D=status&c%5B%5D=assigned_to&c%5B%5D=fixed_version&c%5B%5D=relations&c%5B%5D=priority&c%5B%5D=updated_on&c%5B%5D=category&c%5B%5D=created_on&group_by=): Goal -> No ticket without category
126 43 SLindoMansilla
  1. [**Categorize QSF-U**](http://s.qa.suse.de/qsfu_tickets_without_category)
127 44 tjyrinki_suse
2. [**Tag**](https://progress.opensuse.org/projects/openqatests/issues?utf8=%E2%9C%93&set_filter=1&sort=id%3Adesc&f%5B%5D=subject&op%5Bsubject%5D=%21%7E&v%5Bsubject%5D%5B%5D=%5B&f%5B%5D=status_id&op%5Bstatus_id%5D=o&f%5B%5D=issue_tags&op%5Bissue_tags%5D=%21*&f%5B%5D=&c%5B%5D=subject&c%5B%5D=project&c%5B%5D=status&c%5B%5D=assigned_to&c%5B%5D=fixed_version&c%5B%5D=relations&c%5B%5D=priority&c%5B%5D=updated_on&c%5B%5D=category&group_by=&t%5B%5D=): Goal -> No ticket without component or responsibility tags
128 47 okurz
129
## SLOs (service level objectives)
130
131 57 livdywan
See the following as target numbers or "guideline", "should be", in priorities from top to bottom. Each query should show zero entries if objectives are met. See [openQA Tests Backlog Status](https://opensuse.github.io/openqa-tests-backlog/) for an overview of all queries.
132 50 okurz
133 47 okurz
* for picking up tickets based on priority, first goal is "urgency removal":
134 52 szarate
 * **immediate**: [<1 day](https://progress.opensuse.org/projects/openqatests/issues?utf8=%E2%9C%93&set_filter=1&sort=updated_on%3Adesc&f%5B%5D=priority_id&op%5Bpriority_id%5D=%3D&v%5Bpriority_id%5D%5B%5D=7&f%5B%5D=status_id&op%5Bstatus_id%5D=o&f%5B%5D=updated_on&op%5Bupdated_on%5D=%3Ct-&v%5Bupdated_on%5D%5B%5D=1&f%5B%5D=subproject_id&op%5Bsubproject_id%5D=*&f%5B%5D=&c%5B%5D=subject&c%5B%5D=project&c%5B%5D=status&c%5B%5D=assigned_to&c%5B%5D=due_date&c%5B%5D=updated_on&c%5B%5D=category&group_by=&t%5B%5D=) or [<1 day for all subprojects of qa](https://progress.opensuse.org/projects/qa/issues?utf8=%E2%9C%93&set_filter=1&sort=updated_on%3Adesc&f[]=priority_id&op[priority_id]=%3D&v[priority_id][]=7&f[]=status_id&op[status_id]=o&f[]=updated_on&op[updated_on]=%3Ct-&v[updated_on][]=1&f[]=subproject_id&op[subproject_id]=*&f[]=&c[]=subject&c[]=project&c[]=status&c[]=assigned_to&c[]=due_date&c[]=updated_on&c[]=category&group_by=&t[]=)
135 53 szarate
 * **urgent**: [<1 week](https://progress.opensuse.org/projects/openqatests/issues?utf8=%E2%9C%93&set_filter=1&sort=updated_on%3Adesc&f%5B%5D=priority_id&op%5Bpriority_id%5D=%3D&v%5Bpriority_id%5D%5B%5D=6&f%5B%5D=status_id&op%5Bstatus_id%5D=o&f%5B%5D=updated_on&op%5Bupdated_on%5D=%3Ct-&v%5Bupdated_on%5D%5B%5D=7&f%5B%5D=subproject_id&op%5Bsubproject_id%5D=*&f%5B%5D=&c%5B%5D=subject&c%5B%5D=project&c%5B%5D=status&c%5B%5D=assigned_to&c%5B%5D=due_date&c%5B%5D=updated_on&c%5B%5D=category&group_by=&t%5B%5D=) or [<1 day for all subprojects of qa](https://progress.opensuse.org/projects/qa/issues?utf8=%E2%9C%93&set_filter=1&sort=updated_on%3Adesc&f[]=priority_id&op[priority_id]=%3D&v[priority_id][]=6&f[]=status_id&op[status_id]=o&f[]=updated_on&op[updated_on]=%3Ct-&v[updated_on][]=7&f[]=subproject_id&op[subproject_id]=*&f[]=&c[]=subject&c[]=project&c[]=status&c[]=assigned_to&c[]=due_date&c[]=updated_on&c[]=category&group_by=&t[]=)
136 47 okurz
 * **high**: [<1 month](https://progress.opensuse.org/projects/openqatests/issues?utf8=%E2%9C%93&set_filter=1&sort=updated_on%3Adesc&f%5B%5D=priority_id&op%5Bpriority_id%5D=%3D&v%5Bpriority_id%5D%5B%5D=5&f%5B%5D=status_id&op%5Bstatus_id%5D=o&f%5B%5D=updated_on&op%5Bupdated_on%5D=%3Ct-&v%5Bupdated_on%5D%5B%5D=30&f%5B%5D=subproject_id&op%5Bsubproject_id%5D=*&f%5B%5D=&c%5B%5D=subject&c%5B%5D=project&c%5B%5D=status&c%5B%5D=assigned_to&c%5B%5D=due_date&c%5B%5D=updated_on&c%5B%5D=category&group_by=&t%5B%5D=)
137
 * **normal**: [<1 year](https://progress.opensuse.org/projects/openqatests/issues?utf8=%E2%9C%93&set_filter=1&sort=updated_on%3Adesc&f%5B%5D=priority_id&op%5Bpriority_id%5D=%3D&v%5Bpriority_id%5D%5B%5D=4&f%5B%5D=status_id&op%5Bstatus_id%5D=o&f%5B%5D=updated_on&op%5Bupdated_on%5D=%3Ct-&v%5Bupdated_on%5D%5B%5D=365&f%5B%5D=subproject_id&op%5Bsubproject_id%5D=*&f%5B%5D=&c%5B%5D=subject&c%5B%5D=project&c%5B%5D=status&c%5B%5D=assigned_to&c%5B%5D=due_date&c%5B%5D=updated_on&c%5B%5D=category&group_by=&t%5B%5D=)
138 1 okurz
 * **low**: undefined
139 47 okurz
* **Within due-date**: [0 (10 day threshold)](https://progress.opensuse.org/projects/openqatests/issues?utf8=%E2%9C%93&set_filter=1&sort=updated_on%3Adesc&f%5B%5D=status_id&op%5Bstatus_id%5D=o&f%5B%5D=subproject_id&op%5Bsubproject_id%5D=*&f%5B%5D=due_date&op%5Bdue_date%5D=%3Ct-&v%5Bdue_date%5D%5B%5D=10&f%5B%5D=&c%5B%5D=subject&c%5B%5D=project&c%5B%5D=status&c%5B%5D=assigned_to&c%5B%5D=due_date&c%5B%5D=updated_on&c%5B%5D=category&group_by=&t%5B%5D=) . Where set, we should take due-dates serious, finish tickets fast and at the very least update tickets with an explanation why the due-date could not be hold and update to a reasonable time in the future based on usual time expectations.
140
* **No closed tickets linked to currently failing tests**: [0 (daily)](https://openqa.io.suse.de/openqa-review/openqa_suse_de_status.html#closed_box) . Closed tickets mean assignees assume the issue would be fixed but as long as tests still fail either the issue was not fixed or a new issue is wrongly tracked
141 48 okurz
* **No unassigned tickets linked to currently failing tests**: [0 (daily)](https://openqa.io.suse.de/openqa-review/openqa_suse_de_status.html#unassigned_box) . Tickets linked to currently failing tests should be prioritized, at least by following the established Maintenance QA process to unschedule false-positive tests
142
143 57 livdywan
**Process**: If SLO time periods exceeded consider putting a reminder on the according ticket at the end of each SLO period. If the ticket pops up again (and the last comment was the reminder comment), de-prioritize to the next lower level (This could be automated).
144 50 okurz
Notifications: People may be watching the project in Redmine, and that's recommended for SM and PO roles. As general good practice if you're relying on someone to respond, you can also add them as a "Watcher" for an individual tickets (mentions via @ don't have that effect in Redmine).
145 57 livdywan
*Note:* Adherence is also automatically observed by @slo-gin (SLO generic infra node) which is a bot that adds reminder comments based on queries with a "updated_on" filter defined in [openSUSE/openqa-tests-backlog/queries.yaml](https://github.com/openSUSE/openqa-tests-backlog/blob/main/queries.yaml#L5).
146 51 okurz
147
Text template for update comments on outdated tickets depending on current ticket priority:
148
* **Immediate**: `This ticket was set to "Immediate" priority but was not updated within the SLO period for "Immediate" tickets (1 day) as described on https://progress.opensuse.org/projects/openqatests/wiki/Wiki#SLOs-service-level-objectives`
149 58 okurz
  * **first reminder**: `Please consider picking up this ticket within the next day or just set the ticket to the next lower priority "Urgent" (SLO: updated within 7 days).`
150
  * **second reminder**: `The ticket will be set to the next lower priority "Urgent".`
151 51 okurz
* **Urgent**: `This ticket was set to "Urgent" priority but was not updated within the SLO period for "Urgent" tickets (7 day) as described on https://progress.opensuse.org/projects/openqatests/wiki/Wiki#SLOs-service-level-objectives`
152 58 okurz
  * **first reminder**: `Please consider picking up this ticket within the next 7 days or just set the ticket to the next lower priority "High" (SLO: updated within 30 days).`
153
  * **second reminder**: `The ticket will be set to the next lower priority "High".`
154 51 okurz
* **High**: `This ticket was set to "High" priority but was not updated within the SLO period for "High" tickets (30 days) as described on https://progress.opensuse.org/projects/openqatests/wiki/Wiki#SLOs-service-level-objectives`
155 58 okurz
  * **first reminder**: `Please consider picking up this ticket within the next 30 days or just set the ticket to the next lower priority "Normal" (SLO: updated within 365 days).`
156
  * **second reminder**: `The ticket will be set to the next lower priority "Normal".`
157 51 okurz
* **Normal**: `This ticket was set to "Normal" priority but was not updated within the SLO period for "Normal" tickets (365 days) as described on https://progress.opensuse.org/projects/openqatests/wiki/Wiki#SLOs-service-level-objectives`
158 58 okurz
  * **first reminder**: `Please consider picking up this ticket within the next 365 days or just set the ticket to the next lower priority "Low" (no SLO related time period).`
159
  * **second reminder**: `The ticket will be set to the next lower priority "Low".`
160 51 okurz
* **Due-date exceeded**: `This ticket had a due set but exceeded it already by more than 14 days. We would like to take the due date seriously so please update the ticket accordingly (resolve the ticket or update the due-date or remove the due-date). See https://progress.opensuse.org/projects/openqatests/wiki/Wiki#SLOs-service-level-objectives for details.`
161
162 47 okurz
Note: Individual teams can apply different workflows in subprojects. Any differences in what can be expected should be documented accordingly
163 23 okurz
164 55 okurz
## openqa-review reminder handling
165
166
* Guideline: **No closed tickets with unhandled openqa-review reminder comments:** [0 (daily)](https://progress.opensuse.org/issues?query_id=737)
167
 1. Given any resolved ticket When an openqa-review reminder comment shows up, Then reopen the ticket To ensure the test failure is reviewed and handled accordingly
168
 1. If the reopened ticket is not acted upon the same process applies as already documented on https://progress.opensuse.org/projects/openqatests/wiki#SLOs-service-level-objectives
169
170
It is important to have a process noted down so that we have an objective base and also prevent frustration among users. There is also potential for automation but it's not a necessity to implement the process in the first step.
171
172 23 okurz
# code contribution review checklist
173
174
Check each pull request on https://github.com/os-autoinst/os-autoinst-distri-opensuse against the following rules
175
176
* https://github.com/os-autoinst/os-autoinst-distri-opensuse#coding-style
177
* DoD is adhered to
178
* SLE staging impact has been considered (be careful accepting changes during working days when a stable SLE staging project is expected by release managers)
179
180
181 15 okurz
# Test development instances (staging openQA instances)
182 10 okurz
183 1 okurz
Contributors cannot afford to verify a newly developed test in all scenarios run by o3 or osd, so tests will break sometime. It would be useful to use a machine to run a subset of the scenarios run in the official instance(s) to make sure the new tests can be deployed with some degree of confidence. But: Any "staging openQA instance" would not be able to run everything which is run in production. It just does not scale. So anyway only a subset can be run and there can be always something missing. Also, we don't have the hardware capacity to cover everything twice and also consider SLE plus openSUSE. Our [DOD](https://progress.opensuse.org/projects/openqatests/wiki/Wiki#Definition-of-DONEREADY) should cover some important steps so that external contributors are motivated to test something locally first. We have a good test review process and it has to be decided by the reviewer if he accepts the risk of a new test with or without a local verification and covering which scenarios. Depending on the contributors it might make sense to setup a staging server with a subset of tests which is used by multiple test developers to share the burden of openQA setup and administration. For example the YaST team has one available: https://wiki.microfocus.net/index.php/YAST/openQA
184 11 okurz
If you want to follow this model you can watch [this talk by Christopher Hofmann from the OSC16](https://events.opensuse.org/conference/oSC16/program/proposal/986) or ask the YaST team for their experiences.
185 16 okurz
186
# Tips for test development and issue investigation
187
188
Examples mentioned here write `clone_job` and `client`. Replace this by a call to the scripts within openQA installation with the corresponding name and proper arguments to provide your API key as well as the host selection, e.g. `/usr/share/openqa/client --host https://openqa.opensuse.org` with your API key configured in `~/.config/openqa/client.conf`
189
190
## Uploading image files to openqa server and run test on it
191
192
You can manually trigger a test job with explicit name as one-shot overriding the variables as necessary, for example:
193
194
as geekotest@openqa:
195
196
```
197
cd /var/lib/openqa/factory/hdd
198
wget http://<my_host>/<path>.qcow2 -O <new_image_name>.qcow2
199
cd /var/lib/openqa/factory/iso
200
/usr/share/openqa/script/client isos post --params SLE-12-SP2-Server-DVD-ppc64le-Build1651-Media1.iso.6.json HDD_1=SLE-12-Server-ppc64le-GM-gnome_with_snapper.qcow2 TEST=migration_offline_sle12_ppc BUILD=1651_<your_short_name>
201
```
202
203
why `SLE-12-SP2-Server-DVD-ppc64le-Build1651-Media1.iso.6.json`? I checked `SLE-12-SP2-Server-DVD-ppc64le-Build1651-Media1.iso.?.json`: There are `…5…` and `…6…`. `…5…` is for *HA* so I chose 6.
204
205
The job can be cleaned afterwards to tidy up the build history with:
206
207
```
208
client jobs/463859 delete
209
```
210
211
## Create new HDD image with openQA
212
```
213
client jobs post DISTRI=sle VERSION=12 FLAVOR=Server-DVD ARCH=ppc64le BACKEND=qemu \
214
NOVIDEO=1 OFW=1 QEMUCPU=host SERIALDEV=hvc0 BUILD=okurz_poo9714 \
215
ISO=SLE-12-Server-DVD-ppc64le-GM-DVD1.iso INSTALLONLY=1 QEMU_COMPRESS_QCOW2=1 \
216
PUBLISH_HDD_1=SLES-12-GM-gnome-ppc64le_snapper_20g.qcow2 TEST=create_gm_ppc_image \
217
MACHINE=ppc64le WORKER_CLASS=qemu_ppc64le HDDSIZEGB=20 MAX_JOB_TIME=86400 TIMEOUT_SCALE=10
218
```
219
220
The `MAX_JOB_TIME=86400 TIMEOUT_SCALE=10` allows for interactive login during the process in case you want to manually adjust or debug. Beware though that `TIMEOUT_SCALE=10` also scales the waiting time on `check_screen` so that the whole job might take longer to execute.
221
222
To run a test but based on the new HDD image search for a good example and clone it with adjusted parameter:
223
224
```
225
clone_job 462022 HDD_1=SLES-12-GM-gnome-ppc64le_snapper_20g.qcow2
226
```
227
228
## Interactive investigation
229
230
While a job is running one can connect to the worker (if network access is possible) using VNC. One challenge is that the test is still running and manual interaction with the system interferes with the test and vice versa.
231
232
233
### Making the test stop for long enough to be able to connect
234
235
If you can change the test code, i.e. if running on a development machine, you can for example add a `sleep 3600;` or `wait_serial 'CONTINUE';` at the point in test when you want to connect to the system and interact with it, e.g. to gather additional logs. In case of `wait_serial 'CONTINUE';` you can echo 'CONTINUE' to the serial point to let the test continue, e.g. call `echo 'CONTINUE' > /dev/ttyS0;`.
236
237
In case you can not or do not want to change the test code or your test run is stopping anyway at a certain point with long enough timeout you can also increase timeout with `TIMEOUT_SCALE`, e.g. trigger it with the job variable `TIMEOUT_SCALE=10`. For example a `script_run` with default timeout of 90 seconds will wait for 900 seconds (=15 minutes) which should give enough time in most cases already.
238 1 okurz
239 26 okurz
Other possibility is to enter the interactive mode using the Interactive mode button on "Live view" tab of job run and then stop the execution. After that the qemu VM will enter debug mode.
240 25 riafarov
241
### Making VM active again
242 26 okurz
In case of interactive mode usage, as mentioned above, VM will get to debug mode and freeze. To make VM interactive again, we need to send the 'cont' command over qemu HMP.
243
To perform these activity within the o3 infrastructure, multiple steps are required:
244 25 riafarov
1) Request adding your ssh public key to access o3
245 26 okurz
2) Connect to o3 using the following command:
246 25 riafarov
247
```
248
ssh o3
249 1 okurz
```
250 26 okurz
3) Now you will be able to connect as root to the worker of your choice using ssh
251
4) Use 'ps' to find relevant qemu VM instance and get the qemu telnet monitor port. Hint: you can use the vnc port shown when cursor is on the worker's name on job page, e.g.:
252 25 riafarov
253
```
254
ps aux | grep :91
255 1 okurz
```
256 26 okurz
5) Connect to the VM using VNC (see next section)
257
6) Connect to the VM monitor using telnet:
258 25 riafarov
259
```
260
telnet localhost 20072
261
```
262 26 okurz
7) Type the `cont` command to continue:
263 25 riafarov
264
```
265
cont
266 1 okurz
```
267 25 riafarov
268
NOTE: please use '^]' as escape character, detach will stop VM.
269
270 1 okurz
### VNC port forwarding
271 26 okurz
After configuring the ssh profile for connection to o3, it's possible to perform port forwarding using ssh using following command:
272 25 riafarov
273
```
274
ssh -L <local_port_number>:<worker_hostname>:<vnc_port_on_remote_host> -NT4f o3
275
```
276
277
For example:
278
279
```
280
ssh -L 5997:openqa-worker:5997 -NT4f o3
281
```
282
283 26 okurz
After that you can connect to this port using VNC.
284 16 okurz
285
### Connecting over VNC
286
287 26 okurz
The VNC port is shown on the job live view as a hover text on the instance name. Make sure to use a "shared" connection in your vncviewer. `krdc`, the default KDE VNC viewer, as well as `vinagre`, default GNOME VNC viewer, do this already. For TigerVNC use for example:
288 16 okurz
289
```
290
vncviewer -Shared malbec.arch:91
291
```
292
293
294
### Forwarding of special shortcuts
295
296
The default `vncviewer` in openSUSE/SUSE systems is recommended as it can also be used to forward special keyboard shortcuts. E.g. to change to text console:
297
Press *F8* in vncviewer, select *ctrl* and *alt* in menue, exit menue, press *F2*.
298
299
### Requesting video when by default you do not have video in your environment
300
301
Example:
302
303 1 okurz
```
304 45 okurz
openqa-clone-job https://openqa.opensuse.org/464665 NOVIDEO=0
305 16 okurz
```
306 22 okurz
307
## Structured test issue investigation
308
309
In the cases of non-trivial issues it makes sense to use the "scientific method" especially because openQA tests being system tests are under influence of many moving parts. Also see https://progress.opensuse.org/projects/openqav3/wiki#Further-decision-steps-working-on-test-issues about this.
310 1 okurz
[Bug Hunting and the Scientific Method](https://accu.org/index.php/journals/1714) is a suggested read as well as [How to Fix the Hardest Bug You've Ever Seen: The Scientific Method](http://yellerapp.com/posts/2014-08-11-scientific-debugging.html). It is suggested to note down in tickets the hypotheses of all potential relevant problem sources, design experiments - which can be as simple as checking the logfile, collect observations, accept/reject hypotheses and therefore derive a better understanding of what is happening to eventually come to a conclusion. [s390 dasdfmt fails even though command looks complete in screenshot](https://progress.opensuse.org/issues/12410) can serve as an real-world example ticket how it can look like.
311 32 okurz
312 59 okurz
See https://progress.opensuse.org/projects/openqav3/wiki/#Further-decision-steps-working-on-test-issues for a ticket template extension.
313
314 32 okurz
## Statistical investigation
315
316
In case issues appear sporadically and are therefore hard to reproduce it can help to trigger many more jobs on a production instance to gather more data first, for example the failure ratio.
317
318 54 okurz
Example of triggering 100 jobs in the development group so that the result of passed/failed jobs is counted by openQA itself on the corresponding overview page:
319 32 okurz
320 1 okurz
```
321 61 okurz
for i in {001..100} ; do openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org 123456 TEST+=-$USER_poo32242_$i BUILD=poo32242_investigation _GROUP="Test Development: openSUSE Tumbleweed" ; done
322 1 okurz
```
323 32 okurz
324 40 szarate
Alternatively, there's another script: https://github.com/foursixnine/stunning-octo-chainsaw/blob/master/openQA/trigger-multiple-jobs
325
326
Both alternatives will make the results visible on https://openqa.opensuse.org/tests/overview?build=poo32242_investigation
327 60 okurz
328
okurz suggests to use https://github.com/okurz/scripts/blob/master/count_fail_ratio to get an overview about the fail ratio and confidence interval of sporadically failing applications.