Project

General

Profile

Tools » History » Version 571

livdywan, 2025-04-24 12:55
Use a non-project version of the 5 why's queries

1 1 okurz
{{toc}}
2
3
# QE tools - Team description
4
5
"The easiest way to provide complete quality for your software"
6
7
We provide the most complete free-software system-level testing solution to ensure high quality of operating systems, complete software stacks and multi-machine services for software distribution builders, system integration engineers and release teams. We continuously develop, maintain and release our software to be readily used by anyone while we offer a friendly community to support you in your needs. We maintain the main public and SUSE internal openQA server as well as supporting tools in the surrounding ecosystem.
8
9
## Team responsibilities
10
11 75 okurz
* Develop and maintain upstream [openQA](https://github.com/os-autoinst/openQA) including the backend [os-autoinst](https://github.com/os-autoinst/os-autoinst)
12 486 okurz
* Administration of SUSE internal [openqa.suse.de (osd)](https://openqa.suse.de) and workers
13 75 okurz
* Helps administrating and maintaining [openqa.opensuse.org (o3)](https://openqa.opensuse.org), including coordination of efforts aiming at solving problems affecting o3
14 546 okurz
* Develop and maintain SUSE maintenance QA tools, e.g. [qem-bot](https://github.com/openSUSE/qem-bot/), [osc-plugin-qam](https://github.com/openSUSE/osc-plugin-qam), [MTUI](https://github.com/openSUSE/mtui)
15 223 mkittler
* Help with the investigation of specific issues, especially when they are likely related to generic, hardware or backend problems
16 1 okurz
* Support colleagues, team members and open source community
17
18
## Out of scope
19
20 223 mkittler
* Maintenance and *recurring* review of individual tests (besides openQA-in-openQA tests)
21 106 okurz
* Maintenance of special worker addendums needed for tests, e.g. external hypervisor hosts for s390x, powerVM, xen, hyperv, IPMI, VMWare (Clarification: We maintain the code for all backends but we are no experts in specific domains. So we always try to help but it's a case by case decision based on what we realistically can provide based on our competence. We can't be expected to be experts in everything and also we are limited in what we can actually test.)
22 136 okurz
* Maintenance of most openSUSE related triggering solutions, e.g. for Tumbleweed or Leap maintenance that use https://github.com/openSUSE/opensuse-release-tools on https://botmaster.suse.de. Contact "SUSE Security Solutions", e.g. Marcus Meissner, for this.
23 1 okurz
* Ticket triaging of http://progress.opensuse.org/projects/openqatests/
24
* Setup of configuration for individual products to test, e.g. new job groups in openQA
25
* Feature development within the backend for single teams (commonly provided by teams themselves)
26
27
## Our common userbase
28
29 244 okurz
Known users of our products: Most SUSE QA engineers, SUSE SLE release managers and release engineers, every SLE developer submitting "submit requests" in OBS/IBS where product changes are tested as part of the "staging" process before changes are accepted in either SLE or openSUSE (staging tests must be green before packages are accepted), same for all openSUSE contributors submitting to either openSUSE:Factory (for Tumbleweed, SLE, future Leap versions) or Leap, other GNU/Linux distributions like Fedora https://openqa.fedoraproject.org/ , AlmaLinux http://openqa.almalinux.org/, Debian https://openqa.debian.net/ , https://openqa.qubes-os.org/ , https://openqa.endlessm.com/ , the GNOME project https://openqa.gnome.org, https://www.codethink.co.uk/articles/2021/automated-linux-kernel-testing/, https://en.euro-linux.com/blog/openqa-or-how-we-test-eurolinux/, openSUSE KDE contributors (with their own workflows, https://openqa.opensuse.org/group_overview/23 ), openSUSE GNOME contributors (https://openqa.opensuse.org/group_overview/35 ), OBS developers (https://openqa.opensuse.org/parent_group_overview/7#grouped_by_build) , wicked developers (https://gitlab.suse.de/wicked-maintainers/wicked-ci#openqa), and of course our team itself for "openQA-in-openQA Tests" :) https://openqa.opensuse.org/group_overview/24 . Also see https://en.opensuse.org/openSUSE:OpenQA/Partners .
30 1 okurz
Keep in mind: "Users of openQA" and talking about "openSUSE release managers and engineers" means SUSE employees but also employees of other companies, also development partners of SUSE.
31
In summary our products, for example openQA, are a critical part of many development processes hence outages and regressions are disruptive and costly. Hence we need to ensure a high quality in production hence we practice DevOps with a slight tendency to a conservative approach for introducing changes while still ensuring a high development velocity.
32
33 427 szarate
This might be reworked via: https://github.com/os-autoinst/linux-qa/issues/1 to make it more discoverable
34
35 1 okurz
## How we work
36
37 568 okurz
The joint QE Tools team is following the DevOps approach working using a lightweight Agile approach also inspired by [Extreme Programming](https://extremeprogramming.org/) and [Kanban](https://en.wikipedia.org/wiki/Kanban_(development)) and of course the original http://agilemanifesto.org/. We structure our team and roles following [Agile Product Ownership in a Nutshell](https://youtu.be/502ILHjX9EE). We plan and track our works using tickets on https://progress.opensuse.org . We pick tickets based on priority and planning decisions. We use weekly meetings as checkpoints for progress and also track cycle and lead times to crosscheck progress against expectations. The joint QE Tools team is composed of two closely collaborating teams with individual scope:
38 1 okurz
39 568 okurz
* *dev:* openQA - upstream os-autoinst+openQA and operating openQA on o3
40
* *infra:* QE infrastructure - OSD, o3 OS and base, qem-dashboard, hardware, compliance, etc.
41 550 okurz
42
Relevant ticket queries:
43
* [tools team - joint team backlog](https://progress.opensuse.org/issues?query_id=230): The complete backlog of the joint team
44 1 okurz
* [tools team - backlog, high-level view](https://progress.opensuse.org/issues?query_id=526): A high-level view of the backlog, all epics and higher (an "epic" includes multiple stories)
45
* [tools team - backlog, top-level view](https://progress.opensuse.org/issues?query_id=524): A top-level view of the backlog, only sagas and higher (a "saga" is bigger than an epic and can include multiple epics, i.e.  "epic of epics")
46 550 okurz
* [dev team - backlog](https://progress.opensuse.org/issues?query_id=754): The backlog of the dev team
47 551 livdywan
* [infra team - backlog](https://progress.opensuse.org/issues?query_id=757): The backlog of the infra team
48 1 okurz
* [tools team - what members of the team are working on](https://progress.opensuse.org/issues?query_id=400): To check progress and know what the team is currently occupied with
49
* [tools team - closed within last 60 days](https://progress.opensuse.org/issues?query_id=541): What was recently resolved
50 269 okurz
* [tools team - next](https://progress.opensuse.org/issues?query_id=794): The staging ground for next tasks considered to be picked into the backlog
51 1 okurz
52
*Be aware:* Custom queries in the right-hand sidebar of individual projects, e.g. https://progress.opensuse.org/projects/openqav3/issues , show queries with the same name but are limited to the scope of the specific projects so can show only a subset of all relevant tickets.
53
54
### What we expect from team members
55
56 223 mkittler
* Actively show visible contributions to our products every workday *(pull requests, code review, ticket updates in descending priority, i.e. if you are very active in pull requests + code review ticket updates are much less important)*
57 1 okurz
* Be responsive over usual communication platforms and channels *(user questions, team discussions)*
58
* Stick to our rules *(this wiki, SLOs, alert handling)*
59
60
### Common tasks for team members
61
62
This is a list of common tasks that we follow, e.g. reviewing daily based on individual steps in the DevOps Process ![DevOps Process](devops-process_25p.png)
63
64
* **Plan**:
65
 * State daily learning and planned tasks in internal chat room
66
 * Review backlog for time-critical, triage new tickets, pick tickets from backlog; see https://progress.opensuse.org/projects/qa/wiki#How-we-work-on-our-backlog
67 8 okurz
 * Coordinate on the agile board https://progress.opensuse.org/agile/board?query_id=711
68 1 okurz
* **Code**:
69
 * See project specific contribution instructions
70 393 okurz
 * Provide peer-review and development support of projects about and around openQA, in particular:
71 110 livdywan
     * https://github.com/os-autoinst/openQA
72
     * https://github.com/os-autoinst/os-autoinst
73
     * https://github.com/os-autoinst/scripts
74
     * https://github.com/os-autoinst/os-autoinst-distri-openQA
75
     * https://github.com/os-autoinst/openqa-trigger-from-obs
76
     * https://github.com/os-autoinst/openqa_review
77 222 livdywan
     * https://github.com/os-autoinst/openqa_bugfetcher
78 110 livdywan
     * https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess
79
     * https://github.com/openSUSE/mtui
80
     * https://github.com/openSUSE/qem-bot
81
     * https://github.com/openSUSE/backlogger
82 222 livdywan
     * https://github.com/openSUSE/qem-dashboard
83 1 okurz
     * https://github.com/openSUSE/openSUSE-release-tools/tree/master/factory-package-news
84 393 okurz
 * Offer development help for people contributing to those projects when it's not their main job anyway, e.g. review in pull requests, offer to continue developing, help to fix CI tests, dependabot updates, etc.
85 1 okurz
* **Build**:
86
 * See project specific contribution instructions
87
* **Test**:
88
 * Monitor failures on https://app.circleci.com/pipelines/github/os-autoinst/openQA?branch=master relying on https://build.opensuse.org/project/show/devel:openQA:ci for openQA (email notifications)
89
* **Release**:
90
 * By default we use the rolling-release model for all projects unless specified otherwise
91 527 tinita
 * Monitor [devel:openQA on OBS](https://build.opensuse.org/project/show/devel:openQA) (all packages and all subprojects) for failures, ensure packages are published on http://download.opensuse.org/repositories/devel:/openQA/, ensure to be added as a Maintainer for that project (members need to be added individually, you can ask existing team members, e.g. the SM). To be notified of build errors via email:
92
     1. Add it to your watchlist:
93
          * Go to [devel:openQA](https://build.opensuse.org/project/show/devel:openQA), click on "Watchlist" in the top navi and click on "Watch this project"
94
          * Go to [subscriptions](https://build.opensuse.org/my/subscriptions), to the section "Package failed to build" and enable notifications for "Watching the project"
95
     2. Alternatively get notifications for all projects that you maintain:
96
          * Just enable "Package failed to build" - "Maintainer" notifications in [subscriptions](https://build.opensuse.org/my/subscriptions)
97 249 okurz
 * Monitor http://jenkins.qe.nue2.suse.org for the openQA-in-openQA Tests and automatic submissions of os-autoinst and openQA to openSUSE:Factory through https://build.opensuse.org/project/show/devel:openQA:tested
98 1 okurz
* **Deploy**:
99
 * o3 is automatically deployed (daily), see https://progress.opensuse.org/projects/openqav3/wiki/Wiki#Automatic-update-of-o3
100
* **Operate**:
101 487 okurz
 * Apply infrastructure changes to o3 (manually over ssh)
102 470 okurz
 * Ensure old unused/non-matching needles are cleaned up (osd+o3), see #73387
103 250 okurz
* **Monitor**:
104 1 okurz
 * For openqa.opensuse.org react on alerts from [zabbix.suse.de](https://zabbix.suse.de) (emails on [o3-admins@suse.de](http://mailman.suse.de/mailman/listinfo/o3-admins)
105
 * Look for incomplete jobs or scheduled not being worked on o3 and osd (API or webUI) - see also #81058 for *power*
106 184 okurz
 * React on alerts from https://gitlab.suse.de/openqa/auto-review/, https://gitlab.suse.de/openqa/openqa-review/, https://gitlab.suse.de/openqa/monitor-o3 (subscribe to projects for notifications)
107 1 okurz
 * Be responsive on #opensuse-factory (irc://irc.libera.chat/opensuse-factory, formerly irc://chat.freenode.net/opensuse-factory) for help, support and collaboration (Unless you have a better solution it is suggested to use [Element.io](https://matrix.to/#/!ilXMcHXPOjTZeauZcg:libera.chat) for a sustainable presence; you also need a [registered IRC account](https://libera.chat/guides/registration), formerly [freenode](https://freenode.net/kb/answer/registration)) **note** *don't use matrix features on irc!*
108
 * Be responsive on [#team-qa-tools in chat](https://app.slack.com/client/T02863RC2AC/C02AJ1E568M/thread/C02CANHLANP-1658480276.547769) for internal coordination and alarm handling, fallback to #suse-qe-tools:opensuse.org (matrix) as backup if other channels are temporarily down, alternatively public channels on matrix/ IRC if the topics are not confidential
109 3 livdywan
     - for incidents walk through the checklist in the #team-qa-tools. When it is needed, reply to the incident/comment on the Slack with a :rotating_light: 
110 96 okurz
![](clipboard-202408131356-u76hy.png)
111 547 tinita
 * Also join [#team-qa-tools-notifications](https://suse.slack.com/archives/C08L62GFQ8M)
112 449 ybonatakis
 * Be responsive on [#eng-testing](https://app.slack.com/client/T02863RC2AC/C02CANHLANP/thread/C02CANHLANP-1658480276.547769) for help, support and collaboration
113 447 ybonatakis
 * Be responsive on mailing lists opensuse-factory@opensuse.org and openqa@suse.de (see https://en.opensuse.org/openSUSE:Mailing_lists_subscription)
114 96 okurz
 * Be responsive in https://matrix.to/#/#openqa:opensuse.org or the bridged room [#openqa](https://discord.com/channels/366985425371398146/817367056956653621) on https://discord.gg/opensuse if you have a discord account
115 487 okurz
116
For SUSE internal information see https://gitlab.suse.de/suse/wiki/
117 1 okurz
118 275 livdywan
### Best practices for major changes
119
120
When proposing non-trivial changes with the potential of breaking existing tests consider the follow best practice patterns:
121
  
122
  - Make the problematic change opt-in via a test variable like MY_NEW_FEATURE_ENABLED to enable the new behavior, and otherwise log a warning only
123
  - Include a reference to a relevant GitHub PR and progress ticket
124 316 livdywan
  - If a **BARK** test is to be conducted to assess the full impact of the change an autoreview regex matching the most relevant error message should be prepared so that affected jobs can be restarted trivially without disrupting daily operation too much - it's called a *BARK* test from how the bark of a tree is scratched to confirm if it's green and alive or brown and not healthy anymore.
125 275 livdywan
  - Inform all stakeholders in relevant Slack channels, Matrix and mailing lists
126
  - Include an explicit mention in the release notes
127
128 308 livdywan
### Guideline for communication in tickets
129
130
* Clarify action items and steps (to be) taken, for example
131
  * I will implement ... from the suggestions
132
  * I will monitor ... and evaluate results
133
  * Confirm if other experts will provide reproducers
134
  * Document mitigations with references to MRs, PRs or manual file changes and keep the description updated
135
  * Confirm if adjustments made by others are still in place
136
* Explicitly include examples of what won't be done
137
  * I won't look into the test code itself here
138
  
139
* Make use of the [scientific method template with hypotheses, experiments and observations]( https://progress.opensuse.org/projects/openqav3/wiki/#Further-decision-steps-working-on-test-issues)
140
141 1 okurz
### How we work on our backlog
142
143 56 okurz
* "due dates" are only used as exception or reminders. Commonly the due-date is set [automatically](https://github.com/os-autoinst/scripts/blob/master/backlog-set-due-date) to 14 days in the future as soon as a non-low ticket is picked up. That period is roughly the median cycle time which we want to stay well below. And on top, to prevent redmine sending a reminder and the backlog status to flag issues the ticket should be resolved before the due-date, at least a day but possibly a reminder is sent out even on the last day before so better resolve on the second to last day. Of course, even better to always try to finish as soon as possible, well before the due date.
144 1 okurz
* every team member can pick up tickets themselves
145
* everybody can set priority, PO can help to resolve conflicts
146
* consider the [ready, not assigned/blocked/low](https://progress.opensuse.org/issues?query_id=490) query as preferred. It is suggested to pick up tickets based on priority. "Workable" tickets are often convenient and hence preferred.
147
* ask questions in tickets, even potentially "stupid" questions, oftentimes descriptions are unclear and should be improved
148 522 livdywan
* Ask assignees and domain experts explicitly for agreement or disagreement, especially for [expert tickets](https://progress.opensuse.org/issues?query_id=1093)
149 1 okurz
* There are "low-level infrastructure tasks" only conducted by some team members, the "DevOps" aspect does not include that but focusses on the joint development and operation of our main products
150 520 livdywan
* Consider tickets with the subject keyword or tag [beginner](https://progress.opensuse.org/issues?query_id=1096) as good learning opportunities for people new to a certain area. Experts in the specific area should prefer helping others but not work on the ticket
151 521 livdywan
* For tickets which are out of the scope of the team remove from backlog, delegate to corresponding teams or persons but be nice and supportive, e.g. see [our IT ticket handling process](https://gitlab.suse.de/suse/wiki/-/blob/main/qe_infrastructure.md#suse-it-ticket-handling) and [SLA](https://confluence.suse.com/display/qasle/Service+Level+Agreements), [test maintainer](https://progress.opensuse.org/projects/openqatests/), QE-LSG PrjMgr/mgmt
152 1 okurz
* Whenever we apply changes to the infrastructure we should have a ticket
153
* Refactoring and general improvements are conducted while we work on features or regression fixes
154
* For every regression or bigger issue that we encounter try to come up with at least two improvements, e.g. the actual issue is fixed and similar cases are prevented in the future with better tests and optionally also monitoring is improved
155 571 livdywan
* For critical issues and very big problems especially when we were informed by users about outages collect "lessons learned", e.g. in notes in the ticket or a meeting with minutes in the ticket, consider [Five whys](https://en.wikipedia.org/wiki/Five_whys) and answer at least the following questions: "User impact, outwards-facing communication and mitigation, upstream improvement ideas, Why did the issue appear, can we reduce our detection time, can we prevent similar issues in the future, what can we improve technically, what can we improve in our processes". See [previous 5 whys](https://progress.opensuse.org/issues?query_id=1115). Also see https://youtu.be/_Dv4M39Arec
156 1 okurz
* okurz proposes to use "#NoEstimates". Though that topic is controversial and often misunderstood. https://ronjeffries.com/xprog/articles/the-noestimates-movement/ describes it nicely :) Hence tickets should be evenly sized and no estimation numbers should be provided on tickets
157
* If you really want you can look at the [burndown chart](https://progress.opensuse.org/agile/charts?utf8=%E2%9C%93&set_filter=1&f%5B%5D=chart_period&op%5Bchart_period%5D=%3E%3Ct-&v%5Bchart_period%5D%5B%5D=90&f%5B%5D=fixed_version_id&op%5Bfixed_version_id%5D=%3D&v%5Bfixed_version_id%5D%5B%5D=418&f%5B%5D=&chart=burndown_chart&chart_unit=issues&interval_size=day) (some people wish to have this) but we consider it unnecessary due to the continuous development, not a project with defined end. Also an [agile board](https://progress.opensuse.org/agile/board?utf8=%E2%9C%93&set_filter=1&f%5B%5D=fixed_version_id&op%5Bfixed_version_id%5D=%3D&v%5Bfixed_version_id%5D%5B%5D=418&f%5B%5D=status_id&op%5Bstatus_id%5D=%3D&f_status%5B%5D=1&f_status%5B%5D=12&f_status%5B%5D=2&f_status%5B%5D=15&f_status%5B%5D=4&c%5B%5D=tracker&c%5B%5D=assigned_to&c%5B%5D=cf_16) is available but likely due to problems within the redmine installation ordering cards is not reliable.
158
* Write to qa-team@suse.de as well for critical changes as well as chat channels
159
* Everyone should propose reverts of features if we find problems that can not be immediately fixed or worked around in production
160
161
#### Definition of DONE
162
163 367 okurz
Also see https://web.archive.org/web/20110308065330/ http://www.allaboutagile.com/definition-of-done-10-point-checklist/ and https://web.archive.org/web/20170214020537/ https://www.scrumalliance.org/community/articles/2008/september/what-is-definition-of-done-(dod)
164 1 okurz
165
* Code changes are made available via a pull request on a version control repository, e.g. github for openQA
166
* [Guidelines for git commits](http://chris.beams.io/posts/git-commit/) have been followed
167
* Code has been reviewed (e.g. in the github PR)
168 23 okurz
* Depending on criticality/complexity/size/feature: A local verification test has been run, e.g. post link to a local openQA machine or screenshot or logfile (especially also for hardware-related changes, e.g. in os-autoinst backend)
169 1 okurz
* For regressions: A regression fix is provided, flaws in the design, monitoring, process have been considered
170
* Potentially impacted package builds have been considered, e.g. openSUSE Tumbleweed and Leap, Fedora, etc.
171
* Code has been merged (either by reviewer or "mergify" bot or reviewee after 'LGTM' from others)
172
* Code has been deployed to osd and o3 (monitor automatic deployment, apply necessary config or infrastructure changes)
173
174
#### Definition of READY for new features
175
176
The following points should be considered before a new feature ticket is READY to be implemented:
177
178
* Follow the ticket template from https://progress.opensuse.org/projects/openqav3/wiki/#Feature-requests
179
* A clear motivation or user expressing a wish is available
180
* Acceptance criteria are stated (see ticket template) or use `[timeboxed:<nr>h]` with `<nr>` hours for tasks that should be limited in time, e.g. a research task with `[timeboxed:20h] research …`
181
* add tasks as a hint where to start
182
183
#### WIP-limits (reference "Kanban development")
184
185
* global limit of 10 tickets, and 3 tickets per person respectively [In Progress](https://progress.opensuse.org/issues?query_id=505)
186 294 okurz
* global limit of 10 tickets in [Feedback, not-low](https://progress.opensuse.org/issues?query_id=520)
187 1 okurz
188
#### Target numbers or "guideline", "should be", in priorities
189
190
1. *New, untriaged QA (openQA, etc.):* [0 (daily)](https://progress.opensuse.org/projects/qa/issues?query_id=576) . Every ticket should have a target version, e.g. "Ready" for QE tools team, "future" if unplanned, others for other teams
191
1. *Untriaged "tools" tagged:* [0 (daily)](https://progress.opensuse.org/issues?query_id=481) . Every ticket should have a target version, e.g. "Ready" for QE tools team, "future" if unplanned, others for other teams
192
1. *Workable (properly defined):* [10-40](https://progress.opensuse.org/issues?query_id=478) . Enough tickets to reflect a proper plan but not too many to limit unfinished data (see "waste")
193
1. *Overall backlog length:* [ideally less than 100](https://progress.opensuse.org/issues?query_id=230) . Similar as for "Workable". Enough tickets to reflect a proper roadmap as well as give enough flexibility for all unfinished work but limited to a feasible number that can still be overlooked by the team without loosing overview. One more reason for a maximum of 100 are that pagination in redmine UI allows to show only up to 100 issues on one page at a time, same for redmine API access.
194
1. *Within due-date:* [0 (daily/weekly)](https://progress.opensuse.org/issues?query_id=514) . We should take due-dates serious, finish tickets fast and at the very least update tickets with an explanation why the due-date could not be hold and update to a reasonable time in the future based on usual cycle time expectations
195
196 291 okurz
#### SLAs (service level agreements)
197 1 okurz
198 291 okurz
* for at least picking up tickets, better providing reasonable updates based on priority, first goal is "urgency removal":
199 1 okurz
 * **immediate**: [<1 day](https://progress.opensuse.org/issues?query_id=542)
200
 * **urgent**: [<1 week](https://progress.opensuse.org/issues?query_id=543)
201
 * **high**: [<1 month](https://progress.opensuse.org/issues?query_id=544)
202
 * **normal**: [<1 year](https://progress.opensuse.org/issues?query_id=545)
203
 * **low**: undefined
204
205 291 okurz
* "reasonable updates": Provide fixes, workarounds or at least state of progress or when the task is blocked
206 296 okurz
* to ensure timely updates immediate/urgent tickets must never be in status "Blocked" or "Feedback"
207 1 okurz
* aim for cycle time of individual tickets (not epics or sagas): 1h-2w
208
209 291 okurz
#### SLOs (service level objectives, internal)
210
211 299 okurz
* For providing reasonable updates on tickets in our backlog based on priority, first goal is "urgency removal":
212 291 okurz
 * **immediate**: multiple times within the day
213 300 livdywan
 * **urgent**: [<1 day](https://progress.opensuse.org/issues?query_id=824)
214
 * **high**: [<1 week](https://progress.opensuse.org/issues?query_id=827)
215
 * **normal**: [<1 month](https://progress.opensuse.org/issues?query_id=830)
216 291 okurz
 * **low**: <1 year
217
218 299 okurz
* Frequent updates do not necessarily need to happen in tickets but visible in written form, e.g. just internal chat. Especially in ticket updates every comment should give a clear answer: Who plans to do what until when, in particular the ticket assignee.
219
* Reference for SLOs and related topics: https://sre.google/sre-book/table-of-contents/
220 1 okurz
221 167 okurz
#### Status overview
222
223
Dynamic dashboard showing target numbers and SLOs: https://os-autoinst.github.io/qa-tools-backlog-assistant/
224
225 1 okurz
#### Backlog prioritization
226
227
When we prioritize tickets we assess:
228
1. What the main use cases of openQA are among all users, be it SUSE QA engineers, other SUSE employees, openSUSE contributors as well as any other outside user of openQA
229 463 okurz
2. We try to understand how many persons and products are affected by feature requests as well as regressions and prioritize issues affecting more persons and products and use cases over limited issues. See #120540 for details in particular about the various os-autoinst backends
230 1 okurz
3. We prioritize regressions higher than work on (new) feature requests
231
4. If a workaround or alternative exists then this lowers priority. We prioritize tasks that need deep understanding of the architecture and an efficient low-level implementation over convenience additions that other contributors are more likely to be able to implement themselves.
232
233 65 livdywan
#### Periodic backlog refinement
234 1 okurz
235 66 livdywan
These queries can be used to help organize our work efficiently
236 1 okurz
237 271 okurz
1. [QE tools team - backlog - sorted by update time](https://progress.opensuse.org/issues?query_id=654) ensure all tickets are reasonably up-to-date and don't keep hanging around
238
2. [QE tools team - due date forecast](https://progress.opensuse.org/issues?query_id=651) prevent running into due-dates proactively
239
3. [QE tools team - next - sorted by update time](https://progress.opensuse.org/issues?query_id=797) ensure all *next* tickets are reasonably up-to-date and considered for the backlog
240
4. [QE tools team - backlog, non-reactive, needs parent](https://progress.opensuse.org/issues?query_id=729) ensure all our (non-reactive) work is linked to higher-level planning as motivation
241 66 livdywan
242 569 livdywan
It's good practice to keep an eye on the queries to anticipate blockers. All team members are encouraged to utilize them.
243 1 okurz
244 67 livdywan
Note that due dates should provide a hint as to when a ticket will be resolved but they need to be realistic. Availability, reviews and deployment need to be factored in as well since typically a ticket will be in *Feedback* before it can be resolved. If in doubt the Due date should be extended with an accompanying message like "Outstanding branches still need to be reviewed" or simply "Bumping the due date because of availability".
245
246 1 okurz
### Team meetings
247
248 545 livdywan
**Note:** We're are using [the virtual office on workadventu.re](https://play.workadventu.re/@/suse/suse-office/space-station-office) for regular meetings unless otherwise mentioned. We meet at the glass table in the north-east (walk towards the right) linked to https://meet.opensuse.org/6blugd-meetopensuseorg .
249 542 livdywan
250 545 livdywan
There's other tables for ad-hoc conversations such as the **white sofa space** where it says **senf call** linked to https://lecture.senfcall.de/liv-inx-vgs-jf0 . You just need to be next to each other to chat.
251 1 okurz
252 363 livdywan
**Regular calls:**
253 508 okurz
* **Dailies:** *Infra* Every weekday 1020-1035 CET/CEST, *Dev* Every weekday 1040-1055 CET/CEST. Use (internal) chat actively, e.g. formulate your findings or achievements and plans for the day, "think out loud" while working on individual problems. Join **our regular meeting location** . At the latest at 1100 CET/CEST everyone working on that day must have checked in, at least with a text message in chat.
254 1 okurz
  * *Goal:* Emergency responses, clarify next steps or blockers on current work items, asking and answering questions on tickets that would be ignored otherwise, ticket estimations (after the regular daily) (compare to [Daily Scrum](https://www.scrumguides.org/scrum-guide.html#events-daily))
255 506 livdywan
  * *Conduction:* Answer the following questions concerning [dev tickets](https://progress.opensuse.org/issues?query_id=754) and [infra tickets](https://progress.opensuse.org/issues?query_id=757) respectively:
256 501 mkittler
      1. Is the [backlog status](https://os-autoinst.github.io/qa-tools-backlog-assistant/) green?
257
      2. Are there any time-critical issues to be handled?
258
      3. What was achieved since the last time?
259 407 okurz
      4. Who needs help?
260
      5. Plans until next time?
261 523 livdywan
      6. Are the ACs still feasible given the due date on the ticket?
262 500 mkittler
* **Ticket Estimations:** *Infra* Every Tuesday 1400 CET/CEST, *Dev* Every Thursday 1100-1150 CET/CEST  including a 5 minute break
263 506 livdywan
  * *Goal:* Estimate t-shirt sizes for non-estimated tickets i.e. [non-estimated infra tickets](https://progress.opensuse.org/issues?query_id=1025) and [non-estimated dev tickets](https://progress.opensuse.org/issues?query_id=717).
264 1 okurz
  * *Goal:* Ensure tickets are workable. Refine and split tickets for larger estimates.
265 424 livdywan
  * *Conduction:*
266 563 livdywan
      1. Consider using [SCRUM Poker](https://www.scrumpoker-online.org/en/room/52534457/) or Jitsi surveys to make explicit decision points for ticket estimation calls to prevent awkward silences
267 510 livdywan
      2. Check who reads out tickets, prepares the [etherpad](https://etherpad.opensuse.org/p/suse_qe_tools) and updates the ticket respectively at the start of the call
268 501 mkittler
      3. Try and aim for S size tickets (e.g. <20h of effort), and split up the ticket if needed. An M size ticket is more complex, e.g. when multiple code repositories need to be touched.
269
      4. If a ticket can't be estimated in 10 minutes, schedule a follow-up conversation or skip the ticket e.g. with a short comment on open questions
270 520 livdywan
      5. Consider adding a tag for [beginner](https://progress.opensuse.org/issues?query_id=1096) and [expert](https://progress.opensuse.org/issues?query_id=1093) tickets.
271 1 okurz
      6. Explicitly consider a good **subject** for each ticket
272 563 livdywan
      7. Consider going by priority and status i.e. start with immediate/urgent/high and otherwise feedback/progress
273
      8. If a ticket is in feedback, check if it needs more discussion or can be resolved
274 426 okurz
* **Midweekly Unblock:** Every Wednesday 1100-1150 CET/CEST including a 5 minute break
275 517 livdywan
  * *Goal:* Discuss tasks in progress in more detail, unblock currently assigned tasks and tasks avoided for longer (see [[Tools#Periodic-backlog-refinement|Periodic backlog refinement]]), apply the **pull principle** based on [tickets in progress](https://progress.opensuse.org/issues?query_id=505) firstly and [tickets updated by priority](https://progress.opensuse.org/issues?query_id=771) secondarily, confirm if a ticket should be split up or re-estimated, and whether the difficulty level was considered before - the *collaborative session* can be used to dedicate more time to tickets in need of attention.
276 385 livdywan
* **Collaborative Session:** Thursdays between 1330-1630 CET/CEST in **our regular meeting location** if a topic was picked at the latest in the **Estimations** and announced accordingly. Pick from [previous suggestions](https://progress.opensuse.org/issues?query_id=833) or bring up your own topic
277 1 okurz
  * *Goal:* Follow-up on tasks too difficult to solve alone, or where someone looks to be stuck using pair programming and other means
278 426 okurz
* **Fortnightly Coordination:** Friday 1100-1150 CET/CEST every even week including a 5 minute break. Community members and guests are particularly welcome to join this meeting.
279 507 livdywan
  * *Goal:* Reflect on how well the team worked in the past two weeks, Team backlog coordination and design decisions of bigger topics (compare to [Sprint Planning](https://www.scrumguides.org/scrum-guide.html#events-planning)).
280
  * *Conduction:* Evaluate [metrics](https://monitor.qa.suse.de/d/ck8uu5f4z/agile?orgId=1&refresh=30m)(#152957), Demo recently finished feature work depending on [last closed](https://progress.opensuse.org/issues?query_id=572), crosscheck status of team, discuss blocked tasks and upcoming work
281
  * *Metrics*: an Average can be defined as the sum of all numbers divided by the total number of values / a mean can be defined as an average of the set of values in a sample of data / the 
282 426 okurz
* **Fortnightly Retrospective:** Friday 1100-1150 CET/CEST every odd week including a 5 minute break - a *link to our retro board* can be found in the Slack bookmarks, or the reminder to join the call.
283 304 livdywan
  * *Goal:* Inspect and adapt, learn and improve (compare to [Sprint Retrospective](https://www.scrumguides.org/scrum-guide.html#events-retro))
284 399 livdywan
  * *Conduction:* The board is made available via Slack. Topics can also be brought up in conversation. Follow-up actions will be [recorded as tickets with the "retro" tag](https://progress.opensuse.org/issues?query_id=999).
285 340 livdywan
* **Virtual coffee:** Weekly every Monday 1330-1345 CET/CEST in **our regular meeting location**.
286 293 okurz
  * *Goal:* Connect and bond as a team, understand each other (compare to [Informal Communication in an all-remote environment](https://about.gitlab.com/company/culture/all-remote/informal-communication))
287 1 okurz
288 5 okurz
#### Weekly moderation duty
289 237 livdywan
290 569 livdywan
* By default calls are moderated by the SM of the respective team (Liv/Tina) or Oliver. Otherwise a fallback moderator can take over, see [[Tools#Team|Team]] for people with a **mod** indicator. See also #177895#note-12
291 552 livdywan
* The moderator should make an effort to give everyone a chance to participate, or if nothing else ask if anyone had no chance to speak up
292
* For the conduction of calls see [[Tools#Team-meetings|Team meetings]] and [[Tools#Best-practices-for-meetings|Best practices for meetings]]
293
* Hand over to the next person after the weekly, going by the order of team members in the wiki
294 1 okurz
* Asks for standin on unavailabilities
295
296 5 okurz
#### Best practices for meetings
297 565 livdywan
* It is recommended to use the Jitsi Audio-feedback feature, blue/green circles depending on microphone volume. Everybody should ensure that at least "two green bubbles" show up. Consider [audio trouble shooting hints from the openSUSE Support DataBase](https://en.opensuse.org/SDB:Audio_troubleshooting#Configuring_the_microphone)
298
* Ask and give feedback regarding the audio quality. Use [plain-language radio checks](https://en.wikipedia.org/wiki/Plain_language_radio_checks) such as "Loud and clear" if everything is good or "weak but readable" if of low volume but one can understand what the person is saying if there is no one else overlaying with higher volume. Or "loud but distorted" if there are interferences, e.g. broken sound due to overloaded system or too low connection bandwidth.
299 443 okurz
* Hand signals over video can be used, e.g. "waving/circling hands": "I am lost, please bring me into discussion again"; "T-Sign": "I need a break"; "Raised hand": "I would like to speak"
300 567 livdywan
* Ask if people need a break, especially between topics and ideally announce this spoken and in chat to avoid confusion about the duration of the break
301 565 livdywan
* Make the end of each meeting explicit. For example clearly mention when a meeting is done plus use visual cues like a chat message "daily is over" or the Jitsi "clapping hands" 👏 reaction. This ensures that people are engaged in the meeting and only staying as long as they want and can be engaged and not miss it when a new spontaneous meeting started end-to-end #153937
302 1 okurz
* Use https://etherpad.opensuse.org/p/suse_qe_tools for collaborative editing and put the content back into tickets or wikis. For a SUSE internal and hence more protected environment use https://etherpad.prg2.suse.org/
303 565 livdywan
304
#### Daylight savings
305
* We would prefer UTC for meeting times but as many other SUSE meetings are bound to European time we usually observe German daylight savings
306
* Reminders in Slack correct for summer/winter time automatically but if you make changes the time might be shifted by one hour e.g. if you scheduled a reminder on 10:30 am CEST, it will become 9:30 CET
307 528 livdywan
308
#### Workshop
309
  * Time: Friday 0900-0950 CET/CEST every even week in [meet.opensuse.org/suse_qa_tools](https://meet.opensuse.org/suse_qa_tools) especially for community members and users!
310
  * *Goal:* Demonstrate new and important features, explain already existing, but less well-known features, and discuss questions from the user community. All your questions are welcome!
311
  * *Announcements:* Consider to drop a reminder with a teaser in [#eng-testing](https://app.slack.com/client/T02863RC2AC/C02CANHLANP/thread/C02CANHLANP-1658480276.547769). On schedule changes consider updating https://calendar.opensuse.org/teams/qe-tools/events/suse-qe-tools-workshop
312 533 livdywan
  * *Recordings:* Consider recording, e.g. using OBS, and upload it to [our YouTube channel](https://www.youtube.com/@openQAWorkshop). SUSE internal topics can be published on http://streaming.nue.suse.com/i/QE-Tools-Workshops/ by ssh-uploading to ftp@streaming.nue.suse.com:~/i/QE-Tools-Workshops/ (get your SSH key added by existing team members, e.g. okurz)
313 528 livdywan
  * *Content:* See [topics](https://progress.opensuse.org/projects/qa/wiki/Tools#Workshop-Topics)!
314 1 okurz
315
#### Workshop Topics
316
317 439 livdywan
* The *SUSE QE Tools roadmap*: Recent achievements, mid-term plan and future outlook. Every first Friday every even month
318 1 okurz
* Find older workshop topics and recordings on our [[ToolsWorkshopArchive|SUSE QE Tools Workshop Archive]]
319 439 livdywan
320
For the call details see [Team Meetings](https://progress.opensuse.org/projects/qa/wiki/Tools#Team-meetings)
321 345 okurz
322 498 okurz
* **2025-01-10:** *DONE* [Onboarding experience in SUSE QE tools](https://youtu.be/GF4rO4ufT98) #168712 (@robert.richardson)
323 524 okurz
* **2025-01-24:** *DONE* [Increasing code coverage in os-autoinst based on example](https://youtu.be/D8N48_RjDug) #167932 (@gpuliti)
324 512 okurz
* **2025-02-07:** *DONE* [SUSE QE Tools roadmap - 2025-02](https://youtu.be/KtHJvHtaBbs) (@okurz)
325 515 okurz
* **2025-02-21:** *DONE* [How to contribute to openQA based on example #134840](https://youtu.be/ZO_wrDVyJzg) (@robert.richardson)
326 529 okurz
* **2025-03-07:** *DONE* [Allow variable expansion incorporating worker settings](https://youtu.be/R1HX5m-PNJ0) #169159 (@mkittler)
327 539 okurz
* **2025-03-21:** *DONE* [How we automatically update perl modules in devel:languages:perl](https://youtu.be/xvQBov7BWI8) (@tinita)
328 549 okurz
* **2025-04-04:** *DONE* [SUSE QE Tools roadmap - 2025-04](https://youtu.be/NVoQjR64srE) (@okurz)
329 513 okurz
* **2025-04-18:** *skipped due to common public holiday*
330
* **2025-05-02:** *skipped due to common holiday bridging day*
331
* **2025-05-16:** *Automatic submission of openQA related packages using CI pipelines and openQA-in-openQA* (@okurz, @jbaier_cz)
332
* **2025-05-30:** *skipped due to common holiday bridging day*
333 496 okurz
* **2025-06-13:** *SUSE QE Tools roadmap - 2025-06*
334 513 okurz
* **2025-06-27:** *meet us at the openSUSE conference*
335 526 okurz
* **2025-07-11:** *Full version control awareness within openQA #58184* (@okurz)
336 496 okurz
* **2025-07-25:** **
337 513 okurz
* **2025-08-08:** *skipped due to summer holiday period*
338 496 okurz
* **2025-08-22:** **
339
* **2025-09-05:** **
340
* **2025-09-19:** **
341
* **2025-10-03:** *skipped due to common public holiday*
342
* **2025-10-17:** *SUSE QE Tools roadmap - 2025-10*
343
* **2025-10-31:** **
344
* **2025-11-14:** **
345
* **2025-11-28:** **
346
* **2025-12-12:** *SUSE QE Tools roadmap - 2025-12*
347
* **2025-12-26:** *skipped due to common public holiday*
348 497 jbaier_cz
349 1 okurz
---
350 349 livdywan
351 1 okurz
* *periodic proposal by okurz: How to report tickets, investigate issues, etc. (#104805)*
352 179 okurz
* *general proposal: if there are no further topics make it an "open conversation", at least from time to time :)*
353 371 okurz
* *proposal by okurz: Generic agile project management trainings and tutorials*
354 25 okurz
* feedback from yearly workshop review: run it every second week but maybe longer, more interactive, more technical sessions, about backends and more openQA internals, from jlausuch: maybe understanding how svirt backend boots VMs in s390x, VMWare, etc? Highlight the differences between how qemu backend spawns VMs and how others do
355
356
**Note:** Everybody should feel welcome to add topic proposals here or approach us with ideas or requests.
357 416 okurz
Remove appointments from https://calendar.opensuse.org/ when events are skipped.
358 1 okurz
359
#### Announcements
360
361
- For every meeting, regular or one-off, desired attendants should be invited to make sure a slot blocked in their calendar and reminders with the correct local time will show up when it's time to join the meeting
362
  - Create a new event, for example in Thunderbird via the *Calendar* tab or `New > Event` via the menu.
363 17 livdywan
  - Select individual attendants via their respective email addresses .g. *Invite attendees* in Thunderbird
364 1 okurz
  - Specify the time of the meeting
365
  - Set a schedule to repeat the event if applicable.
366
  - Add a location, e.g. https://meet.opensuse.org/suse_qa_tools
367
  - Don't worry if any of the details might change - you can update the invitation later and participants will be notified.
368 17 livdywan
  - Prefer new events if the time and date change
369 1 okurz
- See the respective meeting for regular actions such as communication via chat
370
371
### Team
372
373 557 okurz
The team is comprised of engineers from different organisational units, some only partially available:
374 245 livdywan
1. Liv Dywan (Scrum Master - Ensure that we build it fast) @livdywan / [@kalikiana](https://github.com/kalikiana)
375 1 okurz
1. [Oliver Kurz](https://progress.opensuse.org/users/17668) (Product Owner - Ensure that we build the right thing) @okurz / [@okurz](https://github.com/okurz)
376 548 okurz
1. Nick Singer (only infra) @nicksinger / [@nicksinger](https://github.com/nicksinger)
377 1 okurz
1. Tina Müller (dev; Scrum Master Dev April 31 - May 16, see #179966) (Part time (35h)) @tinita / [@perlpunk](https://github.com/perlpunk)
378 562 okurz
1. Dominik Heidler (both dev and infra) @dheidler / [@dheidler](https://github.com/asdil12)
379 569 livdywan
1. Marius Kittler (**mod**) @mkittler / [@Martchus](https://github.com/Martchus)
380 1 okurz
1. Yannis Bonatakis @ybonatakis / [@b10n1k](https://github.com/b10n1k)
381 569 livdywan
1. Robert Richardson (**mod**) @robert.richardson / [@r-richardson](https://github.com/r-richardson)
382
1. Gaurav Pathak (**mod**, infra) @gpathak / [@gauravpathak](https://github.com/gauravpathak)
383
1. Gabriele Puliti (**mod**, dev) @gpuliti / [@wabri](https://github.com/wabri)
384
1. Emil Miler (**mod**, March 26 2025 - June 20) @emiler / [@realcharmer](https://github.com/realcharmer)
385 1 okurz
1. ~~Sebastian Riedel (mostly working on other projects currently, only bug fixing and feature development) @kraih / [@kraih](https://github.com/kraih)~~
386 538 okurz
1. ~~Jan Baier (part time, QEM-dedicated work areas) @jbaier_cz / [@baierjan](https://github.com/baierjan)~~ (on a temporary assignment in UV squad)
387 402 okurz
388 1 okurz
### Onboarding for new joiners
389
390 488 okurz
391 52 okurz
* For mentors: https://plan.io/blog/hire-remote-developers/
392 432 robert.richardson
393 1 okurz
#### Communication
394 488 okurz
* Subscribe to [opensuse-factory@opensuse.org](https://lists.opensuse.org/archives/list/factory@lists.opensuse.org)
395 1 okurz
* Connect to `#opensuse-factory` on *libera.chat*, see [[Tools#Common-tasks-for-team-members|Common tasks/ Monitoring]]
396 514 livdywan
* Consider logging in with the [osd-admins mailing list](https://mailman.suse.de/mailman/admin/osd-admins) with [the according password](https://gitlab.suse.de/openqa/password/-/blob/main/password?ref_type=heads#L21) to add yourself to the list of admins
397 432 robert.richardson
398 452 livdywan
#### OBS
399
* Request to join [qe-tools-team on OBS](https://build.opensuse.org/groups/qe-tools-team) and check that you have `Request created`, `New comment for request created`, `New comment for package created` enabled for `Maintainer of the target` in your [OBS notification settings](https://build.opensuse.org/my/subscriptions)
400 478 gpuliti
* Add [devel:openQA on OBS](https://build.opensuse.org/project/show/devel:openQA) to your watchlist
401 432 robert.richardson
402
#### Github
403 1 okurz
* Request to get added to the [os-autoinst Tools team on GitHub](https://github.com/orgs/os-autoinst/teams/tools-team) as well as the [openSUSE Tools team](https://github.com/orgs/openSUSE/teams/qa-tools-team) (Link throws 404 if not org member, existing members can filter for owners to ask for new members to be added) and subscribe to notifications for projects within that organization
404 543 livdywan
    * Once you have been added to these two teams, create an account on [CircleCI](https://app.circleci.com) and [join the os-autoinst org on CircleCI (which will prompt for your GitHub login)](https://app.circleci.com/launchpad/invited?inviter=ecfa6f52-0477-49ec-a0aa-9ae3a3bf0b2f&invitePage=org-settings&orgId=3de1a98f-9537-4bfd-8e7f-fb63899936aa&vcsType=github) and change the "User Settings -> Project notifications" to be notified by "All builds in my project" and make sure you have your @suse.com email address selected for "openSUSE" and "os-autoinst" 
405 485 gpuliti
* *Watch* [qa-tools-backlog-assistant](https://github.com/os-autoinst/qa-tools-backlog-assistant) and choose *All Activity*
406 432 robert.richardson
* Subscribe to notifications of the [Mojo-IOLoop-ReadWriteProcess project on GitHub](https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess) as it is also closely related to openQA development
407 483 gpuliti
* Ensure you are subscribed to all projects referenced in [[Tools#Common-tasks-for-team-members|Common tasks for team members]]
408 479 gpuliti
409 437 livdywan
#### Other
410 479 gpuliti
* Request to get added to the [QA project in Progress](https://progress.opensuse.org/projects/qa) and *enable notifications for the "QA" project* in [your account settings](https://progress.opensuse.org/my/account)
411 432 robert.richardson
* Watch this wiki page (click "Watch" button on top of this page)
412 488 okurz
* SUSE internal onboarding: https://gitlab.suse.de/suse/wiki/-/blob/main/qe_infrastructure.md?#suse-qe-tools-team-onboarding
413 35 okurz
414 6 okurz
### Offboarding
415
416
When someone leaves the team the following steps should be taken
417 1 okurz
418
* Conduct a team-internal exit-interview (Learn about what was good, what can be improved, what to learn)
419
* Remove from https://github.com/orgs/os-autoinst/teams/tools-team . Optionally add the people still as contributors with additional priviledges to individual projects
420
* Remove from team calendars
421
422
### Alert handling
423
424
#### Best practices
425
426
* "if it hurts, do it more often": https://www.martinfowler.com/bliki/FrequencyReducesDifficulty.html
427
* Reduce [Mean-time-to-Detect (MTTD)](https://searchitoperations.techtarget.com/definition/mean-time-to-detect-MTTD) and [Mean-time-to-Recovery](https://raygun.com/blog/what-is-mttr/)
428
429
#### Process
430 382 okurz
431 142 okurz
* React on any alert or report of an outage
432 464 livdywan
* If users report outages of components of our infrastructure
433
  * Ensure there is a ticket on the backlog tracking the issue
434
* For any user-facing outages
435
  * Consider teaming up and assigning individual tasks to focus on
436
  * Inform affected users about the impact and ETA via chat channels, ticket updates and mailing list posts
437
  * Look into mitigations and short-term workarounds such as a hotpatch in production or a revert to an older release
438
  * Investigate a proper solution with a conservative estimate on the effort involved
439
  * Set a time limit to ensure either a workaround or a solution is available within a reasonable amount of time (for example 4 hours or end of working day of the person communicating the changes)
440
  * Join an ad-hoc video call to discuss further steps
441
  * Keep a record of what was discussed and investigated to allow for a later analysis
442 283 livdywan
  * Look into symptoms such as restarting incomplete jobs
443
* For each failing alert, e.g. Grafana, failing CI pipelines, etc.
444
 * Create a ticket for the issue (with a tag "alert"; create ticket unless the alert is trivial to resolve and needs no improvement; if an alert is unhandled for at least 4h then a ticket must be created; even create a ticket if alerts turn to "ok" to prevent these issues in the future and to improve the alert)
445 1 okurz
 * Link the corresponding ... in the ticket
446
   * **Grafana panel** as reference in the alert email
447 283 livdywan
   * Details of the failing job in case of an **Unreviewed issue** alert
448
   * Pipeline name and link in case of GitLab
449 298 okurz
 * Copy relevant metadata from the email, especially date and time, mentioned hostname(s) and the subject of the email
450 383 livdywan
 * Respond to the notification email with a link to the ticket or forward the email to a corresponding mailing list, e.g. o3-admins@suse.de or osd-admins@suse.de (Caveat: gitlab@suse.de as sender seems to be able to receive emails and swallow them without any useful response or error message)
451 1 okurz
 * Optional: Inform in chat
452 383 livdywan
 * Optional: Add "annotation" in corresponding Grafana panel with a link to the corresponding ticket
453 436 okurz
 * Silence/pause the alert to mitigate urgency and reduce the priority of the ticket
454 298 okurz
   * For grafana just follow the "silence" button in alert emails or use https://monitor.qa.suse.de/alerting/silences, consider a default of 2 months, reference the ticket and mention to remove the silence in the ticket in "Rollback actions". Alternatively if you as ticket assignee want to be notified on alerts but to not distract others on https://monitor.qa.suse.de/alerting/routes click next to the policy for `__contacts__ =~ .*"osd-admins".*` on "New nested policy" and add direct messages to yourself instead of the mailing list. Also mention that in "Rollback actions"
455
   * GitLab pipelines can be paused after taking ownership (think of it as who touched it last, not who maintains it)
456 1 okurz
   * In [Zabbix a problem can be suppressed](https://www.zabbix.com/documentation/current/en/manual/acknowledgment#updating-problems)
457
   * When observing an *Unknown issue*, file a ticket and add it in a comment on the job and consider an autoreview regex in case it affects multiple test modules
458
   * To address [openqa logwarn issues](https://github.com/os-autoinst/openqa-logwarn), add the message to the list of known messages (and potentially look into changing the message or log level later)
459
   * See [[Tools#Munin|Munin]]
460
   * See [[Tools#Gitlab-Pipeline-Notifications|gitlab pipeline notifications]]
461
* If you consider an alert non-actionable then change it accordingly
462 283 livdywan
* If you do not know how to handle an alert ask the team for help
463
* We must always strive for an accepted hypothesis when we want to change alerts or call an issue resolved
464 1 okurz
* After resolving the issue add explanation in ticket, unpause alert and verify it going to "ok" again, resolve ticket
465 289 livdywan
466
#### References
467
468
* https://nl.devoteam.com/en/blog-post/monitoring-reduce-mean-time-recovery-mttr/
469 499 robert.richardson
* Also see https://gitlab.suse.de/suse/wiki/-/blob/main/qe_infrastructure.md for SUSE internal infrastructure alert handling
470 290 livdywan
471
#### Munin
472
473
* To completely disable alert emails from munin: in `/etc/munin/munin.conf`, comment out the line `contact.o3admins.command`.
474
* For individual plugins it is necessary to read the plugin docs, e.g. in `/etc/munin/plugins/df` you can see how to adjust the values for warning and critical. You then put this in `/etc/munin/plugin-conf.d/munin-node` and then `systemctl restart munin-node`, e.g.
475
476
```
477 257 osukup
[df]
478 1 okurz
env.exclude none unknown rootfs iso9660 squashfs udf romfs ramfs debugfs cgroup_root devtmpfs
479
env.warning 92
480
env.critical 98
481
```
482 30 livdywan
483
#### Weekly alert duty
484 1 okurz
485
We all should react on alert but additionally we can have one person on "alert duty" for one week each to ensure quicker reaction times when other team members are focussed on development work. For this the person on duty should do the following:
486 2 okurz
487
* React quickly (e.g. within two hours) on any unhandled alerts
488
* Hand over to the next person after the weekly, going by the order of team members in the wiki
489
* Asks for standin on unavailabilities
490
491
### Collaboration best practices
492
493
Sometimes there are pull requests that are based on other pull requests. Person X reviews PR 1 and Person Y reviews PR 2, but they share the same commit. As a result we have more work for all. For a best practice it is recommended to
494
495
* Include keywords in the PR subject line, e.g. "Part 2: … - based on #<previous_pr>". Example: https://github.com/os-autoinst/openQA/pull/4473
496
* Include the list of base pull request(s) in the PR description. Keep in mind that pull request links in github only seem to be properly rendered as preview links when included in a Markdown list, e.g.
497
498
```
499
Based on
500
* #1234
501 210 okurz
```
502 211 okurz
503 1 okurz
* Mark the dependant pull request as draft until the base pull request is approved or merged
504 211 okurz
505
See #105244 for the motivation for these best practices
506
507 531 robert.richardson
### openQA short video creation
508 532 robert.richardson
509 531 robert.richardson
We create short videos informing about openQA and its usage from time to time. ([Example](https://www.youtube.com/shorts/Y5_LeNhGqk0))
510 534 robert.richardson
The recommended tools are Kdenlive, Audacity and OBS.
511 1 okurz
512 532 robert.richardson
#### Create a script of what to say during the video 
513
 
514
* **Try** to make it fit 60 seconds
515
* Think about if the subject is actually suitable for a short
516
* If you have previously held a presentation on the subject which was recorded, you can save time by converting the transcript to a 60 second short script draft using Gemini and then adjusting that
517 1 okurz
518 532 robert.richardson
#### Record a rough version of the vocals in one take (Using e.g. Audacity)
519 1 okurz
520 532 robert.richardson
* Ignore mispronunciations and small audio issues for now
521
* Cut pauses and background noise out, as far as possible  
522
* Adjust/Restructure the script, then restart from step 2 **until the script actually fits one minute**.  
523 531 robert.richardson
524 532 robert.richardson
#### Record the visuals (Using e.g. builtin screen capture or OBS)
525 531 robert.richardson
526 534 robert.richardson
* Video material may exceed 60s, as we can speed up playback in the next step
527 532 robert.richardson
* Vertical screen (16:9) orientation
528
* Try to only create 1080p/30fps content or better
529 531 robert.richardson
530 532 robert.richardson
#### Align the audio and visuals (Using e.g. Kdenlive)
531 531 robert.richardson
532 532 robert.richardson
* When creating a new Kdenlive Project, select `Custom` -> `Vertical HD 60 fps` or `Custom` -> `Vertical HD 30 fps`
533 534 robert.richardson
* Increase the speed (Hold `CTRL` to be able to "squeeze" clips) for anything happening on the screen that is taking too much time, and is not mentioned in the script or otherwise important to point out. (Cut as few visuals as possible, try to only speed them up)
534 532 robert.richardson
* To be able to select multiple clips, hold `SHIFT`, then click and drag.
535 1 okurz
* To be able to have a single effect target multiple elements at a time, select multiple clips and use the "Create Sequence from Selection", then apply a effect to the sequence
536 532 robert.richardson
* To move elements use `Effects` -> `Transform, Distort and Perspective` -> `Transform` and create the desired keyframes.
537
* This can be used to pan and zoom the screen by moving and resizing the main sequence.
538 534 robert.richardson
* You can use `Project` -> `Add Title Clip` to quickly create a overlay text or a transparent rectangle with a colored frame which can be used to highlight specific elements on the screen.
539 532 robert.richardson
* Make sure the visuals are "snappy". Every few seconds there should either be camera movement or a cut.
540
* Add (or hide) a Geeko somewhere within the short. 
541
542
#### Re-record and align audio
543
544
* Find a good spot to record (e.g. record in a small room, or use a blanket to further dampen the sound)
545
* Do multiple takes in a row for each section of the script without changing the recording setup.
546
* Align the best audio recordings with the the visuals and delete the initial audio track
547
548 1 okurz
### Things to try
549 532 robert.richardson
550 1 okurz
* Everybody can be "Product Owner" or "Scrum Master" or "Admin" or "Developer" for some time to get the different perspective
551 11 okurz
* From time to time ask stakeholders for their list of priorities regarding our tasks
552 1 okurz
* Seelect mob-programming tasks in unblock meetings to deep-dive in dedicated meeting
553 11 okurz
554
### Literature references
555
556
* https://xahteiwi.eu/resources/presentations/no-we-wont-have-a-video-call-for-that/
557
558 42 livdywan
### Historical
559
560
Previously the former QA tools team used target versions "Ready" (to be planned into individual milestone periods or sprints), "Current Sprint" and "Done". However the team never really did use proper time-limited sprints so the distinction was rather vague. After having tickets "Resolved" after some time the PO or someone else would also update the target version to "Done" to signal that the result has been reviewed. This was causing a lot of ticket update noise for not much value considering that the [Definition-of-Done](https://progress.opensuse.org/projects/openqav3/wiki/#ticket-workflow) when properly followed already has rather strict requirements on when something can be considered really "Resolved" hence the team eventually decided to not use the "Done" target version anymore. Since about 2019-05 (and since okurz is doing more backlog management) the team uses priorities more as well as the status "Workable" together with an explicit team member list for "What the team is working on" to better visualize what is making team members busy regardless of what was "officially" planned to be part of the team's work. So we closed the target version. On 2020-07-03 okurz subsequently closed "Current Sprint" as also this one was in most cases equivalent to just picking an assignee for a ticket or setting to "In Progress". We can just distinguish between "(no version)" meaning untriaged, "Ready" meaning tools team should consider picking up these issues and "future" meaning that there is no plan for this to be picked up. Everything else is defined by status and priority.
561
In 2020-10-27 we discussed together to find out the history of the team. We clarified that the team started out as a not well defined "Dev+Ops" team. "team responsibilities" have been mainly unchanged since at least beginning of 2019. We agreed that learning from users and production about our "Dev" contributions is good, so this part of "Ops" is responsibility of everyone.
562
563
Also see #73060 for more details about how the responsibilities were setup.
564
565
### Team-internal Hack Week (or Hackweek)
566
567 11 okurz
#### Rules of the game
568
569 1 okurz
- Regular meetings with the exception of the Weekly are cancelled
570
- Look into future tickets or other projects that relate to our usual work
571
- Backlog priorities are not enforced, short of emergency responses
572
- The challenge has to be solved the previous week, weekly to weekly
573 11 okurz
574 1 okurz
#### Extra-ordinary "hack-week" 2020-W51
575
576
SUSE QE Tools plans to have an internal "hack-week": Condition: We close 30 tickets from our backlog within the time frame 2020-12-03 until 2020-12-11 start of weekly meeting. No cheating! :) See [this query](https://progress.opensuse.org/issues?utf8=%E2%9C%93&set_filter=1&sort=priority%3Adesc%2Cid%3Adesc&f%5B%5D=status_id&op%5Bstatus_id%5D=c&f%5B%5D=fixed_version_id&op%5Bfixed_version_id%5D=%3D&v%5Bfixed_version_id%5D%5B%5D=418&f%5B%5D=closed_on&op%5Bclosed_on%5D=%3E%3C&v%5Bclosed_on%5D%5B%5D=2020-12-03&v%5Bclosed_on%5D%5B%5D=2020-12-11&f%5B%5D=&c%5B%5D=subject&c%5B%5D=project&c%5B%5D=status&c%5B%5D=assigned_to&c%5B%5D=relations&c%5B%5D=priority&c%5B%5D=category&c%5B%5D=cf_16&group_by=status&t%5B%5D=). During week 2020-W51 everyone is allowed to work on any hack-week project, it should just have a reasonable, "explainable" connection to our normal work. okurz volunteers to take over ops-duty for the week.
577
578
Result during meeting 2020-12-11: We missed the goal (by a slight amount) but we are motivated to try again in the next year :) Everybody, put some easy tickets aside for the next time!
579 28 okurz
580
#### Extra-ordinary "hack-week" 2021-W8
581
582
Similar as our attempt for 2020-W51 with same rules, except condition: We close 30 tickets from our backlog within the time frame 2021-02-05 until 2021-02-19 start of weekly meeting. No cheating! See [this query](https://progress.opensuse.org/issues?utf8=%E2%9C%93&set_filter=1&sort=priority%3Adesc%2Cid%3Adesc&f%5B%5D=status_id&op%5Bstatus_id%5D=c&f%5B%5D=fixed_version_id&op%5Bfixed_version_id%5D=%3D&v%5Bfixed_version_id%5D%5B%5D=418&f%5B%5D=closed_on&op%5Bclosed_on%5D=%3E%3C&v%5Bclosed_on%5D%5B%5D=2021-02-05&v%5Bclosed_on%5D%5B%5D=2021-02-19&f%5B%5D=&c%5B%5D=subject&c%5B%5D=project&c%5B%5D=status&c%5B%5D=assigned_to&c%5B%5D=relations&c%5B%5D=priority&c%5B%5D=category&c%5B%5D=cf_16&group_by=status&t%5B%5D=).
583 1 okurz
584
Result during meeting 2021-02-19: We missed the goal (25/30 tickets resolved) but again we are open to try again, maybe after next SUSE hack week.
585 11 okurz
586 1 okurz
#### Extra-ordinary "hack-week" 2022-W9
587
588
Same as in before, similar condition: We close 30 tickets from our backlog within the time frame 2022-02-18 until 2022-02-25 start of weekly meeting. No cheating! See [this query](https://progress.opensuse.org/issues?utf8=%E2%9C%93&set_filter=1&sort=priority%3Adesc%2Cid%3Adesc&f%5B%5D=status_id&op%5Bstatus_id%5D=c&f%5B%5D=fixed_version_id&op%5Bfixed_version_id%5D=%3D&v%5Bfixed_version_id%5D%5B%5D=418&f%5B%5D=closed_on&op%5Bclosed_on%5D=%3E%3C&v%5Bclosed_on%5D%5B%5D=2022-02-18&v%5Bclosed_on%5D%5B%5D=2022-02-25&f%5B%5D=&c%5B%5D=subject&c%5B%5D=project&c%5B%5D=status&c%5B%5D=assigned_to&c%5B%5D=relations&c%5B%5D=priority&c%5B%5D=category&c%5B%5D=cf_16&group_by=status&t%5B%5D=).
589
590
## Change announcements
591
592
For new, cool features or disruptive changes consider providing according notifications to our common userbase as well as potential future users, for example create post on opensuse-factory@opensuse.org , link to post on openqa@suse.de , invite for workshop, #opensuse-factory (IRC) (irc://irc.libera.chat/opensuse-factory), [#testing (Slack)](https://app.slack.com/client/T02863RC2AC/C02CANHLANP)