Project

General

Profile

Tools » History » Version 313

livdywan, 2023-11-13 09:36
Maris is back from squad rotation

1 1 okurz
{{toc}}
2
3
# QE tools - Team description
4
5
"The easiest way to provide complete quality for your software"
6
7
We provide the most complete free-software system-level testing solution to ensure high quality of operating systems, complete software stacks and multi-machine services for software distribution builders, system integration engineers and release teams. We continuously develop, maintain and release our software to be readily used by anyone while we offer a friendly community to support you in your needs. We maintain the main public and SUSE internal openQA server as well as supporting tools in the surrounding ecosystem.
8
9
## Team responsibilities
10
11 75 okurz
* Develop and maintain upstream [openQA](https://github.com/os-autoinst/openQA) including the backend [os-autoinst](https://github.com/os-autoinst/os-autoinst)
12 223 mkittler
* Administration of [openqa.suse.de (osd)](https://openqa.suse.de) and workers
13 75 okurz
* Helps administrating and maintaining [openqa.opensuse.org (o3)](https://openqa.opensuse.org), including coordination of efforts aiming at solving problems affecting o3
14 223 mkittler
* Develop and maintain SUSE maintenance QA tools, e.g. [qem-bot](https://github.com/openSUSE/qem-bot/), [osc-plugin-qam](https://github.com/openSUSE/osc-plugin-qam), ([template generator aka. "teregen"](https://gitlab.suse.de/qa-maintenance/teregen/), [MTUI](https://gitlab.suse.de/qa-maintenance/mtui), etc., e.g. from https://confluence.suse.com/display/maintenanceqa/Toolchain+for+maintenance+quality+engineering)
15
* Help with the investigation of specific issues, especially when they are likely related to generic, hardware or backend problems
16 1 okurz
* Support colleagues, team members and open source community
17
18
## Out of scope
19
20 223 mkittler
* Maintenance and *recurring* review of individual tests (besides openQA-in-openQA tests)
21 106 okurz
* Maintenance of special worker addendums needed for tests, e.g. external hypervisor hosts for s390x, powerVM, xen, hyperv, IPMI, VMWare (Clarification: We maintain the code for all backends but we are no experts in specific domains. So we always try to help but it's a case by case decision based on what we realistically can provide based on our competence. We can't be expected to be experts in everything and also we are limited in what we can actually test.)
22 136 okurz
* Maintenance of most openSUSE related triggering solutions, e.g. for Tumbleweed or Leap maintenance that use https://github.com/openSUSE/opensuse-release-tools on https://botmaster.suse.de. Contact "SUSE Security Solutions", e.g. Marcus Meissner, for this.
23 1 okurz
* Ticket triaging of http://progress.opensuse.org/projects/openqatests/
24
* Setup of configuration for individual products to test, e.g. new job groups in openQA
25
* Feature development within the backend for single teams (commonly provided by teams themselves)
26
27
## Our common userbase
28
29 244 okurz
Known users of our products: Most SUSE QA engineers, SUSE SLE release managers and release engineers, every SLE developer submitting "submit requests" in OBS/IBS where product changes are tested as part of the "staging" process before changes are accepted in either SLE or openSUSE (staging tests must be green before packages are accepted), same for all openSUSE contributors submitting to either openSUSE:Factory (for Tumbleweed, SLE, future Leap versions) or Leap, other GNU/Linux distributions like Fedora https://openqa.fedoraproject.org/ , AlmaLinux http://openqa.almalinux.org/, Debian https://openqa.debian.net/ , https://openqa.qubes-os.org/ , https://openqa.endlessm.com/ , the GNOME project https://openqa.gnome.org, https://www.codethink.co.uk/articles/2021/automated-linux-kernel-testing/, https://en.euro-linux.com/blog/openqa-or-how-we-test-eurolinux/, openSUSE KDE contributors (with their own workflows, https://openqa.opensuse.org/group_overview/23 ), openSUSE GNOME contributors (https://openqa.opensuse.org/group_overview/35 ), OBS developers (https://openqa.opensuse.org/parent_group_overview/7#grouped_by_build) , wicked developers (https://gitlab.suse.de/wicked-maintainers/wicked-ci#openqa), and of course our team itself for "openQA-in-openQA Tests" :) https://openqa.opensuse.org/group_overview/24 . Also see https://en.opensuse.org/openSUSE:OpenQA/Partners .
30 1 okurz
Keep in mind: "Users of openQA" and talking about "openSUSE release managers and engineers" means SUSE employees but also employees of other companies, also development partners of SUSE.
31
In summary our products, for example openQA, are a critical part of many development processes hence outages and regressions are disruptive and costly. Hence we need to ensure a high quality in production hence we practice DevOps with a slight tendency to a conservative approach for introducing changes while still ensuring a high development velocity.
32
33
## How we work
34
35 168 okurz
The QE Tools team is following the DevOps approach working using a lightweight Agile approach also inspired by [Extreme Programming](https://extremeprogramming.org/) and [Kanban](https://en.wikipedia.org/wiki/Kanban_(development)) and of course the original http://agilemanifesto.org/. We structure our team and roles following [Agile Product Ownership in a Nutshell](https://youtu.be/502ILHjX9EE). We plan and track our works using tickets on https://progress.opensuse.org . We pick tickets based on priority and planning decisions. We use weekly meetings as checkpoints for progress and also track cycle and lead times to crosscheck progress against expectations.
36 1 okurz
37
* [tools team - backlog](https://progress.opensuse.org/issues?query_id=230): The complete backlog of the team
38
* [tools team - backlog, high-level view](https://progress.opensuse.org/issues?query_id=526): A high-level view of the backlog, all epics and higher (an "epic" includes multiple stories)
39
* [tools team - backlog, top-level view](https://progress.opensuse.org/issues?query_id=524): A top-level view of the backlog, only sagas and higher (a "saga" is bigger than an epic and can include multiple epics, i.e.  "epic of epics")
40
* [tools team - what members of the team are working on](https://progress.opensuse.org/issues?query_id=400): To check progress and know what the team is currently occupied with
41
* [tools team - closed within last 60 days](https://progress.opensuse.org/issues?query_id=541): What was recently resolved
42 269 okurz
* [tools team - next](https://progress.opensuse.org/issues?query_id=794): The staging ground for next tasks considered to be picked into the backlog
43 1 okurz
44
*Be aware:* Custom queries in the right-hand sidebar of individual projects, e.g. https://progress.opensuse.org/projects/openqav3/issues , show queries with the same name but are limited to the scope of the specific projects so can show only a subset of all relevant tickets.
45
46 152 okurz
*devops/infra split - 2022-12*: As a temporary experiment we decided to try out working with sub-teams for better focus. For this we have split the backlog in two parts:
47 1 okurz
* [tools team - backlog w/o infra](https://progress.opensuse.org/issues?query_id=754): The backlog of the team without infrastructure tasks
48 152 okurz
* [tools team - infrastructure backlog](https://progress.opensuse.org/issues?query_id=757): The infrastructure backlog of the team
49
50 1 okurz
### What we expect from team members
51
52 223 mkittler
* Actively show visible contributions to our products every workday *(pull requests, code review, ticket updates in descending priority, i.e. if you are very active in pull requests + code review ticket updates are much less important)*
53 1 okurz
* Be responsive over usual communication platforms and channels *(user questions, team discussions)*
54
* Stick to our rules *(this wiki, SLOs, alert handling)*
55
56
### Common tasks for team members
57
58
This is a list of common tasks that we follow, e.g. reviewing daily based on individual steps in the DevOps Process ![DevOps Process](devops-process_25p.png)
59
60
* **Plan**:
61
 * State daily learning and planned tasks in internal chat room
62
 * Review backlog for time-critical, triage new tickets, pick tickets from backlog; see https://progress.opensuse.org/projects/qa/wiki#How-we-work-on-our-backlog
63 8 okurz
 * Coordinate on the agile board https://progress.opensuse.org/agile/board?query_id=711
64 1 okurz
* **Code**:
65
 * See project specific contribution instructions
66 110 livdywan
 * Provide peer-review of projects around openQA, in particular:
67
     * https://github.com/os-autoinst/openQA
68
     * https://github.com/os-autoinst/os-autoinst
69
     * https://github.com/os-autoinst/scripts
70
     * https://github.com/os-autoinst/os-autoinst-distri-openQA
71
     * https://github.com/os-autoinst/openqa-trigger-from-obs
72
     * https://github.com/os-autoinst/openqa_review
73 222 livdywan
     * https://github.com/os-autoinst/openqa_bugfetcher
74 110 livdywan
     * https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess
75
     * https://github.com/openSUSE/mtui
76
     * https://github.com/openSUSE/qem-bot
77
     * https://github.com/openSUSE/backlogger
78 222 livdywan
     * https://github.com/openSUSE/qem-dashboard
79 225 livdywan
     * https://github.com/openSUSE/openSUSE-release-tools/tree/master/factory-package-news
80 1 okurz
* **Build**:
81
 * See project specific contribution instructions
82
* **Test**:
83
 * Monitor failures on https://travis-ci.org/ relying on https://build.opensuse.org/package/show/devel:openQA/os-autoinst_dev for os-autoinst (email notifications)
84
 * Monitor failures on https://app.circleci.com/pipelines/github/os-autoinst/openQA?branch=master relying on https://build.opensuse.org/project/show/devel:openQA:ci for openQA (email notifications)
85
* **Release**:
86
 * By default we use the rolling-release model for all projects unless specified otherwise
87
 * Monitor [devel:openQA on OBS](https://build.opensuse.org/project/show/devel:openQA) (all packages and all subprojects) for failures, ensure packages are published on http://download.opensuse.org/repositories/devel:/openQA/, ensure to be added as a Maintainer for that project (members need to be added individually, you can ask existing team members, e.g. the SM)
88 312 jbaier_cz
 * React on alerts from https://gitlab.suse.de/openqa/scripts-ci
89 249 okurz
 * Monitor http://jenkins.qe.nue2.suse.org for the openQA-in-openQA Tests and automatic submissions of os-autoinst and openQA to openSUSE:Factory through https://build.opensuse.org/project/show/devel:openQA:tested
90 1 okurz
* **Deploy**:
91
 * o3 is automatically deployed (daily), see https://progress.opensuse.org/projects/openqav3/wiki/Wiki#Automatic-update-of-o3
92
 * osd is automatically deployed (multiple times per week), monitor https://gitlab.suse.de/openqa/osd-deployment/pipelines and watch for notification email to openqa@suse.de
93
* **Operate**:
94
 * Apply infrastructure changes from https://gitlab.suse.de/openqa/salt-states-openqa (osd) or manually over sshd (o3)
95 227 okurz
 * Maintain the network infrastructure of QE hardware: https://wiki.suse.net/index.php/SUSE-Quality_Assurance/Labs
96 1 okurz
 * Monitor for backup, see https://gitlab.suse.de/qa-sle/backup-server-salt
97
config changes in salt (osd), backups, job group configuration changes
98
 * Ensure old unused/non-matching needles are cleaned up (osd+o3), see #73387
99 187 okurz
 * Maintain https://gitlab.suse.de/qa-maintenance/qamops and https://confluence.suse.com/display/maintenanceqa/qam.suse.de and https://gitlab.suse.de/qa-sle/qa-jump-configs/
100 190 okurz
 * Maintain the [QA&QAM&QE LSG networks](https://racktables.nue.suse.com/index.php?andor=and&cfe=%7BQA%7D+or+%7BQAM%7D+or+%7BQE+LSG%7D&page=ipv4space&tab=default&submit.x=11&submit.y=19)
101 1 okurz
* **Monitor**:
102 250 okurz
 * React on alerts from [monitor.qa.suse.de](https://monitor.qa.suse.de/alerting/list?state=not_ok) (emails on [osd-admins@suse.de](http://mailman.suse.de/mailman/listinfo/osd-admins) and login via LDAP credentials, you must be an *editor* to edit panels and hooks via the web UI)
103 184 okurz
 * For openqa.opensuse.org react on alerts from [zabbix.suse.de](https://zabbix.suse.de) (emails on [o3-admins@suse.de](http://mailman.suse.de/mailman/listinfo/o3-admins)
104 1 okurz
 * Look for incomplete jobs or scheduled not being worked on o3 and osd (API or webUI) - see also #81058 for *power*
105
 * React on alerts from https://gitlab.suse.de/openqa/auto-review/, https://gitlab.suse.de/openqa/openqa-review/, https://gitlab.suse.de/openqa/monitor-o3 (subscribe to projects for notifications)
106 3 livdywan
 * Be responsive on #opensuse-factory (irc://irc.libera.chat/opensuse-factory, formerly irc://chat.freenode.net/opensuse-factory) for help, support and collaboration (Unless you have a better solution it is suggested to use [Element.io](https://matrix.to/#/!ilXMcHXPOjTZeauZcg:libera.chat) for a sustainable presence; you also need a [registered IRC account](https://libera.chat/guides/registration), formerly [freenode](https://freenode.net/kb/answer/registration)) **note** *don't use matrix features on irc!*
107 96 okurz
 * Be responsive on [#team-qa-tools in chat](https://app.slack.com/client/T02863RC2AC/C02AJ1E568M/thread/C02CANHLANP-1658480276.547769) for internal coordination and alarm handling, fallback to #suse-qe-tools:opensuse.org (matrix) as backup if other channels are temporarily down, alternatively public channels on matrix/ IRC if the topics are not confidential
108
 * Be responsive on [#eng-testing](https://app.slack.com/client/T02863RC2AC/C02CANHLANP/thread/C02CANHLANP-1658480276.547769) for help, support and collaboration
109 1 okurz
 * Be responsive on mailing lists opensuse-factory@opensuse.org and openqa@suse.de (see https://en.opensuse.org/openSUSE:Mailing_lists_subscription)
110
 * Be responsive in https://matrix.to/#/#openqa:opensuse.org or the bridged room [#openqa](https://discord.com/channels/366985425371398146/817367056956653621) on https://discord.gg/opensuse if you have a discord account
111
112 275 livdywan
### Best practices for major changes
113
114
When proposing non-trivial changes with the potential of breaking existing tests consider the follow best practice patterns:
115
  
116
  - Make the problematic change opt-in via a test variable like MY_NEW_FEATURE_ENABLED to enable the new behavior, and otherwise log a warning only
117
  - Include a reference to a relevant GitHub PR and progress ticket
118
  - If a BARK test is to be conducted to assess the full impact of the change an autoreview regex matching the most relevant error message should be prepared so that affected jobs can be restarted trivially without disrupting daily operation too much.
119
  - Inform all stakeholders in relevant Slack channels, Matrix and mailing lists
120
  - Include an explicit mention in the release notes
121
122 308 livdywan
### Guideline for communication in tickets
123
124
* Clarify action items and steps (to be) taken, for example
125
  * I will implement ... from the suggestions
126
  * I will monitor ... and evaluate results
127
  * Confirm if other experts will provide reproducers
128
  * Document mitigations with references to MRs, PRs or manual file changes and keep the description updated
129
  * Confirm if adjustments made by others are still in place
130
* Explicitly include examples of what won't be done
131
  * I won't look into the test code itself here
132
  
133
* Make use of the [scientific method template with hypotheses, experiments and observations]( https://progress.opensuse.org/projects/openqav3/wiki/#Further-decision-steps-working-on-test-issues)
134
135 1 okurz
### How we work on our backlog
136
137 56 okurz
* "due dates" are only used as exception or reminders. Commonly the due-date is set [automatically](https://github.com/os-autoinst/scripts/blob/master/backlog-set-due-date) to 14 days in the future as soon as a non-low ticket is picked up. That period is roughly the median cycle time which we want to stay well below. And on top, to prevent redmine sending a reminder and the backlog status to flag issues the ticket should be resolved before the due-date, at least a day but possibly a reminder is sent out even on the last day before so better resolve on the second to last day. Of course, even better to always try to finish as soon as possible, well before the due date.
138 1 okurz
* every team member can pick up tickets themselves
139
* everybody can set priority, PO can help to resolve conflicts
140
* consider the [ready, not assigned/blocked/low](https://progress.opensuse.org/issues?query_id=490) query as preferred. It is suggested to pick up tickets based on priority. "Workable" tickets are often convenient and hence preferred.
141
* ask questions in tickets, even potentially "stupid" questions, oftentimes descriptions are unclear and should be improved
142
* There are "low-level infrastructure tasks" only conducted by some team members, the "DevOps" aspect does not include that but focusses on the joint development and operation of our main products
143
* Consider tickets with the subject keyword or tag "learning" as good learning opportunities for people new to a certain area. Experts in the specific area should prefer helping others but not work on the ticket
144 26 okurz
* For tickets which are out of the scope of the team remove from backlog, delegate to corresponding teams or persons but be nice and supportive, e.g. [SUSE-IT](https://sd.suse.com/), [EngInfra](https://sd.suse.com/servicedesk/customer/portal/1) also see [SLA](https://confluence.suse.com/display/qasle/Service+Level+Agreements), [test maintainer](https://progress.opensuse.org/projects/openqatests/), QE-LSG PrjMgr/mgmt
145 211 okurz
* For EngInfra ticket see the section [[Tools#SUSE-IT-ticket-handling]] for a process[EngInfra tickets](https://sd.suse.com/servicedesk/customer/portal/1)
146 1 okurz
* Whenever we apply changes to the infrastructure we should have a ticket
147
* Refactoring and general improvements are conducted while we work on features or regression fixes
148
* For every regression or bigger issue that we encounter try to come up with at least two improvements, e.g. the actual issue is fixed and similar cases are prevented in the future with better tests and optionally also monitoring is improved
149
* For critical issues and very big problems especially when we were informed by users about outages collect "lessons learned", e.g. in notes in the ticket or a meeting with minutes in the ticket, consider https://en.wikipedia.org/wiki/Five_whys and answer at least the following questions: "User impact, outwards-facing communication and mitigation, upstream improvement ideas, Why did the issue appear, can we reduce our detection time, can we prevent similar issues in the future, what can we improve technically, what can we improve in our processes". Also see https://youtu.be/_Dv4M39Arec
150
* okurz proposes to use "#NoEstimates". Though that topic is controversial and often misunderstood. https://ronjeffries.com/xprog/articles/the-noestimates-movement/ describes it nicely :) Hence tickets should be evenly sized and no estimation numbers should be provided on tickets
151
* If you really want you can look at the [burndown chart](https://progress.opensuse.org/agile/charts?utf8=%E2%9C%93&set_filter=1&f%5B%5D=chart_period&op%5Bchart_period%5D=%3E%3Ct-&v%5Bchart_period%5D%5B%5D=90&f%5B%5D=fixed_version_id&op%5Bfixed_version_id%5D=%3D&v%5Bfixed_version_id%5D%5B%5D=418&f%5B%5D=&chart=burndown_chart&chart_unit=issues&interval_size=day) (some people wish to have this) but we consider it unnecessary due to the continuous development, not a project with defined end. Also an [agile board](https://progress.opensuse.org/agile/board?utf8=%E2%9C%93&set_filter=1&f%5B%5D=fixed_version_id&op%5Bfixed_version_id%5D=%3D&v%5Bfixed_version_id%5D%5B%5D=418&f%5B%5D=status_id&op%5Bstatus_id%5D=%3D&f_status%5B%5D=1&f_status%5B%5D=12&f_status%5B%5D=2&f_status%5B%5D=15&f_status%5B%5D=4&c%5B%5D=tracker&c%5B%5D=assigned_to&c%5B%5D=cf_16) is available but likely due to problems within the redmine installation ordering cards is not reliable.
152
* Write to qa-team@suse.de as well for critical changes as well as chat channels
153
* Everyone should propose reverts of features if we find problems that can not be immediately fixed or worked around in production
154
155
#### Definition of DONE
156
157
Also see https://web.archive.org/web/20110308065330/http://www.allaboutagile.com/definition-of-done-10-point-checklist/ and https://web.archive.org/web/20170214020537/https://www.scrumalliance.org/community/articles/2008/september/what-is-definition-of-done-(dod)
158
159
* Code changes are made available via a pull request on a version control repository, e.g. github for openQA
160
* [Guidelines for git commits](http://chris.beams.io/posts/git-commit/) have been followed
161
* Code has been reviewed (e.g. in the github PR)
162 23 okurz
* Depending on criticality/complexity/size/feature: A local verification test has been run, e.g. post link to a local openQA machine or screenshot or logfile (especially also for hardware-related changes, e.g. in os-autoinst backend)
163 1 okurz
* For regressions: A regression fix is provided, flaws in the design, monitoring, process have been considered
164
* Potentially impacted package builds have been considered, e.g. openSUSE Tumbleweed and Leap, Fedora, etc.
165
* Code has been merged (either by reviewer or "mergify" bot or reviewee after 'LGTM' from others)
166
* Code has been deployed to osd and o3 (monitor automatic deployment, apply necessary config or infrastructure changes)
167
168
#### Definition of READY for new features
169
170
The following points should be considered before a new feature ticket is READY to be implemented:
171
172
* Follow the ticket template from https://progress.opensuse.org/projects/openqav3/wiki/#Feature-requests
173
* A clear motivation or user expressing a wish is available
174
* Acceptance criteria are stated (see ticket template) or use `[timeboxed:<nr>h]` with `<nr>` hours for tasks that should be limited in time, e.g. a research task with `[timeboxed:20h] research …`
175
* add tasks as a hint where to start
176
177
#### WIP-limits (reference "Kanban development")
178
179
* global limit of 10 tickets, and 3 tickets per person respectively [In Progress](https://progress.opensuse.org/issues?query_id=505)
180 294 okurz
* global limit of 10 tickets in [Feedback, not-low](https://progress.opensuse.org/issues?query_id=520)
181 1 okurz
182
#### Target numbers or "guideline", "should be", in priorities
183
184
1. *New, untriaged QA (openQA, etc.):* [0 (daily)](https://progress.opensuse.org/projects/qa/issues?query_id=576) . Every ticket should have a target version, e.g. "Ready" for QE tools team, "future" if unplanned, others for other teams
185
1. *Untriaged "tools" tagged:* [0 (daily)](https://progress.opensuse.org/issues?query_id=481) . Every ticket should have a target version, e.g. "Ready" for QE tools team, "future" if unplanned, others for other teams
186
1. *Workable (properly defined):* [10-40](https://progress.opensuse.org/issues?query_id=478) . Enough tickets to reflect a proper plan but not too many to limit unfinished data (see "waste")
187
1. *Overall backlog length:* [ideally less than 100](https://progress.opensuse.org/issues?query_id=230) . Similar as for "Workable". Enough tickets to reflect a proper roadmap as well as give enough flexibility for all unfinished work but limited to a feasible number that can still be overlooked by the team without loosing overview. One more reason for a maximum of 100 are that pagination in redmine UI allows to show only up to 100 issues on one page at a time, same for redmine API access.
188
1. *Within due-date:* [0 (daily/weekly)](https://progress.opensuse.org/issues?query_id=514) . We should take due-dates serious, finish tickets fast and at the very least update tickets with an explanation why the due-date could not be hold and update to a reasonable time in the future based on usual cycle time expectations
189
190 291 okurz
#### SLAs (service level agreements)
191 1 okurz
192 291 okurz
* for at least picking up tickets, better providing reasonable updates based on priority, first goal is "urgency removal":
193 1 okurz
 * **immediate**: [<1 day](https://progress.opensuse.org/issues?query_id=542)
194
 * **urgent**: [<1 week](https://progress.opensuse.org/issues?query_id=543)
195
 * **high**: [<1 month](https://progress.opensuse.org/issues?query_id=544)
196
 * **normal**: [<1 year](https://progress.opensuse.org/issues?query_id=545)
197
 * **low**: undefined
198
199 291 okurz
* "reasonable updates": Provide fixes, workarounds or at least state of progress or when the task is blocked
200 296 okurz
* to ensure timely updates immediate/urgent tickets must never be in status "Blocked" or "Feedback"
201 1 okurz
* aim for cycle time of individual tickets (not epics or sagas): 1h-2w
202
203 291 okurz
#### SLOs (service level objectives, internal)
204
205 299 okurz
* For providing reasonable updates on tickets in our backlog based on priority, first goal is "urgency removal":
206 291 okurz
 * **immediate**: multiple times within the day
207 300 livdywan
 * **urgent**: [<1 day](https://progress.opensuse.org/issues?query_id=824)
208
 * **high**: [<1 week](https://progress.opensuse.org/issues?query_id=827)
209
 * **normal**: [<1 month](https://progress.opensuse.org/issues?query_id=830)
210 291 okurz
 * **low**: <1 year
211
212 299 okurz
* Frequent updates do not necessarily need to happen in tickets but visible in written form, e.g. just internal chat. Especially in ticket updates every comment should give a clear answer: Who plans to do what until when, in particular the ticket assignee.
213
* Reference for SLOs and related topics: https://sre.google/sre-book/table-of-contents/
214 1 okurz
215 167 okurz
#### Status overview
216
217
Dynamic dashboard showing target numbers and SLOs: https://os-autoinst.github.io/qa-tools-backlog-assistant/
218
219 1 okurz
#### Backlog prioritization
220
221
When we prioritize tickets we assess:
222
1. What the main use cases of openQA are among all users, be it SUSE QA engineers, other SUSE employees, openSUSE contributors as well as any other outside user of openQA
223 155 okurz
2. We try to understand how many persons and products are affected by feature requests as well as regressions (or "concrete bugs" as the ticket category is called within the openQA Project) and prioritize issues affecting more persons and products and use cases over limited issues. See #120540 for details in particular about the various os-autoinst backends
224 1 okurz
3. We prioritize regressions higher than work on (new) feature requests
225
4. If a workaround or alternative exists then this lowers priority. We prioritize tasks that need deep understanding of the architecture and an efficient low-level implementation over convenience additions that other contributors are more likely to be able to implement themselves.
226
227 65 livdywan
#### Periodic backlog refinement
228 1 okurz
229 66 livdywan
These queries can be used to help organize our work efficiently
230 1 okurz
231 271 okurz
1. [QE tools team - backlog - sorted by update time](https://progress.opensuse.org/issues?query_id=654) ensure all tickets are reasonably up-to-date and don't keep hanging around
232
2. [QE tools team - due date forecast](https://progress.opensuse.org/issues?query_id=651) prevent running into due-dates proactively
233
3. [QE tools team - next - sorted by update time](https://progress.opensuse.org/issues?query_id=797) ensure all *next* tickets are reasonably up-to-date and considered for the backlog
234
4. [QE tools team - backlog, non-reactive, needs parent](https://progress.opensuse.org/issues?query_id=729) ensure all our (non-reactive) work is linked to higher-level planning as motivation
235 66 livdywan
236 68 okurz
It's good practice to keep an eye on the queries to anticipate blockers. All team members are encouraged to utilize them and they are useful as part of the Scrum Master's daily routine as well as [[Tools#Weekly-moderation-duty|moderation duty]].
237 1 okurz
238 67 livdywan
Note that due dates should provide a hint as to when a ticket will be resolved but they need to be realistic. Availability, reviews and deployment need to be factored in as well since typically a ticket will be in *Feedback* before it can be resolved. If in doubt the Due date should be extended with an accompanying message like "Outstanding branches still need to be reviewed" or simply "Bumping the due date because of availability".
239
240 1 okurz
### Team meetings
241
242 253 livdywan
* **Daily:** Use (internal) chat actively, e.g. formulate your findings or achievements and plans for the day, "think out loud" while working on individual problems. Optionally join [meet.jit.si/suse_qa_tools](https://meet.jit.si/suse_qa_tools) every weekday 1030-1045 CET/CEST. At the latest at 1100 CET/CEST everyone working on that day must have checked in, at least with a text message in chat.
243 293 okurz
  * *Goal:* Emergency responses, clarify next steps or blockers on current work items, asking and answering questions on tickets that would be ignored otherwise, ticket estimations (after the regular daily) (compare to [Daily Scrum](https://www.scrumguides.org/scrum-guide.html#events-daily))
244
  * *Conduction:* Answer the following questions: 1. Backlog checks green? 2. Time critical issues needing handling? 3. What was achieved since the last time? 4. Who needs help? 5. Plans until next time?
245 305 livdywan
* **Ticket Estimations:** Every Thursday 1100-1150 CET/CEST in [meet.jit.si/suse_qa_tools](https://meet.jit.si/suse_qa_tools) including a 5 minute break
246 293 okurz
  * *Goal:* Estimate [t-shirt sizes for our non-estimated tickets](https://progress.opensuse.org/issues?query_id=717).
247
  * *Goal:* Ensure tickets are workable. Refine and split tickets for larger estimates.
248 1 okurz
* **Infra Daily:** Every weekday 1300-1315 CET/CEST
249 293 okurz
  * *Goal:* State your on-going tasks and plans for the day. Estimate and unblock **infra** tickets as needed.
250 305 livdywan
* **Midweekly Unblock:** Every Wednesday 1100-1150 CET/CEST in [meet.jit.si/suse_qa_tools](https://meet.jit.si/suse_qa_tools) including a 5 minute break
251 293 okurz
  * *Goal:* Discuss tasks in progress in more detail, unblock currently assigned tasks and tasks avoided for longer (see [[Tools#Periodic-backlog-refinement|Periodic backlog refinement]]), apply the **pull principle** based on [tickets in progress](https://progress.opensuse.org/issues?query_id=505) firstly and [tickets updated by priority](https://progress.opensuse.org/issues?query_id=771) secondarily - the *mob session* can be used to dedicate more time to tickets in need of attention.
252 306 livdywan
* **Collaborative Session:** Every Thursday 1330-1630 CET/CEST in [meet.jit.si/suse_qa_tools](https://meet.jit.si/suse_qa_tools). Optional, topic to be picked at the latest in the **Unblock**. Pick from [previous suggestions](https://progress.opensuse.org/issues?query_id=833) or bring up your own topic
253 301 livdywan
  * *Goal:* Follow-up on tasks too difficult to solve alone, or where someone looks to be stuck using pair programming and other means
254 305 livdywan
* **Fortnightly Coordination:** Friday 1100-1150 CET/CEST every even week in [meet.jit.si/suse_qa_tools](https://meet.jit.si/suse_qa_tools) including a 5 minute break. Community members and guests are particularly welcome to join this meeting.
255 293 okurz
  * *Goal:* Demo of features, Team backlog coordination and design decisions of bigger topics (compare to [Sprint Planning](https://www.scrumguides.org/scrum-guide.html#events-planning)).
256
  * *Conduction:* Demo recently finished feature work depending on [last closed](https://progress.opensuse.org/issues?query_id=572), crosscheck status of team, discuss blocked tasks and upcoming work
257 305 livdywan
* **Fortnightly Retrospective:** Friday 1100-1150 CET/CEST every odd week in [meet.jit.si/suse_qa_tools](https://meet.jit.si/suse_qa_tools) including a 5 minute break.
258 293 okurz
  * *Goal:* Inspect and adapt, learn and improve (compare to [Sprint Retrospective](https://www.scrumguides.org/scrum-guide.html#events-retro))
259 307 tinita
  * *Announcements:* Create a new *discussion* with all team members in Slack and a new [retrospected game](http://retrospected.core.qa.suse.de:8080) - the new board is made available after the previous retro. Specific actions will be recorded as tickets.
260 304 livdywan
* **Virtual coffee:** Weekly every Monday 1330-1345 CET/CEST in [meet.jit.si/suse_qa_tools](https://meet.jit.si/suse_qa_tools).
261 293 okurz
  * *Goal:* Connect and bond as a team, understand each other (compare to [Informal Communication in an all-remote environment](https://about.gitlab.com/company/culture/all-remote/informal-communication))
262 309 okurz
* **Workshop:** Friday 0900-0950 CET/CEST every week in [meet.jit.si/suse_qa_tools](https://meet.jit.si/suse_qa_tools) especially for community members and users! We will run this every week with the plan to move to a fortnightly cadence every even week.
263 293 okurz
  * *Goal:* Demonstrate new and important features, explain already existing, but less well-known features, and discuss questions from the user community. All your questions are welcome!
264
  * *Announcements:* Drop a reminder with a teaser in [#eng-testing](https://app.slack.com/client/T02863RC2AC/C02CANHLANP/thread/C02CANHLANP-1658480276.547769).
265
  * *Recordings:* Consider recording, e.g. using OBS, and upload to youtube, link on topics link. SUSE internal topics can be published on http://streaming.nue.suse.com/i/QE-Tools-Workshops/ by ssh-uploading to ftp@streaming.nue.suse.com:~/i/QE-Tools-Workshops/ (get your SSH key added by existing team members, e.g. okurz)
266
  * *Content:* See [topics](https://progress.opensuse.org/projects/qa/wiki/Tools#Workshop-Topics)!
267 1 okurz
268 309 okurz
**NOTICE:** We're are using meet.jit.si due to problems on meet.opensuse.org, see [poo#116959](https://progress.opensuse.org/issues/116959).
269 1 okurz
270 5 okurz
#### Weekly moderation duty
271
272 239 livdywan
**We do not CURRENTLY assign this task to team members in rotation, see [poo#132446](https://progress.opensuse.org/issues/132446)**
273 237 livdywan
274 90 mkittler
We see mandatory daily video calls as an effective measure but we don't want to enforce the team to do this unless we have to. To ensure that we have daily updates next to the [[Tools#Weekly-alert-duty|Alert duty]] we have the rotating role of "moderation duty". The person doing alert duty in the next week has "moderation duty". The duty consists of ensuring [[Tools#How-we-work|How we work on tickets]], in particular:
275 5 okurz
276
* On a daily base ensure that we have an update from every team member that is expected to be present this day. If a person actively contributes to the daily meeting in video call or provided an update related to backlog tasks in chat then this is already ensured.
277 31 livdywan
* Hand over to the next person during the weekly, going by the order of team members in the wiki
278 5 okurz
* Asks for standin on unavailabilities
279
280
We expect that this of course is an additional task with the corresponding time investment. The expected time invested per day is in the range of 3-15m, not more, so accounting for 15m-1h15m during duty week. Even in the worst case of a 30h part time worker investing said 15m every day that accounts for only 5% of weekly work time so no significant impact on contributions is expected.
281
282 1 okurz
#### Best practices for meetings
283
* Meetings concerning the whole team are moderated by the scrum master by default, who should join the call early and verify that the meeting itself and any tools used are working or e.g. advise the use of the fallback option.
284
* We would prefer UTC for meeting times to be globally fair but as many other SUSE meetings are bound to European time we need to stick to that as well.
285
* It is recommended to use the Jitsi Audio-feedback feature, blue/green circles depending on microphone volume. Everybody should ensure that at least "two green balls" show up
286
* Hand signals over video can be used, e.g. "waving/circling hands": "I am lost, please bring me into discussion again"; "T-Sign": "I need a break"; "Raised hand": "I would like to speak"
287
* Discuss topics relevant for all within the common meetings, continue discussions pro-actively over asynchronous communication, e.g. tickets, as well as conduct topic centered follow-up meetings with only relevant attendees
288
* Reminders in Slack correct for summer/winter time automatically but if you make changes on them the time might be shifted by one hour e.g. if you scheduled a reminder on 10:30 am CEST, it will become 9:30 CET after the switch
289
* Use https://etherpad.opensuse.org/p/suse_qe_tools for collaborative editing and put the content back into tickets or wikis
290
291
#### Workshop Topics
292
293
* *SUSE QE Tools roadmap*: Recent achievements, mid-term plan and future outlook. Every first Friday every month (Idea based on discussion between okurz and vpelcak 2021-02-09)
294 34 okurz
* (find older workshop topics and recordings on our [[ToolsWorkshopArchive|SUSE QE Tools Workshop Archive]]
295 161 okurz
* **2023-01-13:** *DONE* [How the development team within SUSE QE Tools works](https://youtu.be/I-_55UYBDPE) (@mkittler)
296 170 okurz
* **2023-01-20:** *DONE* [How the Product Owner within SUSE QE Tools works](https://youtu.be/tacy9Keetc8) (@okurz)
297 172 okurz
* **2023-01-27:** *skipped due to SUSE event at Nuremberg office*
298 174 okurz
* **2023-02-03:** *skipped due to SUSE HackWeek, see https://hackweek.opensuse.org/*
299 180 okurz
* **2023-02-10:** *DONE* proposal by pdostal: gh, glab - the CLI tools for GitLab and GitHub (@pdostal)
300
* **2023-02-17:** *DONE* [work estimation process within the SUSE QE Tools team](https://youtu.be/otjvQni2WPU) (@tinita)
301 182 livdywan
* **2023-02-24:** *DONE* [Overview of backends and general remarks followed by a dive into the general hardware backend to test on Raspberry Pi](https://youtu.be/r2_Ru3FSjaA) (@mkittler)
302 185 okurz
* **2023-03-03:** *DONE* [SUSE QE Tools roadmap - 2023-03](https://www.youtube.com/watch?v=ekbTe7dmfwI) (@okurz)
303
* **2023-03-10:** *skipped due to SUSE internal event*
304 194 okurz
* **2023-03-17:** *DONE* [Introduction to SELinux focused on QE and ALP by jsegitz](https://youtu.be/91pxilczl2I) (@okurz)
305 231 okurz
* **2023-03-24:** *DONE* [Experiences with the generalhw backend with laptops/desktops](https://www.youtube.com/watch?v=IvOG_lm3JII) (@marmarek)
306 201 okurz
* **2023-03-31:** *DONE* [Backend code walkthrough: qemu](https://youtu.be/aIwC50X-rwE) (@mkittler)
307 193 okurz
* **2023-04-07:** *skipped due to public holiday*
308 203 livdywan
* **2023-04-14:** *DONE* [Best practices for os-autoinst self-test development](https://youtu.be/sBXXRByuWZ8) (@mkittler)
309 206 okurz
* **2023-04-21:** *DONE* [Test Automation OBS->openQA with os-autoinst/openqa-trigger-from-obs](https://youtu.be/vgFcA9QNfEA) (@anikitin, @jlausuch)
310 212 okurz
* **2023-04-28:** *DONE* [Backend code walkthrough: ipmi](https://youtu.be/Je7q95F6H8w) (@okurz)
311 209 okurz
* **2023-05-05:** *DONE* [SUSE QE Tools roadmap - 2023-05](https://youtu.be/CzLM4Wd6PLU) (@okurz)
312 217 okurz
* **2023-05-12:** *DONE* [Backend code walkthrough: svirt](https://youtu.be/MYpeoe-3ZME) (@okurz)
313 206 okurz
* **2023-05-19:** *skipped due to public holiday bridge day*
314
* **2023-05-26:** *skipped due to openSUSE conference*
315 231 okurz
* **2023-06-02:** *skipped due to misplanning on side of okurz ;)*
316 1 okurz
* **2023-06-09:** *DONE* [os-autoinst/openqa-trigger-from-obs - real-life examples sharing, make it fail tests to understand what it is doing, fix problems you have](https://youtu.be/603UO-B6PGM) (@okurz, @anikitin)
317
* **2023-06-16:** *DONE* [More efficient video encoder used on o3 - how to work with videos #77842](https://youtu.be/K8Qhgl9MfAQ) (@okurz)
318 231 okurz
* **2023-06-23:** *DONE* [openQA test issue management best practices: How to file new test tickets properly assigned to appropriate teams, triage issues, etc.* Very suitable for newcomers!](https://youtu.be/HU-ig3NHrTk) (@okurz)
319
* **2023-06-30:** *skipped due to SUSE internal event*
320 236 okurz
* **2023-07-07:** *DONE* [SUSE QE Tools roadmap - 2023-07](https://youtu.be/AgpzYY8pThY) (@okurz)
321 241 okurz
* **2023-07-14:** *DONE* [How to run openQA in 5 minutes - all-in-one container](https://youtu.be/PZASQ4BFcpo) (@okurz)
322 246 okurz
* **2023-07-21:** *DONE* open conversation
323 247 okurz
* **2023-07-28:** *DONE* [O3 migration state - https://progress.opensuse.org/issues/132143 - sharing what was done, what's still to be done, asking for what it is still not working](https://youtu.be/ZVenqVOrwd8) (@okurz, @nicksinger)
324 248 okurz
* **2023-08-04:** *DONE* [SUSE QE Tools roadmap - 2023-08](https://youtu.be/diirvCdhWjk) (@okurz)
325 255 okurz
* **2023-08-11:** *DONE* [Salt managed OSD infrastructure using github.com/os-autoinst/salt-states-openqa](https://youtu.be/t6cKiY-abzg) (@mkittler, @okurz)
326 254 okurz
* **2023-08-18:** **
327 260 livdywan
* **2023-08-25:** *DONE* Automatically identify product issues with openQA (#109920) (@tinita)
328 262 okurz
* **2023-09-01:** *DONE* [SUSE QE Tools roadmap - 2023-09](https://youtu.be/mk0e-nKSWMo) (@okurz)
329 265 livdywan
* **2023-09-08:** *DONE* [Making new-lines in script_run fatal](https://streaming.nue.suse.com/i/QE-Tools-Workshops/2023-09-08%2009-01-23.mkv) (#134723) (@livdywan)
330 270 okurz
* **2023-09-15:** *DONE* [openQA in containers - how to deploy, how to use, limitations and differences to package-based deployments, relation to ALP (idea from jstehlik)](https://youtu.be/rTRsIsgywrY) (@okurz)
331 277 okurz
* **2023-09-22:** *DONE* [openQA job stability and resiliency](https://youtu.be/HmOT7cywq2k?si=ACK8cUkVIesjgACA) slight relation to #110683 (internal) (@okurz)
332 281 okurz
* **2023-09-29:** *DONE* [General search functionality in openQA](https://youtu.be/ioKEnGteLhQ) (@livdywan)
333 292 okurz
* **2023-10-06:** *DONE* [SUSE QE Tools roadmap - 2023-10](https://youtu.be/Q3NSjSnzB5k) (@okurz)
334
* **2023-10-13:** *DONE* [Statistics - Basics, how to apply for sporadic issues and test instabilities, statistical investigation, mean+std, rule of three](https://youtu.be/lNA6a5oSOl8) (@okurz)
335 302 okurz
* **2023-10-20:** *DONE* [Refactoring session for qem-bot](https://youtu.be/e2Zy8vp91ts) (@okurz)
336 303 okurz
* **2023-10-27:** *DONE* [Work methods, processes, best practices in QE. What changed in 2 years and ideas for the future](https://youtu.be/1dxfVe8hUsM) (mgrifalconi: recap a presentation done a couple of years ago, about best practices, infra as code, ownership, structure etc. to see what changed so far, benefit we got, what we could still improve and of course what new problems we got with these changes.
337
From there, we can start various discussions more in depth in progress tickets, like bot/dashboard architecture, handling stuff in gitlab/github etc!) (@mgrifalconi, @okurz)
338 310 okurz
* **2023-11-03:** *DONE* [SUSE QE Tools roadmap - 2023-11](https://youtu.be/sqkI6QXlJPg) (@okurz)
339 311 okurz
* **2023-11-10:** *skipped due to SUSE HackWeek, see https://hackweek.opensuse.org/*
340
* **2023-11-17:** *Results from SUSE HackWeek lightning talk session - present your own topics, 5 mins top for each! Put your topic proposal on https://etherpad.opensuse.org/p/suse_qa_tools_workshop_lightning_talks*
341 232 okurz
* **2023-11-24:** **
342
* **2023-12-01:** *SUSE QE Tools roadmap - 2023-12* (@okurz)
343
* **2023-12-08:** **
344
* **2023-12-15:** **
345
* **2023-12-22:** **
346 233 okurz
* **2023-12-29:** **
347
348 83 okurz
---
349 1 okurz
350
* *periodic proposal by okurz: How to report tickets, investigate issues, etc. (#104805)*
351
* *general proposal: if there are no further topics make it an "open conversation", at least from time to time :)*
352 179 okurz
* *proposal by okurz: Generic agile project management trainings and tutorials*
353 311 okurz
* *proposal by okurz: container-release-bot from QE container squad, maybe ph0enix would like to present*
354 25 okurz
* feedback from yearly workshop review: run it every second week but maybe longer, more interactive, more technical sessions, about backends and more openQA internals, from jlausuch: maybe understanding how svirt backend boots VMs in s390x, VMWare, etc? Highlight the differences between how qemu backend spawns VMs and how others do
355
356
**Note:** Everybody should feel welcome to add topic proposals here or approach us with ideas or requests.
357 1 okurz
358
#### Announcements
359
360
- For every meeting, regular or one-off, desired attendants should be invited to make sure a slot blocked in their calendar and reminders with the correct local time will show up when it's time to join the meeting
361
  - Create a new event, for example in Thunderbird via the *Calendar* tab or `New > Event` via the menu.
362 17 livdywan
  - Select individual attendants via their respective email addresses .g. *Invite attendees* in Thunderbird
363 1 okurz
  - Specify the time of the meeting
364
  - Set a schedule to repeat the event if applicable.
365
  - Add a location, e.g. https://meet.opensuse.org/suse_qa_tools
366
  - Don't worry if any of the details might change - you can update the invitation later and participants will be notified.
367 17 livdywan
  - Prefer new events if the time and date change
368 1 okurz
- See the respective meeting for regular actions such as communication via chat
369
370
### Team
371
372
The team is comprised of engineers from different teams, some only partially available:
373 245 livdywan
1. Liv Dywan (Scrum Master - Ensure that we build it fast) @livdywan / [@kalikiana](https://github.com/kalikiana)
374 169 okurz
1. Oliver Kurz (Product Owner - Ensure that we build the right thing) @okurz / [@okurz](https://github.com/okurz)
375 126 okurz
1. Nick Singer (only OPS) @nicksinger / [@nicksinger](https://github.com/nicksinger)
376
1. Tina Müller (Part time (35h)) @tinita / [@perlpunk](https://github.com/perlpunk)
377
1. Jan Baier (part time, QEM-dedicated work areas) @jbaier_cz / [@baierjan](https://github.com/baierjan)
378 216 kraih
1. Sebastian Riedel (mostly working on other projects currently, only bug fixing and feature development) @kraih / [@kraih](https://github.com/kraih)
379 1 okurz
1. Dominik Heidler @dheidler / [@dheidler](https://github.com/asdil12)
380 313 livdywan
1. Marius Kittler @mkittler / [@Martchus](https://github.com/Martchus)
381 268 okurz
1. ~~Ondřej Súkup @osukup / [@mimi1vx](https://github.com/mimi1vx)~~ (on Pack Team Sep 18 - Nov 18)
382 43 okurz
383 1 okurz
### Onboarding for new joiners
384
385 178 okurz
* Make sure you followed what is there for SUSE in general on https://geekos.io/onboarding and for LSG QE on https://wiki.suse.net/index.php/SUSE/Quality_Assurance/QA_Employee_Handbook
386 52 okurz
* For mentors: https://plan.io/blog/hire-remote-developers/
387 1 okurz
* Request to get added to the [tools team on GitHub](https://github.com/orgs/os-autoinst/teams/tools-team) and subscribe to notifications for projects within that organization
388
* Subscribe to notifications of the [Mojo-IOLoop-ReadWriteProcess project on GitHub](https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess) as it is also closely related to openQA development
389 251 okurz
* Login at [monitor.qa.suse.de](https://monitor.qa.suse.de/alerting/list) with NIS/LDAP credentials and ask to be given the *admin* role
390 1 okurz
* Watch this wiki page (click "Watch" button on top of this page)
391 9 okurz
* Subscribe to [o3-admins@suse.de](http://mailman.suse.de/mailman/listinfo/o3-admins), [osd-admins@suse.de](http://mailman.suse.de/mailman/listinfo/osd-admins), [openqa@suse.de](http://mailman.suse.de/mailman/listinfo/openqa) and [opensuse-factory@opensuse.org](https://lists.opensuse.org/archives/list/factory@lists.opensuse.org)
392 95 okurz
* Learn about https://gitlab.suse.de/openqa/password/ to have access to administer services, mailing lists, etc.
393 198 okurz
* Join #suse-qe-tools:opensuse.org (matrix) and our private team channel [team-qa-tools on Slack](https://suse.slack.com/archives/C02AJ1E568M)
394 89 livdywan
* Request to join [devel:openQA on OBS](https://build.opensuse.org/project/show/devel:openQA) and check that you have `Request created`, `New comment for request created`, `New comment for package created` enabled for `Maintainer of the target` in your [OBS notification settings](https://build.opensuse.org/my/subscriptions) (staging bot writes reminder comments on open reviews)
395 1 okurz
* Add [devel:openQA on OBS](https://build.opensuse.org/project/show/devel:openQA) to your watchlist
396 221 livdywan
* Connect to `#opensuse-factory` on *libera.chat*, see [[Tools#Common-tasks-for-team-members|Common tasks/ Monitoring]]
397 1 okurz
* Request admin access on [osd](http://openqa.suse.de/) and [o3](http://openqa.opensuse.org/)
398 131 tinita
* Request to get added to the [QA project in Progress](https://progress.opensuse.org/projects/qa/settings/members) and *enable notifications for the "QA" project* in [your account settings](https://progress.opensuse.org/my/account)
399 1 okurz
* Request to get added to the [openqa team in GitLab](https://gitlab.suse.de/groups/openqa/-/group_members)
400
* Add your ssh key to https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/sshd/users.sls with a merge request
401 204 okurz
* Add your ssh key to https://gitlab.suse.de/qa-maintenance/qamops/-/blob/master/ansible/books/vars/main.yml with a merge request
402
* Add your ssh key to https://gitlab.suse.de/OPS-Service/salt/-/blob/production/pillar/common/groups/qa-tools.yaml with a merge request
403 220 livdywan
* Ask an existing admin, e.g. other members of the team, to add your username and ssh key to o3, see [[openqav3:wiki#SSH-configuration|SSH configuration]]
404 221 livdywan
* Ensure you are subscribed to all projects referenced in [[Tools#Common-tasks-for-team-members|Common tasks for team members]]
405 1 okurz
* ~~Ensure you have access to https://gitlab.suse.de/OPS-Service/monitoring (create EngInfra ticket otherwise) and add yourself in https://gitlab.suse.de/OPS-Service/monitoring/-/tree/master/icinga/shared/contacts to receive monitoring information~~ EngInfra does not grant access to additional people currently. That might change again in the future.
406 187 okurz
* *Watch* https://gitlab.suse.de/qa-sle/qa-jump-configs/
407 1 okurz
* Ask for access to the vacations calendar (on demand, via invitation)
408
* *Watch* [qa-tools-backlog-assistant](https://github.com/os-autoinst/qa-tools-backlog-assistant) and choose *All Activity*
409 184 okurz
* ~~Ensure you can access thruk.suse.de via NIS/LDAP credentials (replaced by zabbix.suse.de)~~
410 219 livdywan
* Create [a service request on sd.suse.com](https://sd.suse.com/servicedesk/customer/portal/1/create/1) to be added to the Jira SD group **OSD Admins**, the Slack group **@qa-tools** as well as the zabbix.suse.de group **Owners/O3**.
411 205 mkittler
* For access to the VM host `openqa-service.qe.suse.de` create a MR *like* https://gitlab.suse.de/OPS-Service/salt/-/merge_requests/3411
412 259 livdywan
* If you need access to [Netbox](https://netbox.suse.de) 1) login using the **opensuse** SSO provider using *IDP credentials* 2) ask an existing admin (@okurz, @nicksinger, @mgriessmeier, @hreinecke) to unlock the account (padlocks will literally be removed from the UX; for the admins: https://netbox.suse.de/admin ask user to login once, so they appear in the list "Authentication and Authorization" -> "Users".. There you need to add them individually to the group "netbox-users")
413 1 okurz
414 7 okurz
### Offboarding
415 6 okurz
416
When someone leaves the team the following steps should be taken
417
418 35 okurz
* Conduct a team-internal exit-interview (Learn about what was good, what can be improved, what to learn)
419 6 okurz
* Remove from https://github.com/orgs/os-autoinst/teams/tools-team . Optionally add the people still as contributors with additional priviledges to individual projects
420
* Remove from team calendars
421
422 1 okurz
### Alert handling
423
424
#### Best practices
425
426
* "if it hurts, do it more often": https://www.martinfowler.com/bliki/FrequencyReducesDifficulty.html
427
* Reduce [Mean-time-to-Detect (MTTD)](https://searchitoperations.techtarget.com/definition/mean-time-to-detect-MTTD) and [Mean-time-to-Recovery](https://raygun.com/blog/what-is-mttr/)
428
429
#### Process
430
431
* React on any alert or report of an outage
432
* If users report outages of components of our infrastructure
433
  * Consider forming a task force and work together
434
  * Inform the affected users about the impact, mitigation/workarounds and ETA for resolution
435 199 mkittler
* For each failing alert, e.g. Grafana
436 142 okurz
 * Create a ticket for the issue (with a tag "alert"; create ticket unless the alert is trivial to resolve and needs no improvement; if an alert is unhandled for at least 4h then a ticket must be created; even create a ticket if alerts turn to "ok" to prevent these issues in the future and to improve the alert)
437 283 livdywan
 * Link the corresponding ... in the ticket
438
   * **Grafana panel** as reference in the alert email
439
   * Details of the failing job in case of an **Unreviewed issue** alert
440
   * Pipeline name and link in case of GitLab
441
 * Copy relevant metadata from the email, especially date and time, mentioned hostname(s) and the subject of the email
442 1 okurz
 * Respond to the notification email with a link to the ticket or forward the email to a corresponding mailing list, e.g. o3-admins@suse.de or osd-admins@suse.de (Caveat: gitlab@suse.de as sender seems to be able to receive emails and swallow them without any useful response or error message)
443
 * Optional: Inform in chat
444 283 livdywan
 * Optional: Add "annotation" in corresponding Grafana panel with a link to the corresponding ticket
445
 * Silence/pause the alert to mitigate urgency and reduce the priority of the ticket
446 298 okurz
   * For grafana just follow the "silence" button in alert emails or use https://monitor.qa.suse.de/alerting/silences
447 295 livdywan
   * In [Zabbix a problem can be suppressed](https://www.zabbix.com/documentation/current/en/manual/acknowledgment#updating-problems)
448 297 livdywan
   * When observing an *Unknown issue*, file a ticket and add it in a comment on the job
449
   * To address [openqa logwarn issues](https://github.com/os-autoinst/openqa-logwarn), add the message to the list of known messages (and potentially look into changing the message or log level later)
450 298 okurz
   * See [[Tools#Munin|Munin]]
451
   * See [[Tools#Gitlab-Pipeline-Notifications|gitlab pipeline notifications]]
452 1 okurz
* If you consider an alert non-actionable then change it accordingly
453
* If you do not know how to handle an alert ask the team for help
454
* We must always strive for an accepted hypothesis when we want to change alerts or call an issue resolved
455
* After resolving the issue add explanation in ticket, unpause alert and verify it going to "ok" again, resolve ticket
456
457
#### References
458 283 livdywan
459
* https://nl.devoteam.com/en/blog-post/monitoring-reduce-mean-time-recovery-mttr/
460 1 okurz
461 289 livdywan
#### Grafana
462
463
##### Pausing alerts
464
465
* [Silence the alert in Grafana](https://stats.openqa-monitor.qa.suse.de/alerting/silences)
466
* It is most useful to match by the `rule_uid` label or by the `alertname` label, e.g. `alertname=~openqa-piworker:.*` or `rule_uid=~host_up_alert_openqaworker-arm-\d+`. Note that the regex matching requires you to use `.*` at the start or end as `^` and `$` are implied.
467
* Fill in the comment field, e.g. with a ticket URL.
468
469 1 okurz
#### Gitlab Pipeline Notifications
470
471
Currently, the following projects are configured to write an email to osd-admins@suse.de if a pipeline fails:
472 279 okurz
* [openqa/auto-review](https://gitlab.suse.de/openqa/auto-review/-/settings/integrations/pipelines_email/edit) id: 4877
473 1 okurz
* [openqa/grafana-webhook-actions](https://gitlab.suse.de/openqa/grafana-webhook-actions/-/settings/integrations/pipelines_email/edit) id: 4652
474 279 okurz
* [openqa/monitor-o3](https://gitlab.suse.de/openqa/monitor-o3/-/settings/integrations/pipelines_email/edit) id: 5544
475
* [openqa/openqa-review](https://gitlab.suse.de/openqa/openqa-review/-/settings/integrations/pipelines_email/edit) id: 4884
476
* [openqa/osd-deployment](https://gitlab.suse.de/openqa/osd-deployment/-/settings/integrations/pipelines_email/edit) id: 3731
477
* [openqa/salt-states-openqa](https://gitlab.suse.de/openqa/salt-states-openqa/-/settings/integrations/pipelines_email/edit) id: 743
478
* [openqa/salt-pillars-openqa](https://gitlab.suse.de/openqa/salt-pillars-openqa/-/settings/integrations/pipelines_email/edit) id: 746
479
* [qa-maintenance/bot-ng](https://gitlab.suse.de/qa-maintenance/bot-ng/-/settings/integrations/pipelines_email/edit) id: 6096
480
* [qa-maintenance/openQABot](https://gitlab.suse.de/qa-maintenance/openQABot/-/settings/integrations/pipelines_email/edit) id: 3530
481 1 okurz
482 289 livdywan
###### Note:
483 1 okurz
- The configuration can be found by going to **Settings** > **Integrations** > **Pipeline Status Emails** (for any new projects the plugin will need to be enabled first)
484
- There's no way to subscribe as a user - instead an email address must be added
485 256 osukup
486 289 livdywan
###### API usage for handling email notification
487 256 osukup
488
- For disabling all CI fails notifications run:
489
490
~~~
491
export GITLAB_TOKEN=OAUTH2_USER_TOKEN_FROM_GITLAB
492
for i in 6096 4877 5544 3731 743 746 4652 3530 4884;do
493
    curl -X DELETE --header "Authorization: Bearer ${GITLAB_TOKEN}" "https://gitlab.suse.de/api/v4/projects/${i}/integrations/pipelines-email"
494 1 okurz
done
495
~~~
496
497
- For enabling all notifications:
498
499
~~~
500
export GITLAB_TOKEN=OAUTH2_USER_TOKEN_FROM_GITLAB
501
for i in 6096 4877 5544 3731 743 746 4652 3530 4884;do
502
    curl -X PUT --data 'recipients=osd-admins@suse.de&notify_only_broken_pipelines=true' --header "Authorization: Bearer ${GITLAB_TOKEN}" "https://gitlab.suse.de/api/v4/projects/${i}/integrations/pipelines-email"
503 256 osukup
done
504
~~~
505
506
* **OAUTH2_USER_TOKEN_FROM_GITLAB** must be valid user generated token with privileges to read/write api and user must have corresponding privileges in these repositories
507 289 livdywan
508
#### Munin
509
510 290 livdywan
* To completely disable alert emails from munin: in `/etc/munin/munin.conf`, comment out the line `contact.o3admins.command`.
511
* For individual plugins it is necessary to read the plugin docs, e.g. in `/etc/munin/plugins/df` you can see how to adjust the values for warning and critical. You then put this in `/etc/munin/plugin-conf.d/munin-node` and then `systemctl restart munin-node`, e.g.
512
513
```
514
[df]
515
env.exclude none unknown rootfs iso9660 squashfs udf romfs ramfs debugfs cgroup_root devtmpfs
516
env.warning 92
517
env.critical 98
518
```
519 257 osukup
520 1 okurz
#### Weekly alert duty
521
522
We all should react on alert but additionally we can have one person on "alert duty" for one week each to ensure quicker reaction times when other team members are focussed on development work. For this the person on duty should do the following:
523
524 30 livdywan
* React quickly (e.g. within two hours) on any unhandled alerts
525
* Hand over to the next person after the weekly, going by the order of team members in the wiki
526 1 okurz
* Asks for standin on unavailabilities
527
528 2 okurz
### Collaboration best practices
529
530
Sometimes there are pull requests that are based on other pull requests. Person X reviews PR 1 and Person Y reviews PR 2, but they share the same commit. As a result we have more work for all. For a best practice it is recommended to
531
532
* Include keywords in the PR subject line, e.g. "Part 2: … - based on #<previous_pr>". Example: https://github.com/os-autoinst/openQA/pull/4473
533
* Include the list of base pull request(s) in the PR description. Keep in mind that pull request links in github only seem to be properly rendered as preview links when included in a Markdown list, e.g.
534
535
```
536
Based on
537
* #1234
538
```
539
540
* Mark the dependant pull request as draft until the base pull request is approved or merged
541
542
See #105244 for the motivation for these best practices
543 210 okurz
544 211 okurz
#### SUSE-IT ticket handling
545 1 okurz
546 211 okurz
As we are relying on Eng-Infra a lot and need to coordinate our work we should follow a consistent process with best practices.
547
548
1. By default use [Incident](https://sd.suse.com/servicedesk/customer/portal/1/create/18) as it includes fields for "Impact" and "Urgency", avoid "Service Request". In some cases [Service Request with Approval](https://sd.suse.com/servicedesk/customer/portal/1/create/38) should be used, e.g. when trying to give access to some systems for new team members
549
2. Ensure there's a corresponding ticket for it in [openQA Infrastructure](https://progress.opensuse.org/projects/openqa-infrastructure/issues)
550
3. Use `Eng-Infra` under **Select a system**
551
4. Use `[openqa] …` in the subject if applicable
552
5. Use the below template for the **Description**
553 213 tinita
6. Select a sensible **Impact** and **Urgency** and make sure the severity and impact of the issue is explicitly mentioned in the EngInfra ticket, e.g. what business related workflows are impacted
554 211 okurz
7. Share the ticket with "OSD Admins" (or after the ticket was created then **Share** with `OSD Admins`; the icon with two figures, not a single gray avatar)
555
8. Use the tracker ticket for internal notes
556
9. React quickly to questions and ticket updates but also keep in mind limited capacity of EngInfra (as of 2022-02)
557
558
We have a ticket template to be used for SUSE SD Eng-Infra to improve our communication, to communicate impact, steps to reproduce, acceptance criteria. Use the following template and replace all instances of `<…>`:
559 210 okurz
560
```
561 230 okurz
h2. Observation
562 210 okurz
563
<To be replaced: Observation of the problem>
564
565 230 okurz
h2. Steps to reproduce
566 210 okurz
567
<To be replaced: Steps to reproduce>
568
569 230 okurz
h2. Expected result
570 210 okurz
571
<To be replaced: Expected result details>
572
573 230 okurz
h2. Impact
574 210 okurz
575
<To be replaced: What and who is impacted>
576
577 230 okurz
h2. Further details
578 210 okurz
579
Internal tracking issue: <To be replaced: ticket link on progress.opensuse.org>
580
581
Feel welcome to comment in the progress ticket which can be shared with more people by default and helps to communicate and we can edit texts and know who is assigned.
582
```
583 2 okurz
584 1 okurz
### Things to try
585
* Everybody can be "Product Owner" or "Scrum Master" or "Admin" or "Developer" for some time to get the different perspective
586
* From time to time ask stakeholders for their list of priorities regarding our tasks
587 10 okurz
* Seelect mob-programming tasks in unblock meetings to deep-dive in dedicated meeting
588 1 okurz
589
### Literature references
590
591
* https://xahteiwi.eu/resources/presentations/no-we-wont-have-a-video-call-for-that/
592
593 11 okurz
### Historical
594 1 okurz
595 11 okurz
Previously the former QA tools team used target versions "Ready" (to be planned into individual milestone periods or sprints), "Current Sprint" and "Done". However the team never really did use proper time-limited sprints so the distinction was rather vague. After having tickets "Resolved" after some time the PO or someone else would also update the target version to "Done" to signal that the result has been reviewed. This was causing a lot of ticket update noise for not much value considering that the [Definition-of-Done](https://progress.opensuse.org/projects/openqav3/wiki/#ticket-workflow) when properly followed already has rather strict requirements on when something can be considered really "Resolved" hence the team eventually decided to not use the "Done" target version anymore. Since about 2019-05 (and since okurz is doing more backlog management) the team uses priorities more as well as the status "Workable" together with an explicit team member list for "What the team is working on" to better visualize what is making team members busy regardless of what was "officially" planned to be part of the team's work. So we closed the target version. On 2020-07-03 okurz subsequently closed "Current Sprint" as also this one was in most cases equivalent to just picking an assignee for a ticket or setting to "In Progress". We can just distinguish between "(no version)" meaning untriaged, "Ready" meaning tools team should consider picking up these issues and "future" meaning that there is no plan for this to be picked up. Everything else is defined by status and priority.
596
In 2020-10-27 we discussed together to find out the history of the team. We clarified that the team started out as a not well defined "Dev+Ops" team. "team responsibilities" have been mainly unchanged since at least beginning of 2019. We agreed that learning from users and production about our "Dev" contributions is good, so this part of "Ops" is responsibility of everyone.
597
598
Also see #73060 for more details about how the responsibilities were setup.
599
600 42 livdywan
### Team-internal Hack Week (or Hackweek)
601
602
#### Rules of the game
603
604
- Regular meetings with the exception of the Weekly are cancelled
605
- Look into future tickets or other projects that relate to our usual work
606
- Backlog priorities are not enforced, short of emergency responses
607
- The challenge has to be solved the previous week, weekly to weekly
608
609 11 okurz
#### Extra-ordinary "hack-week" 2020-W51
610
611 1 okurz
SUSE QE Tools plans to have an internal "hack-week": Condition: We close 30 tickets from our backlog within the time frame 2020-12-03 until 2020-12-11 start of weekly meeting. No cheating! :) See [this query](https://progress.opensuse.org/issues?utf8=%E2%9C%93&set_filter=1&sort=priority%3Adesc%2Cid%3Adesc&f%5B%5D=status_id&op%5Bstatus_id%5D=c&f%5B%5D=fixed_version_id&op%5Bfixed_version_id%5D=%3D&v%5Bfixed_version_id%5D%5B%5D=418&f%5B%5D=closed_on&op%5Bclosed_on%5D=%3E%3C&v%5Bclosed_on%5D%5B%5D=2020-12-03&v%5Bclosed_on%5D%5B%5D=2020-12-11&f%5B%5D=&c%5B%5D=subject&c%5B%5D=project&c%5B%5D=status&c%5B%5D=assigned_to&c%5B%5D=relations&c%5B%5D=priority&c%5B%5D=category&c%5B%5D=cf_16&group_by=status&t%5B%5D=). During week 2020-W51 everyone is allowed to work on any hack-week project, it should just have a reasonable, "explainable" connection to our normal work. okurz volunteers to take over ops-duty for the week.
612
613
Result during meeting 2020-12-11: We missed the goal (by a slight amount) but we are motivated to try again in the next year :) Everybody, put some easy tickets aside for the next time!
614
615 11 okurz
#### Extra-ordinary "hack-week" 2021-W8
616 1 okurz
617
Similar as our attempt for 2020-W51 with same rules, except condition: We close 30 tickets from our backlog within the time frame 2021-02-05 until 2021-02-19 start of weekly meeting. No cheating! See [this query](https://progress.opensuse.org/issues?utf8=%E2%9C%93&set_filter=1&sort=priority%3Adesc%2Cid%3Adesc&f%5B%5D=status_id&op%5Bstatus_id%5D=c&f%5B%5D=fixed_version_id&op%5Bfixed_version_id%5D=%3D&v%5Bfixed_version_id%5D%5B%5D=418&f%5B%5D=closed_on&op%5Bclosed_on%5D=%3E%3C&v%5Bclosed_on%5D%5B%5D=2021-02-05&v%5Bclosed_on%5D%5B%5D=2021-02-19&f%5B%5D=&c%5B%5D=subject&c%5B%5D=project&c%5B%5D=status&c%5B%5D=assigned_to&c%5B%5D=relations&c%5B%5D=priority&c%5B%5D=category&c%5B%5D=cf_16&group_by=status&t%5B%5D=).
618
619
Result during meeting 2021-02-19: We missed the goal (25/30 tickets resolved) but again we are open to try again, maybe after next SUSE hack week.
620
621 28 okurz
#### Extra-ordinary "hack-week" 2022-W9
622
623
Same as in before, similar condition: We close 30 tickets from our backlog within the time frame 2022-02-18 until 2022-02-25 start of weekly meeting. No cheating! See [this query](https://progress.opensuse.org/issues?utf8=%E2%9C%93&set_filter=1&sort=priority%3Adesc%2Cid%3Adesc&f%5B%5D=status_id&op%5Bstatus_id%5D=c&f%5B%5D=fixed_version_id&op%5Bfixed_version_id%5D=%3D&v%5Bfixed_version_id%5D%5B%5D=418&f%5B%5D=closed_on&op%5Bclosed_on%5D=%3E%3C&v%5Bclosed_on%5D%5B%5D=2022-02-18&v%5Bclosed_on%5D%5B%5D=2022-02-25&f%5B%5D=&c%5B%5D=subject&c%5B%5D=project&c%5B%5D=status&c%5B%5D=assigned_to&c%5B%5D=relations&c%5B%5D=priority&c%5B%5D=category&c%5B%5D=cf_16&group_by=status&t%5B%5D=).
624
625 1 okurz
## Change announcements
626
627 11 okurz
For new, cool features or disruptive changes consider providing according notifications to our common userbase as well as potential future users, for example create post on opensuse-factory@opensuse.org , link to post on openqa@suse.de , invite for workshop, #opensuse-factory (IRC) (irc://irc.libera.chat/opensuse-factory), [#testing (Slack)](https://app.slack.com/client/T02863RC2AC/C02CANHLANP)