Project

General

Profile

Tools » History » Version 42

livdywan, 2022-03-04 11:16
Add proper hackweek section with basic rules of the game

1 1 okurz
{{toc}}
2
3
# QE tools - Team description
4
5
"The easiest way to provide complete quality for your software"
6
7
We provide the most complete free-software system-level testing solution to ensure high quality of operating systems, complete software stacks and multi-machine services for software distribution builders, system integration engineers and release teams. We continuously develop, maintain and release our software to be readily used by anyone while we offer a friendly community to support you in your needs. We maintain the main public and SUSE internal openQA server as well as supporting tools in the surrounding ecosystem.
8
9
## Team responsibilities
10
11
* Develop and maintain upstream openQA including the backend os-autoinst
12
* Administration of openqa.suse.de and workers (But not physical hardware, as these belong to the departments that purchased them and we merely facilitate)
13
* Helps administrating and maintaining openqa.opensuse.org, including coordination of efforts aiming at solving problems affecting o3
14
* Develop and maintain SUSE maintenance QA tools (SMELT, template generator, MTUI, openQA QAM bot, etc, e.g. from https://confluence.suse.com/display/maintenanceqa/Toolchain+for+maintenance+quality+engineering)
15
* Help with the investigation of specific issues, especially when they are likely related to generic or backend problems
16
* Support colleagues, team members and open source community
17
18
## Out of scope
19
20
* Maintenance and recurring review of individual tests
21
* Maintenance of physical hardware
22
* Maintenance of special worker addendums needed for tests, e.g. external hypervisor hosts for s390x, powerVM, xen, hyperv, IPMI, VMWare (Clarification: We maintain the code for all backends but we are no experts in specific domains. So we always try to help but it's a case by case decision based on what we realistically can provide based on our competence.)
23
* Ticket triaging of http://progress.opensuse.org/projects/openqatests/
24
* Setup of configuration for individual products to test, e.g. new job groups in openQA
25
* Feature development within the backend for single teams (commonly provided by teams themselves)
26
27
## Our common userbase
28
29
Known users of our products: Most SUSE QA engineers, SUSE SLE release managers and release engineers, every SLE developer submitting "submit requests" in OBS/IBS where product changes are tested as part of the "staging" process before changes are accepted in either SLE or openSUSE (staging tests must be green before packages are accepted), same for all openSUSE contributors submitting to either openSUSE:Factory (for Tumbleweed, SLE, future Leap versions) or Leap, other GNU/Linux distributions like Fedora https://openqa.fedoraproject.org/ , Debian https://openqa.debian.net/ , https://openqa.qubes-os.org/ , https://openqa.endlessm.com/ , the GNOME project https://openqa.gnome.org, https://www.codethink.co.uk/articles/2021/automated-linux-kernel-testing/, openSUSE KDE contributors (with their own workflows, https://openqa.opensuse.org/group_overview/23 ), openSUSE GNOME contributors (https://openqa.opensuse.org/group_overview/35 ), OBS developers (https://openqa.opensuse.org/parent_group_overview/7#grouped_by_build) , wicked developers (https://gitlab.suse.de/wicked-maintainers/wicked-ci#openqa), and of course our team itself for "openQA-in-openQA Tests" :) https://openqa.opensuse.org/group_overview/24
30
Keep in mind: "Users of openQA" and talking about "openSUSE release managers and engineers" means SUSE employees but also employees of other companies, also development partners of SUSE.
31
In summary our products, for example openQA, are a critical part of many development processes hence outages and regressions are disruptive and costly. Hence we need to ensure a high quality in production hence we practice DevOps with a slight tendency to a conservative approach for introducing changes while still ensuring a high development velocity.
32
33
## How we work
34
35
The QE Tools team is following the DevOps approach working using a lightweight Agile approach also inspired by [Extreme Programming](https://extremeprogramming.org/) and [Kanban](https://en.wikipedia.org/wiki/Kanban_(development)) and of course the original http://agilemanifesto.org/. We plan and track our works using tickets on https://progress.opensuse.org . We pick tickets based on priority and planning decisions. We use weekly meetings as checkpoints for progress and also track cycle and lead times to crosscheck progress against expectations.
36
37
* [tools team - backlog](https://progress.opensuse.org/issues?query_id=230): The complete backlog of the team
38
* [tools team - backlog, high-level view](https://progress.opensuse.org/issues?query_id=526): A high-level view of the backlog, all epics and higher (an "epic" includes multiple stories)
39
* [tools team - backlog, top-level view](https://progress.opensuse.org/issues?query_id=524): A top-level view of the backlog, only sagas and higher (a "saga" is bigger than an epic and can include multiple epics, i.e.  "epic of epics")
40
* [tools team - what members of the team are working on](https://progress.opensuse.org/issues?query_id=400): To check progress and know what the team is currently occupied with
41
* [tools team - closed within last 60 days](https://progress.opensuse.org/issues?query_id=541): What was recently resolved
42
43
*Be aware:* Custom queries in the right-hand sidebar of individual projects, e.g. https://progress.opensuse.org/projects/openqav3/issues , show queries with the same name but are limited to the scope of the specific projects so can show only a subset of all relevant tickets.
44
45
### What we expect from team members
46
47
* Actively show visible contributions to our products every workday *(pull requests, code review, ticket updates in decending priority, i.e. if you are very active in pull requests + code review ticket updates are much less important)*
48
* Be responsive over usual communication platforms and channels *(user questions, team discussions)*
49
* Stick to our rules *(this wiki, SLOs, alert handling)*
50
51
### Common tasks for team members
52
53
This is a list of common tasks that we follow, e.g. reviewing daily based on individual steps in the DevOps Process ![DevOps Process](devops-process_25p.png)
54
55
* **Plan**:
56
 * State daily learning and planned tasks in internal chat room
57
 * Review backlog for time-critical, triage new tickets, pick tickets from backlog; see https://progress.opensuse.org/projects/qa/wiki#How-we-work-on-our-backlog
58 8 okurz
 * Coordinate on the agile board https://progress.opensuse.org/agile/board?query_id=711
59 1 okurz
* **Code**:
60
 * See project specific contribution instructions
61
 * Provide peer-review following https://github.com/notifications based on projects within the scope of https://github.com/os-autoinst with the exception of test code repositories, especially https://github.com/os-autoinst/openQA, https://github.com/os-autoinst/os-autoinst, https://github.com/os-autoinst/scripts, https://github.com/os-autoinst/os-autoinst-distri-openQA, https://github.com/os-autoinst/openqa-trigger-from-obs, https://github.com/os-autoinst/openqa_review as well as other projects like https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess and https://gitlab.suse.de/qa-maintenance/openQABot
62
* **Build**:
63
 * See project specific contribution instructions
64
* **Test**:
65
 * Monitor failures on https://travis-ci.org/ relying on https://build.opensuse.org/package/show/devel:openQA/os-autoinst_dev for os-autoinst (email notifications)
66
 * Monitor failures on https://app.circleci.com/pipelines/github/os-autoinst/openQA?branch=master relying on https://build.opensuse.org/project/show/devel:openQA:ci for openQA (email notifications)
67
* **Release**:
68
 * By default we use the rolling-release model for all projects unless specified otherwise
69
 * Monitor [devel:openQA on OBS](https://build.opensuse.org/project/show/devel:openQA) (all packages and all subprojects) for failures, ensure packages are published on http://download.opensuse.org/repositories/devel:/openQA/, ensure to be added as a Maintainer for that project (members need to be added individually, you can ask existing team members, e.g. the SM)
70
 * Monitor http://jenkins.qa.suse.de/view/openQA-in-openQA/ for the openQA-in-openQA Tests and automatic submissions of os-autoinst and openQA to openSUSE:Factory through https://build.opensuse.org/project/show/devel:openQA:tested
71
* **Deploy**:
72
 * o3 is automatically deployed (daily), see https://progress.opensuse.org/projects/openqav3/wiki/Wiki#Automatic-update-of-o3
73
 * osd is automatically deployed (multiple times per week), monitor https://gitlab.suse.de/openqa/osd-deployment/pipelines and watch for notification email to openqa@suse.de
74
* **Operate**:
75
 * Apply infrastructure changes from https://gitlab.suse.de/openqa/salt-states-openqa (osd) or manually over sshd (o3)
76
 * Monitor for backup, see https://gitlab.suse.de/qa-sle/backup-server-salt
77
config changes in salt (osd), backups, job group configuration changes
78
 * Ensure old unused/non-matching needles are cleaned up (osd+o3), see #73387
79
 * Maintain https://gitlab.suse.de/qa-maintenance/qamops and https://confluence.suse.com/display/maintenanceqa/qam.suse.de
80
* **Monitor**:
81
 * React on alerts from [stats.openqa-monitor.qa.suse.de](https://stats.openqa-monitor.qa.suse.de/alerting/list?state=not_ok) (emails on [osd-admins@suse.de](http://mailman.suse.de/mailman/listinfo/osd-admins) and login via LDAP credentials, you must be an *editor* to edit panels and hooks via the web UI)
82
 * Look for incomplete jobs or scheduled not being worked on o3 and osd (API or webUI) - see also #81058 for *power*
83
 * React on alerts from https://gitlab.suse.de/openqa/auto-review/, https://gitlab.suse.de/openqa/openqa-review/, https://gitlab.suse.de/openqa/monitor-o3 (subscribe to projects for notifications)
84 3 livdywan
 * Be responsive on #opensuse-factory (irc://irc.libera.chat/opensuse-factory, formerly irc://chat.freenode.net/opensuse-factory) for help, support and collaboration (Unless you have a better solution it is suggested to use [Element.io](https://matrix.to/#/!ilXMcHXPOjTZeauZcg:libera.chat) for a sustainable presence; you also need a [registered IRC account](https://libera.chat/guides/registration), formerly [freenode](https://freenode.net/kb/answer/registration)) **note** *don't use matrix features on irc!*
85 1 okurz
 * Be responsive on [#qa-tools in Rocket.Chat](https://chat.suse.de/channel/qa-tools) for internal coordination and alarm handling, fallback to #suse-qe-tools:opensuse.org (matrix) as backup if other channels are temporarily down, alternatively public channels on matrix/ IRC if the topics are not confidential
86
 * Be responsive on [#testing](https://chat.suse.de/channel/testing) for help, support and collaboration
87
 * Be responsive on mailing lists opensuse-factory@opensuse.org and openqa@suse.de (see https://en.opensuse.org/openSUSE:Mailing_lists_subscription)
88
 * Be responsive in https://matrix.to/#/#openqa:opensuse.org or the bridged room [#openqa](https://discord.com/channels/366985425371398146/817367056956653621) on https://discord.gg/opensuse if you have a discord account
89
90
### How we work on our backlog
91
92
* "due dates" are only used as exception or reminders
93
* every team member can pick up tickets themselves
94
* everybody can set priority, PO can help to resolve conflicts
95
* consider the [ready, not assigned/blocked/low](https://progress.opensuse.org/issues?query_id=490) query as preferred. It is suggested to pick up tickets based on priority. "Workable" tickets are often convenient and hence preferred.
96
* ask questions in tickets, even potentially "stupid" questions, oftentimes descriptions are unclear and should be improved
97
* There are "low-level infrastructure tasks" only conducted by some team members, the "DevOps" aspect does not include that but focusses on the joint development and operation of our main products
98
* Consider tickets with the subject keyword or tag "learning" as good learning opportunities for people new to a certain area. Experts in the specific area should prefer helping others but not work on the ticket
99 26 okurz
* For tickets which are out of the scope of the team remove from backlog, delegate to corresponding teams or persons but be nice and supportive, e.g. [SUSE-IT](https://sd.suse.com/), [EngInfra](https://sd.suse.com/servicedesk/customer/portal/1) also see [SLA](https://confluence.suse.com/display/qasle/Service+Level+Agreements), [test maintainer](https://progress.opensuse.org/projects/openqatests/), QE-LSG PrjMgr/mgmt
100 32 okurz
* For [EngInfra tickets](https://sd.suse.com/servicedesk/customer/portal/1)
101
 * Ensure there's a ticket for it in [openQA Infrastructure](https://progress.opensuse.org/projects/openqa-infrastructure/issues)
102
 * Use `EngInfra` under **Select a system**
103
 * Use `[openqa] …` in the subject
104
 * Reference the progress ticket
105 38 okurz
 * Share the ticket with "OSD Admins" (or after the ticket was created then **Share** with `OSD Admins` (the icon with two figures, not a single gray avatar)
106 32 okurz
 * Use the tracker ticket for internal notes
107
 * Make sure the severity and impact of the issue is explicitly mentioned in the EngInfra ticket, e.g. what business related workflows are impacted
108
 * Keep in mind limited capacity of EngInfra (as of 2022-02)
109 1 okurz
* Whenever we apply changes to the infrastructure we should have a ticket
110
* Refactoring and general improvements are conducted while we work on features or regression fixes
111
* For every regression or bigger issue that we encounter try to come up with at least two improvements, e.g. the actual issue is fixed and similar cases are prevented in the future with better tests and optionally also monitoring is improved
112
* For critical issues and very big problems especially when we were informed by users about outages collect "lessons learned", e.g. in notes in the ticket or a meeting with minutes in the ticket, consider https://en.wikipedia.org/wiki/Five_whys and answer at least the following questions: "User impact, outwards-facing communication and mitigation, upstream improvement ideas, Why did the issue appear, can we reduce our detection time, can we prevent similar issues in the future, what can we improve technically, what can we improve in our processes". Also see https://youtu.be/_Dv4M39Arec
113
* okurz proposes to use "#NoEstimates". Though that topic is controversial and often misunderstood. https://ronjeffries.com/xprog/articles/the-noestimates-movement/ describes it nicely :) Hence tickets should be evenly sized and no estimation numbers should be provided on tickets
114
* If you really want you can look at the [burndown chart](https://progress.opensuse.org/agile/charts?utf8=%E2%9C%93&set_filter=1&f%5B%5D=chart_period&op%5Bchart_period%5D=%3E%3Ct-&v%5Bchart_period%5D%5B%5D=90&f%5B%5D=fixed_version_id&op%5Bfixed_version_id%5D=%3D&v%5Bfixed_version_id%5D%5B%5D=418&f%5B%5D=&chart=burndown_chart&chart_unit=issues&interval_size=day) (some people wish to have this) but we consider it unnecessary due to the continuous development, not a project with defined end. Also an [agile board](https://progress.opensuse.org/agile/board?utf8=%E2%9C%93&set_filter=1&f%5B%5D=fixed_version_id&op%5Bfixed_version_id%5D=%3D&v%5Bfixed_version_id%5D%5B%5D=418&f%5B%5D=status_id&op%5Bstatus_id%5D=%3D&f_status%5B%5D=1&f_status%5B%5D=12&f_status%5B%5D=2&f_status%5B%5D=15&f_status%5B%5D=4&c%5B%5D=tracker&c%5B%5D=assigned_to&c%5B%5D=cf_16) is available but likely due to problems within the redmine installation ordering cards is not reliable.
115
* Write to qa-team@suse.de as well for critical changes as well as chat channels
116
* Everyone should propose reverts of features if we find problems that can not be immediately fixed or worked around in production
117
118
#### Definition of DONE
119
120
Also see https://web.archive.org/web/20110308065330/http://www.allaboutagile.com/definition-of-done-10-point-checklist/ and https://web.archive.org/web/20170214020537/https://www.scrumalliance.org/community/articles/2008/september/what-is-definition-of-done-(dod)
121
122
* Code changes are made available via a pull request on a version control repository, e.g. github for openQA
123
* [Guidelines for git commits](http://chris.beams.io/posts/git-commit/) have been followed
124
* Code has been reviewed (e.g. in the github PR)
125 23 okurz
* Depending on criticality/complexity/size/feature: A local verification test has been run, e.g. post link to a local openQA machine or screenshot or logfile (especially also for hardware-related changes, e.g. in os-autoinst backend)
126 1 okurz
* For regressions: A regression fix is provided, flaws in the design, monitoring, process have been considered
127
* Potentially impacted package builds have been considered, e.g. openSUSE Tumbleweed and Leap, Fedora, etc.
128
* Code has been merged (either by reviewer or "mergify" bot or reviewee after 'LGTM' from others)
129
* Code has been deployed to osd and o3 (monitor automatic deployment, apply necessary config or infrastructure changes)
130
131
#### Definition of READY for new features
132
133
The following points should be considered before a new feature ticket is READY to be implemented:
134
135
* Follow the ticket template from https://progress.opensuse.org/projects/openqav3/wiki/#Feature-requests
136
* A clear motivation or user expressing a wish is available
137
* Acceptance criteria are stated (see ticket template) or use `[timeboxed:<nr>h]` with `<nr>` hours for tasks that should be limited in time, e.g. a research task with `[timeboxed:20h] research …`
138
* add tasks as a hint where to start
139
140
#### WIP-limits (reference "Kanban development")
141
142
* global limit of 10 tickets, and 3 tickets per person respectively [In Progress](https://progress.opensuse.org/issues?query_id=505)
143
* limit of 20 tickets per person in [Feedback](https://progress.opensuse.org/issues?query_id=520)
144
145
#### Target numbers or "guideline", "should be", in priorities
146
147
1. *New, untriaged QA (openQA, etc.):* [0 (daily)](https://progress.opensuse.org/projects/qa/issues?query_id=576) . Every ticket should have a target version, e.g. "Ready" for QE tools team, "future" if unplanned, others for other teams
148
1. *Untriaged "tools" tagged:* [0 (daily)](https://progress.opensuse.org/issues?query_id=481) . Every ticket should have a target version, e.g. "Ready" for QE tools team, "future" if unplanned, others for other teams
149
1. *Workable (properly defined):* [10-40](https://progress.opensuse.org/issues?query_id=478) . Enough tickets to reflect a proper plan but not too many to limit unfinished data (see "waste")
150
1. *Overall backlog length:* [ideally less than 100](https://progress.opensuse.org/issues?query_id=230) . Similar as for "Workable". Enough tickets to reflect a proper roadmap as well as give enough flexibility for all unfinished work but limited to a feasible number that can still be overlooked by the team without loosing overview. One more reason for a maximum of 100 are that pagination in redmine UI allows to show only up to 100 issues on one page at a time, same for redmine API access.
151
1. *Within due-date:* [0 (daily/weekly)](https://progress.opensuse.org/issues?query_id=514) . We should take due-dates serious, finish tickets fast and at the very least update tickets with an explanation why the due-date could not be hold and update to a reasonable time in the future based on usual cycle time expectations
152
153
#### SLOs (service level objectives)
154
155
* for picking up tickets based on priority, first goal is "urgency removal":
156
 * **immediate**: [<1 day](https://progress.opensuse.org/issues?query_id=542)
157
 * **urgent**: [<1 week](https://progress.opensuse.org/issues?query_id=543)
158
 * **high**: [<1 month](https://progress.opensuse.org/issues?query_id=544)
159
 * **normal**: [<1 year](https://progress.opensuse.org/issues?query_id=545)
160
 * **low**: undefined
161
162
* aim for cycle time of individual tickets (not epics or sagas): 1h-2w
163
164
* reference for SLOs and related topics: https://sre.google/sre-book/table-of-contents/
165
166
#### Backlog prioritization
167
168
When we prioritize tickets we assess:
169
1. What the main use cases of openQA are among all users, be it SUSE QA engineers, other SUSE employees, openSUSE contributors as well as any other outside user of openQA
170
2. We try to understand how many persons and products are affected by feature requests as well as regressions (or "concrete bugs" as the ticket category is called within the openQA Project) and prioritize issues affecting more persons and products and use cases over limited issues
171
3. We prioritize regressions higher than work on (new) feature requests
172
4. If a workaround or alternative exists then this lowers priority. We prioritize tasks that need deep understanding of the architecture and an efficient low-level implementation over convenience additions that other contributors are more likely to be able to implement themselves.
173
174
#### Periodic backlog grooming
175
176
These queries can be used as help to organize our work efficiently
177
178
* [QE tools team - backlog - sorted by update time](https://progress.opensuse.org/issues?query_id=654) ensure all tickets are reasonably up-to-date and don't keep hanging around
179
* [QE tools team - due date forecast](https://progress.opensuse.org/issues?query_id=651) prevent running into due-dates proactively
180
181
### Team meetings
182
183
* **Daily:** Use (internal) chat actively, e.g. formulate your findings or achievements and plans for the day, "think out loud" while working on individual problems. Optionally join [m.o.o/suse_qa_tools](https://meet.opensuse.org/suse_qa_tools) every Monday, Tuesday and Thursday 1030-1045 CET/CEST
184
  * *Goal*: Emergency responses, clarify next steps or blockers on current work items, asking and answering questions on tickets that would be ignored otherwise, ticket estimations (after the regular daily) (compare to [Daily Scrum](https://www.scrumguides.org/scrum-guide.html#events-daily))
185 18 livdywan
* **Ticket estimations:** Every Thursday 1110-1210 CET/CEST in [m.o.o/suse_qa_tools](https://meet.opensuse.org/suse_qa_tools) Estimate [t-shirt sizes for our non-estimated tickets](https://progress.opensuse.org/issues?query_id=717).
186 1 okurz
  * *Goal*: Ensure tickets are workable. Refine and split tickets for larger estimates.
187
* **Midweekly unblock:** Every Wednesday 1110-1210 CET/CEST in [m.o.o/suse_qa_tools](https://meet.opensuse.org/suse_qa_tools).
188
  * *Goal*: Discuss tasks in progress in more detail, unblock people.
189 16 okurz
* **Mob session:** Every Thursday 1330-1630 CET/CEST in [m.o.o/suse_qa_tools](https://meet.opensuse.org/suse_qa_tools). Optional, can be skipped if there are no suitable tasks pending
190
  * *Goal*: Through [mobbing](https://en.wikipedia.org/wiki/Mob_programming) follow-up on tasks which are stuck, too frightening, too difficult to solve alone.
191 1 okurz
* **Weekly coordination:** Every Friday 1110-1140(-1210) CET/CEST in [m.o.o/suse_qa_tools](https://meet.opensuse.org/suse_qa_tools). Community members and guests are particularly welcome to join this meeting.
192
  * *Goal*: Demo of features, Team backlog coordination and design decisions of bigger topics (compare to [Sprint Planning](https://www.scrumguides.org/scrum-guide.html#events-planning)).
193
  * *Conduction*: Demo recently finished feature work depending on [last closed](https://progress.opensuse.org/issues?query_id=572), crosscheck status of team, discuss blocked tasks and upcoming work
194
* **Fortnightly Retrospective:** Friday 1140-1210 CET/CEST every odd week, same room as the weekly meeting. On these days the weekly has hard time limit of 1110-1140.
195
  * *Goal*: Inspect and adapt, learn and improve (compare to [Sprint Retrospective](https://www.scrumguides.org/scrum-guide.html#events-retro))
196 29 livdywan
  * *Announcements*: Create a new *discussion* with all team members in Rocket Chat and a new [retrospected game](http://retrospected.core.qa.suse.de:8080) - the new board is made available after the previous retro. Specific actions will be recorded as tickets.
197 1 okurz
* **Virtual coffee:** Weekly every Monday 1100-1120 CET/CEST, same room as the weekly.
198
  * *Goal*: Connect and bond as a team, understand each other (compare to [Informal Communication in an all-remote environment](https://about.gitlab.com/company/culture/all-remote/informal-communication))
199
* **extension on-demand:** Optional meeting on invitation in the suggested time slot Thursday 1000-1200 CET/CEST, in the same room as the weekly, on-demand or replacing the *Virtual coffee talk*.
200
  * *Goal*: Introduce, research and discuss bigger topics, e.g. backlog overview, processes and workflows
201
* **Workshop:** Friday 0900-0950 CET/CEST every week in [m.o.o/suse_qa_tools](https://meet.opensuse.org/suse_qa_tools) especially for community members and users! We will run this every week with the plan to move to a fortnightly cadence every even week.
202
  * *Goal*: Demonstrate new and important features, explain already existing, but less well-known features, and discuss questions from the user community. All your questions are welcome!
203
  * *Announcements*: Drop a reminder with a teaser in [#testing](https://chat.suse.de/channel/testing).
204
  * *Recordings*: Consider recording, e.g. using OBS, and upload to youtube, link on topics link. SUSE internal topics can be published on http://streaming.nue.suse.com/i/QE-Tools-Workshops/ by ssh-uploading to ftp@streaming.nue.suse.com:~/i/QE-Tools-Workshops/ (get your SSH key added by existing team members, e.g. okurz)
205
206
**NOTICE**: We're are currently using meet.opensuse.org (m.o.o). As fallback on problems use [fallback](https://meet.jit.si/suse_qa_tools)
207
208 5 okurz
#### Weekly moderation duty
209
210
We see mandatory daily video calls as an effective measure but we don't want to enforce the team to do this unless we have to. To ensure that we have daily updates next to the [[Tools#Weekly-alert-duty|Alert duty]] we have the rotating role of "moderation duty". The person doing alert duty in the next week has "moderation duty". The duty consists of ensuring [[Tools#How-we-work-on-tickets|How we work on tickets]], in particular:
211
212
* On a daily base ensure that we have an update from every team member that is expected to be present this day. If a person actively contributes to the daily meeting in video call or provided an update related to backlog tasks in chat then this is already ensured.
213 31 livdywan
* Hand over to the next person during the weekly, going by the order of team members in the wiki
214 5 okurz
* Asks for standin on unavailabilities
215
216
We expect that this of course is an additional task with the corresponding time investment. The expected time invested per day is in the range of 3-15m, not more, so accounting for 15m-1h15m during duty week. Even in the worst case of a 30h part time worker investing said 15m every day that accounts for only 5% of weekly work time so no significant impact on contributions is expected.
217
218 1 okurz
#### Best practices for meetings
219
* Meetings concerning the whole team are moderated by the scrum master by default, who should join the call early and verify that the meeting itself and any tools used are working or e.g. advise the use of the fallback option.
220
* We would prefer UTC for meeting times to be globally fair but as many other SUSE meetings are bound to European time we need to stick to that as well.
221
* It is recommended to use the Jitsi Audio-feedback feature, blue/green circles depending on microphone volume. Everybody should ensure that at least "two green balls" show up
222
* Hand signals over video can be used, e.g. "waving/circling hands": "I am lost, please bring me into discussion again"; "T-Sign": "I need a break"; "Raised hand": "I would like to speak"
223
* Discuss topics relevant for all within the common meetings, continue discussions pro-actively over asynchronous communication, e.g. tickets, as well as conduct topic centered follow-up meetings with only relevant attendees
224
* Reminders in Slack correct for summer/winter time automatically but if you make changes on them the time might be shifted by one hour e.g. if you scheduled a reminder on 10:30 am CEST, it will become 9:30 CET after the switch
225
* Use https://etherpad.opensuse.org/p/suse_qe_tools for collaborative editing and put the content back into tickets or wikis
226
227
#### Workshop Topics
228
229
* *SUSE QE Tools roadmap*: Recent achievements, mid-term plan and future outlook. Every first Friday every month (Idea based on discussion between okurz and vpelcak 2021-02-09)
230 34 okurz
* (find older workshop topics and recordings on our [[ToolsWorkshopArchive|SUSE QE Tools Workshop Archive]]
231 1 okurz
* **2022-01-07:** *skipped due to holiday*
232
* **2022-01-14:** *open conversation* (@cdywan)
233
* **2022-01-21:** *DONE* [One year of SUSE QE Tools Workshop! Let's celebrate the success and have a good plan for the future](https://youtu.be/bzcEF5VAM7w) (@okurz)
234 12 okurz
* **2022-01-28:** *DONE* [auto-review with force-result](https://youtu.be/xFXdZiGrnTU) (@tinita)
235 19 okurz
* **2022-02-04:** *DONE* [SUSE QE Tools roadmap - 2022-02](https://youtu.be/_wi35z9arAs) (@okurz)
236 21 okurz
* **2022-02-11:** *DONE* [openQA feature: Retry of jobs based on test variables (#104007)](https://youtu.be/dQ9VoMaZFR0) (@okurz)
237 33 okurz
* **2022-02-18:** *DONE* [Scaling up: openQA result archiving and more (#64746)](https://youtu.be/6CQyKg8fxjE) (@mkittler)
238 40 okurz
* **2022-02-25:** lightning talk session - present your own topics, 5 mins top for each! (your favorite, least-favorite openQA feature, an unsolved problem you want to present) put your topic proposal on https://etherpad.opensuse.org/p/suse_qa_tools_workshop_lightning_talks
239 41 okurz
* **2022-03-04:** *DONE* [SUSE QE Tools roadmap - 2022-03](https://youtu.be/iBlQ1qO1Nqo) (@okurz)
240 40 okurz
* **2022-03-11:** *os-autoinst backends: How to check availability of ssh and other services (@okurz)*
241 1 okurz
* **2022-03-18:** *How to report tickets, investigate issues, etc. (#104805) (@okurz)*
242 39 okurz
* *proposal by okurz: The "TODO"-functionality on openQA (#93246) (@okurz)*
243 22 okurz
* *proposal by okurz: How we review openQA test results, by SUSE QE teams: Who volunteers from each team to present? Propose a speaker and a date! Do small lightning talks!*
244 1 okurz
* *proposal by okurz: openQA test review best practices and recent related feature development*
245 37 okurz
* *proposal by okurz: needling best practices (#106969) (@okurz)*
246 1 okurz
* *periodic proposal by okurz: How to report tickets, investigate issues, etc. (#104805)*
247
* *general proposal: if there are no further topics make it an "open conversation", at least from time to time :)*
248 25 okurz
* feedback from yearly workshop review: run it every second week but maybe longer, more interactive, more technical sessions, about backends and more openQA internals, from jlausuch: maybe understanding how svirt backend boots VMs in s390x, VMWare, etc? Highlight the differences between how qemu backend spawns VMs and how others do
249
250
**Note:** Everybody should feel welcome to add topic proposals here or approach us with ideas or requests.
251 1 okurz
252
#### Announcements
253
254
- For every meeting, regular or one-off, desired attendants should be invited to make sure a slot blocked in their calendar and reminders with the correct local time will show up when it's time to join the meeting
255
  - Create a new event, for example in Thunderbird via the *Calendar* tab or `New > Event` via the menu.
256 17 livdywan
  - Select individual attendants via their respective email addresses .g. *Invite attendees* in Thunderbird
257 1 okurz
  - Specify the time of the meeting
258
  - Set a schedule to repeat the event if applicable.
259
  - Add a location, e.g. https://meet.opensuse.org/suse_qa_tools
260
  - Don't worry if any of the details might change - you can update the invitation later and participants will be notified.
261 17 livdywan
  - Prefer new events if the time and date change
262 1 okurz
- See the respective meeting for regular actions such as communication via chat
263
264
### Team
265
266
The team is comprised of engineers from different teams, some only partially available:
267
* Cris Dywan (Scrum Master) @cdywan / [@kalikiana](https://github.com/kalikiana)
268
* Oliver Kurz (Product Owner)
269
* Marius Kittler
270 14 kraih
* Sebastian Riedel (only bug fixing and feature development) @kraih / [@kraih](https://github.com/kraih)
271 1 okurz
* Nick Singer (only OPS)
272
* Tina Müller (Part time (35h)) @tinita / [@perlpunk](https://github.com/perlpunk)
273
* Jan Baier (part time, QEM-dedicated work areas)
274
* Ondřej Súkup (dedicated work areas) @osukup / [@mimi1vx](https://github.com/mimi1vx)
275
* Moritz Kodytek @kodymo / [@FruitFly638](https://github.com/FruitFly638)
276
277
### Onboarding for new joiners
278
279
* Request to get added to the [tools team on GitHub](https://github.com/orgs/os-autoinst/teams/tools-team) and subscribe to notifications for projects within that organization
280
* Subscribe to notifications of the [Mojo-IOLoop-ReadWriteProcess project on GitHub](https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess) as it is also closely related to openQA development
281
* Login at [stats.openqa-monitor.qa.suse.de](https://stats.openqa-monitor.qa.suse.de/alerting/list) with NIS/LDAP credentials and ask to be given the *admin* role
282
* Watch this wiki page (click "Watch" button on top of this page)
283 9 okurz
* Subscribe to [o3-admins@suse.de](http://mailman.suse.de/mailman/listinfo/o3-admins), [osd-admins@suse.de](http://mailman.suse.de/mailman/listinfo/osd-admins), [openqa@suse.de](http://mailman.suse.de/mailman/listinfo/openqa) and [opensuse-factory@opensuse.org](https://lists.opensuse.org/archives/list/factory@lists.opensuse.org)
284 1 okurz
* Join #suse-qe-tools:opensuse.org (matrix) and [team-qa-tools on Slack](https://suse.slack.com/archives/C02AJ1E568M)
285
* Request to join [devel:openQA on OBS](https://build.opensuse.org/project/show/devel:openQA) and check that you have `Request created`, `New comment for request created`, `New comment for project created`, `New comment for package created` enabled for `Maintainer of the target` in your [OBS notification settings](https://build.opensuse.org/my/subscriptions) (staging bot writes reminder comments on open reviews)
286
* Add [devel:openQA on OBS](https://build.opensuse.org/project/show/devel:openQA) to your watchlist
287
* Connect to `#opensuse-factory` on *libera.chat*, see "Common tasks for team members - Monitor" above
288
* Request admin access on [osd](http://openqa.suse.de/) and [o3](http://openqa.opensuse.org/)
289
* Request to get added to the [QA project in Progress](https://progress.opensuse.org/projects/qa/settings/members) and *enable notifications for the openQA project* in [your account settings](https://progress.opensuse.org/my/account)
290
* Request to get added to the [openqa team in GitLab](https://gitlab.suse.de/groups/openqa/-/group_members)
291
* Add your ssh key to https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/sshd/users.sls with a merge request
292
* Add your ssh key to gitlab.suse.de/qa-maintenance/qamops/-/blob/master/ansible/books/vars/main.yml with a merge request
293
* Ask an existing admin, e.g. other members of the team, to add your username and ssh key to o3, see https://progress.opensuse.org/projects/openqav3/wiki/#SSH-configuration
294
* Ensure you are subscribed to all projects referenced in https://progress.opensuse.org/projects/qa/wiki#Common-tasks-for-team-members
295
* ~~Ensure you have access to https://gitlab.suse.de/OPS-Service/monitoring (create EngInfra ticket otherwise) and add yourself in https://gitlab.suse.de/OPS-Service/monitoring/-/tree/master/icinga/shared/contacts to receive monitoring information~~ EngInfra does not grant access to additional people currently. That might change again in the future.
296
* Ask for access to the vacations calendar (on demand, via invitation)
297
* *Watch* [qa-tools-backlog-assistant](https://github.com/os-autoinst/qa-tools-backlog-assistant) and choose *All Activity*
298
* Ensure you can access thruk.suse.de via NIS/LDAP credentials
299 15 okurz
* Create a ticket on sd.suse.com to be added to the Jira SD group "OSD Admins"
300 1 okurz
301 7 okurz
### Offboarding
302 6 okurz
303
When someone leaves the team the following steps should be taken
304
305 35 okurz
* Conduct a team-internal exit-interview (Learn about what was good, what can be improved, what to learn)
306 6 okurz
* Remove from https://github.com/orgs/os-autoinst/teams/tools-team . Optionally add the people still as contributors with additional priviledges to individual projects
307
* Remove from team calendars
308
309 1 okurz
### Alert handling
310
311
#### Best practices
312
313
* "if it hurts, do it more often": https://www.martinfowler.com/bliki/FrequencyReducesDifficulty.html
314
* Reduce [Mean-time-to-Detect (MTTD)](https://searchitoperations.techtarget.com/definition/mean-time-to-detect-MTTD) and [Mean-time-to-Recovery](https://raygun.com/blog/what-is-mttr/)
315
316
#### Process
317
318
* React on any alert or report of an outage
319
* If users report outages of components of our infrastructure
320
  * Consider forming a task force and work together
321
  * Inform the affected users about the impact, mitigation/workarounds and ETA for resolution
322
* For each failing alert, e.g. grafana
323
 * Create a ticket for the issue (with a tag "alert"; create ticket unless the alert is trivial to resolve and needs no improvement; if an alert is unhandled for at least 4h then a ticket must be created; even create a ticket if alerts turn to "ok" to prevent these issues in the future and to improve the alter)
324
 * Link the corresponding grafana panel in the ticket
325
 * Respond to the notification email with a link to the ticket or forward the email to a corresponding mailing list, e.g. o3-admins@suse.de or osd-admins@suse.de (Caveat: gitlab@suse.de as sender seems to be able to receive emails and swallow them without any useful response or error message)
326
 * Optional: Inform in chat
327
 * Optional: Add "annotation" in corresponding grafana panel with a link to the corresponding ticket 
328
 * Pause the alert if you think further alerting the team does not help (e.g. you can work on fixing the problem, alert is non-critical but problem can not be fixed within minutes)
329
* If you consider an alert non-actionable then change it accordingly
330
* If you do not know how to handle an alert ask the team for help
331
* We must always strive for an accepted hypothesis when we want to change alerts or call an issue resolved
332
* After resolving the issue add explanation in ticket, unpause alert and verify it going to "ok" again, resolve ticket
333
334
#### References
335
336
* https://nl.devoteam.com/en/blog-post/monitoring-reduce-mean-time-recovery-mttr/
337
338
#### Gitlab Pipeline Notifications
339
340
Currently, the following projects are configured to write an email to osd-admins@suse.de if a pipeline fails:
341
* [openqa/auto-review](https://gitlab.suse.de/openqa/auto-review/-/services/pipelines_email/edit)
342
* [openqa/grafana-webhook-actions](https://gitlab.suse.de/openqa/grafana-webhook-actions/-/services/pipelines_email/edit)
343
* [openqa/monitor-o3](https://gitlab.suse.de/openqa/monitor-o3/-/services/pipelines_email/edit)
344
* [openqa/openqa-review](https://gitlab.suse.de/openqa/openqa-review/-/services/pipelines_email/edit)
345
* [openqa/osd-deployment](https://gitlab.suse.de/openqa/osd-deployment/-/services/pipelines_email/edit)
346
* [openqa/salt-states-openqa](https://gitlab.suse.de/openqa/salt-states-openqa/-/services/pipelines_email/edit)
347
* [openqa/salt-pillars-openqa](https://gitlab.suse.de/openqa/salt-pillars-openqa/-/services/pipelines_email/edit)
348
* [qa-maintenance/bot-ng](https://gitlab.suse.de/qa-maintenance/bot-ng/-/services/pipelines_email/edit)
349
* [qa-maintenance/openQABot](https://gitlab.suse.de/qa-maintenance/openQABot/-/services/pipelines_email/edit)
350
351
- The configuration can be found by going to **Settings** > **Integrations** > **Pipeline Status Emails** (for any new projects the plugin will need to be enabled first)
352
- There's no way to subscribe as a user - instead an email address must be added
353
354
#### Weekly alert duty
355
356
We all should react on alert but additionally we can have one person on "alert duty" for one week each to ensure quicker reaction times when other team members are focussed on development work. For this the person on duty should do the following:
357
358 30 livdywan
* React quickly (e.g. within two hours) on any unhandled alerts
359
* Hand over to the next person after the weekly, going by the order of team members in the wiki
360 1 okurz
* Asks for standin on unavailabilities
361
362 2 okurz
### Collaboration best practices
363
364
Sometimes there are pull requests that are based on other pull requests. Person X reviews PR 1 and Person Y reviews PR 2, but they share the same commit. As a result we have more work for all. For a best practice it is recommended to
365
366
* Include keywords in the PR subject line, e.g. "Part 2: … - based on #<previous_pr>". Example: https://github.com/os-autoinst/openQA/pull/4473
367
* Include the list of base pull request(s) in the PR description. Keep in mind that pull request links in github only seem to be properly rendered as preview links when included in a Markdown list, e.g.
368
369
```
370
Based on
371
* #1234
372
```
373
374
* Mark the dependant pull request as draft until the base pull request is approved or merged
375
376
See #105244 for the motivation for these best practices
377
378 1 okurz
### Things to try
379
* Everybody can be "Product Owner" or "Scrum Master" or "Admin" or "Developer" for some time to get the different perspective
380
* From time to time ask stakeholders for their list of priorities regarding our tasks
381 10 okurz
* Seelect mob-programming tasks in unblock meetings to deep-dive in dedicated meeting
382 1 okurz
383
### Literature references
384
385
* https://xahteiwi.eu/resources/presentations/no-we-wont-have-a-video-call-for-that/
386
387 11 okurz
### Historical
388 1 okurz
389 11 okurz
Previously the former QA tools team used target versions "Ready" (to be planned into individual milestone periods or sprints), "Current Sprint" and "Done". However the team never really did use proper time-limited sprints so the distinction was rather vague. After having tickets "Resolved" after some time the PO or someone else would also update the target version to "Done" to signal that the result has been reviewed. This was causing a lot of ticket update noise for not much value considering that the [Definition-of-Done](https://progress.opensuse.org/projects/openqav3/wiki/#ticket-workflow) when properly followed already has rather strict requirements on when something can be considered really "Resolved" hence the team eventually decided to not use the "Done" target version anymore. Since about 2019-05 (and since okurz is doing more backlog management) the team uses priorities more as well as the status "Workable" together with an explicit team member list for "What the team is working on" to better visualize what is making team members busy regardless of what was "officially" planned to be part of the team's work. So we closed the target version. On 2020-07-03 okurz subsequently closed "Current Sprint" as also this one was in most cases equivalent to just picking an assignee for a ticket or setting to "In Progress". We can just distinguish between "(no version)" meaning untriaged, "Ready" meaning tools team should consider picking up these issues and "future" meaning that there is no plan for this to be picked up. Everything else is defined by status and priority.
390
In 2020-10-27 we discussed together to find out the history of the team. We clarified that the team started out as a not well defined "Dev+Ops" team. "team responsibilities" have been mainly unchanged since at least beginning of 2019. We agreed that learning from users and production about our "Dev" contributions is good, so this part of "Ops" is responsibility of everyone.
391
392
Also see #73060 for more details about how the responsibilities were setup.
393
394 42 livdywan
### Team-internal Hack Week (or Hackweek)
395
396
#### Rules of the game
397
398
- Regular meetings with the exception of the Weekly are cancelled
399
- Look into future tickets or other projects that relate to our usual work
400
- Backlog priorities are not enforced, short of emergency responses
401
- The challenge has to be solved the previous week, weekly to weekly
402
403 11 okurz
#### Extra-ordinary "hack-week" 2020-W51
404
405 1 okurz
SUSE QE Tools plans to have an internal "hack-week": Condition: We close 30 tickets from our backlog within the time frame 2020-12-03 until 2020-12-11 start of weekly meeting. No cheating! :) See [this query](https://progress.opensuse.org/issues?utf8=%E2%9C%93&set_filter=1&sort=priority%3Adesc%2Cid%3Adesc&f%5B%5D=status_id&op%5Bstatus_id%5D=c&f%5B%5D=fixed_version_id&op%5Bfixed_version_id%5D=%3D&v%5Bfixed_version_id%5D%5B%5D=418&f%5B%5D=closed_on&op%5Bclosed_on%5D=%3E%3C&v%5Bclosed_on%5D%5B%5D=2020-12-03&v%5Bclosed_on%5D%5B%5D=2020-12-11&f%5B%5D=&c%5B%5D=subject&c%5B%5D=project&c%5B%5D=status&c%5B%5D=assigned_to&c%5B%5D=relations&c%5B%5D=priority&c%5B%5D=category&c%5B%5D=cf_16&group_by=status&t%5B%5D=). During week 2020-W51 everyone is allowed to work on any hack-week project, it should just have a reasonable, "explainable" connection to our normal work. okurz volunteers to take over ops-duty for the week.
406
407
Result during meeting 2020-12-11: We missed the goal (by a slight amount) but we are motivated to try again in the next year :) Everybody, put some easy tickets aside for the next time!
408
409 11 okurz
#### Extra-ordinary "hack-week" 2021-W8
410 1 okurz
411
Similar as our attempt for 2020-W51 with same rules, except condition: We close 30 tickets from our backlog within the time frame 2021-02-05 until 2021-02-19 start of weekly meeting. No cheating! See [this query](https://progress.opensuse.org/issues?utf8=%E2%9C%93&set_filter=1&sort=priority%3Adesc%2Cid%3Adesc&f%5B%5D=status_id&op%5Bstatus_id%5D=c&f%5B%5D=fixed_version_id&op%5Bfixed_version_id%5D=%3D&v%5Bfixed_version_id%5D%5B%5D=418&f%5B%5D=closed_on&op%5Bclosed_on%5D=%3E%3C&v%5Bclosed_on%5D%5B%5D=2021-02-05&v%5Bclosed_on%5D%5B%5D=2021-02-19&f%5B%5D=&c%5B%5D=subject&c%5B%5D=project&c%5B%5D=status&c%5B%5D=assigned_to&c%5B%5D=relations&c%5B%5D=priority&c%5B%5D=category&c%5B%5D=cf_16&group_by=status&t%5B%5D=).
412
413
Result during meeting 2021-02-19: We missed the goal (25/30 tickets resolved) but again we are open to try again, maybe after next SUSE hack week.
414
415 28 okurz
#### Extra-ordinary "hack-week" 2022-W9
416
417
Same as in before, similar condition: We close 30 tickets from our backlog within the time frame 2022-02-18 until 2022-02-25 start of weekly meeting. No cheating! See [this query](https://progress.opensuse.org/issues?utf8=%E2%9C%93&set_filter=1&sort=priority%3Adesc%2Cid%3Adesc&f%5B%5D=status_id&op%5Bstatus_id%5D=c&f%5B%5D=fixed_version_id&op%5Bfixed_version_id%5D=%3D&v%5Bfixed_version_id%5D%5B%5D=418&f%5B%5D=closed_on&op%5Bclosed_on%5D=%3E%3C&v%5Bclosed_on%5D%5B%5D=2022-02-18&v%5Bclosed_on%5D%5B%5D=2022-02-25&f%5B%5D=&c%5B%5D=subject&c%5B%5D=project&c%5B%5D=status&c%5B%5D=assigned_to&c%5B%5D=relations&c%5B%5D=priority&c%5B%5D=category&c%5B%5D=cf_16&group_by=status&t%5B%5D=).
418
419 1 okurz
## Change announcements
420
421 11 okurz
For new, cool features or disruptive changes consider providing according notifications to our common userbase as well as potential future users, for example create post on opensuse-factory@opensuse.org , link to post on openqa@suse.de , invite for workshop, #opensuse-factory (IRC) (irc://irc.libera.chat/opensuse-factory), [#testing (Slack)](https://app.slack.com/client/T02863RC2AC/C02CANHLANP)