Wiki » History » Revision 187
Revision 186 (okurz, 2021-05-06 08:26) → Revision 187/424 (okurz, 2021-05-06 08:27)
{{toc}} # Test results overview * Latest report based on openQA test results http://s.qa.suse.de/test-status , SLE12: http://s.qa.suse.de/test-status-sle12 , SLE15: http://s.qa.suse.de/test-status-sle15 * only "blocker" or "shipstopper" bugs on "interesting products" for SLE: http://s.qa.suse.de/qa_sle_bugs_sle , SLE15: http://s.qa.suse.de/qa_sle_bugs_sle15_all, SLE12: http://s.qa/qa_sle_bugs_sle12_2 # QE tools - Team description "The easiest way to provide complete quality for your software" We provide the most complete free-software system-level testing solution to ensure high quality of operating systems, complete software stacks and multi-machine services for software distribution builders, system integration engineers and release teams. We continuously develop, maintain and release our software to be readily used by anyone while we offer a friendly community to support you in your needs. We maintain the main public and SUSE internal openQA server as well as supporting tools in the surrounding ecosystem. ## Team responsibilities * Develop and maintain upstream openQA * Administration of openqa.suse.de and workers (But not physical hardware, as these belong to the departments that purchased them and we merely facilitate) * Helps administrating and maintaining openqa.opensuse.org, including coordination of efforts aiming at solving problems affecting o3 * Develop and maintain internal maintenance QA tools (SMELT, template generator, MTUI, openQA QAM bot, etc, e.g. from https://confluence.suse.com/display/maintenanceqa/QAM+Toolchain) * Support colleagues, team members and open source community ## Out of scope * Maintenance of individual tests * Maintenance of physical hardware * Maintenance of special worker addendums needed for tests, e.g. external hypervisor hosts for s390x, powerVM * Ticket triaging of http://progress.opensuse.org/projects/openqatests/ * Feature development within the backend for single teams (commonly provided by teams themselves) ## Our common userbase Known users of our products: Most SUSE QA engineers, SUSE SLE release managers and release engineers, every SLE developer submitting "submit requests" in OBS/IBS where product changes are tested as part of the "staging" process before changes are accepted in either SLE or openSUSE (staging tests must be green before packages are accepted), same for all openSUSE contributors submitting to either openSUSE:Factory (for Tumbleweed, SLE, future Leap versions) or Leap, other GNU/Linux distributions like Fedora https://openqa.fedoraproject.org/ , Debian https://openqa.debian.net/ , https://openqa.qubes-os.org/ , https://openqa.endlessm.com/ , openSUSE KDE contributors (with their own workflows, https://openqa.opensuse.org/group_overview/23 ), openSUSE GNOME contributors (https://openqa.opensuse.org/group_overview/35 ), OBS developers (https://openqa.opensuse.org/parent_group_overview/7#grouped_by_build) , wicked developers (https://gitlab.suse.de/wicked-maintainers/wicked-ci#openqa), and of course our team itself for "openQA-in-openQA Tests" :) https://openqa.opensuse.org/group_overview/24 Keep in mind: "Users of openQA" and talking about "openSUSE release managers and engineers" means SUSE employees but also employees of other companies, also development partners of SUSE. In summary our products, for example openQA, are a critical part of many development processes hence outages and regressions are disruptive and costly. Hence we need to ensure a high quality in production hence we practice DevOps with a slight tendency to a conservative approach for introducing changes while still ensuring a high development velocity. ## How we work The QE Tools team is following the DevOps approach working using a lightweight Agile approach also inspired by [Extreme Programming](https://extremeprogramming.org/) and [Kanban](https://en.wikipedia.org/wiki/Kanban_(development)) and of course the original http://agilemanifesto.org/. [Kanban](https://en.wikipedia.org/wiki/Kanban_(development)). We plan and track our works using tickets on https://progress.opensuse.org . We pick tickets based on priority and planning decisions. We use weekly meetings as checkpoints for progress and also track cycle and lead times to crosscheck progress against expectations. * [tools team - backlog](https://progress.opensuse.org/issues?query_id=230): The complete backlog of the team * [tools team - backlog, high-level view](https://progress.opensuse.org/issues?query_id=526): A high-level view of the backlog, all epics and higher (an "epic" includes multiple stories) * [tools team - backlog, top-level view](https://progress.opensuse.org/issues?query_id=524): A top-level view of the backlog, only sagas and higher (a "saga" is bigger than an epic and can include multiple epics, i.e. "epic of epics") * [tools team - what members of the team are working on](https://progress.opensuse.org/issues?query_id=400): To check progress and know what the team is currently occupied with * [tools team - closed within last 60 days](https://progress.opensuse.org/issues?query_id=541): What was recently resolved *Be aware:* Custom queries in the right-hand sidebar of individual projects, e.g. https://progress.opensuse.org/projects/openqav3/issues , show queries with the same name but are limited to the scope of the specific projects so can show only a subset of all relevant tickets. ### What we expect from team members * Actively show visible contributions to our products every day *(pull requests, code review, ticket updates)* * Be responsive over usual communication platforms and channels *(user questions, team discussions)* * Stick to our rules *(this wiki, SLOs, alert handling)* ### Common tasks for team members This is a list of common tasks that we follow, e.g. reviewing daily based on individual steps in the DevOps Process ![DevOps Process](devops-process_25p.png) * **Plan**: * State daily learning and planned tasks in internal chat room * Review backlog for time-critical, triage new tickets, pick tickets from backlog; see https://progress.opensuse.org/projects/qa/wiki#How-we-work-on-our-backlog * **Code**: * See project specific contribution instructions * Provide peer-review following https://github.com/notifications based on projects within the scope of https://github.com/os-autoinst/ with the exception of test code repositories, especially https://github.com/os-autoinst/openQA, https://github.com/os-autoinst/os-autoinst, https://github.com/os-autoinst/scripts, https://github.com/os-autoinst/os-autoinst-distri-openQA, https://github.com/os-autoinst/openqa-trigger-from-obs, https://github.com/os-autoinst/openqa_review as well as other projects like https://gitlab.suse.de/qa-maintenance/openQABot/ * **Build**: * See project specific contribution instructions * **Test**: * Monitor failures on https://travis-ci.org/ relying on https://build.opensuse.org/package/show/devel:openQA/os-autoinst_dev for os-autoinst (email notifications) * Monitor failures on https://app.circleci.com/pipelines/github/os-autoinst/openQA?branch=master relying on https://build.opensuse.org/project/show/devel:openQA:ci for openQA (email notifications) * **Release**: * By default we use the rolling-release model for all projects unless specified otherwise * Monitor [devel:openQA on OBS](https://build.opensuse.org/project/show/devel:openQA) (all packages and all subprojects) for failures, ensure packages are published on http://download.opensuse.org/repositories/devel:/openQA/ (members need to be added individually, you can ask existing team members, e.g. the SM) * Monitor http://jenkins.qa.suse.de/view/openQA-in-openQA/ for the openQA-in-openQA Tests and automatic submissions of os-autoinst and openQA to openSUSE:Factory through https://build.opensuse.org/project/show/devel:openQA:tested * **Deploy**: * o3 is automatically deployed (daily), see https://progress.opensuse.org/projects/openqav3/wiki/Wiki#Automatic-update-of-o3 * osd is automatically deployed (weekly), monitor https://gitlab.suse.de/openqa/osd-deployment/pipelines and watch for notification email to openqa@suse.de * **Operate**: * Apply infrastructure changes from https://gitlab.suse.de/openqa/salt-states-openqa (osd) or manually over sshd (o3) * Monitor for backup, see https://gitlab.suse.de/qa-sle/backup-server-salt config changes in salt (osd), backups, job group configuration changes * Ensure old unused/non-matching needles are cleaned up (osd+o3), see #73387 * **Monitor**: * React on alerts from [stats.openqa-monitor.qa.suse.de](https://stats.openqa-monitor.qa.suse.de/alerting/list?state=not_ok) (emails on [osd-admins@suse.de](http://mailman.suse.de/mailman/listinfo/osd-admins) and login via LDAP credentials, you must be an *editor* to edit panels and hooks via the web UI) * Look for incomplete jobs or scheduled not being worked on o3 and osd (API or webUI) - see also #81058 for *power* * React on alerts from https://gitlab.suse.de/openqa/auto-review/, https://gitlab.suse.de/openqa/openqa-review/, https://gitlab.suse.de/openqa/monitor-o3 (subscribe to projects for notifications) * Be responsive on #opensuse-factory (irc://chat.freenode.net/opensuse-factory) for help, support and collaboration (Unless you have a better solution it is suggested to use [Element.io](https://app.element.io/#/room/%23freenode_%23opensuse-factory:matrix.org) for a sustainable presence; you also need a [registered IRC account](https://freenode.net/kb/answer/registration)) * Be responsive on [#qa-tools](https://chat.suse.de/channel/qa-tools) for internal coordination and alarm handling, fallback to #opensuse-factory (irc://chat.freenode.net/opensuse-factory) as backup if [#qa-tools](https://chat.suse.de/channel/qa-tools) is not available, e.g. if chat.suse.de is down * Be responsive on [#testing](https://chat.suse.de/channel/testing) for help, support and collaboration * Be responsive on mailing lists opensuse-factory@opensuse.org and openqa@suse.de (see https://en.opensuse.org/openSUSE:Mailing_lists_subscription) * Be responsive in https://matrix.to/#/#openqa:opensuse.org or the bridged room [#openqa](https://discord.com/channels/366985425371398146/817367056956653621) on https://discord.gg/opensuse if you have a discord account ### How we work on our backlog * "due dates" are only used as exception or reminders * every team member can pick up tickets themselves * everybody can set priority, PO can help to resolve conflicts * consider the [ready, not assigned/blocked/low](https://progress.opensuse.org/issues?query_id=490) query as preferred * ask questions in tickets, even potentially "stupid" questions, oftentimes descriptions are unclear and should be improved * There are "low-level infrastructure tasks" only conducted by some team members, the "DevOps" aspect does not include that but focusses on the joint development and operation of our main products * Consider tickets with the subject keyword or tag "learning" as good learning opportunities for people new to a certain area. Experts in the specific area should prefer helping others but not work on the ticket * For tickets which are out of the scope of the team remove from backlog, delegate to corresponding teams or persons but be nice and supportive, e.g. [SUSE-IT](https://sd.suse.com/), [EngInfra](https://infra.nue.suse.com/) also see [SLA](https://confluence.suse.com/display/qasle/Service+Level+Agreements), [test maintainer](https://progress.opensuse.org/projects/openqatests/), QE-LSG PrjMgr/mgmt * For [EngInfra](https://infra.nue.suse.com/) tickets first create tracker ticket in https://progress.opensuse.org/projects/openqa-infrastructure/issues/ , then create EngInfra ticket with "[openqa] …" in subject, optional "[openqa][urgent] …", reference progress ticket, CC osd-admins@suse.de . Use the tracker ticket for internal notes * Whenever we apply changes to the infrastructure we should have a ticket * Refactoring and general improvements are conducted while we work on features or regression fixes * For every regression or bigger issue that we encounter try to come up with at least two improvements, e.g. the actual issue is fixed and similar cases are prevented in the future with better tests and optionally also monitoring is improved * For critical issues and very big problems collect "lessons learned", e.g. in notes in the ticket or a meeting with minutes in the ticket, consider https://en.wikipedia.org/wiki/Five_whys and answer at least the following questions: "User impact, outwards-facing communication and mitigation, upstream improvement ideas, Why did the issue appear, can we reduce our detection time, can we prevent similar issues in the future, what can we improve technically, what can we improve in our processes" * okurz proposes to use "#NoEstimates". Though that topic is controversial and often misunderstood. https://ronjeffries.com/xprog/articles/the-noestimates-movement/ describes it nicely :) #### Definition of DONE Also see http://www.allaboutagile.com/definition-of-done-10-point-checklist/ and https://www.scrumalliance.org/community/articles/2008/september/what-is-definition-of-done-%28dod%29 * Code changes are made available via a pull request on a version control repository, e.g. github for openQA * [Guidelines for git commits](http://chris.beams.io/posts/git-commit/) have been followed * Code has been reviewed (e.g. in the github PR) * Depending on criticality/complexity/size/feature: A local verification test has been run, e.g. post link to a local openQA machine or screenshot or logfile * Potentially impacted package builds have been considered, e.g. openSUSE Tumbleweed and Leap, Fedora, etc. * Code has been merged (either by reviewer or "mergify" bot or reviewee after 'LGTM' from others) * Code has been deployed to osd and o3 (monitor automatic deployment, apply necessary config or infrastructure changes) #### Definition of READY for new features The following points should be considered before a new feature ticket is READY to be implemented: * Follow the ticket template from https://progress.opensuse.org/projects/openqav3/wiki/#Feature-requests * A clear motivation or user expressing a wish is available * Acceptance criteria are stated (see ticket template) * add tasks as a hint where to start #### WIP-limits (reference "Kanban development") * global limit of 10 tickets, and 3 tickets per person respectively [In Progress](https://progress.opensuse.org/issues?query_id=505) * limit of 20 tickets per person in [Feedback](https://progress.opensuse.org/issues?query_id=520) #### Target numbers or "guideline", "should be", in priorities 1. *New, untriaged QA (openQA, etc.):* [0 (daily)](https://progress.opensuse.org/projects/qa/issues?query_id=576) . Every ticket should have a target version, e.g. "Ready" for QE tools team, "future" if unplanned, others for other teams 1. *Untriaged "tools" tagged:* [0 (daily)](https://progress.opensuse.org/issues?query_id=481) . Every ticket should have a target version, e.g. "Ready" for QE tools team, "future" if unplanned, others for other teams 1. *Workable (properly defined):* [~40 (20-50)](https://progress.opensuse.org/issues?query_id=478) . Enough tickets to reflect a proper plan but not too many to limit unfinished data (see "waste") 1. *Overall backlog length:* [ideally less than 100](https://progress.opensuse.org/issues?query_id=230) . Similar as for "Workable". Enough tickets to reflect a proper roadmap as well as give enough flexibility for all unfinished work but limited to a feasible number that can still be overlooked by the team without loosing overview. One more reason for a maximum of 100 are that pagination in redmine UI allows to show only up to 100 issues on one page at a time, same for redmine API access. 1. *Within due-date:* [0 (daily/weekly)](https://progress.opensuse.org/issues?query_id=514) . We should take due-dates serious, finish tickets fast and at the very least update tickets with an explanation why the due-date could not be hold and update to a reasonable time in the future based on usual cycle time expectations #### SLOs (service level objectives) * for picking up tickets based on priority, first goal is "urgency removal": * **immediate**: [<1 day](https://progress.opensuse.org/issues?query_id=542) * **urgent**: [<1 week](https://progress.opensuse.org/issues?query_id=543) * **high**: [<1 month](https://progress.opensuse.org/issues?query_id=544) * **normal**: [<1 year](https://progress.opensuse.org/issues?query_id=545) * **low**: undefined * aim for cycle time of individual tickets (not epics or sagas): 1h-2w #### Backlog prioritization When we prioritize tickets we assess: 1. What the main use cases of openQA are among all users, be it SUSE QA engineers, other SUSE employees, openSUSE contributors as well as any other outside user of openQA 2. We try to understand how many persons and products are affected by feature requests as well as regressions (or "concrete bugs" as the ticket category is called within the openQA Project) and prioritize issues affecting more persons and products and use cases over limited issues 3. We prioritize regressions higher than work on (new) feature requests 4. If a workaround or alternative exists then this lowers priority. We prioritize tasks that need deep understanding of the architecture and an efficient low-level implementation over convenience additions that other contributors are more likely to be able to implement themselves. ### Team meetings * **Daily:** Use (internal) chat actively, e.g. formulate your findings or achievements and plans for the day, "think out loud" while working on individual problems. Optionally join the call every day 1030-1045 CET/CEST with optional extension for selected topics * *Goal*: Quick support on problems, feedback on plans, collaboration and self-reflection (compare to [Daily Scrum](https://www.scrumguides.org/scrum-guide.html#events-daily)) * **Weekly coordination:** Every Friday 1115-1145(-1215) CET/CEST in [m.o.o/suse_qa_tools](https://meet.opensuse.org/suse_qa_tools) ([fallback](https://meet.jit.si/suse_qa_tools)). Community members and guests are particularly welcome to join this meeting. * *Goal*: Demo of features, Team backlog coordination and design decisions of bigger topics (compare to [Sprint Planning](https://www.scrumguides.org/scrum-guide.html#events-planning)). * *Conduction*: Demo recently finished feature work depending on [last closed](https://progress.opensuse.org/issues?query_id=572), crosscheck status of team, discuss blocked tasks and upcoming work * **Fortnightly Retrospective:** Friday 1145-1215 CET/CEST every even week, same room as the weekly meeting. On these days the weekly has hard time limit of 1115-1145. * *Goal*: Inspect and adapt, learn and improve (compare to [Sprint Retrospective](https://www.scrumguides.org/scrum-guide.html#events-retro)) * *Announcements*: Create a new *discussion* with all team members in Rocket Chat and a new [retrospected game](retrospected.com) which can be filled in all week. Specific actions will be recorded as tickets. * **Virtual coffee talk:** Weekly every Thursday 1100-1120 CET/CEST, same room as the weekly. * *Goal*: Connect and bond as a team, understand each other (compare to [Informal Communication in an all-remote environment](https://about.gitlab.com/company/culture/all-remote/informal-communication)) * **extension on-demand:** Optional meeting on invitation in the suggested time slot Thursday 1000-1200 CET/CEST, in the same room as the weekly, on-demand or replacing the *Virtual coffee talk*. * *Goal*: Introduce, research and discuss bigger topics, e.g. backlog overview, processes and workflows * **Workshop:** Friday 0900-0950 CET/CEST every week in [m.o.o/suse_qa_tools](https://meet.opensuse.org/suse_qa_tools) especially for community members and users! We will run this every week with the plan to move to a fortnightly cadence every even week. * *Goal*: Demonstrate new and important features, explain already existing, but less well-known features, and discuss questions from the user community. All your questions are welcome! * *Announcements*: Drop a reminder with a teaser in [#testing](https://chat.suse.de/channel/testing). #### Best practices for meetings * Meetings concerning the whole team are moderated by the scrum master by default, who should join the call early and verify that the meeting itself and any tools used are working or e.g. advise the use of the fallback option. * We would prefer UTC for meeting times to be globally fair but as many other SUSE meetings are bound to European time we need to stick to that as well. * It is recommended to use the Jitsi Audio-feedback feature, blue/green circles depending on microphone volume. Everybody should ensure that at least "two green balls" show up * Hand signals over video can be used, e.g. "waving/circling hands": "I am lost, please bring me into discussion again"; "T-Sign": "I need a break"; "Raised hand": "I would like to speak" #### Workshop Topics * *SUSE QE Tools roadmap*: Recent achievements, mid-term plan and future outlook. Every first Friday every month (Idea based on discussion between okurz and vpelcak 2021-02-09) * **2021-01-15:** *DONE* [openqa-auto-review and openqa-investigate](https://youtu.be/_t3THhdiDag) * **2021-01-29:** *DONE* overview of development repositories on https://github.com/os-autoinst/ * **2021-02-05:** *DONE* [powerpc](https://youtu.be/q1CM2AH5aKY) (@nicksinger) * **2021-02-12:** *DONE* [job templates](https://youtu.be/YPuH0bcr524) (@tinita, @cdywan) * **2021-02-19:** *DONE* [SUSE QEM review workflow discussions](https://youtu.be/nCIAcvD7SA8) (@dzedro, @mgrifalconi) * **2021-02-26:** *DONE* open conversation * **2021-03-05:** *DONE* [SUSE QE Tools roadmap](https://youtu.be/vIqBIEMH0O0) (@okurz, @mkittler) * **2021-03-12:** *DONE* [openqa-mon](https://youtu.be/CNLihgMKt30) @ph03nix * **2021-03-19:** *DONE* [multi-machine tests](https://youtu.be/9j-NgNTzJ0w) (@okurz; topic proposal by zluo, initially brought up as: "high RAM and storage requirements") * **2021-03-26:** *skipped due to SUSE Hack Week* * **2021-04-02:** *public holiday* * **2021-04-09:** *DONE* [SUSE QE Tools roadmap - 2021-04](https://youtu.be/nfMilLcCosQ) (@okurz, @cdywan) * **2021-04-16:** *DONE* [openqa.opensuse.org infrastructure overview](https://youtu.be/G5bQKI2tURk) (see question in #88831#note-19 , @okurz) * **2021-04-23:** *DONE* [openQA tests written in Python](https://youtu.be/GjKZ51lnCh0) (@okurz, @cdywan) * **2021-04-30:** *DONE* [openqa-review: A review helper script for openQA with complete test overview reports](https://youtu.be/GjKZ51lnCh0) (@okurz) * **2021-05-07:** *SUSE QE Tools roadmap - 2021-05* * **2021-05-14:** [Review badges](https://open.qa/docs/#_review_badges) and recent changes related to them (@mkittler) * **:** *-* * intro to os-autoinst development (demo how to investigate and test a small fix) * How to get better feedback when we share new openQA features. * Workflow discussions: SUSE QE aggregate tests (Proposed by okurz: I would like to learn from others how these are included in the workflow) * proposal by ybonatakis: Explore integration of other tools, test frameworks, Integration * proposal by ybonatakis: QA best practices * proposal by okurz motivated by https://chat.suse.de/channel/testing?msg=EysbgG5kFrHbmjvcy Tumbleweed workflows focussed on openQA, e.g. impact of failing tests, to-test manager, etc. (okurz, dimstar?) #### Announcements - For every meeting, regular or one-off, desired attendants should be invited to make sure a slot blocked in their calendar and reminders with the correct local time will show up when it's time to join the meeting - Create a new event, for example in Thunderbird via the *Calendar* tab or `New > Event` via the menu. - Pick your audience, for example `qa-team@suse.de` will reach test developers and reviewers, or you can select individual attendants via their respective email addresses. - Add attendees accordingly. - Specify the time of the meeting - Set a schedule to repeat the event if applicable. - Add a location, e.g. https://meet.opensuse.org/suse_qa_tools - Don't worry if any of the details might change - you can update the invitation later and participants will be notified. - See the respective meeting for regular actions such as communication via chat ### Team The team is comprised of engineers from different teams, some only partially available: * Xiaojing Liu (Jane, [QA APAC 1](https://geekos.prv.suse.net/team/5b08104d7d795700204993df)) *github: Amrysliu* * Marius Kittler * Nick Singer * Sebastian Riedel (Part time contributions) * Oliver Kurz (Product Owner) * Tina Müller (Part time, [QEM3](https://geekos.prv.suse.net/team/5b7d24a17cf60423d2523485)) *tinita@Freenode, github: perlpunk* * Christian Dywan (Scrum Master, [QEM1](https://geekos.prv.suse.net/team/5b08104b7d795700204993d1)) *kalikiana@Freenode* * Ivan Lausuch ([QEM3](https://geekos.prv.suse.net/team/5b7d24a17cf60423d2523485)) * Ondřej Súkup (dedicated work areas) * Jan Baier (Part time) * Vasileios Anastasiadis (Bill) - Temporarily away to assist the QE-CORE team for April 2021 ### Onboarding for new joiners * Request to get added to the [tools team on GitHub](https://github.com/orgs/os-autoinst/teams/tools-team) * Login at [stats.openqa-monitor.qa.suse.de](https://stats.openqa-monitor.qa.suse.de/alerting/list) with LDAP credentials and ask to be given the *editor* role * [Watch](https://progress.opensuse.org/watchers/watch?object_id=347&object_type=wiki_page) this wiki page * Subscribe to [osd-admins@suse.de](http://mailman.suse.de/mailman/listinfo/osd-admins), [openqa@suse.de](http://mailman.suse.de/mailman/listinfo/openqa) and [opensuse-factory@opensuse.org](https://lists.opensuse.org/archives/list/factory@lists.opensuse.org) * Join [qa-tools on Rocket](https://chat.suse.de/channel/qa-tools) * Request to join [devel:openQA on OBS](https://build.opensuse.org/project/show/devel:openQA) * Ready an IRC bouncer for `#opensuse-factory` on *Freenode*, such as [Element.io](https://app.element.io/#/room/%23freenode_%23opensuse-factory:matrix.org) * Request admin access on [osd](http://openqa.suse.de/) and [o3](http://openqa.opensuse.org/) * Request to get added to the [QA project](https://progress.opensuse.org/projects/qa/settings/members) and *enable notifications for the openQA project* in [your account settings](https://progress.opensuse.org/my/account) * Add your ssh key to gitlab.suse.de/openqa/salt-pillars-openqa with a merge request * Ask an existing admin, e.g. other members of the team, to add your username and ssh key to o3 * Ensure you are subscribed to all projects referenced in https://progress.opensuse.org/projects/qa/wiki#Common-tasks-for-team-members ### Alert handling #### Best practices * "if it hurts, do it more often": https://www.martinfowler.com/bliki/FrequencyReducesDifficulty.html * Reduce [Mean-time-to-Detect (MTTD)](https://searchitoperations.techtarget.com/definition/mean-time-to-detect-MTTD) and [Mean-time-to-Recovery](https://raygun.com/blog/what-is-mttr/) #### Process * React on any alert * For each failing grafana alert * Create a ticket for the issue (with a tag "alert"; create ticket unless the alert is trivial to resolve and needs no improvement; even create a ticket if alerts turn to "ok" to prevent these issues in the future and to improve the alter) * Link the corresponding grafana panel in the ticket * Respond to the notification email with a link to the ticket * Optional: Inform in chat * Optional: Add "annotation" in corresponding grafana panel with a link to the corresponding ticket * Pause the alert if you think further alerting the team does not help (e.g. you can work on fixing the problem, alert is non-critical but problem can not be fixed within minutes) * If you consider an alert non-actionable then change it accordingly * If you do not know how to handle an alert ask the team for help * After resolving the issue add explanation in ticket, unpause alert and verify it going to "ok" again, resolve ticket #### References * https://nl.devoteam.com/en/blog-post/monitoring-reduce-mean-time-recovery-mttr/ ### Extra-ordinary "hack-week" 2020-W51 SUSE QE Tools plans to have an internal "hack-week": Condition: We close 30 tickets from our backlog within the time frame 2020-12-03 until 2020-12-11 start of weekly meeting. No cheating! :) See [this query](https://progress.opensuse.org/issues?utf8=%E2%9C%93&set_filter=1&sort=priority%3Adesc%2Cid%3Adesc&f%5B%5D=status_id&op%5Bstatus_id%5D=c&f%5B%5D=fixed_version_id&op%5Bfixed_version_id%5D=%3D&v%5Bfixed_version_id%5D%5B%5D=418&f%5B%5D=closed_on&op%5Bclosed_on%5D=%3E%3C&v%5Bclosed_on%5D%5B%5D=2020-12-03&v%5Bclosed_on%5D%5B%5D=2020-12-11&f%5B%5D=&c%5B%5D=subject&c%5B%5D=project&c%5B%5D=status&c%5B%5D=assigned_to&c%5B%5D=relations&c%5B%5D=priority&c%5B%5D=category&c%5B%5D=cf_16&group_by=status&t%5B%5D=). During week 2020-W51 everyone is allowed to work on any hack-week project, it should just have a reasonable, "explainable" connection to our normal work. okurz volunteers to take over ops-duty for the week. Result during meeting 2020-12-11: We missed the goal (by a slight amount) but we are motivated to try again in the next year :) Everybody, put some easy tickets aside for the next time! ### Extra-ordinary "hack-week" 2021-W8 Similar as our attempt for 2020-W51 with same rules, except condition: We close 30 tickets from our backlog within the time frame 2021-02-05 until 2021-02-19 start of weekly meeting. No cheating! See [this query](https://progress.opensuse.org/issues?utf8=%E2%9C%93&set_filter=1&sort=priority%3Adesc%2Cid%3Adesc&f%5B%5D=status_id&op%5Bstatus_id%5D=c&f%5B%5D=fixed_version_id&op%5Bfixed_version_id%5D=%3D&v%5Bfixed_version_id%5D%5B%5D=418&f%5B%5D=closed_on&op%5Bclosed_on%5D=%3E%3C&v%5Bclosed_on%5D%5B%5D=2021-02-05&v%5Bclosed_on%5D%5B%5D=2021-02-19&f%5B%5D=&c%5B%5D=subject&c%5B%5D=project&c%5B%5D=status&c%5B%5D=assigned_to&c%5B%5D=relations&c%5B%5D=priority&c%5B%5D=category&c%5B%5D=cf_16&group_by=status&t%5B%5D=). Result during meeting 2021-02-19: We missed the goal (25/30 tickets resolved) but again we are open to try again, maybe after next SUSE hack week. ### Historical Previously the former QA tools team used target versions "Ready" (to be planned into individual milestone periods or sprints), "Current Sprint" and "Done". However the team never really did use proper time-limited sprints so the distinction was rather vague. After having tickets "Resolved" after some time the PO or someone else would also update the target version to "Done" to signal that the result has been reviewed. This was causing a lot of ticket update noise for not much value considering that the [Definition-of-Done](https://progress.opensuse.org/projects/openqav3/wiki/#ticket-workflow) when properly followed already has rather strict requirements on when something can be considered really "Resolved" hence the team eventually decided to not use the "Done" target version anymore. Since about 2019-05 (and since okurz is doing more backlog management) the team uses priorities more as well as the status "Workable" together with an explicit team member list for "What the team is working on" to better visualize what is making team members busy regardless of what was "officially" planned to be part of the team's work. So we closed the target version. On 2020-07-03 okurz subsequently closed "Current Sprint" as also this one was in most cases equivalent to just picking an assignee for a ticket or setting to "In Progress". We can just distinguish between "(no version)" meaning untriaged, "Ready" meaning tools team should consider picking up these issues and "future" meaning that there is no plan for this to be picked up. Everything else is defined by status and priority. In 2020-10-27 we discussed together to find out the history of the team. We clarified that the team started out as a not well defined "Dev+Ops" team. "team responsibilities" have been mainly unchanged since at least beginning of 2019. We agreed that learning from users and production about our "Dev" contributions is good, so this part of "Ops" is responsibility of everyone. Also see #73060 for more details about how the responsibilities were setup. ## Change announcements For new, cool features or disruptive changes consider providing according notifications to our common userbase as well as potential future users, for example create post on opensuse-factory@opensuse.org , link to post on openqa@suse.de , invite for workshop, post on one.suse.com, #opensuse-factory (IRC) (irc://chat.freenode.net/opensuse-factory), [#testing (RC)](https://chat.suse.de/testing) # QE Core and QE Yast - Team descriptions (this chapter has seen changes in 2020-11 regarding QSF -> QE Core / QE Yast change) **QE Core** (formerly QSF, QA SLE Functional) and **QE Yast** are squads focusing on Quality Engineering of the core and yast functionality of the SUSE SLE products. The squad is comprised of members of QE Integration - [SUSE QA SLE Nbg](https://wiki.suse.net/index.php/SUSE-Quality_Assurance/Organization/Members_and_Responsibilities#QA_SLE_NBG_Team), including [SUSE QA SLE Prg](https://wiki.suse.net/index.php/SUSE-Quality_Assurance/Organization/Members_and_Responsibilities#QA_SLE_PRG_Team) - and QE Maintenance people (formerly "QAM"). The [SLE Departement](https://wiki.suse.net/index.php/SUSE-Quality_Assurance/SLE_Department#QSF_.28QA_SLE_Functional.29) page describes our QA responsibilities. We focus on our automatic tests running in [openQA](https://openqa.suse.de) under the job groups "Functional" as well as "Autoyast" for the respective products, for example [SLE 15 / Functional](https://openqa.suse.de/group_overview/110) and [SLE 15 / Autoyast](https://openqa.suse.de/group_overview/129). We back our automatic tests with exploratory manual tests, especially for the product milestone builds. Additionally we care about corresponding openSUSE openQA tests (see as well https://openqa.opensuse.org). * long-term roadmap: http://s.qa.suse.de/qa-long-term * overview of current openQA SLE12SP5 tests with progress ticket references: https://openqa.suse.de/tests/overview?distri=sle&version=12-SP5&groupid=139&groupid=142 * fate tickets for SLE12SP5 feature testing: based on http://s.qa.suse.de/qa_sle_functional_feature_tests_sle12sp5 new report based on all tickets with milestone before SLE12SP5 GM, http://s.qa.suse.de/qa_sle_functional_feature_tests_sle15sp1 for SLE15SP1 * only "blocker" or "shipstopper" bugs on "interesting products" for SLE15 http://s.qa.suse.de/qa_sle_functional_bug_query_sle15_2, http://s.qa/qa_sle_bugs_sle12_2 for SLE12 * Better organization of planned work can be seen at the [SUSE QA](https://progress.opensuse.org/projects/suseqa) project (which is not public). ## Test plan When looking for coverage of certain components or use cases keep the [openQA glossary](http://open.qa/docs/#concept) in mind. It is important to understand that "tests in openQA" could be a scenario, for example a "textmode installation run", a combined multi-machine scenario, for example "a remote ssh based installation using X-forwarding", or a test module, for example "vim", which checks if the vim editor is correctly installed, provides correct rendering and basic functionality. You are welcome to contact any member of the team to ask for more clarification about this. In detail the following areas are tested as part of "SLE functional": * different hardware setups (UEFI, acpi) * support for localization * openSUSE: virtualization - some "virtualization" tests are active on o3 with reduced set compared to SLE coverage (on behalf of QA SLE virtualization due to team capacity constraints, clarified in QA SLE coordination meeting 2018-03-28) * openSUSE: migration - comparable to "virtualization", a reduced set compared to SLE coverage is active on o3 (on behalf of QA SLE migration due to team capacity constraints, clarified in QA SLE coordination meeting 2018-04) ### QE Yast Squad focuses on testing YaST components, including installer and snapper. Detailed test plan for SLES can be found here: [SLES_Integration_Level_Testplan.md](https://gitlab.suse.de/qsf-y/qa-sle-functional-y/blob/master/SLES_Integration_Level_Testplan.md) * Latest report based on openQA test results SLE12: http://s.qa.suse.de/test-status-sle12-yast , SLE15: http://s.qa.suse.de/test-status-sle15-yast ### QE Core "Testing is the future, and the future starts with you" * basic operations (firefox, zypper, logout/reboot/shutdown) * boot_to_snapshot * functional application tests (kdump, gpg, ipv6, java, git, openssl, openvswitch, VNC) * NIS (server, client) * toolchain (development module) * systemd * "transactional-updates" as part of the corresponding SLE server role, not CaaSP * Latest report based on openQA test results SLE12: http://s.qa.suse.de/test-status-sle12-functional , SLE15: http://s.qa.suse.de/test-status-sle15-functional ## In new organization also qovered by QE Core and others * quarterly updated media: former QA Maintenance (QAM) is now part of the various QE squads. However, QU media does happen together with Maintenance Coordination that is not part of these squads. ## What we do We collected opinions, personal experiences and preferences starting with the following four topics: What are fun-tasks ("new tests", "collaborate", "do it right"), what parts are annoying ("old & sporadic issues"), what do we think is expected from qsf-u ("be quick", "keep stuff running", "assess quality") and what we should definitely keep doing to prevent stakeholders becoming disappointed ("build validation", "communication & support"). ### How we work on our backlog * no "due date" * we pick up tickets that have not been previously discussed * more flexible choice * WIP-limits: * global limit of 10 tickets "In Progress" * target numbers or "guideline", "should be", in priorities: 1. New, untriaged: 0 2. Workable: 40 3. New, assigned to [qe-core] or [qe-yast]: ideally less than 200 (should not stop you from triaging) * SLAs for priority tickets - how to ensure to work on tickets which are more urgent? * "taken": <1d: immediate -> looking daily * 2-3d: urgent * first goal is "urgency removal": <1d: immediate, 1w: urgent * our current "cycle time" is 1h - 1y (maximum, with interruptions) * everybody should set priority + milestone in obvious cases, e.g. new reproducible test failures in multiple critical scenarios, in general case the PO decides ### How we like to choose our battles We self-assessed our tasks on a scale from "administrative" to "creative" and found in the following descending order: daily test review (very "administrative"), ticket triaging, milestone validation, code review, create needles, infrastructure issues, fix and cleanup tests, find bugs while fixing failing tests, find bugs while designing new tests, new automated tests (very "creative"). Then we found we appreciate if our work has a fair share of both sides. Probably a good ratio is 60% creative plus 40% administrative tasks. Both types have their advantages and we should try to keep the healthy balance. ### What "product(s)" do we (really) *care* about? Brainstorming results: * openSUSE Krypton -> good example of something that we only remotely care about or not at all even though we see the connection point, e.g. test plasma changes early before they reach TW or Leap as operating systems we rely on or SLE+packagehub which SUSE does not receive direct revenue from but indirect benefit. Should be "community only", that includes members from QSF though * openQA -> (like OBS), helps to provide ROI for SUSE * SLE(S) (in development versions) * Tumbleweed * Leap, because we use it * SLES HA * SLE migration * os-autoinst-distri-opensuse+backend+needles From this list strictly no "product" gives us direct revenue however most likely SLE(S) (as well as SLES HA and SLE migration) are good examples of direct connection to revenue (based on SLE subscriptions). Conducting a poll in the team has revealed that 3 persons see "SLE(S)" as our main product and 3 see "os-autoinst-distri-opensuse+backend+needles" as the main product. We mainly agreed that however we can not *own* a product like "SLE" because that product is mainly not under our control. Visualizing "cost of testing" vs. "risk of business impact" showed that both metrics have an inverse dependency, e.g. on a range from "upstream source code" over "package self-tests", "openSUSE Factory staging", "Tumbleweed", "SLE" we consider SLE to have the highest business risk attached and therefore defines our priority however testing at upstream source level is considered most effective to prevent higher cost of bugs or issues. Our conclusion is that we must ensure that the high-risk SLE base has its quality assured while supporting a quality assurance process as early as possible in the development process. package self-tests as well as the openQA staging tests are seen as a useful approach in that direction as well as "domain specfic specialist QA engineers" working closely together with according in-house development parties. ## Documentation This documentation should only be interesting for the team QA SLE functional. If you find that some of the following topics are interesting for other people, please extract those topics to another wiki section. ### QA SLE functional Dashboards In room 3.2.15 from Nuremberg office are two dedicated laptops each with a monitor attached showing a selected overview of openQA test resuls with important builds from SLE and openSUSE. Such laptops are configured with a root account with the default password for production machines. First point of contact: [slindomansilla.suse.com](mailto:slindomansilla@suse.com), (okurz@suse.de)[mailto:okurz@suse.de] * ''dashboard-osd-3215.suse.de'': Showing current view of openqa.suse.de filtered for some job group results, e.g. "Functional" * ''dashboard-o3-3215.suse.de'': Showing current view of openqa.opensuse.org filtered for some job group results which we took responsibility to review and are mostly interested in ### dashboard-osd-3215 * OS: openSUSE Tumbleweed * Services: ssh, mosh, vnc, x2x * Users: ** root ** dashboard * VNC: `vncviewer dashboard-osd-3215` * X2X: `ssh -XC dashboard@dashboard-osd-3215 x2x -west -to :0.0` ** (attaches the dashboard monitor as an extra display to the left of your screens. Then move the mouse over and the attached X11 server will capture mouse and keyboard) #### Content of /home/dashboard/.xinitrc ``` # # Source common code shared between the # X session and X init scripts # . /etc/X11/xinit/xinitrc.common xset -dpms xset s off xset s noblank [...] # # Add your own lines here... # $HOME/bin/osd_dashboard & ``` #### Content of /home/dashboard/bin/osd_dashboard ``` #!/bin/bash DISPLAY=:0 unclutter & DISPLAY=:0 xset -dpms DISPLAY=:0 xset s off DISPLAY=:0 xset s noblank url="${url:-"https://openqa.suse.de/?group=SLE+15+%2F+%28Functional%7CAutoyast%29&default_expanded=1&limit_builds=3&time_limit_days=14&show_tags=1&fullscreen=1#"}" DISPLAY=:0 chromium --kiosk "$url" ``` #### Cron job: ``` Min H DoM Mo DoW Command * * * * * /home/dashboard/bin/reload_chromium ``` #### Content of /home/dashboard/bin/reload_chromium ``` #!/bin/bash DISPLAY=:0 xset -dpms DISPLAY=:0 xset s off DISPLAY=:0 xset s noblank DISPLAY=:0 xdotool windowactivate $(DISPLAY=:0 xdotool search --class Chromium) DISPLAY=:0 xdotool key F5 DISPLAY=:0 xdotool windowactivate $(DISPLAY=:0 xdotool getactivewindow) ``` #### Issues: * ''When the screen shows a different part of the web page'' ** a simple mouse scroll through vnc or x2x may suffice. * ''When the builds displayed are freeze without showing a new build, it usually means that midori, the browser displaying the info on the screen, crashed.'' ** you can try to restart midori this way: *** ps aux | grep midori *** kill $pid *** /home/dashboard/bin/osd_dashboard ** If this also doesn't work, restart the machine. ### dashboard-o3 * Raspberry Pi 3B+ * IP: `10.160.65.207` #### Content of /home/tux/.xinitrc ``` #!/bin/bash unclutter & openbox & xset s off xset -dpms sleep 5 url="https://openqa.opensuse.org?group=openSUSE Tumbleweed\$|openSUSE Leap [0-9]{2}.?[0-9]*\$|openSUSE Leap.\*JeOS\$|openSUSE Krypton|openQA|GNOME Next&limit_builds=2&time_limit_days=14&&show_tags=1&fullscreen=1#build-results" chromium --kiosk "$url" & while sleep 300 ; do xdotool windowactivate $(xdotool search --class Chromium) xdotool key F5 xdotool windowactivate $(xdotool getactivewindow) done ``` #### Content of /usr/share/lightdm/lightdm.conf.d/50-suse-defaults.conf ``` [Seat:*] pam-service = lightdm pam-autologin-service = lightdm-autologin pam-greeter-service = lightdm-greeter xserver-command=/usr/bin/X session-wrapper=/etc/X11/xdm/Xsession greeter-setup-script=/etc/X11/xdm/Xsetup session-setup-script=/etc/X11/xdm/Xstartup session-cleanup-script=/etc/X11/xdm/Xreset autologin-user=tux autologin-timeout=0 ```