Project

General

Profile

Actions

action #78127

closed

follow-up to #73633 - lessons learned and suggestions

Added by okurz about 4 years ago. Updated almost 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Suggestions from #73633#note-30

  • A passive performance measurement regarding throughput on interfaces
  • whenever we apply changes to the infrastructure we should have a ticket
  • Whenever creating any external ticket, e.g. EngInfra, create internal tracker ticket. Because there might be more internal notes
  • Same as in OSD deployment we should look for failed grafana
  • Collect all the information between "last good" and "first bad" and then also find the git diff in openqa/salt-states-openqa
  • Apply proper "scientific method" with written down hypotheses, experiments and conclusions in tickets, follow https://progress.opensuse.org/projects/openqav3/wiki#Further-decision-steps-working-on-test-issues
  • Keep salt states to describe what should not be there
  • Try out older btrfs snapshots in systems for crosschecking and boot with disabled salt. In the kernel cmdline append systemd.mask=salt-minion.service
  • team should conduct a work backlog check on a daily base
  • nsinger does not mind if someone else provides a suggestion or takes over the ticket

Related issues 1 (0 open1 closed)

Copied from openQA Infrastructure (public) - action #73633: OSD partially unresponsive, triggering 500 responses, spotty response visible in monitoring panels but no alert triggered (yet)Resolvednicksinger2020-10-202020-11-17

Actions
Actions

Also available in: Atom PDF