Project

General

Profile

action #78127

follow-up to #73633 - lessons learned and suggestions

Added by okurz 2 months ago.

Status:
Workable
Priority:
Normal
Assignee:
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Suggestions from #73633#note-30

  • A passive performance measurement regarding throughput on interfaces
  • whenever we apply changes to the infrastructure we should have a ticket
  • Whenever creating any external ticket, e.g. EngInfra, create internal tracker ticket. Because there might be more internal notes
  • Same as in OSD deployment we should look for failed grafana
  • Collect all the information between "last good" and "first bad" and then also find the git diff in openqa/salt-states-openqa
  • Apply proper "scientific method" with written down hypotheses, experiments and conclusions in tickets, follow https://progress.opensuse.org/projects/openqav3/wiki#Further-decision-steps-working-on-test-issues
  • Keep salt states to describe what should not be there
  • Try out older btrfs snapshots in systems for crosschecking and boot with disabled salt. In the kernel cmdline append systemd.mask=salt-minion.service
  • team should conduct a work backlog check on a daily base
  • nsinger does not mind if someone else provides a suggestion or takes over the ticket

Related issues

Copied from openQA Infrastructure - action #73633: OSD partially unresponsive, triggering 500 responses, spotty response visible in monitoring panels but no alert triggered (yet)Resolved2020-10-202020-11-17

History

#1 Updated by okurz 2 months ago

  • Copied from action #73633: OSD partially unresponsive, triggering 500 responses, spotty response visible in monitoring panels but no alert triggered (yet) added

Also available in: Atom PDF