Project

General

Profile

Actions

action #78127

closed

follow-up to #73633 - lessons learned and suggestions

Added by okurz about 4 years ago. Updated almost 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Suggestions from #73633#note-30

  • A passive performance measurement regarding throughput on interfaces
  • whenever we apply changes to the infrastructure we should have a ticket
  • Whenever creating any external ticket, e.g. EngInfra, create internal tracker ticket. Because there might be more internal notes
  • Same as in OSD deployment we should look for failed grafana
  • Collect all the information between "last good" and "first bad" and then also find the git diff in openqa/salt-states-openqa
  • Apply proper "scientific method" with written down hypotheses, experiments and conclusions in tickets, follow https://progress.opensuse.org/projects/openqav3/wiki#Further-decision-steps-working-on-test-issues
  • Keep salt states to describe what should not be there
  • Try out older btrfs snapshots in systems for crosschecking and boot with disabled salt. In the kernel cmdline append systemd.mask=salt-minion.service
  • team should conduct a work backlog check on a daily base
  • nsinger does not mind if someone else provides a suggestion or takes over the ticket

Related issues 1 (0 open1 closed)

Copied from openQA Infrastructure (public) - action #73633: OSD partially unresponsive, triggering 500 responses, spotty response visible in monitoring panels but no alert triggered (yet)Resolvednicksinger2020-10-202020-11-17

Actions
Actions #1

Updated by okurz about 4 years ago

  • Copied from action #73633: OSD partially unresponsive, triggering 500 responses, spotty response visible in monitoring panels but no alert triggered (yet) added
Actions #2

Updated by livdywan almost 4 years ago

I feel like this is not Workable since it's got all of the suggestions we came up with but it's not clear what result we expect yet i.e. ACs. And I'm not sure why it has an assignee... @okurz did you mean to define the ACs? Or maybe we should have another call to do that?

Actions #3

Updated by okurz almost 4 years ago

hm, actually I think all except the first point could be moved to the wiki as "best practices" or "good to know" as is.

Actions

Also available in: Atom PDF