coordination #40199: [EPIC] Better rollback capabilities of (worker) deployments - openQA Project (public) - openSUSE Project Management Tool

Actions

Copy link

coordination #40199

closed

[EPIC] Better rollback capabilities of (worker) deployments

Added by okurz over 6 years ago. Updated over 4 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

okurz

Category:

Feature requests

Target version:

QA (public) - future

Start date:

2018-08-23

Due date:

% Done:

Estimated time:

Description

As an outcome of #39743 we learned Complete deployment rollbacks for the whole infrastructure would be nice (including openQA packages, database and test settings, system packages on both web UI as well as workers) but there will always be factors which are changing outside our control

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by okurz over 6 years ago

Related to action #39743: [o3][tools] o3 unusable, often responds with 504 Gateway Time-out added

Actions

Copy link

Updated by coolo over 6 years ago

I have my doubts that this is a reasonable request. Wishful thinking might lead to this, but IMO you need quite a deployment team to roll something like this. And on top of that: I don't think the deployment strategy is part of the 'openQA Project'.

Actions

Copy link

Updated by coolo over 6 years ago

Subject changed from [tools] Better rollback capabilities of deployments to [EIPC] Better rollback capabilities of (worker) deployments
Target version set to future

For the webui I have no good idea - but for the workers we could complement salt with workers reinstalling a defined state on boot. This is still a huge task - and work intensive to maintain, so I'm not really sure we should invest there.

Actions

Copy link

Updated by okurz over 6 years ago

Subject changed from [EIPC] Better rollback capabilities of (worker) deployments to [EPIC] Better rollback capabilities of (worker) deployments

I guess you meant "EPIC" instead of "EIPC" ;) You are loosing some part of the original idea when you restrict it with "(worker)" and not cover the web UI part anymore. I am with you that this is no easy "let's hack some perl" task but still I see it as feasible. And isn't this basically also a business case we sell to customers? At least on feasible – albeit maybe not the best – approach to reach the goal of the (original) ticket description would be:

Use btrfs with snapshots on / for each machine (done for workers, missing for webui)
Only ever upgrade the webui together with a full database dump saved just before the upgrade (script or salt should work)
Train dry-runs with all involved admins of the "worst case scenarios" to have them less scared and reduce the recovery time in case of emergencies
Optional: Save RPM files used for installation on both webui + worker elsewhere to be able to go back to or automatic maintenance requests for tested packages based on openQA-in-openQA which makes sure that older versions of package are saved "automatically" but probably the openQA updates are too heavy for the maintenance workflow

Actions

Copy link

Updated by okurz almost 6 years ago

Category changed from 168 to Feature requests

Actions

Copy link

Updated by okurz over 5 years ago

so for o3 what works quite well is to have transactional server worker hosts and for the o3 webui host keep packages from devel:openQA repos, a simple keeppackages=1 in the .repo files. We commonly save a database dump when we update the webui host so that part is also covered. And also we have automation for the complete o3 upgrade and getting nearer with it on osd as well.

Actions

Copy link