Project

General

Profile

coordination #99549

Updated by mkittler over 2 years ago

### motivation 
 It is not possible to simply extend the OSD VM by additional CPU cores (see #97943). However, the host is definitely overloaded. At least from time to time, e.g. when extraordinarily many jobs are scheduled, it is very slow and we see 503 responses. 

 It would be possible to request another VM, though. Therefore it would make sense to evaluate how the workload can be split onto multiple hosts. 

 ### options 
 We've already discussed the topic and found that there are multiple options. Some of them could be combined. The following list is not an order, I only used numbers for easier referencing: 

 1. Have an additional, completely independent openQA setup which would only share workers. 
     1. Advantage: It is as easy as it is to setup a new openQA instance. No further openQA features or special setup tweaks would be required. 
     2. Disadvantage: The split is user-visible and requires a high coordination with users. Possibly it is not wanted at all. 
 2. Allow executing certain Minion tasks on a different host. 
     1. It would likely not make sense for (cleanup) tasks which mainly cause filesystem load (and they'd just use the main VMs filesystem via NFS after all). 
     2. Not sure how well this use-case is supported by Minion in particular when we only can run a subset of tasks on a different host. 
 3. Run openQA web UI workers on the other host. 
     1. This would require sharing the storage, e.g. via NFS. 
     2. Or would it be possible to move only certain routes which do not rely on the storage? 
 4. Run additional services like scheduler, web socket server and livehandler on a different host. 
     1. We could likely get away without requiring access to the storage on the additional host. 
     2. Likely only a slight improvement. 
 5. Run the PostgreSQL service on a different host. 
     1. It would be interesting to monitor how much CPU usage the PostgreSQL database causes. 
     2. Telegraf queries via PostgreSQL could be moved to the database host as well. 

 ### suggestions 
 * Think about further options because I surely missed some. 
 * Evaluate certain options more closely. 
 * Strike out unfeasible options. 
 * Monitor the current resource utilization more closely, e.g. to determine how much CPU load the different services like PostgreSQL actually cause. 
 * Create sub tasks for further concrete actions. 

Back