Project

General

Profile

action #129065

Updated by okurz 12 months ago

## Observation 
 https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?orgId=1&from=1683722604024&to=1683725326412&viewPanel=78 alerted on 2023-05-10 15:07 CEST 

 ## Suggestions 
 * Look into the timeframe 
 https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?orgId=1&from=1683723624920&to=1683724305517 and compare to other panels on OSD if it's visible what made the system busy DONE: nothing too unusual. Maybe a little high IO times but far from concerning 
 * @okurz suggested in https://suse.slack.com/archives/C02AJ1E568M/p1683724668733689?thread_ts=1683724103.321589&cid=C02AJ1E568M that it might be caused by something we don't collect metrics from - brainstorm what these could be, implement metrics for them 
   * Open network connections - nsinger observed peaks of >2k, ~75% of them related to httpd-prefork, ~20% to openqa-websocket 
   
 * > (Nick Singer) I'm currently logged into OSD. CPU utilization is quite high with a longterm load of 12 and shortterm of ~14 with only 12 cores on OSD. velociraptor goes up to 200% and is in general quite high in the process list but also telegraf and obviously openqa itself. 
     > (Oliver Kurz) all of that sounds fine. When the HTTP response was high I just took a look and the CPU usage was near 0 same as we suspected in the past. Remember our debugging on why qanet is slow? Comparable to that but here it's likely apache, number of concurrent connections, something like that 

 * Take https://suse.slack.com/archives/C02CANHLANP/p1683723956965209 into account - is there something we can do to improve this situation? 

 > (Joaquin Rivera) is OSD also slow for someone else? (edited)  
 > (Fabian Vogt) That might be partially because of the yast2_nfs_server jobs for investigation. You might want to delete them now that they did their job. (e.g. https://openqa.suse.de/tests/11085729. Don't open, might crash your browser...). those jobs are special. serial_terminal has some race condition so they hammer enter_cmd + assert_script_run in a loop until it fails

Back