QA &raquo; openQA Project

2024-03-12

Related to openQA Infrastructure - action #158059: OSD unresponsive or significantly slow for some minutes 2024-03-26 13:34Z

Resolved

Related to openQA Infrastructure - action #159396: Repeated HTTP Response alert for /tests and unresponsiveness due to potential detrimental impact of pg_dump (was: HTTP Response alert for /tests briefly going up to 15.7s) size:M

Feedback

2024-06-09

Related to openQA Infrastructure - action #160083: client gets a redirect and downloads an HTML page from microsoft instead of the proper windows .qcow2 image

Resolved

tinita

2024-05-08

Related to openQA Infrastructure - action #160171: [openQA][assets] Access to openQA assets forbidden auto_review:"Download.*curl.*error for.*http://openqa.suse.de/":retry size:S

Feedback

mkittler

2024-05-10

2024-05-27

Related to openQA Infrastructure - action #160239: [alert] External http responses Salt (https://openqa.suse.de/health) due to "Too many open files" after switch to nginx

Feedback

2024-05-12

2024-05-29

Copied from openQA Project - action #129490: high response times on osd - Try nginx on o3 with enabled load limiting or load balancing features

Resolved

kraih

Copied to openQA Project - action #159651: high response times on osd - nginx with enabled rate limiting features size:S

Workable

2024-04-26

Copied to openQA Infrastructure - action #160367: After switch to nginx on OSD let's investigate how system performance was impacted

Resolved

2024-05-14

Updated by livdywan 11 months ago

Copied from action #129490: high response times on osd - Try nginx on o3 with enabled load limiting or load balancing features added

Actions

Updated by okurz 11 months ago

Description updated (diff)

Actions

Updated by kraih 11 months ago

Description updated (diff)

Actions

Updated by kraih 11 months ago

During the openQA weekly we've talked about this ticket and consider it a good candidate for a mob session. Main problems to solve are Salt deployment and SSL configuration. As well as a simple way to rollback the deployment and use Apache again in case something goes wrong.

Actions

Updated by kraih 11 months ago

Description updated (diff)

Actions

Updated by okurz 11 months ago

We can prepare the deployment of nginx in parallel to apache, have it deployed and at any time decide when to switch by just disabling/enabling services accordingly. The deployment needs to consider dehydrated+nginx as well. We can switch OSD to nginx to gather realtime data before we suggest to use nginx as default in our openQA documentation and CI infrastructure.

Actions

Updated by okurz 2 months ago

Related to action #157081: OSD unresponsive or significantly slow for some minutes 2024-03-12 08:30Z added

Actions

Updated by okurz about 2 months ago

Related to action #158059: OSD unresponsive or significantly slow for some minutes 2024-03-26 13:34Z added

Actions

Updated by okurz 20 days ago

Tags set to infra
Target version changed from future to Ready

due to repeated issues with unresponsiveness we should give this more focus and bring it onto the backlog now.

Actions

#10

Updated by okurz 20 days ago

Copied to action #159651: high response times on osd - nginx with enabled rate limiting features size:S added

Actions

#11

Updated by jbaier_cz 20 days ago · Edited

Subject changed from high response times on osd - Try nginx on osd with enabled load limiting or load balancing features to high response times on osd - Try nginx on OSD size:S
Status changed from New to Workable

Actions

#12

Updated by jbaier_cz 20 days ago

Description updated (diff)

Actions

#13

Updated by mkittler 14 days ago

Status changed from Workable to In Progress
Assignee set to mkittler

Actions

#14

Updated by openqa_review 14 days ago

Due date set to 2024-05-17

Setting due date based on mean cycle time of SUSE QE Tools

Actions

#15

Updated by mkittler 13 days ago

It seems to generally work with the config I've already put on Slack: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1168

It uses different ports (the normal ports plus 1000). Therefore I also had to add service_port_delta = 0 to the config to make the live mode work (as it would otherwise assume a reverse-proxy-less development setup).

I'll test connecting a worker after lunch.

Actions

#16

Updated by mkittler 13 days ago · Edited

I connected a worker via

HOST=https://openqa.suse.de:1443
BACKEND=qemu
WORKER_CLASS=qemu_x86_64_poo130636

and it worked (registration, picking up a job and concluding it).

So I'll prepare a MR to switch ports which we can merge next week. EDIT: https://gitlab.suse.de/mkittler/salt-states-openqa/-/merge_requests/new?merge_request%5Bsource_branch%5D=nginx-for-real

Actions

#17

Updated by livdywan 13 days ago

I also went through the web UI just to see if anything stands out. Cloned a bunch of jobs, and it seems fine https://openqa.suse.de:1443/tests/overview?distri=sle&version=15-SP4&build=poo%23130636 - note that the port keeps being reset, even the output of openqa-clone-job --repeat 100 --within-instance https://openqa.suse.de/tests/14196034 _GROUP=0 BUILD=poo#130636 gave me URLs without a port so this may have made manual testing less relevant.

Actions

#18

Updated by mkittler 10 days ago

This test is in fact not really relevant as all of those tests probably just ran on a workers that connected via apache. But at least we know that there are no surprises with openqa-clone-job itself (if it actually honored the port).

I created https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1169 to use NGINX for real.

I'll have to make a break in two hours so currently there's not a big enough window for me to merge it. So I'll merge it when I get back or tomorrow.

Actions

#19

Updated by livdywan 9 days ago

https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1173 to get the same, faster asset handling as on o3

Actions

#20

Updated by mkittler 9 days ago

Description updated (diff)

We tried to use nginx in production but it didn't work; openqa prefork workers always quickly used lots of cpu and everything went very slow.

Maybe this helps: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1174

I was able to connect 200 local workers simultaneously with that via nginx/http and it didn't had any impact (which is good).

Actions

#21

Updated by okurz 8 days ago

Related to action #159396: Repeated HTTP Response alert for /tests and unresponsiveness due to potential detrimental impact of pg_dump (was: HTTP Response alert for /tests briefly going up to 15.7s) size:M added

Actions

#22

Updated by mkittler 8 days ago

Description updated (diff)

It looks good after https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1174 so I'll keep NGINX enabled. See my messages on Slack for details.

I also deleted my test workers from the OSD database.

Actions

#23

Updated by mkittler 8 days ago

Status changed from In Progress to Feedback

Actions

#24

Updated by mkittler 8 days ago · Edited

Even though implementing the monitoring is out of scope we should probably at least get rid of

openqa.suse.de:
    2024-05-08T11:24:18Z E! [inputs.apache] Error in plugin: http://localhost/server-status?auto returned HTTP status 404 Not Found
    2024-05-08T11:24:21Z E! [telegraf] Error running agent: input plugins recorded 1 errors

as it lets our pipelines fail.

EDIT: MRs:

Disable Apache monitoring: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1177
Draft for enabling NGINX monitoring: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1178

Actions

#25

Updated by jbaier_cz 7 days ago

Related to action #160083: client gets a redirect and downloads an HTML page from microsoft instead of the proper windows .qcow2 image added

Actions

https://openqa.suse.de/tests/14263283/settings/schedule/yast/maintenance/create_hdd_transactional_server_restapi.yaml

#26

Updated by rainerkoenig 7 days ago

We also encounter strange failures when clicking on links to YAML schedules in the settings tab. Example:

displays the following text instead of the YAML schedule:

File path: /var/lib/openqa/share/tests/sle/schedule/yast/maintenance/create_hdd_transactional_server_restapi.yaml
let mode; let path = document.getElementById('script').dataset.path; if (path && path.endsWith('.pm') || path.endsWith('.pl')) { mode = 'ace/mode/perl'; } var editor = ace.edit("script", { mode: mode, maxLines: Infinity, readOnly: true, }); editor.session.setUseWrapMode(true);

Actions

Fix: https://github.com/os-autoinst/openQA/pull/5631

#27

Updated by tinita 6 days ago

Oh, that seems to be a missing quote of the data-path. Looking at the source:

<div class="code" id="script" data-path="/var/lib/openqa/share/tests/sle/schedule/yast/maintenance/create_hdd_transactional_server_restapi.yaml>---
...
</div>

  <script type="text/javascript">
let mode;
let path = document.getElementById('script').dataset.path;
...
</script>

</div>

Can't see how that's related to the nginx though...

Actions

#28

Updated by mkittler 4 days ago · Edited

Ok, so nothing problematic came up besides https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1180 which has already been merged.

The two monitoring MRs have been merged as well. Of course Grafana still needs to be adjusted (as the 2nd MR only covers Telegraf) but we defined this out of scope for this ticket.

EDIT: Looks like https://progress.opensuse.org/issues/160171 is related, too.

Actions

#29

Updated by okurz 4 days ago

Related to action #160171: [openQA][assets] Access to openQA assets forbidden auto_review:"Download.*curl.*error for.*http://openqa.suse.de/":retry size:S added

Actions

#30

Updated by okurz 3 days ago

Related to action #160239: [alert] External http responses Salt (https://openqa.suse.de/health) due to "Too many open files" after switch to nginx added

Actions

#31

Updated by mkittler 2 days ago

Status changed from Feedback to Resolved

The switch to NGINX generally worked. We created follow-up tickets for some problems which came up.

For now it seems that NGINX provides good performance but maybe it is too soon to tell whether it is an improvement.

Actions