Project

General

Profile

Actions

action #150908

closed

o3 "Unable to fetch build results" and "Internal server error" on some pages size:M

Added by okurz about 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Start date:
2023-11-15
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Observation

From https://suse.slack.com/archives/C02CANHLANP/p1700046210637459

(Dominique Leuenberger) Seems O3 went back into the same fail state as we had seen yesterday:
Unable to fetch build results

and https://openqa.opensuse.org/tests/3727497 says "Internal server error" but not more details. Couldn't find anything obvious in journalctl -u openqa-webui

Rollback actions

  • Re-enable openqa-auto-update on o3 again

Related issues 2 (0 open2 closed)

Related to openQA Infrastructure (public) - action #150845: openqaworker-arm22 broken due to packages automatically removed size:MResolvedmkittler2023-11-142023-11-29

Actions
Related to openQA Infrastructure (public) - action #151013: o3 yielding "502 Bad Gateway" from nginx 2023-11-19, why was the config overwritten? size:MResolvedtinita2023-11-19

Actions
Actions #1

Updated by livdywan about 1 year ago

https://status.opensuse.org/ says it's all working. I assume that needs to be fixed?

Actions #2

Updated by tinita about 1 year ago

  • Status changed from New to In Progress
  • Assignee set to tinita
Actions #3

Updated by tinita about 1 year ago

from /var/log/openqa:

[2023-11-15T11:11:50.988143Z] [debug] [pid:28037] Updating seen of worker 982 from worker_status (free)
[2023-11-15T11:11:50.995202Z] [warn] [pid:28037] Unable to verify whether worker 982 runs its job(s) as expected: DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: DBD::Pg::st execute failed: ERROR:  column me.backend_info does not exist
LINE 1: ...result, me.reason, me.clone_id, me.blocked_by_id, me.backend...
                                                             ^ [for Statement "SELECT me.id, me.result_dir, me.archived, me.state, me.priority, me.result, me.reason, me.clone_id, me.blocked_by_id, me.backend_info, me.TEST, me.DISTRI, me.VERSION, me.FLAVOR, me.ARCH, me.BUILD, me.MACHINE, me.group_id, me.assigned_worker_id, me.t_started, me.t_finished, me.logs_present, me.passed_module_count, me.failed_module_count, me.softfailed_module_count, me.skipped_module_count, me.externally_skipped_module_count, me.scheduled_product_id, me.result_size, me.t_created, me.t_updated FROM jobs me WHERE ( ( me.assigned_worker_id = ? AND t_finished IS NULL ) ) ORDER BY t_created DESC" with ParamValues: 1='982'] at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/Workers.pm line 245

[2023-11-15T11:11:51.094791Z] [debug] [pid:28037] Updating seen of worker 896 from worker_status (free)
[2023-11-15T11:11:51.102115Z] [warn] [pid:28037] Unable to verify whether worker 896 runs its job(s) as expected: DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: DBD::Pg::st execute failed: ERROR:  column me.backend_info does not exist
LINE 1: ...result, me.reason, me.clone_id, me.blocked_by_id, me.backend...
                                                             ^ [for Statement "SELECT me.id, me.result_dir, me.archived, me.state, me.priority, me.result, me.reason, me.clone_id, me.blocked_by_id, me.backend_info, me.TEST, me.DISTRI, me.VERSION, me.FLAVOR, me.ARCH, me.BUILD, me.MACHINE, me.group_id, me.assigned_worker_id, me.t_started, me.t_finished, me.logs_present, me.passed_module_count, me.failed_module_count, me.softfailed_module_count, me.skipped_module_count, me.externally_skipped_module_count, me.scheduled_product_id, me.result_size, me.t_created, me.t_updated FROM jobs me WHERE ( ( me.assigned_worker_id = ? AND t_finished IS NULL ) ) ORDER BY t_created DESC" with ParamValues: 1='896'] at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/Workers.pm line 245
Actions #4

Updated by okurz about 1 year ago

  • Description updated (diff)
  • Priority changed from Immediate to Urgent

packages were downgraded due to the system unable to reach download.opensuse.org. I tried with

sed -i 's/http:/https:/g' /etc/zypp/repos.d/*.repo && zypper ref

but same problem.

Working on this with tina

zypper --no-refresh in --oldpackage --allow-vendor-change /var/cache/zypp/packages/devel_openQA/x86_64/openQA-*-4.6.1699952945.e6799a9-lp155.6163.1.x86_64.rpm /var/cache/zypp/packages/devel_openQA_Leap/noarch/perl-Mojolicious-9.340.0-lp154.2.1.noarch.rpm

and to retrigger according incomplete jobs with openqa-advanced-retrigger-jobs. openQA back in action.

Actions #5

Updated by okurz about 1 year ago

  • Related to action #150845: openqaworker-arm22 broken due to packages automatically removed size:M added
Actions #6

Updated by tinita about 1 year ago

  • Tags changed from o3 to o3, infra
Actions #7

Updated by tinita about 1 year ago

We force merged https://github.com/os-autoinst/openQA/pull/5361 which fixes the autoupdate issue, so after that we shouldn't see unwanted downgrades anymore.

I will monitor package build and publishing and then enable autoupdate again

Actions #8

Updated by tinita about 1 year ago

new packages arrived at http://download.opensuse.org/repositories/devel:/openQA/15.5/x86_64/
Started openqa-auto-update service, it's currently updating a lot of packages

Actions #9

Updated by tinita about 1 year ago

  • Status changed from In Progress to Feedback

finished. I did systemctl start openqa-auto-update.service which ran the autoupdate, but I can't enable the service:

systemctl enable openqa-auto-update.service
The unit files have no installation config (WantedBy=, RequiredBy=, Also=,
Alias= settings in the [Install] section, and DefaultInstance= for template
units). This means they are not meant to be enabled using systemctl.

Maybe we should have disabled/enabled the .timer instead?

webui running fine again.

Actions #10

Updated by mkittler about 1 year ago

Yes, you only need to enable --now the timer unit.

EDIT: I see you have already done that now. So this ticket can supposedly be considered resolved.

Actions #11

Updated by tinita about 1 year ago

I wonder why we didn't get a notification from logwarn. We also had this issue yesterday for about 15 minutes.

[2023-11-14T13:14:52.868112Z] [error] [ss5maBT9dSgc] DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: DBD::Pg::st execute failed: ERROR:  column me.backend_info does not exist
...
[2023-11-14T13:30:09.810755Z] [error] [7QU0oTvJCIqm] DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: DBD::Pg::st execute failed: ERROR:  column me.backend_info does not e
xist
Actions #12

Updated by okurz about 1 year ago

  • Priority changed from Urgent to High

Reducing prio as the urgency was resolved with all your actions, thanks! So, resolve or do you want to look into the logwarn errors?

Actions #13

Updated by tinita about 1 year ago

I checked with a test logfile if logwarn would report the lines, and it does, and I don't see any errors from cron in the root mailbox, so currently I'm out of ideas why it didn't report...

Actions #14

Updated by livdywan about 1 year ago

  • Subject changed from o3 "Unable to fetch build results" and "Internal server error" on some pages to o3 "Unable to fetch build results" and "Internal server error" on some pages size:M
  • Status changed from Feedback to Resolved

Everything works as intended. Awesome!

Actions #15

Updated by tinita about 1 year ago

  • Related to action #151013: o3 yielding "502 Bad Gateway" from nginx 2023-11-19, why was the config overwritten? size:M added
Actions

Also available in: Atom PDF