Project

General

Profile

Actions

action #121378

open

load_templates sometimes fails with "unknown error code", then works after a while

Added by AdamWill over 1 year ago. Updated over 1 year ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Regressions/Crashes
Target version:
Start date:
2022-12-02
Due date:
% Done:

0%

Estimated time:

Description

On our Fedora deployments, I often notice that a run of load_templates fails with:

unknown error code - host localhost unreachable? at /usr/share/openqa/script/load_templates line 114.

if I run it again, it'll often fail the same way, but after a few tries, it'll suddenly work fine. There doesn't seem to be any rhyme or reason to when this happens, and I don't see anything in the logs.

If I edit load_templates to dump the response before dying with that rather unhelpful message, I get this:

do {
  my $a = bless({
    content  => bless({
                  asset   => bless({ auto_upgrade => 1 }, "Mojo::Asset::Memory"),
                  events  => { read => [sub { ... }] },
                  headers => bless({ headers => {} }, "Mojo::Headers"),
                  read    => 'fix',
                }, "Mojo::Content::Single"),
    error    => { message => "Premature connection close" },
    events   => {},
    finished => 2,
    json     => undef,
    state    => "finished",
  }, "Mojo::Message::Response");
  $a->{content}{read} = $a->{content}{events}{read}[0];
  $a;
}

so the error seems to be "Premature connection close".

Actions #1

Updated by okurz over 1 year ago

  • Category set to Regressions/Crashes
  • Target version set to future

So I assume a valid workaround is to retry?

Actions #2

Updated by livdywan over 1 year ago

AdamWill wrote:

unknown error code - host localhost unreachable? at /usr/share/openqa/script/load_templates line 114.

if I run it again, it'll often fail the same way, but after a few tries, it'll suddenly work fine. There doesn't seem to be any rhyme or reason to when this happens, and I don't see anything in the logs.

It sounds to me like the server is excibiting a problem and this is the symptom. Would be good if you could check the logs from the web UI around that time, see if there's any errors.

Actions #3

Updated by AdamWill over 1 year ago

Forgot to mention that - I did check the server logs and there are no errors. I also have tried just loading regular pages when I'm seeing the errors, and never had a problem doing that.

Yes, retrying eventually makes it work, but it can be a problem when e.g. our ansible scripts automatically run a template load (and crash out if it fails), or just if I do it manually and walk away without noticing it failed.

Actions #4

Updated by AdamWill over 1 year ago

For e.g., I just hit this bug twice in a row on our prod instance. I loaded the homepage in a browser - no problem. Checked the openQA server logs - no messages since five minutes ago, long before I hit the problem. Nothing in the Apache error logs either.

Actions

Also available in: Atom PDF