action #121378
openload_templates sometimes fails with "unknown error code", then works after a while
0%
Description
On our Fedora deployments, I often notice that a run of load_templates
fails with:
unknown error code - host localhost unreachable? at /usr/share/openqa/script/load_templates line 114.
if I run it again, it'll often fail the same way, but after a few tries, it'll suddenly work fine. There doesn't seem to be any rhyme or reason to when this happens, and I don't see anything in the logs.
If I edit load_templates
to dump the response before dying with that rather unhelpful message, I get this:
do {
my $a = bless({
content => bless({
asset => bless({ auto_upgrade => 1 }, "Mojo::Asset::Memory"),
events => { read => [sub { ... }] },
headers => bless({ headers => {} }, "Mojo::Headers"),
read => 'fix',
}, "Mojo::Content::Single"),
error => { message => "Premature connection close" },
events => {},
finished => 2,
json => undef,
state => "finished",
}, "Mojo::Message::Response");
$a->{content}{read} = $a->{content}{events}{read}[0];
$a;
}
so the error seems to be "Premature connection close".
Updated by okurz almost 2 years ago
- Category set to Regressions/Crashes
- Target version set to future
So I assume a valid workaround is to retry?
Updated by livdywan almost 2 years ago
AdamWill wrote:
unknown error code - host localhost unreachable? at /usr/share/openqa/script/load_templates line 114.
if I run it again, it'll often fail the same way, but after a few tries, it'll suddenly work fine. There doesn't seem to be any rhyme or reason to when this happens, and I don't see anything in the logs.
It sounds to me like the server is excibiting a problem and this is the symptom. Would be good if you could check the logs from the web UI around that time, see if there's any errors.
Updated by AdamWill almost 2 years ago
Forgot to mention that - I did check the server logs and there are no errors. I also have tried just loading regular pages when I'm seeing the errors, and never had a problem doing that.
Yes, retrying eventually makes it work, but it can be a problem when e.g. our ansible scripts automatically run a template load (and crash out if it fails), or just if I do it manually and walk away without noticing it failed.
Updated by AdamWill almost 2 years ago
For e.g., I just hit this bug twice in a row on our prod instance. I loaded the homepage in a browser - no problem. Checked the openQA server logs - no messages since five minutes ago, long before I hit the problem. Nothing in the Apache error logs either.