Actions
action #39068
closedWebui killed by out of memory in o3 (triggered by postgresql)
Start date:
2018-08-01
Due date:
% Done:
0%
Estimated time:
Description
So, I just noticed that o3 webui was down, looking at the journal, there are a lot of the following messages:
Aug 01 16:03:35 ariel openqa[29052]: Use of uninitialized value $distri in string eq at template
Aug 01 16:03:35 ariel openqa[29052]: branding/openSUSE/external_reporting.html.ep line 103 (#1)
Aug 01 16:07:37 ariel openqa[29052]: (in cleanup) Can't call method "stream" on an undefined value at
Aug 01 16:07:37 ariel openqa[29052]: /usr/lib/perl5/vendor_perl/5.18.2/Mojo/RabbitMQ/Client.pm line 544 during global destruction (#2)
Aug 01 16:07:37 ariel openqa[29052]: (W misc) This prefix usually indicates that a DESTROY() method raised
Aug 01 16:07:37 ariel openqa[29052]: the indicated exception. Since destructors are usually called by the
Aug 01 16:07:37 ariel openqa[29052]: system at arbitrary points during execution, and often a vast number of
Aug 01 16:07:37 ariel openqa[29052]: times, the warning is issued only once for any number of failures that
Aug 01 16:07:37 ariel openqa[29052]: would otherwise result in the same message being repeated.
Aug 01 16:07:37 ariel openqa[29052]:
Aug 01 16:07:37 ariel openqa[29052]: Failure of user callbacks dispatched using the G_KEEPERR flag could
Aug 01 16:07:37 ariel openqa[29052]: also result in this warning. See "G_KEEPERR" in perlcall.
Aug 01 16:07:37 ariel openqa[29052]:
Aug 01 16:15:12 ariel openqa[29052]: DBIx::Class::Storage::DBI::_gen_sql_bind(): DateTime objects passed to search() are not supported properly (InflateColumn::DateTime formats and settings are not respected.) See ".. format a Dat
Aug 01 17:47:51 ariel openqa[29052]: (in cleanup) Can't call method "stream" on an undefined value at
Aug 01 17:47:51 ariel openqa[29052]: /usr/lib/perl5/vendor_perl/5.18.2/Mojo/RabbitMQ/Client.pm line 544 during global destruction (#1)
Aug 01 17:47:51 ariel openqa[29052]: (W misc) This prefix usually indicates that a DESTROY() method raised
Aug 01 17:47:51 ariel openqa[29052]: the indicated exception. Since destructors are usually called by the
Aug 01 17:47:51 ariel openqa[29052]: system at arbitrary points during execution, and often a vast number of
Aug 01 17:47:51 ariel openqa[29052]: times, the warning is issued only once for any number of failures that
Aug 01 17:47:51 ariel openqa[29052]: would otherwise result in the same message being repeated.
Aug 01 17:47:51 ariel openqa[29052]:
Aug 01 17:47:51 ariel openqa[29052]: Failure of user callbacks dispatched using the G_KEEPERR flag could
Aug 01 17:47:51 ariel openqa[29052]: also result in this warning. See "G_KEEPERR" in perlcall.
Aug 01 17:47:51 ariel openqa[29052]:
Indeed the openqa process was killed, and the app died because not being able to fork, I wonder if there's a leak?:
Aug 01 21:09:16 ariel openqa[29052]: Can't fork: Cannot allocate memory at
Aug 01 21:09:16 ariel openqa[29052]: /usr/lib/perl5/vendor_perl/5.18.2/Mojo/Server/Prefork.pm line 142 (#1)
Aug 01 21:09:16 ariel openqa[29052]: (F) A fatal error occurred while trying to fork while opening a
Aug 01 21:09:16 ariel openqa[29052]: pipeline.
Aug 01 21:09:16 ariel openqa[29052]:
Aug 01 21:09:16 ariel openqa[29052]: Uncaught exception from user code:
Aug 01 21:09:16 ariel openqa[29052]: Can't fork: Cannot allocate memory at /usr/lib/perl5/vendor_perl/5.18.2/Mojo/Server/Prefork.pm line 142.
Aug 01 21:09:16 ariel openqa[29052]: Mojo::Server::Prefork::_spawn('Mojo::Server::Prefork=HASH(0x9caabe0)') called at /usr/lib/perl5/vendor_perl/5.18.2/Mojo/Server/Prefork.pm line 100
Aug 01 21:09:16 ariel openqa[29052]: Mojo::Server::Prefork::_manage('Mojo::Server::Prefork=HASH(0x9caabe0)') called at /usr/lib/perl5/vendor_perl/5.18.2/Mojo/Server/Prefork.pm line 85
Aug 01 21:09:16 ariel openqa[29052]: Mojo::Server::Prefork::run('Mojo::Server::Prefork=HASH(0x9caabe0)') called at /usr/lib/perl5/vendor_perl/5.18.2/Mojolicious/Command/prefork.pm line 31
Aug 01 21:09:16 ariel openqa[29052]: Mojolicious::Command::prefork::run('Mojolicious::Command::prefork=HASH(0x9cae588)', '--proxy', '-i', 100, '-H', 400, '-w', 20, '-G', ...) called at /usr/lib/perl5/vendor_perl/5.18.2/Moj
Aug 01 21:09:16 ariel openqa[29052]: Mojolicious::Commands::run('Mojolicious::Commands=HASH(0x8a33ef0)', 'prefork', '-m', 'production', '--proxy', '-i', 100, '-H', 400, ...) called at /usr/lib/perl5/vendor_perl/5.18.2/Mojo
Aug 01 21:09:16 ariel openqa[29052]: Mojolicious::start('OpenQA::WebAPI=HASH(0x1700280)') called at /usr/lib/perl5/vendor_perl/5.18.2/Mojolicious/Commands.pm line 71
Aug 01 21:09:16 ariel openqa[29052]: Mojolicious::Commands::start_app('Mojolicious::Commands', 'OpenQA::WebAPI') called at /usr/share/openqa/script/../lib/OpenQA/WebAPI.pm line 486
Aug 01 21:09:16 ariel openqa[29052]: OpenQA::WebAPI::run() called at /usr/share/openqa/script/openqa line 34
Aug 01 21:09:17 ariel systemd[1]: openqa-webui.service: Main process exited, code=exited, status=12/n/a
Aug 01 21:09:18 ariel systemd[1]: openqa-webui.service: Unit entered failed state.
Aug 01 21:09:18 ariel systemd[1]: openqa-webui.service: Failed with result 'exit-code'.
Files
Updated by szarate over 6 years ago
- Subject changed from out of memory in o3 to Webui killed by out of memory in o3
Updated by szarate over 6 years ago
I think we need to revisit the actual parameters we use to start our openQA instance, as it looks like either Mojo or apache cannot cope with them...
Updated by szarate over 6 years ago
- Priority changed from Normal to Urgent
- Target version set to Current Sprint
Updated by szarate over 6 years ago
- Related to action #39743: [o3][tools] o3 unusable, often responds with 504 Gateway Time-out added
Updated by szarate over 6 years ago
- Related to action #39629: openQA Scheduler refactor fallout added
Updated by szarate over 6 years ago
By this point in time, o3 already had the blocked_by calculation in place. and I have not seen the oom killer starting again, after commenting the blocked_by calculation/deploying old scheduler
Updated by szarate over 6 years ago
- File dmesg_03.txt dmesg_03.txt added
- Priority changed from Urgent to High
This one needs investigation,
[Tue Aug 21 11:40:18 2018] postgres invoked oom-killer: gfp_mask=0x14201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=0, order=0, oom_score_adj=0
[Tue Aug 21 11:40:18 2018] postgres cpuset=/ mems_allowed=0
Perhaps postgresql needs tunning too.
Updated by szarate over 6 years ago
- Subject changed from Webui killed by out of memory in o3 to Webui killed by out of memory in o3 (triggered by postgresql)
Updated by coolo over 6 years ago
- Status changed from New to Rejected
It's hard to say what was up there at that time, so drop that
Updated by coolo over 6 years ago
- Target version changed from Current Sprint to Done
Actions