Actions
action #90161
closed[Alerting] malbec: Memory usage alert triggered briefly and turned OK within the next minute
Start date:
2021-03-16
Due date:
% Done:
0%
Estimated time:
Description
Observation¶
[Alerting] malbec: Memory usage alert
Metric name
Value
available
1848351129.600
The alert turned back to OK within the next minute.
Suggestion¶
- See if this comes up again (was it flaky, was it a one-off)
- Find a related open issue (the alert was not mentioned in any open tickets as of this ticket being filed)
Updated by okurz over 3 years ago
- Project changed from openQA Project to openQA Infrastructure
- Status changed from New to Workable
- Priority changed from Normal to High
- Target version set to Ready
Definition of Done for alert tickets should be clear, setting to "Workable" without further explicit ACs
Updated by okurz over 3 years ago
- Status changed from Workable to In Progress
- Assignee set to okurz
Updated by okurz over 3 years ago
https://monitor.qa.suse.de/d/WDmalbec/worker-dashboard-malbec?viewPanel=12054&orgId=1&from=1615877632079&to=1615880700634 shows that the available memory went down to 0 at 2021-03-16 08:30
Found OOM in system journal:
Mar 16 08:30:12 malbec kernel: ntpd invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
Mar 16 08:30:12 malbec kernel: CPU: 0 PID: 11378 Comm: ntpd Kdump: loaded Not tainted 5.3.18-lp152.57-default #1 openSUSE Leap 15.2
Mar 16 08:30:12 malbec kernel: Call Trace:
Mar 16 08:30:12 malbec kernel: [c000000fe9be75b0] [c000000000d0cd28] dump_stack+0xbc/0x104 (unreliable)
Mar 16 08:30:12 malbec kernel: [c000000fe9be75f0] [c000000000383c20] dump_header+0x60/0x2e0
Mar 16 08:30:12 malbec kernel: [c000000fe9be7680] [c00000000038444c] oom_kill_process+0x19c/0x2c0
Mar 16 08:30:12 malbec kernel: [c000000fe9be76c0] [c0000000003856c4] out_of_memory+0x114/0x720
Mar 16 08:30:12 malbec kernel: [c000000fe9be7760] [c000000000402a08] __alloc_pages_slowpath+0xa78/0xe20
Mar 16 08:30:12 malbec kernel: [c000000fe9be7910] [c0000000004030c8] __alloc_pages_nodemask+0x318/0x3e0
Mar 16 08:30:12 malbec kernel: [c000000fe9be7990] [c000000000428300] alloc_pages_current+0xa0/0x140
Mar 16 08:30:12 malbec kernel: [c000000fe9be79d0] [c000000000377b68] __page_cache_alloc+0xb8/0x120
Mar 16 08:30:12 malbec kernel: [c000000fe9be7a00] [c00000000037b258] pagecache_get_page+0x128/0x490
Mar 16 08:30:12 malbec kernel: [c000000fe9be7a60] [c00000000037c53c] filemap_fault+0x51c/0xd50
Mar 16 08:30:12 malbec kernel: [c000000fe9be7b80] [c0000000003d21fc] __do_fault+0x5c/0x200
Mar 16 08:30:12 malbec kernel: [c000000fe9be7bc0] [c0000000003d9494] __handle_mm_fault+0x1144/0x1af0
Mar 16 08:30:12 malbec kernel: [c000000fe9be7cb0] [c0000000003d9f6c] handle_mm_fault+0x12c/0x220
Mar 16 08:30:12 malbec kernel: [c000000fe9be7cf0] [c00000000007a408] __do_page_fault+0x298/0xf10
Mar 16 08:30:12 malbec kernel: [c000000fe9be7de0] [c00000000007b0b8] do_page_fault+0x38/0xc0
Mar 16 08:30:12 malbec kernel: [c000000fe9be7e20] [c00000000000a908] handle_page_fault+0x10/0x30
Mar 16 08:30:12 malbec kernel: Mem-Info:
Mar 16 08:30:12 malbec kernel: active_anon:1401138 inactive_anon:628449 isolated_anon:0
active_file:214 inactive_file:0 isolated_file:0
unevictable:4480 dirty:7 writeback:0 unstable:0
slab_reclaimable:2945 slab_unreclaimable:12285
mapped:427 shmem:844311 pagetables:486 bounce:0
free:10996 free_pcp:0 free_cma:39
…
Mar 16 08:30:13 malbec kernel: [ 49233] 479 49233 616370 525869 4550656 0 0 qemu-system-ppc
Mar 16 08:30:13 malbec kernel: [ 49298] 479 49298 4884 4523 90112 0 0 perl
Mar 16 08:30:13 malbec kernel: [ 49304] 479 49304 1846 1429 63488 0 0 /usr/bin/isotov
Mar 16 08:30:13 malbec kernel: [ 49312] 479 49312 43424 3329 133120 0 0 /usr/bin/isotov
Mar 16 08:30:13 malbec kernel: [ 49331] 479 49331 51625 11504 200704 0 0 /usr/bin/isotov
Mar 16 08:30:13 malbec kernel: [ 49350] 479 49350 21086 171 63488 0 0 videoencoder
Mar 16 08:30:13 malbec kernel: [ 49366] 479 49366 610272 525752 4487168 0 0 qemu-system-ppc
Mar 16 08:30:13 malbec kernel: [ 52958] 479 52958 1218 1026 57344 0 0 worker
Mar 16 08:30:13 malbec kernel: [ 53324] 479 53324 2834 2484 73728 0 0 perl
Mar 16 08:30:13 malbec kernel: [ 53334] 479 53334 1784 1411 65536 0 0 /usr/bin/isotov
Mar 16 08:30:13 malbec kernel: [ 53574] 479 53574 42295 2241 124928 0 0 /usr/bin/isotov
Mar 16 08:30:13 malbec kernel: [ 53575] 479 53575 47838 7658 169984 0 0 /usr/bin/isotov
Mar 16 08:30:13 malbec kernel: [ 53612] 479 53612 21086 124 65536 0 0 videoencoder
Mar 16 08:30:13 malbec kernel: [ 53628] 479 53628 93345 31529 468992 0 0 qemu-system-ppc
Mar 16 08:30:13 malbec kernel: [ 55187] 51 55187 302 64 51200 0 0 pickup
Mar 16 08:30:13 malbec kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0-1,16-17,global_oom,task_memcg=/,task=qemu-system->
Mar 16 08:30:13 malbec kernel: Out of memory: Killed process 49233 (qemu-system-ppc) total-vm:39447680kB, anon-rss:33653376kB, file-rss:2176kB, shmem-r
the corresponding openQA test is https://openqa.suse.de/tests/5674784 which was of course incomplete with reason "Reason: backend died: QEMU exited unexpectedly, see log for details" and auto-review labeled with #71188
Updated by okurz over 3 years ago
- Copied to action #90974: Make it obvious if qemu gets terminated unexpectedly due to out-of-memory added
Updated by okurz over 3 years ago
- Status changed from In Progress to Resolved
Created new feature request #90974 as follow-up. With this and the alert resolved we can resolve this ticket.
Actions