https://progress.opensuse.org/
https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?1582917784
2021-03-16T12:35:18Z
openSUSE Project Management Tool
openQA Infrastructure - action #90161: [Alerting] malbec: Memory usage alert triggered briefly and turned OK within the next minute
https://progress.opensuse.org/issues/90161?journal_id=391958
2021-03-16T12:35:18Z
okurz
okurz@suse.com
<ul><li><strong>Project</strong> changed from <i>openQA Project</i> to <i>openQA Infrastructure</i></li><li><strong>Status</strong> changed from <i>New</i> to <i>Workable</i></li><li><strong>Priority</strong> changed from <i>Normal</i> to <i>High</i></li><li><strong>Target version</strong> set to <i>Ready</i></li></ul><p>Definition of Done for alert tickets should be clear, setting to "Workable" without further explicit ACs</p>
openQA Infrastructure - action #90161: [Alerting] malbec: Memory usage alert triggered briefly and turned OK within the next minute
https://progress.opensuse.org/issues/90161?journal_id=395138
2021-04-06T14:50:09Z
mkittler
marius.kittler@suse.com
<ul></ul><p>It hasn't happened again yet.</p>
openQA Infrastructure - action #90161: [Alerting] malbec: Memory usage alert triggered briefly and turned OK within the next minute
https://progress.opensuse.org/issues/90161?journal_id=396500
2021-04-12T07:46:40Z
okurz
okurz@suse.com
<ul><li><strong>Status</strong> changed from <i>Workable</i> to <i>In Progress</i></li><li><strong>Assignee</strong> set to <i>okurz</i></li></ul><p>the related panel is <a href="https://monitor.qa.suse.de/d/WDmalbec/worker-dashboard-malbec?viewPanel=12054&orgId=1&refresh=1m" class="external">https://monitor.qa.suse.de/d/WDmalbec/worker-dashboard-malbec?viewPanel=12054&orgId=1&refresh=1m</a></p>
openQA Infrastructure - action #90161: [Alerting] malbec: Memory usage alert triggered briefly and turned OK within the next minute
https://progress.opensuse.org/issues/90161?journal_id=396506
2021-04-12T08:03:20Z
okurz
okurz@suse.com
<ul></ul><p><a href="https://monitor.qa.suse.de/d/WDmalbec/worker-dashboard-malbec?viewPanel=12054&orgId=1&from=1615877632079&to=1615880700634">https://monitor.qa.suse.de/d/WDmalbec/worker-dashboard-malbec?viewPanel=12054&orgId=1&from=1615877632079&to=1615880700634</a> shows that the available memory went down to 0 at 2021-03-16 08:30</p>
<p>Found OOM in system journal:</p>
<pre><code>Mar 16 08:30:12 malbec kernel: ntpd invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
Mar 16 08:30:12 malbec kernel: CPU: 0 PID: 11378 Comm: ntpd Kdump: loaded Not tainted 5.3.18-lp152.57-default #1 openSUSE Leap 15.2
Mar 16 08:30:12 malbec kernel: Call Trace:
Mar 16 08:30:12 malbec kernel: [c000000fe9be75b0] [c000000000d0cd28] dump_stack+0xbc/0x104 (unreliable)
Mar 16 08:30:12 malbec kernel: [c000000fe9be75f0] [c000000000383c20] dump_header+0x60/0x2e0
Mar 16 08:30:12 malbec kernel: [c000000fe9be7680] [c00000000038444c] oom_kill_process+0x19c/0x2c0
Mar 16 08:30:12 malbec kernel: [c000000fe9be76c0] [c0000000003856c4] out_of_memory+0x114/0x720
Mar 16 08:30:12 malbec kernel: [c000000fe9be7760] [c000000000402a08] __alloc_pages_slowpath+0xa78/0xe20
Mar 16 08:30:12 malbec kernel: [c000000fe9be7910] [c0000000004030c8] __alloc_pages_nodemask+0x318/0x3e0
Mar 16 08:30:12 malbec kernel: [c000000fe9be7990] [c000000000428300] alloc_pages_current+0xa0/0x140
Mar 16 08:30:12 malbec kernel: [c000000fe9be79d0] [c000000000377b68] __page_cache_alloc+0xb8/0x120
Mar 16 08:30:12 malbec kernel: [c000000fe9be7a00] [c00000000037b258] pagecache_get_page+0x128/0x490
Mar 16 08:30:12 malbec kernel: [c000000fe9be7a60] [c00000000037c53c] filemap_fault+0x51c/0xd50
Mar 16 08:30:12 malbec kernel: [c000000fe9be7b80] [c0000000003d21fc] __do_fault+0x5c/0x200
Mar 16 08:30:12 malbec kernel: [c000000fe9be7bc0] [c0000000003d9494] __handle_mm_fault+0x1144/0x1af0
Mar 16 08:30:12 malbec kernel: [c000000fe9be7cb0] [c0000000003d9f6c] handle_mm_fault+0x12c/0x220
Mar 16 08:30:12 malbec kernel: [c000000fe9be7cf0] [c00000000007a408] __do_page_fault+0x298/0xf10
Mar 16 08:30:12 malbec kernel: [c000000fe9be7de0] [c00000000007b0b8] do_page_fault+0x38/0xc0
Mar 16 08:30:12 malbec kernel: [c000000fe9be7e20] [c00000000000a908] handle_page_fault+0x10/0x30
Mar 16 08:30:12 malbec kernel: Mem-Info:
Mar 16 08:30:12 malbec kernel: active_anon:1401138 inactive_anon:628449 isolated_anon:0
active_file:214 inactive_file:0 isolated_file:0
unevictable:4480 dirty:7 writeback:0 unstable:0
slab_reclaimable:2945 slab_unreclaimable:12285
mapped:427 shmem:844311 pagetables:486 bounce:0
free:10996 free_pcp:0 free_cma:39
…
Mar 16 08:30:13 malbec kernel: [ 49233] 479 49233 616370 525869 4550656 0 0 qemu-system-ppc
Mar 16 08:30:13 malbec kernel: [ 49298] 479 49298 4884 4523 90112 0 0 perl
Mar 16 08:30:13 malbec kernel: [ 49304] 479 49304 1846 1429 63488 0 0 /usr/bin/isotov
Mar 16 08:30:13 malbec kernel: [ 49312] 479 49312 43424 3329 133120 0 0 /usr/bin/isotov
Mar 16 08:30:13 malbec kernel: [ 49331] 479 49331 51625 11504 200704 0 0 /usr/bin/isotov
Mar 16 08:30:13 malbec kernel: [ 49350] 479 49350 21086 171 63488 0 0 videoencoder
Mar 16 08:30:13 malbec kernel: [ 49366] 479 49366 610272 525752 4487168 0 0 qemu-system-ppc
Mar 16 08:30:13 malbec kernel: [ 52958] 479 52958 1218 1026 57344 0 0 worker
Mar 16 08:30:13 malbec kernel: [ 53324] 479 53324 2834 2484 73728 0 0 perl
Mar 16 08:30:13 malbec kernel: [ 53334] 479 53334 1784 1411 65536 0 0 /usr/bin/isotov
Mar 16 08:30:13 malbec kernel: [ 53574] 479 53574 42295 2241 124928 0 0 /usr/bin/isotov
Mar 16 08:30:13 malbec kernel: [ 53575] 479 53575 47838 7658 169984 0 0 /usr/bin/isotov
Mar 16 08:30:13 malbec kernel: [ 53612] 479 53612 21086 124 65536 0 0 videoencoder
Mar 16 08:30:13 malbec kernel: [ 53628] 479 53628 93345 31529 468992 0 0 qemu-system-ppc
Mar 16 08:30:13 malbec kernel: [ 55187] 51 55187 302 64 51200 0 0 pickup
Mar 16 08:30:13 malbec kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0-1,16-17,global_oom,task_memcg=/,task=qemu-system->
Mar 16 08:30:13 malbec kernel: Out of memory: Killed process 49233 (qemu-system-ppc) total-vm:39447680kB, anon-rss:33653376kB, file-rss:2176kB, shmem-r
</code></pre>
<p>the corresponding openQA test is <a href="https://openqa.suse.de/tests/5674784">https://openqa.suse.de/tests/5674784</a> which was of course incomplete with reason "Reason: backend died: QEMU exited unexpectedly, see log for details" and auto-review labeled with <a class="issue tracker-4 status-12 priority-3 priority-lowest child" title="action: job incomplete with auto_review:"backend died: QEMU exited unexpectedly, see log for details" and... (Workable)" href="https://progress.opensuse.org/issues/71188">#71188</a></p>
openQA Infrastructure - action #90161: [Alerting] malbec: Memory usage alert triggered briefly and turned OK within the next minute
https://progress.opensuse.org/issues/90161?journal_id=396521
2021-04-12T08:35:05Z
okurz
okurz@suse.com
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" href="/issues/90974">action #90974</a>: Make it obvious if qemu gets terminated unexpectedly due to out-of-memory</i> added</li></ul>
openQA Infrastructure - action #90161: [Alerting] malbec: Memory usage alert triggered briefly and turned OK within the next minute
https://progress.opensuse.org/issues/90161?journal_id=396527
2021-04-12T08:35:38Z
okurz
okurz@suse.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Resolved</i></li></ul><p>Created new feature request <a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: Make it obvious if qemu gets terminated unexpectedly due to out-of-memory (Resolved)" href="https://progress.opensuse.org/issues/90974">#90974</a> as follow-up. With this and the alert resolved we can resolve this ticket.</p>