https://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842019-01-10T12:58:09ZopenSUSE Project Management ToolopenQA Tests - action #45938: [functional][u] disk_boot test in Kubic textmode scenarios fail after 09th Janhttps://progress.opensuse.org/issues/45938?journal_id=1783522019-01-10T12:58:09ZRBrownSUSErbrown@suse.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/178352/diff?detail_id=177485">diff</a>)</li></ul> openQA Tests - action #45938: [functional][u] disk_boot test in Kubic textmode scenarios fail after 09th Janhttps://progress.opensuse.org/issues/45938?journal_id=1783642019-01-10T13:25:57Zokurzokurz@suse.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li><li><strong>Assignee</strong> set to <i>okurz</i></li></ul> openQA Tests - action #45938: [functional][u] disk_boot test in Kubic textmode scenarios fail after 09th Janhttps://progress.opensuse.org/issues/45938?journal_id=1783702019-01-10T13:34:11Zokurzokurz@suse.com
<ul><li><strong>Subject</strong> changed from <i>disk_boot test in Kubic textmode scenarios fail after 09th Jan</i> to <i>[functional][u] disk_boot test in Kubic textmode scenarios fail after 09th Jan</i></li><li><strong>Description</strong> updated (<a title="View differences" href="/journals/178370/diff?detail_id=177524">diff</a>)</li><li><strong>Priority</strong> changed from <i>Normal</i> to <i>High</i></li><li><strong>Target version</strong> set to <i>Milestone 22</i></li></ul><p>Let me see.</p>
<ul>
<li>last good: build 20190105: <a href="https://openqa.opensuse.org/tests/826720/">https://openqa.opensuse.org/tests/826720/</a> running on imagetester:1, os-autoinst 4.5.1546602946.a7be7efa, test git hash f56b364adcd8154fc88bdb8bcbecc5de43195fbf</li>
<li>bad: same build: <a href="https://openqa.opensuse.org/tests/828139/">https://openqa.opensuse.org/tests/828139/</a> running on openqaworker1:10, os-autoinst 4.5.1546949268.cb3fa727, test git hash cf10b19af359c26b61e1b8f50e67c45d78e774df</li>
</ul>
<p>last good on openqaworker1: <a href="https://openqa.opensuse.org/tests/820899">https://openqa.opensuse.org/tests/820899</a> from 16 days ago.</p>
<ul>
<li><strong>E4-1</strong> Compare job settings: <strong>O4-1</strong>:</li>
</ul>
<pre><code>$ diff <(openqa_client_o3 --json-output jobs/826720 | jq '.job | .settings' | sort) <(openqa_client_o3 --json-output jobs/828139 | jq '.job | .settings' | sort)
18,19c18,19
< "MULTI_STEP_KUBIC_FLOW": "1",
< "NAME": "00826720-kubic-Tumbleweed-DVD-x86_64-Build20190105-microos_textmode@64bit-4G-HD40G",
---
> "MULTI_STEP_KUBIC_FLOW": "1"
> "NAME": "00828139-kubic-Tumbleweed-DVD-x86_64-Build20190105-microos_textmode@64bit-4G-HD40G",
31c31
< "TEST": "microos_textmode"
---
> "TEST": "microos_textmode",
</code></pre>
<p>no significant difference -> <em>REJECT</em> <strong>H4</strong></p>
<ul>
<li><strong>E3-1</strong> Check difference of os-autoinst: <strong>O3-1</strong></li>
</ul>
<pre><code>$ git log1 --no-merges a7be7efa..cb3fa727
036ab540 Add missing network_console.pm to Makefile
631d0f7a Do not incomplete on connection error with ssh based consoles
</code></pre>
<p>so unlikely</p>
<ul>
<li><strong>E2.3-1</strong> Retrigger tests on production until the same worker as in "last good" picks the job: <a href="https://openqa.opensuse.org/tests/828316#live">https://openqa.opensuse.org/tests/828316#live</a> -> <em>passed</em> <a href="https://openqa.opensuse.org/tests/828316#step/disk_boot/1" class="external">disk_boot</a>, supporting <strong>H2.3</strong> and <strong>H6</strong>, <em>REJECT</em> <strong>H5</strong>, <em>REJECT</em> <strong>H1.2</strong> and <strong>H1</strong></li>
<li><strong>E2.3-2</strong> Check average load on our workers: <strong>O2.3-2</strong></li>
</ul>
<pre><code>$ for i in power8 aarch64 imagetester openqaworker1 openqaworker4 ; do echo -n "$i: " && ssh root@$i "cat /proc/loadavg"; done
power8: 8.19 8.37 7.64 8/1424 152970
aarch64: 1.22 1.35 1.27 2/866 24624
imagetester: 6.64 4.74 3.06 5/442 24515
openqaworker1: 10.80 12.80 10.53 17/1458 5511
openqaworker4: 8.19 10.86 9.71 6/1126 24461
$ for i in power8 aarch64 imagetester openqaworker1 openqaworker4 ; do echo -n "$i: " && ssh root@$i "cat /proc/loadavg"; done
power8: 8.35 8.63 7.96 8/1405 153270
aarch64: 0.45 1.00 1.15 1/806 24713
imagetester: 1.26 2.85 2.72 1/419 24721
openqaworker1: 6.49 9.38 9.72 8/1218 6257
openqaworker4: 10.08 9.82 9.50 11/1192 25258
</code></pre>
<p>The long-term load on imagetester is 2.72-3.06 whereas on openqaworker1/4 it is 9.50-10.53. So the load is <em>significantly</em> higher on openqaworker1 and openqaworker4 than imagetester. <em>REJECT</em> <strong>H2.1</strong> and <strong>H2.2</strong></p>
<ul>
<li><p><strong>E2-1</strong> Local clone: Running on lord.arch: <a href="http://lord.arch/tests/1966">http://lord.arch/tests/1966</a> -> <em>failed</em> in <a href="http://lord.arch/tests/1966#step/disk_boot/2" class="external">disk_boot</a>, supporting <strong>H2</strong> and <strong>H6</strong></p></li>
<li><p><strong>E6-1</strong> Gather statistics and check workers in particular:</p></li>
</ul>
<pre><code>for i in {001..020}; do openqa-clone-job --from https://openqa.opensuse.org --host https://openqa.opensuse.org --skip-download --skip-chained-deps 828316 TEST=okurz_poo45938_$i _GROUP="Development Tumbleweed" BUILD="20190105:poo45938" EXCLUDE_MODULES=networking,repositories,create_autoyast,libzypp_config,one_line_checks,services_enabled,filesystem_ro,transactional_update,rebootmgr,journal_check,shutdown; done
</code></pre>
<p>-> <a href="https://openqa.opensuse.org/tests/overview?version=Tumbleweed&build=20190105%3Apoo45938&groupid=38&distri=kubic">https://openqa.opensuse.org/tests/overview?version=Tumbleweed&build=20190105%3Apoo45938&groupid=38&distri=kubic</a></p>
<p>17/20 failed. The three passed ones all were executed on imagetester with the total execution time 8:30m, 8:32m, 10:06m (winter grub theme in the last which takes longer in "bootloader"). All failed jobs are in the time range of 8:57m-10:37m so longer. <em>ACCEPT</em> <strong>H2.3</strong> and <strong>H2</strong>. As all three workers have the same package state ensured by transactional updates, <em>REJECT</em> <strong>H3</strong></p>
<ul>
<li><p><strong>E6.1-1</strong> lord.arch is currently loaded and conducting tests slowly. Let's see if bumping the timeout in kubic/disk_boot can help -> <a href="http://lord.arch/tests/1967">http://lord.arch/tests/1967</a> with using <code>wait_boot</code> instead of the custom <code>assert_screen</code> which effectively bumps the timeout waiting for "grub2" from 30s to 90s. Failed. Setting TIMEOUT_SCALE=5 -> <a href="http://lord.arch/tests/1969">http://lord.arch/tests/1969</a></p></li>
<li><p>Additionally triggered <a href="https://openqa.opensuse.org/tests/overview?distri=kubic&groupid=38&build=20190105%3Apoo45938_scaled_3&version=Tumbleweed">https://openqa.opensuse.org/tests/overview?distri=kubic&groupid=38&build=20190105%3Apoo45938_scaled_3&version=Tumbleweed</a> with <code>TIMEOUT_SCALE=3</code> which showed that jobs still fail except on imagetester so going for a higher timeout in the fix.</p></li>
</ul>
<p>Fix in <a href="https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/6521">https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/6521</a></p>
openQA Tests - action #45938: [functional][u] disk_boot test in Kubic textmode scenarios fail after 09th Janhttps://progress.opensuse.org/issues/45938?journal_id=1783852019-01-10T13:47:04Zokurzokurz@suse.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/178385/diff?detail_id=177542">diff</a>)</li></ul> openQA Tests - action #45938: [functional][u] disk_boot test in Kubic textmode scenarios fail after 09th Janhttps://progress.opensuse.org/issues/45938?journal_id=1784452019-01-10T16:47:30Zokurzokurz@suse.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/178445/diff?detail_id=177602">diff</a>)</li><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Feedback</i></li></ul> openQA Tests - action #45938: [functional][u] disk_boot test in Kubic textmode scenarios fail after 09th Janhttps://progress.opensuse.org/issues/45938?journal_id=1784512019-01-10T17:02:37Zokurzokurz@suse.com
<ul></ul><p>PR merged.</p>
<p>Crosschecking:</p>
<pre><code>$ for i in {001..020}; do openqa-clone-job --from https://openqa.opensuse.org --host https://openqa.opensuse.org --skip-download --skip-chained-deps 828316 TEST=okurz_poo45938_scaled_3_$i _GROUP="Development Tumbleweed" BUILD="20190105:poo45938_with_fix" EXCLUDE_MODULES=networking,repositories,create_autoyast,libzypp_config,one_line_checks,services_enabled,filesystem_ro,transactional_update,rebootmgr,journal_check,shutdown; done
</code></pre>
<p>-> <a href="https://openqa.opensuse.org/tests/overview?distri=kubic&version=Tumbleweed&build=20190105%3Apoo45938_with_fix&groupid=38" class="external">https://openqa.opensuse.org/tests/overview?distri=kubic&version=Tumbleweed&build=20190105%3Apoo45938_with_fix&groupid=38</a></p>
openQA Tests - action #45938: [functional][u] disk_boot test in Kubic textmode scenarios fail after 09th Janhttps://progress.opensuse.org/issues/45938?journal_id=1784602019-01-10T19:28:50Zokurzokurz@suse.com
<ul><li><strong>Status</strong> changed from <i>Feedback</i> to <i>Resolved</i></li></ul><p>20/20 passed.</p>