https://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842019-09-30T13:37:26ZopenSUSE Project Management ToolopenQA Infrastructure - action #57539: imagetester is incompleting all jobs with existing but empty logs, as /var/lib/openqa/pool is fullhttps://progress.opensuse.org/issues/57539?journal_id=2471212019-09-30T13:37:26Zokurzokurz@suse.com
<ul><li><strong>Subject</strong> changed from <i>imagetester is incompleting all jobs with existing but empty logs, as / is full</i> to <i>imagetester is incompleting all jobs with existing but empty logs, as /var/lib/openqa/pool is full</i></li><li><strong>Priority</strong> changed from <i>Immediate</i> to <i>Normal</i></li></ul><pre><code>host=openqa.opensuse.org; worker=imagetester; failed_since=2019-09-30; for i in $(ssh $host "sudo -u geekotest psql --no-align --tuples-only --command=\"select id from jobs where (assigned_worker_id in (select id from workers where host='$worker' and result='incomplete' and t_finished >= '$failed_since'));\" openqa"); do openqa-client --host $host jobs/$i/restart post; done
</code></pre>
<p>Checking when this might have started. Looking with <code>journalctl -u openqa-worker@*</code> I found the first "Result: died" coming from worker with PID 2110, started around "Sep 30 03:34:24" , that is <a href="https://openqa.opensuse.org/tests/1044019/file/autoinst-log.txt" class="external">https://openqa.opensuse.org/tests/1044019/file/autoinst-log.txt</a> showing</p>
<pre><code>[2019-09-30T05:06:02.103 CEST] [debug] /var/lib/openqa/cache/openqa1-opensuse/tests/opensuse/tests/x11/sshxterm.pm:43 called testapi::type_string
[2019-09-30T05:06:02.103 CEST] [debug] <<< testapi::type_string(string='killall xterm
', max_interval=250, wait_screen_changes=0, wait_still_screen=0, timeout=30, similarity_level=47)
[2019-09-30T05:06:02.449 CEST] [debug] <<< testapi::assert_screen(mustmatch='generic-desktop', timeout=30)
libpng error: Write Error
[2019-09-30T05:06:03.828 CEST] [debug] >>> testapi::_handle_found_needle: found generic-desktop-kde-plasma512-leap15.1-aarch64-20190409, similarity 1.00 @ 2/733
[2019-09-30T05:06:03.830 CEST] [debug] ||| finished sshxterm x11 at 2019-09-30 03:06:03 (45 s)
Can't close(GLOB(0x5617dca65888)) filehandle: 'No space left on device' at /usr/lib/os-autoinst/bmwqemu.pm line 322
</code></pre>
<p>imagetester is configured for the pool using tmpfs but with only 64GB and current tests using often 40GB we are not able to sustain even more than one instance. We would have more room on /dev/sda:</p>
<pre><code>/dev/sda1 3.6T 54G 3.4T 2% /var/lib/openqa/cache
tmpfs 64G 32K 64G 1% /var/lib/openqa/pool
</code></pre>
<p>I am not aware of recent changes regarding this.</p>
openQA Infrastructure - action #57539: imagetester is incompleting all jobs with existing but empty logs, as /var/lib/openqa/pool is fullhttps://progress.opensuse.org/issues/57539?journal_id=2471362019-09-30T14:04:05Zokurzokurz@suse.com
<ul></ul><p>I have asked on <a href="irc://chat.freenode.net/opensuse-factory" class="external">#opensuse-factory</a> and <a href="https://chat.suse.de/group/openqa-dev" class="external">openqa-dev (RC)</a> if anyone knows of recent changes involving the 64GB tmpfs pool dir.</p>
openQA Infrastructure - action #57539: imagetester is incompleting all jobs with existing but empty logs, as /var/lib/openqa/pool is fullhttps://progress.opensuse.org/issues/57539?journal_id=2471572019-09-30T15:52:15Zokurzokurz@suse.com
<ul><li><strong>Due date</strong> set to <i>2019-10-08</i></li><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Feedback</i></li><li><strong>Priority</strong> changed from <i>Normal</i> to <i>High</i></li></ul><p>should keep it on High and with due date to see what we need to do as long as we have the workers disabled completely. Masked the worker target and workers</p>
<pre><code>systemctl mask --now openqa-worker.target openqa-worker@{1..5}
</code></pre>
<p>Waiting until next week, 2019-10-08, to see if anyone else comes back with a good idea of what happened.</p>
<p>EDIT: As decided in QA tools meeting 2019-10-01 we will reduce the the worker instances to what should be safe, I decided for two instances.</p>
openQA Infrastructure - action #57539: imagetester is incompleting all jobs with existing but empty logs, as /var/lib/openqa/pool is fullhttps://progress.opensuse.org/issues/57539?journal_id=2473252019-10-01T08:51:41Zokurzokurz@suse.com
<ul><li><strong>Status</strong> changed from <i>Feedback</i> to <i>Resolved</i></li><li><strong>Target version</strong> set to <i>Done</i></li></ul>