https://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842020-03-19T17:53:06ZopenSUSE Project Management ToolopenQA Project - action #64574: Keep track of disk usage of results by job groupshttps://progress.opensuse.org/issues/64574?journal_id=2865052020-03-19T17:53:06Zokurzokurz@suse.com
<ul><li><strong>Subject</strong> changed from <i>Keep track of disk space used by results of job groups</i> to <i>Keep track of disk usage of results by job groups</i></li><li><strong>Description</strong> updated (<a title="View differences" href="/journals/286505/diff?detail_id=283133">diff</a>)</li><li><strong>Category</strong> set to <i>Feature requests</i></li></ul> openQA Project - action #64574: Keep track of disk usage of results by job groupshttps://progress.opensuse.org/issues/64574?journal_id=2871452020-03-24T11:15:49Zmkittlermarius.kittler@suse.com
<ul></ul><p>My last comment was lost, here just the SQL queries I've been mentioning:</p>
<pre><code>select group_id, (select concat_ws('/', (select name from job_group_parents where id = parent_id), name) from job_groups where id = group_id) as group_name, sum(result_size) as result_size from jobs group by group_id order by group_id;
</code></pre><pre><code>select group_id, (select concat_ws('/', (select name from job_group_parents where id = parent_id), name) from job_groups where id = group_id) as group_name, (sum(result_size) / 1024 / 1024 / 1024) as result_size_gb from jobs group by group_id order by result_size_gb desc;
</code></pre><pre><code>select id, test, (select concat_ws('/', (select name from job_group_parents where id = parent_id), name) from job_groups where id = group_id) as group_name, result_size, (result_size / 1024 / 1024) as result_size_mb from jobs where result_size is not null order by result_size desc limit 20;
</code></pre>
<hr>
<p>I'm currently experimenting with using Telegraf locally and have also drafted a MR for our monitoring: <a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/287" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/287</a></p>
<hr>
<p>Seems like the actual disk usage and the disk usage now available in the DB are not exactly the same. I guess one problem with my accounting is that it counts screenshots twice if the same screenshot is present in the same test and that's apparently often the case, e.g.</p>
<pre><code>lrwxrwxrwx 1 martchus users 70 24. Mär 10:07 welcome-1.png -> /hdd/openqa-devel/openqa/images/cbb/22d/ea6b09799ac843a799e2f8578e.png
lrwxrwxrwx 1 martchus users 70 24. Mär 10:07 welcome-2.png -> /hdd/openqa-devel/openqa/images/cbb/22d/ea6b09799ac843a799e2f8578e.png
</code></pre> openQA Project - action #64574: Keep track of disk usage of results by job groupshttps://progress.opensuse.org/issues/64574?journal_id=2876672020-03-25T13:09:47Zmkittlermarius.kittler@suse.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-4 status-3 priority-4 priority-default closed" href="/issues/64809">action #64809</a>: Worker uploads some text results possibly multiple times wasting resources</i> added</li></ul> openQA Project - action #64574: Keep track of disk usage of results by job groupshttps://progress.opensuse.org/issues/64574?journal_id=2876702020-03-25T13:19:06Zmkittlermarius.kittler@suse.com
<ul></ul><p>The reason why the figures stored in the database are inaccurate is generally that it accounts what is being uploaded and not what is being stored/linked. That means:</p>
<ol>
<li>Screenshots which are already present from a previous test run are <em>not</em> accounted. That means the result size in the DB is smaller.</li>
<li>Text results <a href="https://progress.opensuse.org/issues/64809" class="external">might be uploaded twice</a> (but are only stored once). That means the result size in the DB is bigger. With <a href="https://github.com/os-autoinst/openQA/pull/2879" class="external">https://github.com/os-autoinst/openQA/pull/2879</a> in place, this shouldn't be the case anymore.</li>
</ol>
<p>It seems that for the jobs I've tested 2. outweighs 1. and the result size in the DB is bigger than what <code>du</code> reports. Nevertheless I suppose the figures are accurate enough to compare them and to identify the disk space eating culprit. However, we should avoid printing them somewhere in the web UI implying that these are exact sizes.</p>
<p>Yet another caveat to mention for people who come from the Grafana dashboard: The accumulated result size for the groups is obviously only the size since recording the result sizes has been started and not the total result size for that group.</p>
openQA Project - action #64574: Keep track of disk usage of results by job groupshttps://progress.opensuse.org/issues/64574?journal_id=2878472020-03-26T05:38:50Zokurzokurz@suse.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-4 status-3 priority-7 priority-highest closed" href="/issues/64824">action #64824</a>: osd /results is at 99%, about to exceed available space</i> added</li></ul> openQA Project - action #64574: Keep track of disk usage of results by job groupshttps://progress.opensuse.org/issues/64574?journal_id=2882402020-03-26T16:55:42Zmkittlermarius.kittler@suse.com
<ul></ul><ul>
<li>The PR for the Telegraf query has been merged.</li>
<li>PR for the Grafana dashboard is ongoing: <a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/293" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/293</a></li>
<li>PR for PostgreSQL permissions of Telegram user has been merged but maybe needs amendment: <a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/292" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/292</a></li>
</ul>
openQA Project - action #64574: Keep track of disk usage of results by job groupshttps://progress.opensuse.org/issues/64574?journal_id=2882432020-03-26T19:52:29Zokurzokurz@suse.com
<ul></ul><p>As you managed to provide the permissions manually please see <a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/292#note_206089" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/292#note_206089</a> why salt failed to do the same.</p>
openQA Project - action #64574: Keep track of disk usage of results by job groupshttps://progress.opensuse.org/issues/64574?journal_id=2884712020-03-27T10:52:24Zmkittlermarius.kittler@suse.com
<ul></ul><p>I've update my PR so include a fix for salt (which will hopefully work).</p>
openQA Project - action #64574: Keep track of disk usage of results by job groupshttps://progress.opensuse.org/issues/64574?journal_id=2885072020-03-27T13:38:06Zmkittlermarius.kittler@suse.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Feedback</i></li></ul><p>The PR has been merged and the panel is available: <a href="https://stats.openqa-monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?orgId=1&fullscreen&panelId=19" class="external">https://stats.openqa-monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?orgId=1&fullscreen&panelId=19</a></p>
openQA Project - action #64574: Keep track of disk usage of results by job groupshttps://progress.opensuse.org/issues/64574?journal_id=2885942020-03-27T20:48:50Zokurzokurz@suse.com
<ul><li><strong>Parent task</strong> set to <i>#64746</i></li></ul> openQA Project - action #64574: Keep track of disk usage of results by job groupshttps://progress.opensuse.org/issues/64574?journal_id=2900012020-04-01T15:12:52Zmkittlermarius.kittler@suse.com
<ul><li><strong>Status</strong> changed from <i>Feedback</i> to <i>In Progress</i></li></ul><p>When deleting logs we should keep track of the freed disk space, otherwise the effect of the cleanup is not visible in the graph until the entire job is deleted. PR: <a href="https://github.com/os-autoinst/openQA/pull/2893" class="external">https://github.com/os-autoinst/openQA/pull/2893</a></p>
openQA Project - action #64574: Keep track of disk usage of results by job groupshttps://progress.opensuse.org/issues/64574?journal_id=2906732020-04-03T15:49:29Zmkittlermarius.kittler@suse.com
<ul></ul><p>The mentioned PR has been merged.</p>
<hr>
<p>The retention policy for the data might need to be adjusted. It would also make sense to perform the PostgreSQL query less frequently.</p>
openQA Project - action #64574: Keep track of disk usage of results by job groupshttps://progress.opensuse.org/issues/64574?journal_id=2930002020-04-15T09:48:26Zmkittlermarius.kittler@suse.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Feedback</i></li></ul><p>PR to query less frequently (already merged): <a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/commit/8401cd98bd4a545ee77020992ccdbe5b6f4893cf" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/commit/8401cd98bd4a545ee77020992ccdbe5b6f4893cf</a></p>
<p>It seems the retention policy can only be set on database level. So it is likely the best to configure a global retention policy at some point (not as part of this task).</p>
openQA Project - action #64574: Keep track of disk usage of results by job groupshttps://progress.opensuse.org/issues/64574?journal_id=2956512020-04-23T13:42:54Zmkittlermarius.kittler@suse.com
<ul><li><strong>Status</strong> changed from <i>Feedback</i> to <i>Resolved</i></li><li><strong>Target version</strong> deleted (<del><i>Current Sprint</i></del>)</li></ul><p>I've created a follow-up ticket for the retention policy: <a class="issue tracker-4 status-12 priority-3 priority-lowest" title="action: Configure downsampling and a retention policy for InfluxDB (Workable)" href="https://progress.opensuse.org/issues/66019">#66019</a></p>
<p>Not sure what's left to do so I'm closing the ticket as resolved.</p>