action #152392
closedWhy does the investigation diff show PRJDIR value "/var/lib/openqa/cache/openqa.suse.de" vs. "/var/lib/openqa/share" when both should use the cache service? size:M
Description
Observation¶
https://openqa.suse.de/tests/13018864#investigation shows
in "diff_to_last_good"
- "PRJDIR" : "/var/lib/openqa/cache/openqa.suse.de",
+ "PRJDIR" : "/var/lib/openqa/share",
…
- "WORKER_HOSTNAME" : "worker38.oqa.prg2.suse.org",
+ "WORKER_HOSTNAME" : "worker39.oqa.prg2.suse.org",
so a difference in PRJDIR as if worker38 would use the cache service and worker39 would not despite both machines having the cache service enabled.
Acceptence Criteria¶
- AC1: It is known and documented what PRJDIR does
- AC2: Use of the cache service is confirmed to work for reading the test distribution data
Expected result¶
- There should be no diff in PRJDIR
Suggestions¶
- Confirm where PRJDIR is set, and what updates it later
- Restarting workers might reset it to the default value
- Is it correct to use PRJDIR to enable the cache?
- DONE isotovideo probably doesn't care about this folder -> confirmed, there is no reference to PRJDIR (or even just "prj" case-insensitive) in current os-autoinst
- Remove FIXME's from code
- Remove note about "compatibility for older os-autoinst" versions https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/Worker/Engines/isotovideo.pm if the last version of os-autoinst is from some years ago -> confirmed to be deletable, last references was deleted in "commit 9eb2f564, Author: Marius Kittler mkittler@suse.de, Date: Thu Oct 17 19:05:00 2019 +0200" so old enough
- https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/Worker/Engines/isotovideo.pm#L215
Updated by tinita about 1 year ago
There are even differences on the same worker, but in a quick check I could only verify worker38 and 39:
https://openqa.suse.de/tests/13018864#downloads
Assigned worker: worker39:24
"PRJDIR" : "/var/lib/openqa/share",
https://openqa.suse.de/tests/12996694#downloads
Assigned worker: worker39:15
"PRJDIR" : "/var/lib/openqa/cache/openqa.suse.de",
https://openqa.suse.de/tests/12748549#downloads
Assigned worker: worker38:32
"PRJDIR" : "/var/lib/openqa/share",
https://openqa.suse.de/tests/12758728#downloads
Assigned worker: worker38:3
"PRJDIR" : "/var/lib/openqa/cache/openqa.suse.de",
It has nothing to do with the investigation tab, though.
Updated by tinita about 1 year ago
- Status changed from New to In Progress
- Assignee set to tinita
Updated by tinita about 1 year ago · Edited
I analyzed the vars.json files of the last days, for worker35 and worker38:
% cd /var/lib/openqa/testresults
% find 130* -name vars.json | xargs grep "WORKER_HOSTNAME.*worker35.oqa.prg2.suse.org" -l | tee ~/workers/worker35-files
% cat ~/workers/worker35-files | xargs grep PRJDIR | grep /var/lib/openqa/share >~/workers/worker35-files.share
% cat ~/workers/worker35-files | xargs grep PRJDIR | grep /var/lib/openqa/cache/openqa.suse.de >~/workers/worker35-files.cache
% wc -l ~/workers/*
2496 /home/tinita/workers/worker35-files
2260 /home/tinita/workers/worker35-files.cache
236 /home/tinita/workers/worker35-files.share
5214 /home/tinita/workers/worker38-files
5035 /home/tinita/workers/worker38-files.cache
179 /home/tinita/workers/worker38-files.share
15420 total
So the distribution is similar - a small part of the tests are using /var/lib/openqa/share
.
Looking at the test ids, the cases are spread over the whole set.
Looking at the worker instances, it happens for all instances.
% cat /home/tinita/workers/worker38-files.share | perl -pwlE's/: .*//' | xargs cat | jq .WORKER_INSTANCE | sort -un
1
2
...
40
%
cat /home/tinita/workers/worker38-files.cache | perl -pwlE's/: .*//' | xargs cat | jq .WORKER_INSTANCE | sort -un
1
2
...
40
Maybe there are certain conditions where the cache service is used or not. Looking at the code...
Updated by tinita about 1 year ago
Both tests seem to be using the cache directory though:
https://openqa.suse.de/tests/13010854/logfile?filename=worker-log.txt
[2023-12-09T01:17:22.703328+01:00] [debug] [pid:114119] Linked asset "/var/lib/openqa/cache/openqa.suse.de/SLES-15-SP5-x86_64-mru-install-minimal-with-addons-Build20231208-1-Server-DVD-Updates-64bit.qcow2" to "/var/lib/openqa/pool/26/SLES-15-SP5-x86_64-mru-install-minimal-with-addons-Build20231208-1-Server-DVD-Updates-64bit.qcow2"
https://openqa.suse.de/tests/13018864/logfile?filename=worker-log.txt
[2023-12-11T00:22:04.744680+01:00] [debug] [pid:35080] Linked asset "/var/lib/openqa/cache/openqa.suse.de/SLES-15-SP5-x86_64-mru-install-minimal-with-addons-Build20231210-1-Server-DVD-Updates-64bit.qcow2" to "/var/lib/openqa/pool/24/SLES-15-SP5-x86_64-mru-install-minimal-with-addons-Build20231210-1-Server-DVD-Updates-64bit.qcow2"
The code has a note:
# lib/OpenQA/Worker/Engines/isotovideo.pm
# note: PRJDIR is used as base for relative needle paths by os-autoinst. This is supposed to change
# but for compatibility with current old os-autoinst we need to set PRJDIR for a consistent
# behavior.
I also cannot find an instance of PRJDIR
in os-autoinst. Its usage was removed in 2019: https://github.com/os-autoinst/os-autoinst/commit/9eb2f564b325549227824f4fb054effe66861d4e
That doesn't explain why the content differs, though.
Updated by openqa_review about 1 year ago
- Due date set to 2023-12-30
Setting due date based on mean cycle time of SUSE QE Tools
Updated by okurz about 1 year ago
- Due date changed from 2023-12-30 to 2024-01-13
christmas vacation due date bump :)
Updated by livdywan about 1 year ago
- Subject changed from Why does the investigation diff show PRJDIR value "/var/lib/openqa/cache/openqa.suse.de" vs. "/var/lib/openqa/share" when both should use the cache service? to Why does the investigation diff show PRJDIR value "/var/lib/openqa/cache/openqa.suse.de" vs. "/var/lib/openqa/share" when both should use the cache service? size:M
- Description updated (diff)
Updated by tinita about 1 year ago
Some more findings:
I looked into job.json in the pooldir, and it contains PRJDIR most of the times.
I was able to find a running job where PRJDIR was not set at all in job.json: https://openqa.suse.de/tests/13130097
job.json is filled from the job_settings (but PRJDIR is not in the database job settings).
Also regarding PRJDIR and OpenQA::Utils::prjdir():
sub prjdir { ($ENV{OPENQA_BASEDIR} || '/var/lib') . '/openqa' }
It seems to me that prjdir()
should really only be that directory and not the sharedir and also not the cachedir.
e.g. here are some instances of prjdir()
:
pool_directory => prjdir() . "/pool/$instance_number",
my $dblockfile = catfile(prjdir(), 'db', 'db.lock');
my $defaultroot = catdir($prjdir, 'share', 'tests');
So being PRJDIR /var/lib/openqa/cache/... or /var/lib/openqa/share is already confusing and does not match prjdir()
.
Apart from the seemingly random value we should decide what we want the value in vars.json to represent.
Updated by okurz about 1 year ago
Updated by tinita about 1 year ago
- Assignee changed from tinita to okurz
Assigning to @okurz
If I have time before vacation I would like to at least log the path to the test code in autoinst-log.txt, so that we can verify if the cache directory was used or not.
Updated by okurz about 1 year ago
- Status changed from In Progress to Feedback
https://github.com/os-autoinst/openQA/pull/5408 ready for review
Updated by okurz about 1 year ago
- Due date deleted (
2024-01-13) - Status changed from Feedback to Resolved
Both ACs covered and investigation jobs like https://openqa.suse.de/tests/13217106#investigation show no PRJDIR anymore and no other fallout known, good enough to resolve.