action #152392
closedWhy does the investigation diff show PRJDIR value "/var/lib/openqa/cache/openqa.suse.de" vs. "/var/lib/openqa/share" when both should use the cache service? size:M
Description
Observation¶
https://openqa.suse.de/tests/13018864#investigation shows
in "diff_to_last_good"
- "PRJDIR" : "/var/lib/openqa/cache/openqa.suse.de",
+ "PRJDIR" : "/var/lib/openqa/share",
…
- "WORKER_HOSTNAME" : "worker38.oqa.prg2.suse.org",
+ "WORKER_HOSTNAME" : "worker39.oqa.prg2.suse.org",
so a difference in PRJDIR as if worker38 would use the cache service and worker39 would not despite both machines having the cache service enabled.
Acceptence Criteria¶
- AC1: It is known and documented what PRJDIR does
- AC2: Use of the cache service is confirmed to work for reading the test distribution data
Expected result¶
- There should be no diff in PRJDIR
Suggestions¶
- Confirm where PRJDIR is set, and what updates it later
- Restarting workers might reset it to the default value
- Is it correct to use PRJDIR to enable the cache?
- DONE isotovideo probably doesn't care about this folder -> confirmed, there is no reference to PRJDIR (or even just "prj" case-insensitive) in current os-autoinst
- Remove FIXME's from code
- Remove note about "compatibility for older os-autoinst" versions https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/Worker/Engines/isotovideo.pm if the last version of os-autoinst is from some years ago -> confirmed to be deletable, last references was deleted in "commit 9eb2f564, Author: Marius Kittler mkittler@suse.de, Date: Thu Oct 17 19:05:00 2019 +0200" so old enough
- https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/Worker/Engines/isotovideo.pm#L215
Updated by tinita 10 months ago
There are even differences on the same worker, but in a quick check I could only verify worker38 and 39:
https://openqa.suse.de/tests/13018864#downloads
Assigned worker: worker39:24
"PRJDIR" : "/var/lib/openqa/share",
https://openqa.suse.de/tests/12996694#downloads
Assigned worker: worker39:15
"PRJDIR" : "/var/lib/openqa/cache/openqa.suse.de",
https://openqa.suse.de/tests/12748549#downloads
Assigned worker: worker38:32
"PRJDIR" : "/var/lib/openqa/share",
https://openqa.suse.de/tests/12758728#downloads
Assigned worker: worker38:3
"PRJDIR" : "/var/lib/openqa/cache/openqa.suse.de",
It has nothing to do with the investigation tab, though.
Updated by tinita 10 months ago · Edited
I analyzed the vars.json files of the last days, for worker35 and worker38:
% cd /var/lib/openqa/testresults
% find 130* -name vars.json | xargs grep "WORKER_HOSTNAME.*worker35.oqa.prg2.suse.org" -l | tee ~/workers/worker35-files
% cat ~/workers/worker35-files | xargs grep PRJDIR | grep /var/lib/openqa/share >~/workers/worker35-files.share
% cat ~/workers/worker35-files | xargs grep PRJDIR | grep /var/lib/openqa/cache/openqa.suse.de >~/workers/worker35-files.cache
% wc -l ~/workers/*
2496 /home/tinita/workers/worker35-files
2260 /home/tinita/workers/worker35-files.cache
236 /home/tinita/workers/worker35-files.share
5214 /home/tinita/workers/worker38-files
5035 /home/tinita/workers/worker38-files.cache
179 /home/tinita/workers/worker38-files.share
15420 total
So the distribution is similar - a small part of the tests are using /var/lib/openqa/share
.
Looking at the test ids, the cases are spread over the whole set.
Looking at the worker instances, it happens for all instances.
% cat /home/tinita/workers/worker38-files.share | perl -pwlE's/: .*//' | xargs cat | jq .WORKER_INSTANCE | sort -un
1
2
...
40
%
cat /home/tinita/workers/worker38-files.cache | perl -pwlE's/: .*//' | xargs cat | jq .WORKER_INSTANCE | sort -un
1
2
...
40
Maybe there are certain conditions where the cache service is used or not. Looking at the code...
Updated by tinita 10 months ago
Both tests seem to be using the cache directory though:
https://openqa.suse.de/tests/13010854/logfile?filename=worker-log.txt
[2023-12-09T01:17:22.703328+01:00] [debug] [pid:114119] Linked asset "/var/lib/openqa/cache/openqa.suse.de/SLES-15-SP5-x86_64-mru-install-minimal-with-addons-Build20231208-1-Server-DVD-Updates-64bit.qcow2" to "/var/lib/openqa/pool/26/SLES-15-SP5-x86_64-mru-install-minimal-with-addons-Build20231208-1-Server-DVD-Updates-64bit.qcow2"
https://openqa.suse.de/tests/13018864/logfile?filename=worker-log.txt
[2023-12-11T00:22:04.744680+01:00] [debug] [pid:35080] Linked asset "/var/lib/openqa/cache/openqa.suse.de/SLES-15-SP5-x86_64-mru-install-minimal-with-addons-Build20231210-1-Server-DVD-Updates-64bit.qcow2" to "/var/lib/openqa/pool/24/SLES-15-SP5-x86_64-mru-install-minimal-with-addons-Build20231210-1-Server-DVD-Updates-64bit.qcow2"
The code has a note:
# lib/OpenQA/Worker/Engines/isotovideo.pm
# note: PRJDIR is used as base for relative needle paths by os-autoinst. This is supposed to change
# but for compatibility with current old os-autoinst we need to set PRJDIR for a consistent
# behavior.
I also cannot find an instance of PRJDIR
in os-autoinst. Its usage was removed in 2019: https://github.com/os-autoinst/os-autoinst/commit/9eb2f564b325549227824f4fb054effe66861d4e
That doesn't explain why the content differs, though.
Updated by openqa_review 10 months ago
- Due date set to 2023-12-30
Setting due date based on mean cycle time of SUSE QE Tools
Updated by livdywan 10 months ago
- Subject changed from Why does the investigation diff show PRJDIR value "/var/lib/openqa/cache/openqa.suse.de" vs. "/var/lib/openqa/share" when both should use the cache service? to Why does the investigation diff show PRJDIR value "/var/lib/openqa/cache/openqa.suse.de" vs. "/var/lib/openqa/share" when both should use the cache service? size:M
- Description updated (diff)
Updated by tinita 10 months ago
Some more findings:
I looked into job.json in the pooldir, and it contains PRJDIR most of the times.
I was able to find a running job where PRJDIR was not set at all in job.json: https://openqa.suse.de/tests/13130097
job.json is filled from the job_settings (but PRJDIR is not in the database job settings).
Also regarding PRJDIR and OpenQA::Utils::prjdir():
sub prjdir { ($ENV{OPENQA_BASEDIR} || '/var/lib') . '/openqa' }
It seems to me that prjdir()
should really only be that directory and not the sharedir and also not the cachedir.
e.g. here are some instances of prjdir()
:
pool_directory => prjdir() . "/pool/$instance_number",
my $dblockfile = catfile(prjdir(), 'db', 'db.lock');
my $defaultroot = catdir($prjdir, 'share', 'tests');
So being PRJDIR /var/lib/openqa/cache/... or /var/lib/openqa/share is already confusing and does not match prjdir()
.
Apart from the seemingly random value we should decide what we want the value in vars.json to represent.
Updated by okurz 10 months ago
- Status changed from In Progress to Feedback
https://github.com/os-autoinst/openQA/pull/5408 ready for review
Updated by okurz 9 months ago
- Due date deleted (
2024-01-13) - Status changed from Feedback to Resolved
Both ACs covered and investigation jobs like https://openqa.suse.de/tests/13217106#investigation show no PRJDIR anymore and no other fallout known, good enough to resolve.