Why does the investigation diff show PRJDIR value "/var/lib/openqa/cache/" vs. "/var/lib/openqa/share" when both should use the cache service? size:M

Added by okurz about 1 year ago. Updated about 1 year ago.

Observation shows
in "diff_to_last_good"

-   "PRJDIR" : "/var/lib/openqa/cache/",
+   "PRJDIR" : "/var/lib/openqa/share",

so a difference in PRJDIR as if worker38 would use the cache service and worker39 would not despite both machines having the cache service enabled.

Acceptence Criteria

  • AC1: It is known and documented what PRJDIR does
  • AC2: Use of the cache service is confirmed to work for reading the test distribution data

Expected result

  • There should be no diff in PRJDIR


Updated by tinita about 1 year ago

There are even differences on the same worker, but in a quick check I could only verify worker38 and 39:

Assigned worker: worker39:24
   "PRJDIR" : "/var/lib/openqa/share",

Assigned worker: worker39:15
   "PRJDIR" : "/var/lib/openqa/cache/",

Assigned worker: worker38:32
   "PRJDIR" : "/var/lib/openqa/share",

Assigned worker: worker38:3
   "PRJDIR" : "/var/lib/openqa/cache/",

It has nothing to do with the investigation tab, though.

Updated by tinita about 1 year ago

  • Status changed from New to In Progress
  • Assignee set to tinita
Updated by tinita about 1 year ago · Edited

I analyzed the vars.json files of the last days, for worker35 and worker38:

% cd /var/lib/openqa/testresults
% find 130* -name vars.json | xargs grep "WORKER_HOSTNAME.*" -l | tee ~/workers/worker35-files
% cat ~/workers/worker35-files | xargs grep PRJDIR | grep /var/lib/openqa/share >~/workers/worker35-files.share
% cat ~/workers/worker35-files | xargs grep PRJDIR | grep /var/lib/openqa/cache/ >~/workers/worker35-files.cache
% wc -l ~/workers/*
   2496 /home/tinita/workers/worker35-files
   2260 /home/tinita/workers/worker35-files.cache
    236 /home/tinita/workers/worker35-files.share
   5214 /home/tinita/workers/worker38-files
   5035 /home/tinita/workers/worker38-files.cache
    179 /home/tinita/workers/worker38-files.share
  15420 total

So the distribution is similar - a small part of the tests are using /var/lib/openqa/share.
Looking at the test ids, the cases are spread over the whole set.
Looking at the worker instances, it happens for all instances.

% cat /home/tinita/workers/worker38-files.share | perl -pwlE's/: .*//' | xargs cat | jq .WORKER_INSTANCE | sort -un
cat /home/tinita/workers/worker38-files.cache | perl -pwlE's/: .*//' | xargs cat | jq .WORKER_INSTANCE | sort -un

Maybe there are certain conditions where the cache service is used or not. Looking at the code...

Updated by tinita about 1 year ago

Both tests seem to be using the cache directory though:

[2023-12-09T01:17:22.703328+01:00] [debug] [pid:114119] Linked asset "/var/lib/openqa/cache/" to "/var/lib/openqa/pool/26/SLES-15-SP5-x86_64-mru-install-minimal-with-addons-Build20231208-1-Server-DVD-Updates-64bit.qcow2"

[2023-12-11T00:22:04.744680+01:00] [debug] [pid:35080] Linked asset "/var/lib/openqa/cache/" to "/var/lib/openqa/pool/24/SLES-15-SP5-x86_64-mru-install-minimal-with-addons-Build20231210-1-Server-DVD-Updates-64bit.qcow2"

The code has a note:

# lib/OpenQA/Worker/Engines/
     # note: PRJDIR is used as base for relative needle paths by os-autoinst. This is supposed to change
     #       but for compatibility with current old os-autoinst we need to set PRJDIR for a consistent
     #       behavior.

I also cannot find an instance of PRJDIR in os-autoinst. Its usage was removed in 2019:

That doesn't explain why the content differs, though.

Updated by openqa_review about 1 year ago

  • Due date set to 2023-12-30

Setting due date based on mean cycle time of SUSE QE Tools

Updated by okurz about 1 year ago

  • Due date changed from 2023-12-30 to 2024-01-13

christmas vacation due date bump :)

Actions #7

Updated by livdywan about 1 year ago

  • Subject changed from Why does the investigation diff show PRJDIR value "/var/lib/openqa/cache/" vs. "/var/lib/openqa/share" when both should use the cache service? to Why does the investigation diff show PRJDIR value "/var/lib/openqa/cache/" vs. "/var/lib/openqa/share" when both should use the cache service? size:M
  • Description updated (diff)
Updated by tinita about 1 year ago

Some more findings:
I looked into job.json in the pooldir, and it contains PRJDIR most of the times.
I was able to find a running job where PRJDIR was not set at all in job.json:

job.json is filled from the job_settings (but PRJDIR is not in the database job settings).

Also regarding PRJDIR and OpenQA::Utils::prjdir():

sub prjdir { ($ENV{OPENQA_BASEDIR} || '/var/lib') . '/openqa' }

It seems to me that prjdir() should really only be that directory and not the sharedir and also not the cachedir.
e.g. here are some instances of prjdir():

pool_directory => prjdir() . "/pool/$instance_number",
my $dblockfile = catfile(prjdir(), 'db', 'db.lock');
my $defaultroot = catdir($prjdir, 'share', 'tests');

So being PRJDIR /var/lib/openqa/cache/... or /var/lib/openqa/share is already confusing and does not match prjdir().
Apart from the seemingly random value we should decide what we want the value in vars.json to represent.

Updated by tinita about 1 year ago

  • Assignee changed from tinita to okurz

Assigning to @okurz

If I have time before vacation I would like to at least log the path to the test code in autoinst-log.txt, so that we can verify if the cache directory was used or not.

Actions #11

Updated by okurz about 1 year ago

  • Status changed from In Progress to Feedback
Actions #12

Updated by okurz about 1 year ago


Updated by okurz about 1 year ago

  • Due date deleted (2024-01-13)
  • Status changed from Feedback to Resolved

Both ACs covered and investigation jobs like show no PRJDIR anymore and no other fallout known, good enough to resolve.


