Project

General

Profile

Actions

action #152392

closed

Why does the investigation diff show PRJDIR value "/var/lib/openqa/cache/openqa.suse.de" vs. "/var/lib/openqa/share" when both should use the cache service? size:M

Added by okurz 10 months ago. Updated 9 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2023-12-11
Due date:
% Done:

0%

Estimated time:

Description

Observation

https://openqa.suse.de/tests/13018864#investigation shows
in "diff_to_last_good"

-   "PRJDIR" : "/var/lib/openqa/cache/openqa.suse.de",
+   "PRJDIR" : "/var/lib/openqa/share",
…
-   "WORKER_HOSTNAME" : "worker38.oqa.prg2.suse.org",
+   "WORKER_HOSTNAME" : "worker39.oqa.prg2.suse.org",

so a difference in PRJDIR as if worker38 would use the cache service and worker39 would not despite both machines having the cache service enabled.

Acceptence Criteria

  • AC1: It is known and documented what PRJDIR does
  • AC2: Use of the cache service is confirmed to work for reading the test distribution data

Expected result

  • There should be no diff in PRJDIR

Suggestions

Actions #1

Updated by tinita 10 months ago

There are even differences on the same worker, but in a quick check I could only verify worker38 and 39:
https://openqa.suse.de/tests/13018864#downloads

Assigned worker: worker39:24
   "PRJDIR" : "/var/lib/openqa/share",

https://openqa.suse.de/tests/12996694#downloads

Assigned worker: worker39:15
   "PRJDIR" : "/var/lib/openqa/cache/openqa.suse.de",

https://openqa.suse.de/tests/12748549#downloads

Assigned worker: worker38:32
   "PRJDIR" : "/var/lib/openqa/share",

https://openqa.suse.de/tests/12758728#downloads

Assigned worker: worker38:3
   "PRJDIR" : "/var/lib/openqa/cache/openqa.suse.de",

It has nothing to do with the investigation tab, though.

Actions #2

Updated by tinita 10 months ago

  • Status changed from New to In Progress
  • Assignee set to tinita
Actions #3

Updated by tinita 10 months ago · Edited

I analyzed the vars.json files of the last days, for worker35 and worker38:

% cd /var/lib/openqa/testresults
% find 130* -name vars.json | xargs grep "WORKER_HOSTNAME.*worker35.oqa.prg2.suse.org" -l | tee ~/workers/worker35-files
% cat ~/workers/worker35-files | xargs grep PRJDIR | grep /var/lib/openqa/share >~/workers/worker35-files.share
% cat ~/workers/worker35-files | xargs grep PRJDIR | grep /var/lib/openqa/cache/openqa.suse.de >~/workers/worker35-files.cache
% wc -l ~/workers/*
   2496 /home/tinita/workers/worker35-files
   2260 /home/tinita/workers/worker35-files.cache
    236 /home/tinita/workers/worker35-files.share
   5214 /home/tinita/workers/worker38-files
   5035 /home/tinita/workers/worker38-files.cache
    179 /home/tinita/workers/worker38-files.share
  15420 total

So the distribution is similar - a small part of the tests are using /var/lib/openqa/share.
Looking at the test ids, the cases are spread over the whole set.
Looking at the worker instances, it happens for all instances.

% cat /home/tinita/workers/worker38-files.share | perl -pwlE's/: .*//' | xargs cat | jq .WORKER_INSTANCE | sort -un
1
2
...
40
% 
cat /home/tinita/workers/worker38-files.cache | perl -pwlE's/: .*//' | xargs cat | jq .WORKER_INSTANCE | sort -un
1
2
...
40

Maybe there are certain conditions where the cache service is used or not. Looking at the code...

Actions #4

Updated by tinita 10 months ago

Both tests seem to be using the cache directory though:
https://openqa.suse.de/tests/13010854/logfile?filename=worker-log.txt

[2023-12-09T01:17:22.703328+01:00] [debug] [pid:114119] Linked asset "/var/lib/openqa/cache/openqa.suse.de/SLES-15-SP5-x86_64-mru-install-minimal-with-addons-Build20231208-1-Server-DVD-Updates-64bit.qcow2" to "/var/lib/openqa/pool/26/SLES-15-SP5-x86_64-mru-install-minimal-with-addons-Build20231208-1-Server-DVD-Updates-64bit.qcow2"

https://openqa.suse.de/tests/13018864/logfile?filename=worker-log.txt

[2023-12-11T00:22:04.744680+01:00] [debug] [pid:35080] Linked asset "/var/lib/openqa/cache/openqa.suse.de/SLES-15-SP5-x86_64-mru-install-minimal-with-addons-Build20231210-1-Server-DVD-Updates-64bit.qcow2" to "/var/lib/openqa/pool/24/SLES-15-SP5-x86_64-mru-install-minimal-with-addons-Build20231210-1-Server-DVD-Updates-64bit.qcow2"

The code has a note:

# lib/OpenQA/Worker/Engines/isotovideo.pm
     # note: PRJDIR is used as base for relative needle paths by os-autoinst. This is supposed to change
     #       but for compatibility with current old os-autoinst we need to set PRJDIR for a consistent
     #       behavior.

I also cannot find an instance of PRJDIR in os-autoinst. Its usage was removed in 2019: https://github.com/os-autoinst/os-autoinst/commit/9eb2f564b325549227824f4fb054effe66861d4e

That doesn't explain why the content differs, though.

Actions #5

Updated by openqa_review 10 months ago

  • Due date set to 2023-12-30

Setting due date based on mean cycle time of SUSE QE Tools

Actions #6

Updated by okurz 10 months ago

  • Due date changed from 2023-12-30 to 2024-01-13

christmas vacation due date bump :)

Actions #7

Updated by livdywan 10 months ago

  • Subject changed from Why does the investigation diff show PRJDIR value "/var/lib/openqa/cache/openqa.suse.de" vs. "/var/lib/openqa/share" when both should use the cache service? to Why does the investigation diff show PRJDIR value "/var/lib/openqa/cache/openqa.suse.de" vs. "/var/lib/openqa/share" when both should use the cache service? size:M
  • Description updated (diff)
Actions #8

Updated by tinita 10 months ago

Some more findings:
I looked into job.json in the pooldir, and it contains PRJDIR most of the times.
I was able to find a running job where PRJDIR was not set at all in job.json: https://openqa.suse.de/tests/13130097

job.json is filled from the job_settings (but PRJDIR is not in the database job settings).

Also regarding PRJDIR and OpenQA::Utils::prjdir():

sub prjdir { ($ENV{OPENQA_BASEDIR} || '/var/lib') . '/openqa' }

It seems to me that prjdir() should really only be that directory and not the sharedir and also not the cachedir.
e.g. here are some instances of prjdir():

pool_directory => prjdir() . "/pool/$instance_number",
my $dblockfile = catfile(prjdir(), 'db', 'db.lock');
my $defaultroot = catdir($prjdir, 'share', 'tests');

So being PRJDIR /var/lib/openqa/cache/... or /var/lib/openqa/share is already confusing and does not match prjdir().
Apart from the seemingly random value we should decide what we want the value in vars.json to represent.

Actions #10

Updated by tinita 10 months ago

  • Assignee changed from tinita to okurz

Assigning to @okurz

If I have time before vacation I would like to at least log the path to the test code in autoinst-log.txt, so that we can verify if the cache directory was used or not.

Actions #11

Updated by okurz 10 months ago

  • Status changed from In Progress to Feedback
Actions #12

Updated by okurz 10 months ago

merged

Actions #13

Updated by okurz 9 months ago

  • Due date deleted (2024-01-13)
  • Status changed from Feedback to Resolved

Both ACs covered and investigation jobs like https://openqa.suse.de/tests/13217106#investigation show no PRJDIR anymore and no other fallout known, good enough to resolve.

Actions

Also available in: Atom PDF