action #65450: workers on o3 power did not restart after upgrade as NFS mount point was stale "Ignoring host 'http://openqa1-opensuse': Working directory does not exist" - openQA Project (public) - openSUSE Project Management Tool

Actions

Copy link

action #65450

open

workers on o3 power did not restart after upgrade as NFS mount point was stale "Ignoring host 'http://openqa1-opensuse': Working directory does not exist"

Added by okurz about 5 years ago. Updated about 5 years ago.

Status:

Workable

Priority:

Low

Assignee:

Category:

Feature requests

Target version:

QA (public) - future

Start date:

2020-04-08

Due date:

% Done:

Estimated time:

Description

Observation¶

After upgrade with zypper dup the worker instances on power8 refused to start with a confusing error message "Ignoring host 'http://openqa1-opensuse': Working directory does not exist".

Only after trying to start workers manually with strace I could find out what is wrong:

[info] [pid:69150] worker 1:
 - config file:           /etc/openqa/workers.ini
 - worker hostname:       power8
 - isotovideo version:    0
 - websocket API version: 1
 - web UI hosts:          http://openqa1-opensuse
 - class:                 qemu_ppc64le,qemu_ppc64,qemu_ppc,heavyload
 - no cleanup:            no
 - pool directory:        /var/lib/openqa/pool/1
stat("/var/lib/empty/.config/openqa/client.conf", 0x100174904f0) = -1 ENOENT (No such file or directory)
stat("/etc/openqa/client.conf", {st_mode=S_IFREG|0400, st_size=166, ...}) = 0
stat("/var/lib/openqa/pool/1", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat("/var/lib/openqa/cache/openqa1-opensuse", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
[info] [pid:69150] CACHE: caching is enabled, setting up /var/lib/openqa/cache/openqa1-opensuse
stat("/var/lib/openqa/cache/tmp", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat("/var/lib/openqa/share", 0x100174904f0) = -1 ESTALE (Stale file handle)
[debug] [pid:69150] Found possible working directory for http://openqa1-opensuse: /var/lib/openqa/share
[error] [pid:69150] Ignoring host 'http://openqa1-opensuse': Working directory does not exist.
+++ exited with 0 +++

After I changed the storage setup on o3, on power8 which is less often rebooted then the other machines, the old mount point which vanished was still exported as a stale NFS mount on power8.

I don't know yet what the cache code wants to do with /var/lib/openqa/share which is only provided for old compatibility with tests relying on that path and not using the cache properly. But it seems we are also checking it. After I unmounted the worker started up just fine so we do not really need it.

Suggestions¶

Check why we even read this path
Improve error messages to show without --verbose what is wrong
Potentially improve to not even rely on this path

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by okurz about 5 years ago

Status changed from Workable to Feedback
Assignee set to okurz

The least I can do is improve the error message: https://github.com/os-autoinst/openQA/pull/2922

Actions

Copy link

Updated by okurz about 5 years ago

Category changed from Regressions/Crashes to Feature requests
Status changed from Feedback to Workable
Assignee deleted (~~okurz~~)

ok, I tried and I failed. I struggle to understand the logic right now so I closed my PR and hope someone else can cover this more easily. What I suggest to do: Every log message of a higher log message should have complete information and not rely on e.g. debug messages to provide context. Hence we need to output which working directory was not usable for a worker.

Actions

Copy link

Updated by okurz almost 2 years ago

Related to action #127754: osd nfs-server needed to be restarted but we got no alerts size:M added

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public)

Tags

Custom queries

action #65450

workers on o3 power did not restart after upgrade as NFS mount point was stale "Ignoring host 'http://openqa1-opensuse': Working directory does not exist"

Observation¶

Suggestions¶

Updated by okurz about 5 years ago

Updated by okurz about 5 years ago

Updated by okurz almost 2 years ago