Project

General

Profile

Actions

action #158209

open

[Research] Add service check test on migration path from 15SP3 to 15SP5

Added by leli 30 days ago. Updated 12 days ago.

Status:
New
Priority:
Low
Assignee:
-
Target version:
-
Start date:
2024-03-28
Due date:
% Done:

0%

Estimated time:

Description

Motivation

This idea comes from a mail from customer titled 'named.service won't start (permission denied) after upgrade 15SP3-->15SP5', I will paste the content of the mail in comments.
To test named after migration we need add service check, to cover the migration path of 15SP3 to 15SP5, we need add a continuous migration test with service check.

Ex: the current service check for named in regression test online_sles15sp4_pscc_live-basesys-srv-desktop-dev-contm-lgm-tsm-wsm-pcm_all_full for reference.

Acceptance criteria

AC1: Add service check test on migration path from 15SP3 to 15SP5.

Actions #1

Updated by leli 30 days ago

Content of the mail: (named.service won't start (permission denied) after upgrade 15SP3-->15SP5)

Dear research list,

I am preparing to submit a L3 question for help on this, but I figured I'd post to the research list in case someone has a quick answer.  My customer is in a hurry since his primary DNS server is down.

A customer was running named.service on 15 SP3 (in chroot mode) and upgraded to 15 SP5.  After that upgrade, named.service will not start anymore.  But knowing (from release notes) that bind-chrootenv is no longer used in 15 SP4 and above, we also edited /etc/named.conf and changed this from yes to no:

NAMED_RUN_CHROOTED="yes" --> NAMED_RUN_CHROOTED="no"

but that hasn't helped.  

systemctl status reports:

named[10578]: the working directory is not writable
named[10578]: loading configuration: permission denied
named[10578]: exiting (due to fatal error)

I used strace on PID 1 (systemd) to learn a bit more about what was giving "permission denied"  during the start process.  The most pertinent information I found is shown by the following, although I cannot be sure this is the true cause of the above.  But it was the only "permission denied" in the strace.

#  egrep -i "setuid\(|chdir|root\(|permission" systemd-strace.out
<snip>
57771 15:30:02.828432 chdir("/run/systemd/unit-root") = 0 <0.000020>
57771 15:30:02.828567 chroot(".")       = 0 <0.000018>
57771 15:30:02.828620 chdir("/")        = 0 <0.000018>
57771 15:30:02.833209 chdir("/")        = 0 <0.000019>
57771 15:30:02.868446 setuid(44)        = 0 <0.000027>
57773 15:30:03.082538 access(".", W_OK|X_OK) = -1 EACCES (Permission denied) <0.000024>

(57773 is a child of 57772 which is a child of 57771).

So after "chdir("/") and setuid(44) (named user, which named.service runs as), trying to access "." shows permission denied.

At first I wasn't sure if "." represented /run/systemd/unit-root as a chrooted location, or if it represented true / dir.

So we set /run/systemd/unit-root to 777.  No change, still got permission denied on the same spot (user 44 accessing ".").

Then I set true / (root dir) to 777 and tried to start named again.  This time, the error while trying to access "." changes.  It no longer returns permission denied to that call, now it returns EROFS (read only file system), which it not true of the root file system or any other mount on that machine, other than:
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
which is indeed normal to be ro.

When I set root dir back to 755, the strace error goes back to "permission denied" instead of EROFS.  I can take this error back and forth at will, by changing permissions at /.

I also altered "named" user (UID 44) so it could get a bash shell, and then after "su named" I am able to manually perform (or not perform) operations (create files, write to files) at the root dir as expected, in accordance to whichever permissions I have set on "/" at that time. (sometimes 777, sometimes 755).

No apparmor is running.  I also altered named.service file to try to run named as root user instead of named.  That didn't seem to completely take effect, as strace stil revealed that still the process does "setuid(44)" before attempting this access.

Customer also found TID https://www.suse.com/support/kb/doc/?id=000020820  and tried that (even before I got involved in this) but it didn't help.
Actions #2

Updated by JERiveraMoya 12 days ago

  • Priority changed from Normal to Low

We don't have migration in maintenance product yet (we are working on that), but we would not include there for sure complex scenarios with service check.
In the future when we have "basic" migration in maintenance we can consider to add this regression but we are far yet from that.

Actions

Also available in: Atom PDF