Content of the mail: (named.service won't start (permission denied) after upgrade 15SP3-->15SP5)
Dear research list,
I am preparing to submit a L3 question for help on this, but I figured I'd post to the research list in case someone has a quick answer. My customer is in a hurry since his primary DNS server is down.
A customer was running named.service on 15 SP3 (in chroot mode) and upgraded to 15 SP5. After that upgrade, named.service will not start anymore. But knowing (from release notes) that bind-chrootenv is no longer used in 15 SP4 and above, we also edited /etc/named.conf and changed this from yes to no:
NAMED_RUN_CHROOTED="yes" --> NAMED_RUN_CHROOTED="no"
but that hasn't helped.
systemctl status reports:
named[10578]: the working directory is not writable
named[10578]: loading configuration: permission denied
named[10578]: exiting (due to fatal error)
I used strace on PID 1 (systemd) to learn a bit more about what was giving "permission denied" during the start process. The most pertinent information I found is shown by the following, although I cannot be sure this is the true cause of the above. But it was the only "permission denied" in the strace.
# egrep -i "setuid\(|chdir|root\(|permission" systemd-strace.out
<snip>
57771 15:30:02.828432 chdir("/run/systemd/unit-root") = 0 <0.000020>
57771 15:30:02.828567 chroot(".") = 0 <0.000018>
57771 15:30:02.828620 chdir("/") = 0 <0.000018>
57771 15:30:02.833209 chdir("/") = 0 <0.000019>
57771 15:30:02.868446 setuid(44) = 0 <0.000027>
57773 15:30:03.082538 access(".", W_OK|X_OK) = -1 EACCES (Permission denied) <0.000024>
(57773 is a child of 57772 which is a child of 57771).
So after "chdir("/") and setuid(44) (named user, which named.service runs as), trying to access "." shows permission denied.
At first I wasn't sure if "." represented /run/systemd/unit-root as a chrooted location, or if it represented true / dir.
So we set /run/systemd/unit-root to 777. No change, still got permission denied on the same spot (user 44 accessing ".").
Then I set true / (root dir) to 777 and tried to start named again. This time, the error while trying to access "." changes. It no longer returns permission denied to that call, now it returns EROFS (read only file system), which it not true of the root file system or any other mount on that machine, other than:
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
which is indeed normal to be ro.
When I set root dir back to 755, the strace error goes back to "permission denied" instead of EROFS. I can take this error back and forth at will, by changing permissions at /.
I also altered "named" user (UID 44) so it could get a bash shell, and then after "su named" I am able to manually perform (or not perform) operations (create files, write to files) at the root dir as expected, in accordance to whichever permissions I have set on "/" at that time. (sometimes 777, sometimes 755).
No apparmor is running. I also altered named.service file to try to run named as root user instead of named. That didn't seem to completely take effect, as strace stil revealed that still the process does "setuid(44)" before attempting this access.
Customer also found TID https://www.suse.com/support/kb/doc/?id=000020820 and tried that (even before I got involved in this) but it didn't help.