action #117262
closed
[alert] failed systemd service: ca-certificates on openqa.suse.de, "p11-kit: couldn't complete writing of file: /var/lib/ca-certificates/ca-bundle.pem.tmp: Unknown error 17" size:M
Added by okurz about 2 years ago.
Updated over 1 year ago.
Description
Observation¶
https://stats.openqa-monitor.qa.suse.de/d/KToPYLEWz/failed-systemd-services failed today . ca-certificates on osd shows:
Sep 27 07:18:52 openqa systemd[1]: Starting Update system wide CA certificates...
Sep 27 07:18:53 openqa update-ca-certificates[7397]: p11-kit: couldn't complete writing of file: /var/lib/ca-certificates/ca-bundle.pem.tmp: Unknown error 17
Sep 27 07:18:53 openqa systemd[1]: ca-certificates.service: Main process exited, code=exited, status=1/FAILURE
Sep 27 07:18:53 openqa systemd[1]: ca-certificates.service: Failed with result 'exit-code'.
Sep 27 07:18:53 openqa systemd[1]: Failed to start Update system wide CA certificates.
A simple restart fixed that
Suggestions¶
- Related to action #104172: osd service ca-certificates failed with "p11-kit: couldn't complete writing of file: /var/lib/ca-certificates/ca-bundle.pem.tmp: File exists" added
- Status changed from In Progress to Resolved
During a web research I haven't found anything useful other than the our own older ticket #104172 which I linked. And nothing useful in the system log
I found https://bugzilla.suse.com/show_bug.cgi?id=1100241 which mentioned that the ca-certificates.service should be disabled on "normal" installations (which is indeed the case on OSD) and found that there is ca-certificates.path triggering the service.
This .path unit monitors several places where "manual" certificates can be deployed and takes care of automatically calling update-ca-certificates
if done so. All other certificates which are shipped by packages should call update-ca-certificates
in their %post hook. I followed this clue and found two certificates which are monitored by this path-unit on OSD:
/usr/share/pki/trust/ca-certificates-mozila.trust.p11-kit
/usr/share/pki/trust/anchors/SUSE_Trust_Root.crt.pem
belonging to the packages ca-certificates-mozilla and ca-certificates-suse. mozilla
coming from SLE15 update repo and suse
from the SUSE_CA repo. So one hypothesis is a race-condition between the path-service vs. %post-hook of one of the two packages.
Looking at the journal of ca-certificates.path shows that previously something stopped this watch:
-- Boot 58ce37dfcd7b43578ebac8c0ca8ee2a3 --
Sep 21 17:41:13 openqa systemd[1]: Started Watch for changes in CA certificates.
Sep 25 03:30:15 openqa systemd[1]: ca-certificates.path: Deactivated successfully.
Sep 25 03:30:16 openqa systemd[1]: Stopped Watch for changes in CA certificates.
-- Boot 3a007cbe2d914beeaa138da98e3606c5 --
Sep 25 03:30:56 openqa systemd[1]: Started Watch for changes in CA certificates.
-- Boot 37b8d07bd19743f5b73de54f2d8baa4f --
Sep 26 16:27:48 openqa systemd[1]: Started Watch for changes in CA certificates.
Sep 26 16:51:31 openqa systemd[1]: ca-certificates.path: Deactivated successfully.
Sep 26 16:51:31 openqa systemd[1]: Stopped Watch for changes in CA certificates.
-- Boot 0e3e2adc06df4ad98653780f2955335e --
Sep 26 16:52:16 openqa systemd[1]: Started Watch for changes in CA certificates.
but not since the last boot. I think this is why we see this sporadically.
Possible workarounds/solutions:
- make sure
ca-certificates.path
is disabled
- figure out while the two mentioned packages write into that location and not like other certificates (which one, actually?) into the "proper" location
- Status changed from Resolved to New
- Assignee deleted (
okurz)
with the additional information we can work on the mentioned suggestions to improve and prevent further problems.
- Target version changed from Ready to future
- Related to action #131096: [alert] Service `ca-certificates` can fail size:M added
- Target version changed from future to Ready
- Subject changed from [alert] failed systemd service: ca-certificates on openqa.suse.de, "p11-kit: couldn't complete writing of file: /var/lib/ca-certificates/ca-bundle.pem.tmp: Unknown error 17" to [alert] failed systemd service: ca-certificates on openqa.suse.de, "p11-kit: couldn't complete writing of file: /var/lib/ca-certificates/ca-bundle.pem.tmp: Unknown error 17" size:M
- Description updated (diff)
- Status changed from New to Workable
- Status changed from Workable to In Progress
- Assignee set to nicksinger
- Due date set to 2023-07-06
Setting due date based on mean cycle time of SUSE QE Tools
- Due date deleted (
2023-07-06)
- Status changed from In Progress to Workable
- Assignee deleted (
nicksinger)
As discussed in the weekly unassigning and leaving for others to do. The next suggestion still holds: Just report an upstream issue and see if anybody has an idea.
I'm not 100 % sure whether #131096 is a duplicate (as the ticket description suggests). This ticket is about:
Sep 27 07:18:53 openqa update-ca-certificates[7397]: p11-kit: couldn't complete writing of file: /var/lib/ca-certificates/ca-bundle.pem.tmp: Unknown error 17
and the other ticket about:
Jun 18 03:01:49 schort-server update-ca-certificates[29527]: mv: cannot stat '/var/lib/ca-certificates/ca-bundle.pem.new': No such file or directory
There are no problematic scripts in /etc/ca-certificates/update.d
on those hosts.
The failing mv
might be in /usr/lib/ca-certificates/update.d/99certbundle.run
which I had a look at on schort-server:
set -e
cafile="/var/lib/ca-certificates/ca-bundle.pem"
cadir="/var/lib/ca-certificates/pem"
…
trust extract --format=pem-bundle --purpose=server-auth --filter=ca-anchors $cafile.tmp
cat $cafile.tmp >> $cafile.new
rm -f $cafile.tmp
mv "$cafile.new" "$cafile"
The other scripts in that directory don't have a mv
command that would produce this error message. However, I'm also still wondering how it can happen in 99certbundle.run
because it looks like the script will either fail earlier or succeed. Maybe something else did something to the file in the background (like a 2nd instance of that script running in parallel)? That would be in-line with the hypothesis @nsinger stated in #117262#note-4.
About the p11-kit
issue: I'm not even sure when that would happen. I didn't find an invocation in any of the scripts and also nothing in the ca-certificates Git repo except comments/documentation.
- Status changed from Workable to Feedback
I haven't found an upstream bug so I've just created a new one: https://github.com/openSUSE/ca-certificates/issues/20
Not sure whether it makes sense to try to create this on our own. At least the p11-kit part is a bit strange to me and maybe upstream has a better idea how to fix this.
As discussed in weekly: We have the upstream report. Now we should implement a workaround. Just in the systemd service implement a restart. Likely something like
systemctl edit ca-certificates
and then add a ticket reference and a
Restart=on-failure
or something along the lines. Should be easy to do in salt based on "override.conf" examples we already have.
- Status changed from Feedback to In Progress
- Status changed from In Progress to Feedback
- Status changed from Feedback to Resolved
Also available in: Atom
PDF