Project

General

Profile

action #117262

[alert] failed systemd service: ca-certificates on openqa.suse.de, "p11-kit: couldn't complete writing of file: /var/lib/ca-certificates/ca-bundle.pem.tmp: Unknown error 17"

Added by okurz 4 months ago. Updated 4 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
Start date:
2022-09-27
Due date:
% Done:

0%

Estimated time:

Description

Observation

https://stats.openqa-monitor.qa.suse.de/d/KToPYLEWz/failed-systemd-services failed today . ca-certificates on osd shows:

Sep 27 07:18:52 openqa systemd[1]: Starting Update system wide CA certificates...
Sep 27 07:18:53 openqa update-ca-certificates[7397]: p11-kit: couldn't complete writing of file: /var/lib/ca-certificates/ca-bundle.pem.tmp: Unknown error 17
Sep 27 07:18:53 openqa systemd[1]: ca-certificates.service: Main process exited, code=exited, status=1/FAILURE
Sep 27 07:18:53 openqa systemd[1]: ca-certificates.service: Failed with result 'exit-code'.
Sep 27 07:18:53 openqa systemd[1]: Failed to start Update system wide CA certificates.

A simple restart fixed that

Suggestions

  • Research if we can find something about this error
  • Look into the system log around this time if there was any other related error

Related issues

Related to openQA Infrastructure - action #104172: osd service ca-certificates failed with "p11-kit: couldn't complete writing of file: /var/lib/ca-certificates/ca-bundle.pem.tmp: File exists"Resolved2021-12-20

History

#1 Updated by okurz 4 months ago

  • Related to action #104172: osd service ca-certificates failed with "p11-kit: couldn't complete writing of file: /var/lib/ca-certificates/ca-bundle.pem.tmp: File exists" added

#2 Updated by okurz 4 months ago

  • Status changed from In Progress to Resolved

During a web research I haven't found anything useful other than the our own older ticket #104172 which I linked. And nothing useful in the system log

#3 Updated by nicksinger 4 months ago

The only stuff I could find are two occurrences in the p11-kit source code:
https://github.com/p11-glue/p11-kit/blob/7b1ef9e559e7f7bb2c743abed7688b621cda9f88/trust/save.c#L206-L211
and
https://github.com/p11-glue/p11-kit/blob/7b1ef9e559e7f7bb2c743abed7688b621cda9f88/trust/save.c#L224-L229

As the second one does not pass an errno I'd suspect the first one one fail here. Interestingly enough the code for printing the code should print the errno resolved to a human readable name:
https://github.com/p11-glue/p11-kit/blob/34b568727ff98ebb36f45a3d63c07f165c58219b/common/message.c#L124 (do we miss a proper locale on OSD? Or is it just broken in p11-kit's environment?)

Anyhow, 17 belongs to "EEXIST" (errno -l on OSD maps the codes to their names) which could point to left-overs after the recent crashes we suffered.

#4 Updated by nicksinger 4 months ago

I found https://bugzilla.suse.com/show_bug.cgi?id=1100241 which mentioned that the ca-certificates.service should be disabled on "normal" installations (which is indeed the case on OSD) and found that there is ca-certificates.path triggering the service.
This .path unit monitors several places where "manual" certificates can be deployed and takes care of automatically calling update-ca-certificates if done so. All other certificates which are shipped by packages should call update-ca-certificates in their %post hook. I followed this clue and found two certificates which are monitored by this path-unit on OSD:

/usr/share/pki/trust/ca-certificates-mozila.trust.p11-kit
/usr/share/pki/trust/anchors/SUSE_Trust_Root.crt.pem

belonging to the packages ca-certificates-mozilla and ca-certificates-suse. mozilla coming from SLE15 update repo and suse from the SUSE_CA repo. So one hypothesis is a race-condition between the path-service vs. %post-hook of one of the two packages.
Looking at the journal of ca-certificates.path shows that previously something stopped this watch:

-- Boot 58ce37dfcd7b43578ebac8c0ca8ee2a3 --
Sep 21 17:41:13 openqa systemd[1]: Started Watch for changes in CA certificates.
Sep 25 03:30:15 openqa systemd[1]: ca-certificates.path: Deactivated successfully.
Sep 25 03:30:16 openqa systemd[1]: Stopped Watch for changes in CA certificates.
-- Boot 3a007cbe2d914beeaa138da98e3606c5 --
Sep 25 03:30:56 openqa systemd[1]: Started Watch for changes in CA certificates.
-- Boot 37b8d07bd19743f5b73de54f2d8baa4f --
Sep 26 16:27:48 openqa systemd[1]: Started Watch for changes in CA certificates.
Sep 26 16:51:31 openqa systemd[1]: ca-certificates.path: Deactivated successfully.
Sep 26 16:51:31 openqa systemd[1]: Stopped Watch for changes in CA certificates.
-- Boot 0e3e2adc06df4ad98653780f2955335e --
Sep 26 16:52:16 openqa systemd[1]: Started Watch for changes in CA certificates.

but not since the last boot. I think this is why we see this sporadically.

Possible workarounds/solutions:

  1. make sure ca-certificates.path is disabled
  2. figure out while the two mentioned packages write into that location and not like other certificates (which one, actually?) into the "proper" location

#5 Updated by okurz 4 months ago

  • Status changed from Resolved to New
  • Assignee deleted (okurz)

with the additional information we can work on the mentioned suggestions to improve and prevent further problems.

#6 Updated by okurz 4 months ago

  • Target version changed from Ready to future

Also available in: Atom PDF