Project

General

Profile

Actions

action #125039

open

[security][15sp4][15sp5] test fails in audit_tools

Added by pstivanin about 1 year ago. Updated 9 days ago.

Status:
Blocked
Priority:
Normal
Assignee:
-
Category:
Bugs in existing tests
Target version:
-
Start date:
Due date:
% Done:

50%

Estimated time:
4.00 h
Difficulty:
Tags:

Description

Observation

Happens also on 15-SP5: https://openqa.suse.de/tests/10576653#step/audit_tools/26

openQA test in scenario sle-15-SP4-Server-DVD-Updates-x86_64-cc_audit-test@64bit fails in
audit_tools

Test suite description

Testsuite maintained at https://gitlab.suse.de/qe-security/osd-sle15-security.

Reproducible

Fails since (at least) Build 20230223-1

Expected result

Last good: 20230222-1 (or more recent)

Further details

Always latest result in this scenario: latest


Related issues 1 (0 open1 closed)

Has duplicate openQA auto review - openqa-force-result #136367: [security][15sp4][15sp5] test fails in audit_tools auto_review:"wait_serial.*sysrq.*show.*blocked.*state.: fail[\S\s]*finished audit_tools security/cc":retry:force_result:softfailedResolvedpstivaninActions
Actions #1

Updated by pstivanin about 1 year ago

  • Subject changed from [security][15sp4]]15sp5] test fails in audit_tools to [security][maint][15sp4][devel][15sp5] test fails in audit_tools
Actions #2

Updated by tjyrinki_suse about 1 year ago

  • Tags set to fail
  • Start date deleted (2023-02-24)
  • Estimated time set to 24.00 h

This seems to sometimes pass on 15-SP4 non-uefi: https://openqa.suse.de/tests/10608546

There is no runs on 15-SP4 uefi.

Likewise mostly passing on 15-SP5 non-uefi although sometimes fails: https://openqa.suse.de/tests/10506010#step/audit_tools/28 - notably that was incorrectly soft-failed by a bot with a unremovable comment.

However failing always on 15-SP5 uefi: https://openqa.suse.de/tests/10576653#step/audit_tools/26 (but I see a day ago successful re-run on previous build.

Another 15-SP5 uefi rerun running now at https://openqa.suse.de/tests/10610275

If it's gotten to pass, this ticket is more about how to improve the stableness of the test than there being an actual product problem.

Actions #3

Updated by amanzini about 1 year ago

  • Assignee set to amanzini
Actions #4

Updated by amanzini about 1 year ago

  • Status changed from New to In Progress

First observations

Seems passing on recent builds; tried to reproduce the bug without success, will keep under monitoring in the next days.
My guess is the fixed delay of 2 seconds in the shell scripts used for testing is sometime too short for the service to come up properly.
The script also seems very old (from 2011) and have legacy statements, so my suggestion is to plan a bit of refactoring.

As an example:


#!/bin/bash

# =============================================================================
# (c) Copyright Hewlett-Packard Development Company, L.P., 2011
#
#   This program is free software: you can redistribute it and/or modify
#   it under the terms of version 2 the GNU General Public License as
#   published by the Free Software Foundation.
#   
#   This program is distributed in the hope that it will be useful,
#   but WITHOUT ANY WARRANTY; without even the implied warranty of
#   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#   GNU General Public License for more details.
#   
#   You should have received a copy of the GNU General Public License
#   along with this program.  If not, see <http://www.gnu.org/licenses/>.
#
#  HISTORY:
#    08/2011 created by T.N Santhosh <santhosh.tn@hp.com>
#
# =============================================================================

export AUDIT_SEEK=`stat -c %s /var/log/audit/audit.log`

service auditd restart 63>/dev/null || return 2

# give more time to starting audit service
sleep 2
npid=$(pidof auditd)

auid=$(</proc/$$/loginuid)

if augrok --seek=$AUDIT_SEEK type=DAEMON_END auid=$auid; then
    augrok --seek=$AUDIT_SEEK type=DAEMON_START pid=$npid
else
    exit 1
fi

Actions #6

Updated by amanzini about 1 year ago

seems the audit tools is

  • taking offset of audit.log
  • restarting the service
  • checking the audit.log for some specific string

On some occasion, it can happen that the audit.log is not yet written to disk.
Instead of adding some sleep, my proposal is to sync the file to the filesystem so we are sure we are reading up-to-date information.

Actions #7

Updated by amanzini about 1 year ago

  • Status changed from In Progress to Feedback
Actions #8

Updated by tjyrinki_suse about 1 year ago

  • Status changed from Feedback to Workable

It seems like a great find, feel free to go ahead and merge it. It shouldn't hurt anything and could improve the stability considerably.

Actions #9

Updated by amanzini about 1 year ago

  • Status changed from Workable to Resolved

MR merged, let's keep an eye on tests if the situation improves

Actions #10

Updated by pstivanin about 1 year ago

  • Status changed from Resolved to Feedback
Actions #11

Updated by amanzini about 1 year ago

  • Status changed from Feedback to In Progress

looks like there's a correlation between worker 'kind' and test result:

10796919 worker10:7        SUCCESS
10795289 openqaworker18:11 FAIL
10783953 worker6:24        SUCCESS
10780949 worker9:1         SUCCESS
10780933 openqaworker16:9  FAIL
10779940 openqaworker17:19 FAIL
10759079 worker8:6         SUCCESS
10759066 openqaworker16:20 FAIL
10757771 openqaworker17:8  FAIL
10744094 openqaworker18:10 FAIL
10752921 worker8:16        SUCCESS
10751212 openqaworker18:6  FAIL
10748101 worker5:8         SUCCESS
10744094 openqaworker18:10 FAIL
10744055 openqaworker18:16 FAIL
10744036 openqaworker14:9  FAIL
10742653 openqaworker17:17 FAIL
10737701 worker10:5        SUCCESS
10737677 openqaworker18:10 FAIL
Actions #12

Updated by amanzini about 1 year ago

audit.log from a successful restart test:

type=DAEMON_ROTATE msg=audit(1679920670.833:8269): op=rotate-logs auid=0 pid=7473 subj=unconfined res=success
type=SERVICE_STOP msg=audit(1679920671.616:407): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=unconfined msg='unit=augenrules comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
type=DAEMON_END msg=audit(1679920671.622:8270): op=terminate auid=0 pid=1 subj=unconfined res=success
type=DAEMON_START msg=audit(1679920671.958:5976): op=start ver=3.0.6 format=raw kernel=5.14.21-150400.24.46-default auid=4294967295 pid=7500 uid=0 ses=4294967295 subj=unconfined  res=success
type=SYSCALL msg=audit(1679920671.652:408): arch=c000003e syscall=44 success=yes exit=60 a0=3 a1=7ffd84ae1d10 a2=3c a3=0 items=0 ppid=1 pid=7300 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="auditd" exe="/usr/sbin/auditd" subj=unconfined key=(null)
type=PROCTITLE msg=audit(1679920671.652:408): proctitle="/sbin/auditd"
type=CONFIG_CHANGE msg=audit(1679920671.848:409): op=set audit_enabled=0 old=1 auid=4294967295 ses=4294967295 subj=unconfined res=1
type=SYSCALL msg=audit(1679920671.848:409): arch=c000003e syscall=44 success=yes exit=60 a0=3 a1=7fff6cf061e0 a2=3c a3=0 items=0 ppid=1 pid=7494 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="auditctl" exe="/usr/sbin/auditctl" subj=unconfined key=(null)
type=PROCTITLE msg=audit(1679920671.848:409): proctitle=2F7362696E2F617564697463746C002D52002F6574632F61756469742F61756469742D73746F702E72756C6573
type=CONFIG_CHANGE msg=audit(1679920671.956:410): op=set audit_pid=7500 old=0 auid=4294967295 ses=4294967295 subj=unconfined res=1
type=SYSCALL msg=audit(1679920671.956:410): arch=c000003e syscall=44 success=yes exit=60 a0=4 a1=7ffee1df8640 a2=3c a3=0 items=0 ppid=7499 pid=7500 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="auditd" exe="/usr/sbin/auditd" subj=unconfined key=(null)
type=PROCTITLE msg=audit(1679920671.956:410): proctitle="/sbin/auditd"
type=SERVICE_START msg=audit(1679920671.960:411): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=unconfined msg='unit=auditd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
type=SERVICE_START msg=audit(1679920672.080:412): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=unconfined msg='unit=augenrules comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'

audit.log from a failed restart test:

type=DAEMON_ROTATE msg=audit(1679920530.185:5907): op=rotate-logs auid=0 pid=7474 subj=unconfined res=success
type=DAEMON_END msg=audit(1679920530.714:5908): op=terminate auid=-1 pid=-1 subj=? res=success
type=DAEMON_START msg=audit(1679920530.922:5980): op=start ver=3.0.6 format=raw kernel=5.14.21-150400.24.46-default auid=4294967295 pid=7500 uid=0 ses=4294967295 subj=unconfined  res=success
type=SYSCALL msg=audit(1679920530.743:408): arch=c000003e syscall=44 success=yes exit=60 a0=3 a1=7ffccfbe6cd0 a2=3c a3=0 items=0 ppid=1 pid=7302 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="auditd" exe="/usr/sbin/auditd" subj=unconfined key=(null)
type=PROCTITLE msg=audit(1679920530.743:408): proctitle="/sbin/auditd"
type=CONFIG_CHANGE msg=audit(1679920530.855:409): op=set audit_enabled=0 old=1 auid=4294967295 ses=4294967295 subj=unconfined res=1
type=SYSCALL msg=audit(1679920530.855:409): arch=c000003e syscall=44 success=yes exit=60 a0=3 a1=7fff8a17da00 a2=3c a3=0 items=0 ppid=1 pid=7494 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="auditctl" exe="/usr/sbin/auditctl" subj=unconfined key=(null)
type=PROCTITLE msg=audit(1679920530.855:409): proctitle=2F7362696E2F617564697463746C002D52002F6574632F61756469742F61756469742D73746F702E72756C6573
type=CONFIG_CHANGE msg=audit(1679920530.923:410): op=set audit_pid=7500 old=0 auid=4294967295 ses=4294967295 subj=unconfined res=1
type=SYSCALL msg=audit(1679920530.923:410): arch=c000003e syscall=44 success=yes exit=60 a0=4 a1=7ffe74106410 a2=3c a3=0 items=0 ppid=7499 pid=7500 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="auditd" exe="/usr/sbin/auditd" subj=unconfined key=(null)
type=PROCTITLE msg=audit(1679920530.923:410): proctitle="/sbin/auditd"
type=SERVICE_START msg=audit(1679920530.923:411): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=unconfined msg='unit=auditd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
type=SERVICE_START msg=audit(1679920530.983:412): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=unconfined msg='unit=augenrules comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'

the green test is ran on worker3:18 and DAEMON_END line has auid=0 pid=1
the red test is ran on openqaworker16:11 and DAEMON_END line has auid=-1 pid=-1

Actions #13

Updated by amanzini about 1 year ago

auditd restart itself has a flaky behavior. This is probably due to condition rc<=0 in https://github.com/linux-audit/audit-userspace/blob/7d35e1411cb4cb4b1382c63d47eeafe434885e8d/src/auditd.c#L1008

it's possible to reproduce it with a command line like

# while systemctl restart auditd ; do sleep 10 ; ausearch -m DAEMON_END ; done

even on the same system, some entries have auid=0, while others shows op=terminate auid=-1 pid=-1 subj=? res=success

----
time->Wed Mar 29 09:41:10 2023
type=DAEMON_END msg=audit(1680075670.314:2706): op=terminate auid=0 pid=1 subj=unconfined res=success
----
time->Wed Mar 29 09:41:12 2023
type=DAEMON_END msg=audit(1680075672.600:2706): op=terminate auid=-1 pid=-1 subj=? res=success
----
time->Wed Mar 29 09:41:19 2023
type=DAEMON_END msg=audit(1680075679.931:5907): op=terminate auid=0 pid=1 subj=unconfined res=success
----
time->Wed Mar 29 09:41:20 2023
type=DAEMON_END msg=audit(1680075680.191:4132): op=terminate auid=-1 pid=-1 subj=? res=success
----
time->Wed Mar 29 09:41:22 2023
type=DAEMON_END msg=audit(1680075682.455:4132): op=terminate auid=0 pid=1 subj=unconfined res=success
----
time->Wed Mar 29 09:41:22 2023
type=DAEMON_END msg=audit(1680075682.718:7767): op=terminate auid=0 pid=1 subj=unconfined res=success
----
time->Wed Mar 29 09:41:25 2023
type=DAEMON_END msg=audit(1680075685.000:7767): op=terminate auid=-1 pid=-1 subj=? res=success
----
time->Wed Mar 29 15:46:24 2023
type=DAEMON_END msg=audit(1680097584.466:4473): op=terminate auid=0 pid=1 subj=unconfined res=success
----

Actions #14

Updated by amanzini about 1 year ago

  • Status changed from In Progress to Blocked

opened https://bugzilla.suse.com/show_bug.cgi?id=1209910 since the restart/stop of the auditd daemon does not emits log in a predictable way. This is observable also in 15SP5:


# rpm -qa |grep audit
audit-3.0.6-150400.4.6.1.x86_64
libaudit1-3.0.6-150400.2.13.x86_64
system-group-audit-3.0.6-150400.4.6.1.x86_64

while sleep 10 ; do echo > /var/log/audit/audit.log ; systemctl restart auditd ; grep  DAEMON_END /var/log/audit/audit.log ; done 
type=DAEMON_END msg=audit(1680156053.592:6689): op=terminate auid=-1 pid=-1 subj=? res=success
type=DAEMON_END msg=audit(1680156063.728:7524): op=terminate auid=-1 pid=-1 subj=? res=success
type=DAEMON_END msg=audit(1680156073.869:5670): op=terminate auid=0 pid=1 subj=unconfined res=success
type=DAEMON_END msg=audit(1680156084.008:6781): op=terminate auid=-1 pid=-1 subj=? res=success
type=DAEMON_END msg=audit(1680156094.150:290): op=terminate auid=0 pid=1 subj=unconfined res=success
type=DAEMON_END msg=audit(1680156104.292:9331): op=terminate auid=0 pid=1 subj=unconfined res=success
type=DAEMON_END msg=audit(1680156114.432:5629): op=terminate auid=-1 pid=-1 subj=? res=success
type=DAEMON_END msg=audit(1680156124.572:1169): op=terminate auid=0 pid=1 subj=unconfined res=success
type=DAEMON_END msg=audit(1680156134.711:7201): op=terminate auid=0 pid=1 subj=unconfined res=success
type=DAEMON_END msg=audit(1680156144.851:7932): op=terminate auid=0 pid=1 subj=unconfined res=success
type=DAEMON_END msg=audit(1680156154.993:4264): op=terminate auid=0 pid=1 subj=unconfined res=success
type=DAEMON_END msg=audit(1680156165.133:5529): op=terminate auid=0 pid=1 subj=unconfined res=success
type=DAEMON_END msg=audit(1680156175.273:6438): op=terminate auid=-1 pid=-1 subj=? res=success
type=DAEMON_END msg=audit(1680156185.412:3829): op=terminate auid=0 pid=1 subj=unconfined res=success
type=DAEMON_END msg=audit(1680156195.553:7005): op=terminate auid=-1 pid=-1 subj=? res=success

Actions #15

Updated by openqa_review about 1 year ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: cc_audit-test
https://openqa.suse.de/tests/10926172#step/audit_tools/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.

Actions #16

Updated by pstivanin 12 months ago

Actions #17

Updated by emiler 10 months ago

  • Subject changed from [security][maint][15sp4][devel][15sp5] test fails in audit_tools to [security][15sp4][15sp5] test fails in audit_tools
Actions #18

Updated by emiler 10 months ago

Keeps appearing in 15-SP5 maintenance: https://openqa.suse.de/tests/11386063#next_previous

Actions #19

Updated by amanzini 10 months ago

  • Assignee deleted (amanzini)

unassigning due to squad rotation - Andrea visiting SAP/HANA Perf squad

Actions #20

Updated by pstivanin 10 months ago

  • Priority changed from High to Normal

lowering the prio since it doesn't seem a fix will be available any time soon

Actions #21

Updated by pstivanin 10 months ago

99.9% of the time the test failed when executed on a worker with the "AMD EPYC 7543P 32-Core Processor" cpu, but I guess this is just a weird coincidence, since it's possible to reproduce the bug also locally (e.g. Intel vms)

Actions #22

Updated by openqa_review 9 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: cc_audit-test
https://openqa.suse.de/tests/11072398#step/audit_tools/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.

Actions #23

Updated by openqa_review 8 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: cc_audit-test
https://openqa.suse.de/tests/11607732#step/audit_tools/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 56 days if nothing changes in this ticket.

Actions #24

Updated by tjyrinki_suse 8 months ago

The blocking bug remains open.

Actions #26

Updated by okurz 7 months ago

  • Priority changed from Normal to Urgent

This test failure impacts releases of SLE maintenance updates, e.g. latest occurence https://openqa.suse.de/tests/12241447#step/audit_tools/269
Please handle this issue with urgency with mitigations, e.g. following http://open.qa/docs/#_suggested_workflow_for_test_review or workarounds in test code.
I created #136367 for auto-review+force_result handling

Actions #27

Updated by okurz 7 months ago

  • Copied to openqa-force-result #136367: [security][15sp4][15sp5] test fails in audit_tools auto_review:"wait_serial.*sysrq.*show.*blocked.*state.: fail[\S\s]*finished audit_tools security/cc":retry:force_result:softfailed added
Actions #29

Updated by pstivanin 7 months ago

  • Assignee set to pstivanin
  • Priority changed from Urgent to Normal
Actions #30

Updated by pstivanin 7 months ago

  • % Done changed from 0 to 50
  • Estimated time changed from 24.00 h to 4.00 h
Actions #31

Updated by pstivanin 7 months ago

  • Has duplicate openqa-force-result #136367: [security][15sp4][15sp5] test fails in audit_tools auto_review:"wait_serial.*sysrq.*show.*blocked.*state.: fail[\S\s]*finished audit_tools security/cc":retry:force_result:softfailed added
Actions #32

Updated by pstivanin 7 months ago

  • Copied to deleted (openqa-force-result #136367: [security][15sp4][15sp5] test fails in audit_tools auto_review:"wait_serial.*sysrq.*show.*blocked.*state.: fail[\S\s]*finished audit_tools security/cc":retry:force_result:softfailed)
Actions #33

Updated by openqa_review 6 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: cc_audit-test
https://openqa.suse.de/tests/12419989

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.

Actions #34

Updated by openqa_review 6 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: cc_audit-test
https://openqa.suse.de/tests/12748969

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 48 days if nothing changes in this ticket.

Actions #35

Updated by openqa_review 4 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: cc_audit-test
https://openqa.suse.de/tests/11607732#step/audit_tools/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 96 days if nothing changes in this ticket.

Actions #36

Updated by pstivanin 2 months ago

  • Assignee deleted (pstivanin)
Actions #37

Updated by tjyrinki_suse 9 days ago

Bug still open.

Actions

Also available in: Atom PDF