Project

General

Profile

Actions

action #65477

closed

[qe-core][sles][functional][sporadic] test fails in clamav - start clamd', timeout => 400

Added by zluo about 4 years ago. Updated over 1 year ago.

Status:
Rejected
Priority:
Normal
Assignee:
Category:
Bugs in existing tests
Target version:
SUSE QA - Milestone 30
Start date:
2020-04-09
Due date:
% Done:

0%

Estimated time:
42.00 h
Difficulty:

Description

Observation

openQA test in scenario sle-15-SP2-Online-aarch64-extra_tests_textmode@aarch64 fails in
clamav

Apr 08 22:47:22 susetest freshclam[11579]: ClamAV update process started at Wed Apr  8 22:47:22 2020
Apr 08 22:47:22 susetest freshclam[11579]: Your ClamAV installation is OUTDATED!
Apr 08 22:47:22 susetest freshclam[11579]: Local version: 0.100.3 Recommended version: 0.102.2
Apr 08 22:47:22 susetest freshclam[11579]: DON'T PANIC! Read https://www.clamav.net/documents/upgrading-clamav
Apr 08 22:47:54 susetest freshclam[11579]: Downloading main.cvd [100%]
Apr 08 22:48:20 susetest freshclam[11579]: main.cvd updated (version: 59, sigs: 4564902, f-level: 60, builder: sigmgr)
Apr 08 22:48:21 susetest freshclam[11579]: Can't query main.59.93.1.0.6810DB54.ping.clamav.net
Apr 08 22:48:39 susetest freshclam[11579]: Downloading daily.cvd [100%]
Apr 08 22:48:50 susetest wickedd-dhcp6[1037]: eth0: DHCPv6 is disabled by IPv6 router RA
Apr 08 22:48:50 susetest wickedd-dhcp6[1037]: eth0: DHCPv6 is disabled by IPv6 router RA
Apr 08 22:50:01 susetest CRON[11584]: (root) CMD ([ -x /usr/lib64/sa/sa1 ] && exec /usr/lib64/sa/sa1 1 1)
Apr 08 22:57:25 susetest wickedd-dhcp6[1037]: eth0: DHCPv6 is disabled by IPv6 router RA
Apr 08 22:57:25 susetest wickedd-dhcp6[1037]: eth0: DHCPv6 is disabled by IPv6 router RA
Apr 08 22:57:41 susetest freshclam[11579]: daily.cvd updated (version: 25776, sigs: 2251437, f-level: 63, builder: raynman)
Apr 08 22:57:41 susetest freshclam[11579]: Can't query daily.25776.93.1.0.6810DB54.ping.clamav.net
Apr 08 22:57:42 susetest freshclam[11579]: Downloading bytecode.cvd [100%]
Apr 08 22:57:42 susetest freshclam[11579]: bytecode.cvd updated (version: 331, sigs: 94, f-level: 63, builder: anvilleg)
Apr 08 22:57:43 susetest freshclam[11579]: Can't query bytecode.331.93.1.0.6810DB54.ping.clamav.net
Apr 08 22:57:51 susetest freshclam[11579]: Database updated (6816433 signatures) from database.clamav.net (IP: 104.16.219.84)
Apr 08 22:57:51 susetest freshclam[11579]: Clamd was NOT notified: Can't connect to clamd through /var/run/clamav/clamd-socket: No such file or directory
Apr 08 22:58:28 susetest systemd[1]: Reloading.
Apr 08 22:58:41 susetest systemd[1]: Starting Freshclam virus definitions downloader...
Apr 08 22:58:41 susetest freshclam[11621]: freshclam daemon 0.100.3 (OS: linux-gnu, ARCH: aarch64, CPU: aarch64)
Apr 08 22:58:41 susetest freshclam[11621]: ClamAV update process started at Wed Apr  8 22:58:41 2020
Apr 08 22:58:41 susetest freshclam[11621]: Your ClamAV installation is OUTDATED!
Apr 08 22:58:41 susetest freshclam[11621]: Local version: 0.100.3 Recommended version: 0.102.2
Apr 08 22:58:41 susetest freshclam[11621]: DON'T PANIC! Read https://www.clamav.net/documents/upgrading-clamav
Apr 08 22:58:41 susetest freshclam[11621]: main.cvd is up to date (version: 59, sigs: 4564902, f-level: 60, builder: sigmgr)
Apr 08 22:58:41 susetest freshclam[11621]: daily.cvd is up to date (version: 25776, sigs: 2251437, f-level: 63, builder: raynman)
Apr 08 22:58:41 susetest freshclam[11621]: bytecode.cvd is up to date (version: 331, sigs: 94, f-level: 63, builder: anvilleg)
Apr 08 22:58:41 susetest freshclam[11621]: --------------------------------------
Apr 08 22:58:42 susetest systemd[1]: Started Freshclam virus definitions downloader.
Apr 08 22:58:42 susetest systemd[1]: Starting Clamav antivirus Deamon...
Apr 08 22:58:42 susetest clamd[11622]: Received 0 file descriptor(s) from systemd.
Apr 08 22:58:42 susetest clamd[11622]: clamd daemon 0.100.3 (OS: linux-gnu, ARCH: aarch64, CPU: aarch64)
Apr 08 22:58:42 susetest clamd[11622]: Running as user root (UID 0, GID 0)
Apr 08 22:58:42 susetest clamd[11622]: Log file size limited to 1048576 bytes.
Apr 08 22:58:42 susetest clamd[11622]: Reading databases from /var/lib/clamav
Apr 08 22:58:42 susetest clamd[11622]: Not loading PUA signatures.
Apr 08 22:58:42 susetest clamd[11622]: Bytecode: Security mode set to "TrustSigned".
Apr 08 23:00:01 susetest CRON[11625]: (root) CMD ([ -x /usr/lib64/sa/sa1 ] && exec /usr/lib64/sa/sa1 1 1)
Apr 08 23:00:50 susetest wickedd-dhcp6[1037]: eth0: DHCPv6 is disabled by IPv6 router RA
Apr 08 23:00:50 susetest wickedd-dhcp6[1037]: eth0: DHCPv6 is disabled by IPv6 router RA
Apr 08 23:05:22 susetest systemd[1]: clamd.service: Start operation timed out. Terminating.
Apr 08 23:05:22 susetest systemd[1]: Failed to start Clamav antivirus Deamon.
Apr 08 23:05:22 susetest systemd[1]: clamd.service: Unit entered failed state.
Apr 08 23:05:22 susetest systemd[1]: clamd.service: Failed with result 'timeout'.

Suggestion

  • Get more logs and file a product bug.

Reproducible

Fails since (at least) Build 161.1

Expected result

Last good: (unknown) (or more recent)

Further details

Always latest result in this scenario: latest

Investigations

this could be an issue of performance on aarch64, but 400 sec is still not enough?

Actions #1

Updated by SLindoMansilla about 4 years ago

  • Description updated (diff)
  • Status changed from New to Workable
  • Target version set to Milestone 30
  • Estimated time set to 42.00 h
Actions #2

Updated by zluo about 4 years ago

  • Status changed from Workable to In Progress
  • Assignee set to zluo

take over

Actions #3

Updated by zluo about 4 years ago

clamd.service could not be started. try to collect statics again latest build.
and increase timeout for aarch64 and collect statics as well.

Actions #5

Updated by zluo almost 4 years ago

https://openqa.suse.de/tests/4158606#next_previous shows clearly that we have sporadic issue with 'start clamd'.

all failed cases belong to openqaworker-arm-1. workers on openqaworker-arm-3 are stable, no failure at all.

Actions #6

Updated by zluo almost 4 years ago

boot_to_desktop has sometimes issue like: http://f40.suse.de/tests/7507#

rogue workqueue lockup bsc#1126782 - Serial error: [ 41.596623] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 32s!

or incomplete status:

http://f40.suse.de/tests/7517#

Actions #7

Updated by zluo almost 4 years ago

add 'systemctl status clamd' to get more information at first and see what kind of issue could blocks starting clamd:

http://f40.suse.de/tests/7545#step/clamav/32

Actions #9

Updated by zluo almost 4 years ago

Apr 24 10:06:55 susetest systemd[1]: Starting Clamav antivirus Deamon...
Apr 24 10:06:55 susetest clamd[15757]: Received 0 file descriptor(s) from systemd.
Apr 24 10:06:55 susetest clamd[15757]: clamd daemon 0.100.3 (OS: linux-gnu, ARCH: aarch64, CPU: aarch64)
Apr 24 10:06:55 susetest clamd[15757]: Running as user root (UID 0, GID 0)
Apr 24 10:06:55 susetest clamd[15757]: Log file size limited to 1048576 bytes.
Apr 24 10:06:55 susetest clamd[15757]: Reading databases from /var/lib/clamav
Apr 24 10:06:55 susetest clamd[15757]: Not loading PUA signatures.
Apr 24 10:06:55 susetest clamd[15757]: Bytecode: Security mode set to "TrustSigned".
Apr 24 10:10:01 susetest CRON[15760]: (root) CMD ([ -x /usr/lib64/sa/sa1 ] && exec /usr/lib64/sa/sa1 1 1)
Apr 24 10:11:54 susetest clamd[15757]: Loaded 6838621 signatures.
Apr 24 10:12:30 susetest wickedd-dhcp6[1040]: eth0: DHCPv6 is disabled by IPv6 router RA
Apr 24 10:12:30 susetest wickedd-dhcp6[1040]: eth0: DHCPv6 is disabled by IPv6 router RA
Apr 24 10:12:35 susetest clamd[15757]: TCP: Bound to [127.0.0.1]:3310
Apr 24 10:12:35 susetest clamd[15757]: TCP: Setting connection queue length to 200
Apr 24 10:12:35 susetest clamd[15757]: LOCAL: Unix socket file /var/run/clamav/clamd-socket
Apr 24 10:12:35 susetest clamd[15757]: LOCAL: Setting connection queue length to 200
Apr 24 10:12:35 susetest clamd[15944]: Limits: Global time limit set to 120000 milliseconds.
Apr 24 10:12:35 susetest clamd[15944]: Limits: Global size limit set to 104857600 bytes.
Apr 24 10:12:35 susetest clamd[15944]: Limits: File size limit set to 26214400 bytes.
Apr 24 10:12:35 susetest clamd[15944]: Limits: Recursion level limit set to 16.
Apr 24 10:12:35 susetest clamd[15944]: Limits: Files limit set to 10000.
Apr 24 10:12:35 susetest clamd[15944]: Limits: MaxEmbeddedPE limit set to 10485760 bytes.
Apr 24 10:12:35 susetest clamd[15944]: Limits: MaxHTMLNormalize limit set to 10485760 bytes.
Apr 24 10:12:35 susetest clamd[15944]: Limits: MaxHTMLNoTags limit set to 2097152 bytes.
Apr 24 10:12:35 susetest clamd[15944]: Limits: MaxScriptNormalize limit set to 5242880 bytes.
Apr 24 10:12:35 susetest clamd[15944]: Limits: MaxZipTypeRcg limit set to 1048576 bytes.
Apr 24 10:12:35 susetest clamd[15944]: Limits: MaxPartitions limit set to 50.
Apr 24 10:12:35 susetest clamd[15944]: Limits: MaxIconsPE limit set to 100.
Apr 24 10:12:35 susetest clamd[15944]: Limits: MaxRecHWP3 limit set to 16.
Apr 24 10:12:35 susetest clamd[15944]: Limits: PCREMatchLimit limit set to 100000.
Apr 24 10:12:35 susetest clamd[15944]: Limits: PCRERecMatchLimit limit set to 5000.
Apr 24 10:12:35 susetest clamd[15944]: Limits: PCREMaxFileSize limit set to 26214400.
Apr 24 10:12:35 susetest clamd[15944]: Archive support enabled.
Apr 24 10:12:35 susetest clamd[15944]: BlockMax heuristic detection disabled.
Apr 24 10:12:35 susetest clamd[15944]: Algorithmic detection enabled.
Apr 24 10:12:35 susetest clamd[15944]: Portable Executable support enabled.
Apr 24 10:12:35 susetest clamd[15944]: ELF support enabled.
Apr 24 10:12:35 susetest clamd[15944]: Mail files support enabled.
Apr 24 10:12:35 susetest clamd[15944]: OLE2 support enabled.
Apr 24 10:12:35 susetest clamd[15944]: PDF support enabled.
Apr 24 10:12:35 susetest clamd[15944]: SWF support enabled.
Apr 24 10:12:35 susetest clamd[15944]: HTML support enabled.
Apr 24 10:12:35 susetest clamd[15944]: XMLDOCS support enabled.
Apr 24 10:12:35 susetest clamd[15944]: HWP3 support enabled.
Apr 24 10:12:35 susetest clamd[15944]: Self checking every 600 seconds.
Apr 24 10:13:35 susetest systemd[1]: clamd.service: Start operation timed out. Terminating.
Apr 24 10:14:27 susetest rsyslogd[1030]: -- MARK --

found in
https://openqa.suse.de/tests/4161390/file/clamav-journal.log

Actions #10

Updated by zluo almost 4 years ago

http://f40.suse.de/tests/7623#next_previous shows no single failure at all.

Actions #11

Updated by zluo almost 4 years ago

https://openqa.suse.de/tests/4167250#step/clamav/10 shows even network issue, freshclam couldn't finish in 700 seconds...

Actions #13

Updated by zluo almost 4 years ago

Actions #14

Updated by zluo almost 4 years ago

http://f40.suse.de/tests/7647#step/boot_to_desktop/8

rogue workqueue lockup bsc#1126782 - Serial error: [ 231.471267] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 151s!

suddenly all tests failed at boot_to_desktop

Actions #15

Updated by zluo almost 4 years ago

https://openqa.suse.de/tests/4177938 new try because old qcow2 got removed so that my test runs got failed...

Actions #16

Updated by zluo almost 4 years ago

https://openqa.suse.de/tests/4179769 failed even when I assign QEMURAM=4096, QEMUCPUS=4

[2020-04-29T12:37:56.236 UTC] [debug] starting: /usr/bin/qemu-system-aarch64 -device virtio-gpu-pci -only-migratable -chardev ringbuf,id=serial0,logfile=serial0,logappend=on -serial chardev:serial0 -soundhw hda -m 4096 -machine virt,usb=off,gic-version=3,its=off -cpu host -netdev user,id=qanet0 -device virtio-net,netdev=qanet0,mac=52:54:00:12:34:56 -boot menu=on,splash-time=5000 -device nec-usb-xhci -device usb-tablet -device usb-kbd -smp 4 -enable-kvm -no-shutdown -vnc :114,share=force-shared -device virtio-serial -chardev pipe,id=virtio_console,path=virtio_console,logfile=virtio_console.log,logappend=on -device virtconsole,chardev=virtio_console,name=org.openqa.console.virtio_console -chardev socket,path=qmp_socket,server,nowait,id=qmp_socket,logfile=qmp_socket.log,logappend=on -qmp chardev:qmp_socket -S -device virtio-scsi-pci,id=scsi0 -blockdev driver=file,node-name=hd0-overlay0-file,filename=/var/lib/openqa/pool/24/raid/hd0-overlay0,cache.no-flush=on -blockdev driver=qcow2,node-name=hd0-overlay0,file=hd0-overlay0-file,cache.no-flush=on -device virtio-blk-device,id=hd0-device,drive=hd0-overlay0,bootindex=0,serial=hd0 -blockdev driver=file,node-name=hd1-file,filename=/var/lib/openqa/pool/24/raid/hd1,cache.no-flush=on -blockdev driver=qcow2,node-name=hd1,file=hd1-file,cache.no-flush=on -device virtio-blk-device,id=hd1-device,drive=hd1,serial=hd1 -blockdev driver=file,node-name=cd0-overlay0-file,filename=/var/lib/openqa/pool/24/raid/cd0-overlay0,cache.no-flush=on -blockdev driver=qcow2,node-name=cd0-overlay0,file=cd0-overlay0-file,cache.no-flush=on -device scsi-cd,id=cd0-device,drive=cd0-overlay0,serial=cd0 -drive id=pflash-code-overlay0,if=pflash,file=/var/lib/openqa/pool/24/raid/pflash-code-overlay0,unit=0,readonly=on -drive id=pflash-vars-overlay0,if=pflash,file=/var/lib/openqa/pool/24/raid/pflash-vars-overlay0,unit=1

but it works on seattle:
http://f40.suse.de/tests/7652

Actions #17

Updated by zluo almost 4 years ago

https://bugzilla.suse.com/show_bug.cgi?id=1170825 for record and hope developer can take a look at it and provide feedback on this issue.

Actions #18

Updated by zluo almost 4 years ago

workers on openqaworker-arm-2 looks bad while workers openqaworker-arm-1 still perform good.

it looks we have problem with openqaworker-arm-2 on osd.

Actions #19

Updated by zluo almost 4 years ago

https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/241/diffs

this is clearly now. I will check that this issue later on osd.

Actions #20

Updated by okurz almost 4 years ago

For additional information: With https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/241 I reduced the number of worker instances for openqaworker-arm-2 from 30 to 20 which might help with stabilization. Please keep in mind that I also increased the number of worker instances on openqaworker-arm-1 and openqaworker-arm-2 as we have a minimum need of available worker instances to be able to complete our regular test schedule. If you find good indications backed by proper statistics we can also change worker instance numbers again. The "clamav" test seems to be a very good test because the systemd service times out internally so not "test instability" on the top level. For any statistical evaluation please keep in mind to provide sensible numbers, e.g. fail-ratio, mean value, standard deviation, and also counter-experiments, e.g. compare to stability on x86_64 or to a local, stable environment.

Actions #22

Updated by SLindoMansilla almost 4 years ago

  • Assignee changed from zluo to SLindoMansilla

Taking over. We need to address hardware impediments.

Actions #23

Updated by okurz almost 4 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: extra_tests_textmode
https://openqa.suse.de/tests/4246796

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed
Actions #25

Updated by okurz almost 4 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: extra_tests_textmode
https://openqa.suse.de/tests/4308011

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed
Actions #26

Updated by okurz almost 4 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: extra_tests_textmode
https://openqa.suse.de/tests/4341542

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed
Actions #27

Updated by okurz almost 4 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: extra_tests_textmode
https://openqa.suse.de/tests/4341542

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed
Actions #28

Updated by okurz almost 4 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: extra_tests_textmode
https://openqa.suse.de/tests/4341542

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed
Actions #29

Updated by okurz over 3 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: extra_tests_textmode
https://openqa.suse.de/tests/4341542

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed
Actions #30

Updated by SLindoMansilla over 3 years ago

  • Status changed from In Progress to Workable
Actions #31

Updated by tjyrinki_suse over 3 years ago

  • Subject changed from [sles][functional][u]test fails in clamav - start clamd', timeout => 400 to [qe-core][sles][functional]test fails in clamav - start clamd', timeout => 400
Actions #32

Updated by SLindoMansilla about 3 years ago

  • Subject changed from [qe-core][sles][functional]test fails in clamav - start clamd', timeout => 400 to [qe-core][sles][functional][sporadic] test fails in clamav - start clamd', timeout => 400
  • Status changed from Workable to New
  • Assignee deleted (SLindoMansilla)

No time to work on this :(

Actions #33

Updated by slo-gin over 1 year ago

This ticket was set to Normal priority but was not updated within the SLO period. Please consider picking up this ticket or just set the ticket to the next lower priority.

Actions #34

Updated by zluo over 1 year ago

  • Status changed from New to Rejected
  • Assignee set to zluo

since this is a very old ticket, and we don't have the issue for now, reject it

Actions

Also available in: Atom PDF