Project

General

Profile

Actions

action #119278

closed

[qem][qe-core]test fails in valgrind

Added by pdostal over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Bugs in existing tests
Target version:
-
Start date:
2022-10-24
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

# bash -e valgrind-test.sh; echo ZXZuM-$?-
Compiling test program ... 
Testing valgrind ... 
ZXZuM-1-

openQA test in scenario sle-15-SP4-Server-DVD-Updates-aarch64-mau-extratests2@aarch64-virtio fails in
valgrind

Test suite description

Testsuite maintained at https://gitlab.suse.de/qa-maintenance/qam-openqa-yml. Run console tests against aggregated test repo

Reproducible

Fails since (at least) Build 20221019-1

Expected result

Last good: 20221018-1 (or more recent)

Further details

Always latest result in this scenario: latest

Actions #1

Updated by ph03nix over 1 year ago

This is likely a timeout issue, we might need to set a higher timeout on line 47: assert_script_run 'bash -e valgrind-test.sh':

assert_script_run('bash -e valgrind-test.sh', timeout => 300);
Actions #2

Updated by rfan1 over 1 year ago

  • Assignee set to rfan1

It might have something to do with bad performance on arm workers.

Actions #3

Updated by rfan1 over 1 year ago

  • Status changed from New to In Progress

It is not a performance issue, after printing the logs, I can see the below errors:

-valgrind --tool=memcheck --trace-children=yes ./valgrind-test 2>/dev/null
+valgrind -v --tool=memcheck --trace-children=yes ./valgrind-test

logs:

+ valgrind -v --tool=memcheck --trace-children=yes ./valgrind-test
==2050== Memcheck, a memory error detector
==2050== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==2050== Using Valgrind-3.18.1-42b08ed5bd-20211015 and LibVEX; rerun with -h for copyright info
==2050== Command: ./valgrind-test
==2050== 
--2050-- Valgrind options:
--2050--    -v
--2050--    --tool=memcheck
--2050--    --trace-children=yes
--2050-- Contents of /proc/version:
--2050--   Linux version 5.14.21-150400.24.28-default (geeko@buildhost) (gcc (SUSE Linux) 7.5.0, GNU ld (GNU Binutils; SUSE Linux Enterprise 15) 2.37.20211103-150100.7.37) #1 SMP PREEMPT_DYNAMIC Mon Oct 10 15:21:12 UTC 2022 (f82da2c)
--2050-- 
--2050-- Arch and hwcaps: ARM64, LittleEndian, v8-atomics
--2050-- Page sizes: currently 4096, max supported 65536
--2050-- Valgrind library directory: /usr/lib/valgrind
--2050-- Reading syms from /var/tmp/valgrind-test
--2050-- Reading syms from /lib64/ld-2.31.so
--2050-- Reading syms from /usr/lib/valgrind/memcheck-arm64-linux
--2050--    object doesn't have a symbol table
--2050--    object doesn't have a dynamic symbol table
--2050-- Scheduler: using generic scheduler lock implementation.
--2050-- Reading suppressions file: /usr/lib/valgrind/default.supp
==2050== embedded gdbserver: reading from /tmp/vgdb-pipe-from-vgdb-to-2050-by-root-on-susetest
==2050== embedded gdbserver: writing to   /tmp/vgdb-pipe-to-vgdb-from-2050-by-root-on-susetest
==2050== embedded gdbserver: shared mem   /tmp/vgdb-pipe-shared-mem-vgdb-2050-by-root-on-susetest
==2050== 
==2050== TO CONTROL THIS PROCESS USING vgdb (which you probably
==2050== don't want to do, unless you know exactly what you're doing,
==2050== or are doing some strange experiment):
==2050==   /usr/lib/valgrind/../../bin/vgdb --pid=2050 ...command...
==2050== 
==2050== TO DEBUG THIS PROCESS USING GDB: start GDB like this
==2050==   /path/to/gdb ./valgrind-test
==2050== and then give GDB the following command
==2050==   target remote | /usr/lib/valgrind/../../bin/vgdb --pid=2050
==2050== --pid is optional if only one valgrind process is running
==2050== 

VEX: Mismatch detected between RDMA and atomics features.
     Found: v8-atomics
Cannot continue. Good-bye

vex storage: T total 0 bytes allocated
vex storage: P total 0 bytes allocated

valgrind: the 'impossible' happened:
   LibVEX called failure_exit().

However, I can't see the issue on my local setup, it might have something to do with the hardware configration.

My setup:
# lscpu|grep Flags
Flags:                           fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid

openQA workers:
# lscpu; echo 5GZT6-$?-
Architecture:           aarch64
  CPU op-mode(s):       64-bit
  Byte Order:           Little Endian
CPU(s):                 4
  On-line CPU(s) list:  0-3
Vendor ID:              Cavium
  Model name:           ThunderX 88XX
    Model:              1
    Thread(s) per core: 1
    Core(s) per socket: 4
    Socket(s):          1
    Stepping:           0x1
    BogoMIPS:           200.00
    Flags:              fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics cpuid
NUMA:                   
  NUMA node(s):         1
  NUMA node0 CPU(s):    0-3

Let me try to file a bug

Actions #4

Updated by rfan1 over 1 year ago

  • Status changed from In Progress to Blocked
Actions #5

Updated by mgrifalconi over 1 year ago

https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/15779 removes the test for now. Since it never worked on 15-sp4.

Before adding it back and to further debug, we should test it in a development job group

Actions #6

Updated by apappas over 1 year ago

  • Tags changed from bugbusters to bugbusters, qe-core-coverage
Actions #7

Updated by rfan1 over 1 year ago

The bug is fixed, I will try to re-test it.

However the parent job is failed due to https://bugzilla.suse.com/show_bug.cgi?id=1204924.

Actions #8

Updated by rfan1 over 1 year ago

  • Status changed from Blocked to In Progress

Check if the fix is checked in: http://openqa.suse.de/tests/10071023

Actions #10

Updated by rfan1 over 1 year ago

  • Status changed from In Progress to Feedback
Actions #11

Updated by rfan1 over 1 year ago

  • Status changed from Feedback to Resolved
Actions

Also available in: Atom PDF