Project

General

Profile

action #48434

[functional][u] test Tumbleweed s390x again

Added by okurz over 1 year ago. Updated 7 months ago.

Status:
Resolved
Priority:
Low
Category:
New test
Target version:
Start date:
2019-02-26
Due date:
% Done:

0%

Estimated time:
Difficulty:

Related issues

Related to openQA Project - action #46349: [s390x] Add qemu backend supportNew2019-01-17

Related to openQA Tests - action #49820: [functional][u] Make the svirt backend work with AppArmor enabled and under company policiesRejected2019-03-29

Related to openQA Tests - action #43655: [functional][u] Increase robustness of using bootloader parameter with info-file instead of long typingWorkable

History

#1 Updated by okurz over 1 year ago

[26/02/2019 15:45:16] <SergioAtSUSE> okurz, then here. There is already a built Tumbleweed IDO for s390x than can be installed on a nested ZVM/KVM. We want to start performing tests on O3. We need to have the assets synced there. Do you have access to the script that syncs Tumbleweed?
[26/02/2019 15:45:22] <SergioAtSUSE> ISO*
[26/02/2019 15:46:18] <okurz> SergioAtSUSE: every SUSE employee has access. And there are already tests enabled for s390x. You don't need to "start" anything but just make sure the media are released to the according :ToTest repo
[26/02/2019 15:47:17] <okurz> SergioAtSUSE: https://gitlab.suse.de/openqa/scripts/blob/master/rsync_opensuse.pm#L72
[26/02/2019 15:49:29] <okurz> SergioAtSUSE: you can compare https://build.opensuse.org/package/show/openSUSE:Factory/000product with https://build.opensuse.org/package/show/openSUSE:Factory:zSystems:ToTest/000product
[26/02/2019 15:50:52] <okurz> SergioAtSUSE: let me check the current sync status on o3
[26/02/2019 15:56:01] <okurz> SergioAtSUSE: hm, actually seems like the assets are more or less fine but "ERROR: destination must be a directory when copying more than 1 file"
[26/02/2019 15:56:49] <SergioAtSUSE> okurz, where have you got that error?
[26/02/2019 15:57:03] <okurz> SergioAtSUSE: from o3
[26/02/2019 16:01:27] <SergioAtSUSE> okurz, could you create the missing directory on O3?
[26/02/2019 16:52:25] <okurz> DimStar: maybe you can crosscheck nevertheless. rsync://openqa@obs-backend.publish.opensuse.org/opensuse-internal/build//openSUSE:Factory:zSystems:ToTest/images//local/*product:openSUSE-ftp-ftp-s390x/openSUSE-*-s390x-Media1/media.1/media should be right to read out build information, right?
[26/02/2019 16:54:51] <okurz> yes, that could help
[26/02/2019 16:57:08] <DimStar> okurz: cleaned; let's give OBS a couple minutes then retry.. maybe that's already all we need
[26/02/2019 17:02:56] <okurz> DimStar: seems like that helped already, sync started

#2 Updated by okurz over 1 year ago

  • Related to action #46349: [s390x] Add qemu backend support added

#3 Updated by okurz over 1 year ago

currently syncing on o3

#4 Updated by okurz over 1 year ago

  • Status changed from New to In Progress

successfully synced. https://openqa.opensuse.org/tests/863522/file/autoinst-log.txt shows missing dependencies on the worker "imagetester".

imagetester:~ # transactional-update pkg install icewm x3270 xterm-console && reboot

#5 Updated by okurz over 1 year ago

  • Status changed from In Progress to Workable
  • Assignee deleted (okurz)

https://openqa.opensuse.org/tests/863533/file/autoinst-log.txt shows failures, e.g. "IceWM: Warning: Failed to load theme default/default.theme: Permission denied" . I guess the way we used for the SLE workers is to have apparmor disabled on the specific worker machines.

I suggest to crosscheck with coolo

#6 Updated by SLindoMansilla over 1 year ago

coolo is ok to deactivate apparmor if there is no other service running on that machine (IT policy for company public services).

  • imagetester is a worker for x86_64 and it is not possible to disable apparmor.
  • remote-backend workers ssh into the wild, which cannot be properly confined in apparmor

Proposed solution, we need to find a dedicated machine for remote-backend workers for openqa.opensuse.org.

#7 Updated by okurz over 1 year ago

SLindoMansilla wrote:

  • imagetester is a worker for x86_64 and it is not possible to disable apparmor.

why should it be not possible?

#8 Updated by SLindoMansilla over 1 year ago

Trying to setup a Raspberry Pi as worker for svirt. (with the intention to put it on the O3 network if the experiment works)

#9 Updated by SLindoMansilla over 1 year ago

  • Assignee set to SLindoMansilla

Possible solutions for a dedicated openQA-worker for svirt backends

Glossary:

  • O3: openqa.opensuse.org
  • dedicated openQA worker: a virtual or physical machine with AppArmor disabled, the package openQA-worker installed and offering only this service.

Linux container

Running a dedicated openQA worker in a Linux container is not doable due to apparmor also being applied on such processes (see https://cloud.google.com/container-optimized-os/docs/how-to/secure-apparmor). It would require an AppArmor rule in the profile for it. So, using a container is not an option.

Virtual machine in imagetester

imagetester is a machine serving openQA workers on O3 (see https://openqa.opensuse.org/admin/workers/44). A virtual machine running there could be used as a dedicated openQA worker.
We need a confirmation if IT policies accept it.

Raspberry Pi 3 B+

This cost-effective micro computer could be used as a dedicated openQA worker.
At least the following packages are necessary:

  • x11-vnc
  • icewm
  • xterm-terminal
  • x3270

We need a confirmation if IT policies accept it.

#10 Updated by SLindoMansilla over 1 year ago

I was able to set up a Raspberry Pi 3 B+ as an openQA-worker for remote z/VM SUT's host.

Internal links:

#11 Updated by SLindoMansilla over 1 year ago

After analyzing the situation with R&D admins, the server room that is on the backend for openqa.opensuse.org has a strict policy about only allowing certified machines.
Raspberry Pi is not a valid machine.

About getting a certified machine for that server room:
After talking to the openSUSE chairman, the existing machines are donated to openSUSE by SUSE, so, I will ask my department if we have the budget for that.

#12 Updated by SLindoMansilla over 1 year ago

  • Related to action #46919: [functional][u][svirt][sporadic] auto_review:"IO::Socket::INET: connect: Connection timed out" added

#13 Updated by SLindoMansilla over 1 year ago

  • Related to deleted (action #46919: [functional][u][svirt][sporadic] auto_review:"IO::Socket::INET: connect: Connection timed out")

#14 Updated by SLindoMansilla over 1 year ago

Machine requirement from R&D admins for server room 1: "serverlike shape, so 19" size 1 or 2 HE 1-2 powerplugs and normally a BMC port"

azouhr suggested this machine: https://www.ebay.de/itm/Supermicro-SuperServer-502-200-X7SLA-H-Intel-Atom-330-1-60GHz-2GB-RAM-250G-HDD/223216622388?hash=item33f8bf5b34:g:MwMAAOSwa81aEtG9

R&D admins told me that such a machine would be valid.

Waiting for feedback from my department.

#15 Updated by SLindoMansilla over 1 year ago

For several reasons, it is not allowed that the company buys hardware from ebay.

In this egg-chicken situation, that means that I need to adapt the backend to work with AppArmor enabled before enabling openSUSE tests on s390x.
For this, I need to be provided with a z/VM user that I can use for this purpose. Waiting for feedback from Z system admins.

#16 Updated by SLindoMansilla over 1 year ago

  • Status changed from Workable to Blocked

Blocked by: #49820

#17 Updated by SLindoMansilla over 1 year ago

  • Blocked by action #49820: [functional][u] Make the svirt backend work with AppArmor enabled and under company policies added

#18 Updated by SLindoMansilla about 1 year ago

  • Blocked by action #56045: [functional][u][sporadic] command 'dasd_configure 0.0.0150 0' timed out at /var/lib/openqa/cache/openqa.suse.de/tests/sle/tests/installation/bootloader_s390.pm added

#19 Updated by SLindoMansilla about 1 year ago

  • Blocked by deleted (action #49820: [functional][u] Make the svirt backend work with AppArmor enabled and under company policies)

#20 Updated by SLindoMansilla about 1 year ago

  • Related to action #49820: [functional][u] Make the svirt backend work with AppArmor enabled and under company policies added

#21 Updated by SLindoMansilla about 1 year ago

  • Related to action #43655: [functional][u] Increase robustness of using bootloader parameter with info-file instead of long typing added

#22 Updated by okurz about 1 year ago

@SLindoMansilla I think you found a different approach since your last ticket update, haven't you? We have "rebel" as an o3 worker. I am not sure how you want to control it. If you want I think we can include it and try to manage it just like the other o3 workers when you think this is the best approach. dmcvicar asked in https://chat.suse.de/channel/suse?msg=6H9kDMt4gGxc3tAYF about the status of Tumbleweed s390x and dimstar force-pushed a build of openSUSE:Factory:zSystems to o3. The jobs were scheduled but not triggered as there was no openQA worker instance running on rebel, the machine did not properly update since about 120 days. I fixed the update on rebel and rebooted into the new snapshot, the worker started correctly and started the job https://openqa.opensuse.org/tests/1054455 which failed in https://openqa.opensuse.org/tests/1054455#step/bootloader_s390/35 with an Input/output error on DASD device when trying to start installation. Maybe the DASD device previously assigned to the z/VM instance is now used elsewhere as well or needs to be re-initialized?

#23 Updated by SLindoMansilla 8 months ago

  • Blocked by deleted (action #56045: [functional][u][sporadic] command 'dasd_configure 0.0.0150 0' timed out at /var/lib/openqa/cache/openqa.suse.de/tests/sle/tests/installation/bootloader_s390.pm)

#24 Updated by SLindoMansilla 8 months ago

  • Status changed from Blocked to Resolved

Not blocked anymore since we have a workaround hardened_usercopy=off

First two tests are already working: https://openqa.opensuse.org/group_overview/34

More test suites will be added, but no need for this ticket anymore. Updates will be announced in ML opensuse-factory and opensuse-zsystems (see https://lists.opensuse.org/)

#25 Updated by pvorel 8 months ago

Nice, congratulations!
Can I add s390x LTP tests into kernel Tumbleweed tests https://openqa.opensuse.org/group_overview/32
I'd start with just 3 most important: install_ltp, ltp_syscalls and ltp_cve

#26 Updated by SLindoMansilla 8 months ago

pvorel wrote:

Nice, congratulations!
Can I add s390x LTP tests into kernel Tumbleweed tests https://openqa.opensuse.org/group_overview/32
I'd start with just 3 most important: install_ltp, ltp_syscalls and ltp_cve

Thanks!

More coverage is always better, but, could add them first to "Development Tumbleweed" https://openqa.opensuse.org/group_overview/38 ?
Once they are green or orange, no problem to add them to "PROD"

#27 Updated by pvorel 8 months ago

SLindoMansilla wrote:

More coverage is always better, but, could add them first to "Development Tumbleweed" https://openqa.opensuse.org/group_overview/38 ?
Once they are green or orange, no problem to add them to "PROD"

Sure, I'll add it first to development. Just a note, syscalls are always read, as that's a big collection of tests, very rarely they're all green even on SLES (very few tests fail, but of course still it's good to run them on both SLES and Tumbleweed).

#28 Updated by okurz 8 months ago

pvorel wrote:

Just a note, syscalls are always read, as that's a big collection of tests, very rarely they're all green even on SLES (very few tests fail, but of course still it's good to run them on both SLES and Tumbleweed).

Then please don't add them. openQA should be used and is used by many as a validation platform, especially openqa.opensuse.org meaning that tests should only fail when they show new regressions that are also handled accordingly. We have limited ressources, especially on o3 and there are enough tests ignored and wasting ressources in "Development" already.

#29 Updated by pvorel 8 months ago

Oliver, kernel syscalls is the main validation for kernel, used on other the other archs. Failing tests are mostly on all archs. BTW we talk about 5 failing tests of 1304 fails (you know, kernel is never "fixed", there is always some error, thus stop testing because 0.5% of tests does not pass doesn't make sense :)).

#30 Updated by okurz 8 months ago

you seem to have a different understanding of what a "test" is. IMHO, especially for openQA tests, every test result on top level is boolean, either it passed or not. If "not" then this means "block the release of the product that is tested" until resolved. Just keep in mind that openSUSE Tumbleweed and openSUSE Leap use a bot that releases when there are no not-ignored, failing test cases left, i.e. needs review of test failures and reliable results. Everything else has simply no place on o3 outside the "Development" job groups. And I hope to arrive at this approach for osd as well to get rid of the tedious very manual approach for test result reviewing.

#31 Updated by pvorel 8 months ago

Regards to failures: we're slowly working on fixing them (ongoing process), much bigger problem than few failures is #63373, as it breaks all kernel testing on intel. That's what really spoil the validation (and nobody cares).

These tests: we check the results (it's very useful to see bugs on Tumbleweed, which is close to mainline, but not the same). That's the only reason why I'm adding also s390x tests.

Added these 3 tests install_ltp, ltp_cve and ltp_syscalls into https://openqa.opensuse.org/group_overview/38. If they work ok, I'll move them to kernel (https://openqa.opensuse.org/group_overview/32) and add some more (just a few, as s390x is probably busy a lot).

#32 Updated by okurz 7 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: textmode
https://openqa.opensuse.org/tests/1227549

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed

Also available in: Atom PDF