Project

General

Profile

action #77116

test fails in bootloader_s390 - ftp installation media directory repo is too long for using in parmfile - linux144, linux145, linux146, linux147 (rebel)

Added by SLindoMansilla 11 months ago. Updated 10 months ago.

Status:
Resolved
Priority:
Normal
Category:
Bugs in existing tests
Target version:
-
Start date:
2020-11-08
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

  • The S390_NETWORK_PARAMS needs to be adapted for each SUT. DONE
  • The terminal settings need to be adapted for openQA tests. profile exec a and profile xedit a. DONE
  • The installation media URL seems to be cropped: https://openqa.opensuse.org/tests/1464483#step/bootloader_s390/29
    • solution 1: Use openqa name. IPv6 interface need to be added.
    • solution 2: rsync.pl creates symlinks for directory repo with shorter names
    • solution 3: Use linuxrc info parameter to provide parameters from a file via HTTP. info parameter itself would be limited by line length, but parameters in the info file would not.

Reproducible

Further details

Always latest result in this scenario: latest

x3scr.3334.txt (19.3 KB) x3scr.3334.txt SLindoMansilla, 2020-11-10 14:36

Related issues

Related to openQA Tests - action #69328: [o3][s390x] Early fail on s390x workers: connection refusedResolved2020-07-242020-11-13

Related to openQA Tests - action #77254: test fails in bootloader_s390 - Timeout of ftp boot (192.168.112.100) - Tumbleweed s390x snapshotRejected2020-11-10

Blocked by openQA Infrastructure - action #77209: workers on o3 machine rebel provide no "WORKER_HOSTNAME" value anymore but it shows up in journal of worker serviceResolved2020-11-09

History

#1 Updated by SLindoMansilla 11 months ago

  • Related to action #69328: [o3][s390x] Early fail on s390x workers: connection refused added

#2 Updated by SLindoMansilla 11 months ago

  • Subject changed from test fails in bootloader_s390 - ftp installation media is not reachable from rebel:5 to test fails in bootloader_s390 - ftp installation media is not reachable from linux144(rebel:1), linux147(rebel:4)
  • Description updated (diff)

#3 Updated by okurz 11 months ago

but at least https://openqa.opensuse.org/tests/1464522#step/await_install/69 seemed to have progressed. Do you have an idea why this worked then?

#4 Updated by SLindoMansilla 11 months ago

  • Blocked by action #77209: workers on o3 machine rebel provide no "WORKER_HOSTNAME" value anymore but it shows up in journal of worker service added

#5 Updated by SLindoMansilla 11 months ago

okurz wrote:

but at least https://openqa.opensuse.org/tests/1464522#step/await_install/69 seemed to have progressed. Do you have an idea why this worked then?

You have already fixed the S390_NETWORK_PARAMS where the four machines where trying to use the same IP address. That may be enough. I will check.

#6 Updated by okurz 11 months ago

In https://openqa.opensuse.org/tests/1465662#step/bootloader_s390/49 I could see the error when trying to access openSUSE-Tumbleweed-oss-s390x-Snapshot20201106/suse.ins . The repo folder does not exist (anymore?) on o3. Instead there is openSUSE-Tumbleweed-oss-s390x-Snapshot20201108/ by now. Seems like the quota is exceeded. I crosschecked the source and debuginfo folders but they are not taking up too much:

okurz@ariel:~> du -sh /var/lib/openqa/share/factory/repo/openSUSE-Tumbleweed-oss-s390x-Snapshot20201108*
55G /var/lib/openqa/share/factory/repo/openSUSE-Tumbleweed-oss-s390x-Snapshot20201108
116K    /var/lib/openqa/share/factory/repo/openSUSE-Tumbleweed-oss-s390x-Snapshot20201108-debuginfo
36M /var/lib/openqa/share/factory/repo/openSUSE-Tumbleweed-oss-s390x-Snapshot20201108-source

The current assets size limit for the job group "openSUSE Tumbleweed s390x" is 80GB meaning that only a single snapshot can be kept. As we have an okaish storage situation on o3 right now I increased the size limit now to 240GB so that 4 snapshots could be kept.

another job https://openqa.opensuse.org/tests/1465663#step/bootloader_s390/29 looks like it is trying to reach an ftp folder that has its name truncated. Be aware about the line length limit in xedit, see https://openqa.opensuse.org/tests/1465663#step/bootloader_s390/28 where the first entry line is wrapped. Likely these characters are lost. I suggest you try to replace the IPv4 address of 192.168.112.100 with a hostname for the openQA webUI host.

#7 Updated by SLindoMansilla 11 months ago

Yes, the problem still happens: https://openqa.opensuse.org/tests/1465663#step/bootloader_s390/30
And now I can reproduce it with all z/VM guests.

#8 Updated by SLindoMansilla 11 months ago

The funny thing is that it was working with linux146: https://openqa.opensuse.org/tests/1464522#step/bootloader_s390/28
I thought on the cropping but suspected something related to the guest profile or Xedit profile. I am going to apply the same profiles we use in the z/VM guests for SLE.

#9 Updated by SLindoMansilla 11 months ago

I have already set the guest profile and Xedit profile to behave better on linux144, but performing a manual installation it also cannot reach the repository, not even http://download.opensuse.org/tumbleweed/repo/oss/

I have also tried with Layer2=0, because I had this problem with one machine once, but same error.

I am thinking that the Channels are wrong (ReadChannel=0.0.0800 WriteChannel=0.0.0801 DataChannel=0.0.0802), which also happened to me once. I need to ask Ihno.

#10 Updated by SLindoMansilla 11 months ago

Also, the z/VM guest can resolve openqa to 1620:113:80c0:8080:10:160:0:207, but it is not able to ping an IPv6 address DTCPIN0029E SendTo(): EDC8118I Network is unreachable. (errno2=0x000005DF)

#11 Updated by AdaLovelace 11 months ago

Can it be, that it is a VLAN issue? The last successful step is the VLAN configuration...
https://openqa.opensuse.org/tests/1465663#step/bootloader_s390/30

#12 Updated by SLindoMansilla 11 months ago

AdaLovelace wrote:

Can it be, that it is a VLAN issue? The last successful step is the VLAN configuration...
https://openqa.opensuse.org/tests/1465663#step/bootloader_s390/30

Yes, it could be.
I am asking Yast developers about debugging linuxrc to know what is happening in the SUT.

#13 Updated by SLindoMansilla 11 months ago

To strike one item from the list, the four z/VM guests have the expected profile exec a and profile xedit a by openQA.
I keep a copy of the templates in:

#14 Updated by okurz 11 months ago

  • Related to action #77254: test fails in bootloader_s390 - Timeout of ftp boot (192.168.112.100) - Tumbleweed s390x snapshot added

#15 Updated by SLindoMansilla 11 months ago

x3270 screen save of cat /var/log/linuxrc.log

This section shows no network interfaces (except lo)

14:24:15 <4>: exec: insmod /modules/btrfs.ko.xz = 0                                                                    
14:24:15 <1>:  ok                                                                                                      
14:24:15 <2>: has I/O device auto-config data: 0                                                                       
14:24:15 <2>: registering url scheme: hmc: Hardware Management Console                                                 
14:24:15 <4>: efi = 0                                                                                                  
14:24:15 <4>: fcoe_check: no edd                                                                                       
14:24:15 <2>: not booted via FCP                                                                                       
14:24:15 <4>: wicked ifup all                                                                                          
14:24:20 <4>: exec: wicked ifup all = 0                                                                                
14:24:20 <4>: stdout + stderr:                                                                                         
lo              up                                                                                                     
14:24:22 <4>: net_update_state:                            
14:24:22 <4>: lo: up                                                                                                   
14:24:22 <4>:                                                                                                          
14:24:22 <2>: Starting hardware detection...
14:24:22 <2>: Hardware detection finished.
14:24:22 <1>: (If a driver is not working for you, try booting with brokenmodule
s=driver_name.)

14:24:22 <1>: IBM OSA Express Network card
14:24:22 <1>:   drivers: qeth*
14:24:22 <1>:
14:24:22 <1>: IBM OSA Express Network card
14:24:22 <1>:   drivers: qeth*
14:24:22 <1>:
14:24:22 <1>: IBM OSA Express Network card
14:24:22 <1>:   drivers: qeth*
14:24:22 <1>:
14:24:22 <2>: update_device_list(1)
14:24:22 <2>: scanning devices

#16 Updated by SLindoMansilla 11 months ago

While trying to setup a machine to also be able to gather the linuxrc logs I found some errors that explains the network problem.
It was always failing in the same spot, but for different reasons.
One problem was that I didn't know the role that Hostname plays for openQA. Before there was a machine, now there are more and each one need a proper setting, that matches /etc/hosts from ariel. (that okurz added)
Then, I had to also add those to /etc/hosts of the worker (rebel).
Then, the failing job was still using the IP from the old broken z/VM guest. okurz fixed the settings in openQA, but I made the mistake of cloning the job without overriding the new IP for the new 4 z/VM guests.
And finally the problem of a very long URL that overflows the max line length in the parmfile. Each line of the parmfile is a virtual punched card, physicaly max 80, but 73-80 was used as comments and therefore ignore by the reader. I don't know why it was working before. I assume there is a setting I don't know that can regulate this max line length and in those 4 machines is 72.
Verification run using a symbolic link in ariel to make the directory repo shorter: https://openqa.opensuse.org/tests/1466408#step/bootloader_s390/34

So, I think it is time now to implement my idea of info files. Other option would be to tell rsync.pl to create symbolic links with shorter names for s390 repositories. But, anyway info file requires less typing from openQA, therefore less typing issues on bootloader.

#17 Updated by SLindoMansilla 11 months ago

  • Subject changed from test fails in bootloader_s390 - ftp installation media is not reachable from linux144(rebel:1), linux147(rebel:4) to test fails in bootloader_s390 - ftp installation media directory repo is too long for using in parmfile - linux144, linux145, linux146, linux147 (rebel)

#18 Updated by SLindoMansilla 10 months ago

  • Description updated (diff)

Names for 192.168.112.100 like openqa don't work because they resolve to IPv6 addresses and z/VM guests don't have IPv6 interface.
https://openqa.opensuse.org/tests/1466492/file/autoinst-log.txt

VM TCP/IP FTP Level 640
Connecting to OPENQA 2620:113:80C0:8080:10:160::207, port 21
Destination network is unreachable
Unable to connect to OPENQA 2620:113:80C0:8080:10:160::207
Destination network is unreachable

Adding IPv6 would be another option.

#19 Updated by okurz 10 months ago

IPv6 sounds like a good idea. Can we provide an IPv6 address to z/VM guests with according configuration? Feel free to (carefully) apply changes to o3 or rebel where needed to make sure IPv6 resolution works over these steps as well.

#20 Updated by SLindoMansilla 10 months ago

okurz wrote:

IPv6 sounds like a good idea. Can we provide an IPv6 address to z/VM guests with according configuration? Feel free to (carefully) apply changes to o3 or rebel where needed to make sure IPv6 resolution works over these steps as well.

AFAIK there is not involvment in rebel or the Linux SUT, they can both resolve IPv6, the problem are the z/VM guests. And that setup has to be done by Ihno I think.

#21 Updated by SLindoMansilla 10 months ago

Ihno has clarified me some things:

you can have lines longer than 80 character in a parmfile.
When the parmfile is handed over to the kernel all lines
are just concatenated.
When a line has 80 characters and in the next line
characters are in the first column, and both lines are
concatenated you have a longer line.

At no time (even the editing in z/vm) there should be
things added to the end of the line.

The parmfile is not allowed to have more than 8 lines.
Everything else will get lost/truncated.
I found a file on linux144 (linux144.parmfile) where
every parameter was in one line. This will not work.

= parmfile with lines longer than 80 characters =
So, that resolves the question of why it was working before even if the parameter was longer than 80 lines.
The question is why openQA is not able anymore to properly split that line now?

= parmfile line 8 line limit =
good to know! I was frustrated because I wasn't able to have the manual installation running while openQA could.

#22 Updated by SLindoMansilla 10 months ago

The info file option doesn't work if the info file is provided from the worker because the URL is very long and the URL shortener used is only reachable inside suse.de: https://openqa.opensuse.org/tests/1467731#step/bootloader_s390/26

I also tried using v.gd, but it fails: https://openqa.opensuse.org/tests/1467718#step/bootloader_s390/6

#23 Updated by SLindoMansilla 10 months ago

I also tried the concatenation of the parmfile that Ihno told me.

The limit is 72 characters.
The character 73 is the first one in the month 11, and it gets lost: https://openqa.opensuse.org/tests/1467778#step/bootloader_s390/28
If I use 72 characters the parmfile looks ok: https://openqa.opensuse.org/tests/1467779#step/bootloader_s390/28
But, linuxrc still gets the URL cropped: https://openqa.opensuse.org/tests/1467779#step/bootloader_s390/29
I should ask Yast team tomorrow.

The two options left are IPv6 in the z/VM guests (Ihno wanted to look into it) and short symlink by rsync.pl.

#24 Updated by SLindoMansilla 10 months ago

I didn't have time today to ask Ihno about his progress with IPv6 on the z/VM guest nor the Yast team for the unexpected behavior of parmfile->linuxrc

I am looking https://github.com/os-autoinst/openqa-trigger-from-obs to see if I can teach it to create short symlinks to the s390x repos.

#25 Updated by SLindoMansilla 10 months ago

After talking to https://github.com/andrii-suse, he suggested as simple solution to do the following change: https://github.com/os-autoinst/openqa-trigger-from-obs/pull/107

#26 Updated by SLindoMansilla 10 months ago

PR merged and deployed, next Snapshot will use shorter name.

#28 Updated by okurz 10 months ago

I fear you should clone it to openSUSE to be public :(

#29 Updated by SLindoMansilla 10 months ago

okurz wrote:

I fear you should clone it to openSUSE to be public :(

https://bugzilla.opensuse.org/show_bug.cgi?id=1178790 :)

Also available in: Atom PDF