Project

General

Profile

Actions

action #56006

closed

coordination #9576: [epic][opensuse][sle][functional][y] VNC+SSH Installations

[functional][y][Timebox:24] Fix remote installation over ssh for openSUSE

Added by riafarov over 4 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
New test
Target version:
SUSE QA - Milestone 33
Start date:
2019-05-29
Due date:
2020-03-24
% Done:

0%

Estimated time:
Difficulty:

Description

See parent ticket for the motivation.

Follow up on #52313. We got test suite running, but now installer fails due to some configuration issues (might be DNS or NAT). As we boot into iso, we can disable online repos, see this module:
https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/master/tests/installation/online_repos.pm
So setting DISABLE_ONLINE_REPOS will do the trick.

As it's follow up, increasing priority to get it done.

Acceptance criteria

  1. Remote installation over ssh is tested on o3 for Leap and TW
  2. New test suite is added to the development job group first, once proven to be stable is moved to the main job group

Suggestions

Use SLES remote_ssh_controller and remote_ssh_target_nfs test suites as a base.
See https://openqa.suse.de/tests/latest?test=remote_ssh_controller&machine=64bit&flavor=Installer-DVD&distri=sle&arch=x86_64&version=15-SP1
https://openqa.suse.de/tests/latest?arch=x86_64&version=15-SP1&test=remote_ssh_target_ftp&machine=64bit&distri=sle&flavor=Installer-DVD

There is worker class tap.
riafarov has access to the machine, so candidate to pair up.
Setup should just work, so fixing is out of scope.

We need to create support_server image which works with dhcp and dns roles.
Secondly, we will have to adjust code for the network configuration which works with wicked as we have NetworkManager.
Alternative will be to get images which have wicked.

For the development we can trigger support_server as normal (not MM test) and once everything works there, we can trigger it as a part of MM test.

Actions #1

Updated by riafarov over 4 years ago

  • Copied from action #52313: [functional][y] Enable remote installation over ssh for openSUSE added
Actions #2

Updated by riafarov over 4 years ago

  • Description updated (diff)
  • Status changed from New to Workable
  • Estimated time set to 5.00 h
Actions #3

Updated by JERiveraMoya over 4 years ago

  • Status changed from Workable to In Progress
Actions #5

Updated by JERiveraMoya over 4 years ago

We were already clicking "no" (default behaviour) to activate online repos in https://openqa.opensuse.org/tests/1017908, so previous verification run with DISABLE_ONLINE_REPOS does not work because it tries to do it explicitly after clicking "yes" and seems that this path does not work: https://openqa.opensuse.org/tests/1019417#step/online_repos/4. Perhaps using ftp in SLE was a requirement and not an additional step in the test?

Actions #6

Updated by JERiveraMoya over 4 years ago

  • Status changed from In Progress to Feedback
Actions #7

Updated by JERiveraMoya over 4 years ago

This run is better to compare with sle 15 sp2 because I setup iso to leap 15, but obviously fails because it is not using wicked.

Actions #9

Updated by JERiveraMoya over 4 years ago

In this one using new image Leap 15.0 (all inclusive!) I don't get the error with the medium in the target, wondering if it is really ok, because I have to create needles again, but...I don't remember I could get so far in previous runs:
https://openqa.opensuse.org/tests/1024855#step/remote_controller/15
https://openqa.opensuse.org/tests/1024856#step/bootloader/7
In parallel, I'm creating script with the command required to setup the support server in one VM.

Actions #10

Updated by JERiveraMoya over 4 years ago

Good news, the controller finally installed the system (no issues about medium not found) https://openqa.opensuse.org/tests/1025002#step/wait_children/6, bad news, the target does not boot :( https://openqa.opensuse.org/tests/1025003#step/remote_target/3. This happens using NET installation by http in the target.
Using DVD installation on the target is present the same issue: https://openqa.opensuse.org/tests/1025005

Actions #11

Updated by JERiveraMoya over 4 years ago

I caught something on video of the DVD setup that could be a bug? https://openqa.opensuse.org/tests/1025004/file/video.ogv#t=71.04,71.08

Actions #12

Updated by JERiveraMoya over 4 years ago

  • Status changed from Feedback to Blocked
Actions #13

Updated by riafarov over 4 years ago

  • Due date changed from 2019-09-10 to 2019-10-08
  • Target version changed from Milestone 27 to Milestone 28
  • Estimated time deleted (5.00 h)

After bug fix it should "just" work, so let's cross check once we have the patch.

Actions #14

Updated by JERiveraMoya over 4 years ago

  • Status changed from Blocked to Workable
  • Assignee deleted (JERiveraMoya)

Target is still failing on booting.

Actions #15

Updated by JERiveraMoya over 4 years ago

  • Subject changed from [functional][y] Fix remote installation over ssh for openSUSE to [functional][y][Timebox:24] Fix remote installation over ssh for openSUSE
  • Due date changed from 2019-10-08 to 2019-10-22
Actions #16

Updated by riafarov over 4 years ago

  • Priority changed from High to Normal
Actions #17

Updated by ybonatakis over 4 years ago

  • Status changed from Workable to In Progress
  • Assignee set to ybonatakis
Actions #18

Updated by ybonatakis over 4 years ago

i bumped into a bug running the job locally which prevents me to investigate further. Probably it has nothing to do with the main problem as it appears in the first place(https://openqa.opensuse.org/tests/1025002). Although i think that something also happening to the controller and looses the connection with the target. But there are not logs in the job and as i cant reach the same result locally, i cant be sure what it causes it. I filed a bug ticket[0] against the first finding.

[0] https://bugzilla.suse.com/show_bug.cgi?id=1153771

Actions #19

Updated by ybonatakis over 4 years ago

There is the following trace in the serial.

starting setsid -wc inst_setup yast
[  596.254515] general protection fault: 0000 [#1] SMP PTI
[  596.255083] CPU: 0 PID: 3793 Comm: y2start Tainted: G        W         5.3.4-1-default #1 openSUSE Tumbleweed (unreleased)
[  596.256139] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014
[  596.257161] RIP: 0010:__queue_work+0x1f/0x3d0
[  596.257571] Code: 00 49 39 d6 75 bd e9 6e ff ff ff 0f 1f 44 00 00 41 57 49 89 d7 41 56 41 89 fe 41 55 41 89 fd 41 54 49 89 f4 55 53 48 83 ec 10 <f6> 86 02 01 00 00 01 0f 85 df 02 00 00 48 bd eb 83 b5 80 46 86 c8
[  596.259475] RSP: 0018:ffffab3a80f87aa0 EFLAGS: 00010086
[  596.260145] RAX: 0000000080000000 RBX: 0000000000000286 RCX: ffff8ca05d2f7ed0
[  596.261036] RDX: ffff8ca00c8ebdc0 RSI: eba702d91d300a72 RDI: 0000000000000200
[  596.261899] RBP: 0000000000000200 R08: ffffffffc0690750 R09: ffff8ca05adbb830
[  596.262810] R10: 0000000000000004 R11: ffffffffffffffff R12: eba702d91d300a72
[  596.263544] R13: 0000000000000200 R14: 0000000000000200 R15: ffff8ca00c8ebdc0
[  596.264237] FS:  00007f0300a82740(0000) GS:ffff8ca057600000(0000) knlGS:0000000000000000
[  596.265001] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  596.265535] CR2: 00007f0fa801a218 CR3: 000000005bcb0000 CR4: 00000000000006f0
[  596.266221] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  596.266923] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  596.267620] Call Trace:
[  596.267858]  queue_work_on+0x85/0x90
[  596.268216]  btrfs_wq_submit_bio+0xa9/0xc0 [btrfs]
[  596.268687]  btree_submit_bio_hook+0x53/0xc0 [btrfs]
[  596.269159]  ? btree_csum_one_bio+0x1f0/0x1f0 [btrfs]
[  596.269641]  submit_one_bio+0x31/0x50 [btrfs]
[  596.270096]  btree_write_cache_pages+0x313/0x330 [btrfs]
[  596.270739]  ? __switch_to_asm+0x36/0x70
[  596.271131]  ? entry_SYSCALL_64_after_hwframe+0xb8/0xbe
[  596.271649]  ? __switch_to_asm+0x34/0x70
[  596.272032]  ? __switch_to_asm+0x40/0x70
[  596.272399]  ? __switch_to_asm+0x34/0x70
[  596.272766]  ? __switch_to_asm+0x40/0x70
[  596.273134]  ? __switch_to_asm+0x34/0x70
[  596.273501]  ? __switch_to_asm+0x40/0x70
[  596.273868]  ? __switch_to_asm+0x34/0x70
[  596.274235]  ? __switch_to_asm+0x40/0x70
[  596.274604]  ? __switch_to_asm+0x34/0x70
[  596.275002]  do_writepages+0x43/0xd0
[  596.275359]  ? __schedule+0x2c6/0x6d0
[  596.275724]  __writeback_single_inode+0x3d/0x340
[  596.276175]  writeback_single_inode+0xaf/0x120
[  596.276595]  write_inode_now+0x86/0xc0
[  596.276950]  iput+0x16e/0x1d0
[  596.277248]  close_ctree+0x1a6/0x310 [btrfs]
[  596.277653]  generic_shutdown_super+0x6c/0x100
[  596.278073]  kill_anon_super+0x14/0x30
[  596.278434]  btrfs_kill_super+0x12/0xa0 [btrfs]
[  596.278901]  deactivate_locked_super+0x36/0x70
[  596.279341]  cleanup_mnt+0x104/0x150
[  596.279697]  task_work_run+0xa1/0xc0
[  596.280054]  exit_to_usermode_loop+0x10c/0x130
[  596.280472]  do_syscall_64+0x1bc/0x200
[  596.280826]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  596.281300] RIP: 0033:0x7f03010d0e2b
[  596.281638] Code: 0f 05 48 3d 00 f0 ff ff 77 45 c3 0f 1f 40 00 48 83 ec 18 89 7c 24 0c e8 a3 4e f9 ff 8b 7c 24 0c 41 89 c0 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2f 44 89 c7 89 44 24 0c e8 e1 4e f9 ff 8b 44
[  596.283452] RSP: 002b:00007ffdc90e7370 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
[  596.284169] RAX: 0000000000000000 RBX: 000055dcb0461770 RCX: 00007f03010d0e2b
[  596.284826] RDX: 000055dcaf945d50 RSI: 0000000000000001 RDI: 0000000000000020
[  596.285505] RBP: 000055dcb0461648 R08: 0000000000000000 R09: 000000000000000f
[  596.286223] R10: 000055dca9739f00 R11: 0000000000000293 R12: 00000000000003b0
[  596.286920] R13: 00000000000003e5 R14: 00007f03011a5e68 R15: 000055dcaa6eb0a0
[  596.287625] Modules linked in: fuse nls_utf8 isofs usb_storage parport_pc parport btrfs xor raid6_pq libcrc32c dm_multipath dm_mod 8021q garp mrp stp llc arc4 libarc4 fan thermal nfs lockd grace fscache nls_iso8859_1 nls_cp437 af_packet sg st iscsi_ibft iscsi_boot_sysfs hid_generic usbhid sunrpc sr_mod cdrom bochs_drm drm_vram_helper ttm drm_kms_helper ata_generic drm ehci_pci ehci_hcd usbcore ata_piix joydev serio_raw pcspkr virtio_net syscopyarea sysfillrect virtio_blk virtio_scsi net_failover failover sysimgblt fb_sys_fops i2c_piix4 qemu_fw_cfg floppy button scsi_dh_rdac scsi_dh_emc scsi_dh_alua edd squashfs loop [last unloaded: ppa]
[  596.292887] ---[ end trace bf834b0d98e4e454 ]---
[  596.293318] RIP: 0010:__queue_work+0x1f/0x3d0
[  596.293725] Code: 00 49 39 d6 75 bd e9 6e ff ff ff 0f 1f 44 00 00 41 57 49 89 d7 41 56 41 89 fe 41 55 41 89 fd 41 54 49 89 f4 55 53 48 83 ec 10 <f6> 86 02 01 00 00 01 0f 85 df 02 00 00 48 bd eb 83 b5 80 46 86 c8
[  596.295516] RSP: 0018:ffffab3a80f87aa0 EFLAGS: 00010086
[  596.296031] RAX: 0000000080000000 RBX: 0000000000000286 RCX: ffff8ca05d2f7ed0
[  596.296687] RDX: ffff8ca00c8ebdc0 RSI: eba702d91d300a72 RDI: 0000000000000200
[  596.297343] RBP: 0000000000000200 R08: ffffffffc0690750 R09: ffff8ca05adbb830
[  596.297999] R10: 0000000000000004 R11: ffffffffffffffff R12: eba702d91d300a72
[  596.298688] R13: 0000000000000200 R14: 0000000000000200 R15: ffff8ca00c8ebdc0
[  596.299380] FS:  00007f0300a82740(0000) GS:ffff8ca057600000(0000) knlGS:0000000000000000
[  596.300156] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  596.300688] CR2: 00007f0fa801a218 CR3: 000000005bcb0000 CR4: 00000000000006f0
[  596.301347] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  596.302004] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
install program exit code is 139
sync...[ 1851.418702] sysrq: Show Blocked State
[ 1851.419854]   task                        PC stack   pid father
[ 1851.421652] init            D    0     1      0 0x00004000
[ 1851.423324] Call Trace:
[ 1851.424091]  ? __schedule+0x2be/0x6d0
[ 1851.425209]  ? __switch_to_asm+0x34/0x70
[ 1851.426415]  ? __switch_to_asm+0x34/0x70
[ 1851.427610]  schedule+0x39/0xa0
[ 1851.428575]  rwsem_down_read_slowpath+0x171/0x4b0
[ 1851.430064]  ? page_cache_pipe_buf_steal.cold+0x21/0x21
[ 1851.431667]  iterate_supers+0x7e/0x100
[ 1851.432798]  ksys_sync+0x40/0xb0
[ 1851.433787]  __ia32_sys_sync+0xa/0x10
[ 1851.434532]  do_syscall_64+0x6e/0x200
[ 1851.435268]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1851.436277] RIP: 0033:0x7f5a99a7c1d7
[ 1851.437002] Code: Bad RIP value.
[ 1851.437664] RSP: 002b:00007ffc8bc58d28 EFLAGS: 00000206 ORIG_RAX: 00000000000000a2
[ 1851.439246] RAX: ffffffffffffffda RBX: 00000000000007e1 RCX: 00007f5a99a7c1d7
[ 1851.440684] RDX: 000055df4d9438a0 RSI: 000055df4d3255c0 RDI: 000055df4d15e010
[ 1851.442118] RBP: 00007ffc8bc58d40 R08: 0000000000000006 R09: 0000000000000000
[ 1851.443534] R10: 000055df4befc90a R11: 0000000000000206 R12: 00007ffc8bc58d3c
[ 1851.444610] R13: 000055df4bf3d66f R14: 000055df4bf49d90 R15: 000055df4bf49d90

I think we have to file a bug and pass it to the yast team

Actions #20

Updated by ybonatakis over 4 years ago

  • Status changed from In Progress to Feedback
Actions #21

Updated by ybonatakis over 4 years ago

  • Status changed from Feedback to Blocked
Actions #22

Updated by riafarov over 4 years ago

  • Due date changed from 2019-10-22 to 2019-12-03
  • Target version changed from Milestone 28 to Milestone 30+
Actions #23

Updated by riafarov over 4 years ago

  • Due date changed from 2019-12-03 to 2019-12-17
  • Assignee changed from ybonatakis to riafarov

Should be fixed rather soon.

Actions #24

Updated by riafarov over 4 years ago

  • Due date changed from 2019-12-17 to 2020-01-28
Actions #25

Updated by mgriessmeier over 4 years ago

  • Target version changed from Milestone 30+ to Milestone 30

bulk moved to M30 for revisiting

Actions #26

Updated by riafarov over 4 years ago

  • Due date changed from 2020-01-28 to 2020-03-10
  • Status changed from Blocked to Workable
  • Assignee deleted (riafarov)
  • Target version changed from Milestone 30 to Milestone 33

As per comments in https://bugzilla.suse.com/show_bug.cgi?id=1153771 this should work now, so we can attempt to finally make it running for openSUSE

Actions #27

Updated by JERiveraMoya about 4 years ago

I cleaned up some settings not needed according to what we did with vnc and used a recent TW support server:
https://openqa.opensuse.org/tests/1199178, still some missing needles and we need to check for other issues.

Actions #28

Updated by JERiveraMoya about 4 years ago

Seems ok: https://openqa.opensuse.org/tests/1199268
Enabled in TW.
Missing part is to remove soft-failure where we type 'ctrl-alt-del' in the target and comment in the bug if is still not working.

Actions #29

Updated by JERiveraMoya about 4 years ago

  • Due date changed from 2020-03-10 to 2020-03-24
  • Priority changed from Normal to High
Actions #30

Updated by riafarov about 4 years ago

  • Copied from deleted (action #52313: [functional][y] Enable remote installation over ssh for openSUSE)
Actions #31

Updated by riafarov about 4 years ago

  • Parent task set to #9576
Actions #32

Updated by riafarov about 4 years ago

JERiveraMoya wrote:

Seems ok: https://openqa.opensuse.org/tests/1199268
Enabled in TW.
Missing part is to remove soft-failure where we type 'ctrl-alt-del' in the target and comment in the bug if is still not working.

Hmm, I believe these are 2 different bugs, VNC is still need a workaround as https://bugzilla.suse.com/show_bug.cgi?id=1164503 is still open. But you are right, that we should not apply that step in SSH installation,

Actions #33

Updated by riafarov about 4 years ago

  • Assignee set to riafarov
Actions #34

Updated by riafarov about 4 years ago

  • Status changed from Workable to Feedback

So the test works fine in TW, and relevant soft-failure is provided. Once a bug is fixed, we can remove it along with workaround.
I've contacted Lubos regarding enabling of the scenario for Leap, as it's downstream of SLES, we might want not to increase execution time for the builds.

Actions #35

Updated by riafarov about 4 years ago

  • Status changed from Feedback to Resolved

No response from Lubos, resolving.

Actions

Also available in: Atom PDF