action #55883
closedknown_issue:"Can.t fcntl" [kernel] Investigate qemu left over, when switching from unix-domain sockets to named-pipes used for virtio_console handling
0%
Description
When https://github.com/os-autoinst/os-autoinst/pull/1182 was deployed, we encountered a lot of incomplete
jobs on o3.
There seems to be a correlation between save_memory_dump()
and qemu left overs, if this PR was included.
This ticked https://progress.opensuse.org/issues/55595 show some investigation on save_memory_dump()
in short, I think it was that xz
return with 2 on warnings. And we have use autodie :all
in qemu.pm which then just let this process die() and the job end's with ´incomplete`.
So I guess that qemu sometimes have problem in exact that "unexpected" end, to cleanup and kill all qemu instances.
The investigation documented in https://progress.opensuse.org/issues/55505 pinpoints to the PR#1182 and also mention a reproducer like:
openqa-clone-job --from https://openqa.opensuse.org/tests/1006973 CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse.git#fix/poo55505_migration_incompletes WORKER_CLASS=qemu_x86_64_tw
Unfortunately, I wasn't able to reproduce it till now...
Updated by cfconrad about 5 years ago
- Related to action #54260: [kernel][s390x] Occasional failures due timeout added
Updated by cfconrad almost 5 years ago
- Status changed from In Progress to Feedback
Added some cleanup, but still not able to reproduce.
Reopened the PR
https://github.com/os-autoinst/os-autoinst/pull/1274
=> waiting for merge.
Updated by cfconrad almost 5 years ago
PR was merged, but introduced a regression!!
From autoinst-log.txt
[2019-12-04T13:40:28.465 CET] [debug] <<< consoles::serial_screen::type_string(text='
', json_cmd_token='bjYlPJkG', cmd='backend_type_string')
[2019-12-04T13:40:28.466 CET] [debug] Backend process died, backend errors are reported below in the following lines:
Can't use an undefined value as a symbol reference at /usr/lib/os-autoinst/consoles/serial_screen.pm line 106.
Issue was introduced with https://github.com/os-autoinst/os-autoinst/pull/1274. Looks like I forgot on rebase to update ssh_serial.pm which was introduced in https://github.com/os-autoinst/os-autoinst/pull/1176.
This PR https://github.com/os-autoinst/os-autoinst/pull/1313 should fix it.
If someone knows a better way to call super class constructor, plz let me know.
Updated by cfconrad almost 5 years ago
- Related to action #60725: [kernel]Can't use an undefined value as a symbol reference at /usr/lib/os-autoinst/consoles/serial_screen.pm line 106. added
Updated by okurz almost 5 years ago
https://openqa.suse.de/tests/3678200 incompleted with
[2019-12-07T14:37:30.891 CET] [debug] <<< consoles::virtio_terminal::open_pipe(pipe_prefix='/var/lib/openqa/pool/7/virtio_console1')
[2019-12-07T14:37:30.891 CET] [info] ::: consoles::virtio_terminal::open_pipe: Set PIPE_SZ from 65536 to 1048576
[2019-12-07T14:37:30.891 CET] [info] ::: consoles::virtio_terminal::open_pipe: Set PIPE_SZ from 65536 to 1048576
/usr/lib/os-autoinst/consoles/vnc_base.pm:62:{
'connect_timeout' => 3,
'port' => 5997,
'hostname' => 'localhost'
}
[2019-12-07T14:37:30.894 CET] [debug] <<< consoles::virtio_terminal::open_pipe(pipe_prefix='/var/lib/openqa/pool/7/virtio_console')
[2019-12-07T14:37:30.894 CET] [info] ::: consoles::virtio_terminal::open_pipe: Set PIPE_SZ from 65536 to 1048576
[2019-12-07T14:37:30.896 CET] [debug] Backend process died, backend errors are reported below in the following lines:
Can't fcntl($fh, '1031', '1048576'): Operation not permitted at /usr/lib/os-autoinst/consoles/virtio_terminal.pm line 147
https://github.com/os-autoinst/os-autoinst/pull/1313 was already deployed on the affected worker. can you take a look please?
Updated by okurz almost 5 years ago
- Subject changed from [kernel] Investigate qemu left over, when switching from unix-domain sockets to named-pipes used for virtio_console handling to known_issue:"Can.t fcntl" [kernel] Investigate qemu left over, when switching from unix-domain sockets to named-pipes used for virtio_console handling
Updated by cfconrad almost 5 years ago
Can't fcntl($fh, '1031', '1048576'): Operation not permitted at /usr/lib/os-autoinst/consoles/virtio_terminal.pm line 147
I think it is related to https://github.com/os-autoinst/openQA/pull/2257 which should adopt the apparmor profile.
Going to double check the worker.
Updated by cfconrad almost 5 years ago
The workers /etc/apparmor.d/usr.share.openqa.script.worker looks ok. Need further investigation.
Updated by cfconrad almost 5 years ago
From my point of view, it cannot be related to apparmor. The same IOCTLs are used before on the second console:
[2019-12-07T14:37:30.891 CET] [debug] <<< consoles::virtio_terminal::open_pipe(pipe_prefix='/var/lib/openqa/pool/7/virtio_console1')
[2019-12-07T14:37:30.891 CET] [info] ::: consoles::virtio_terminal::open_pipe: Set PIPE_SZ from 65536 to 1048576
[2019-12-07T14:37:30.891 CET] [info] ::: consoles::virtio_terminal::open_pipe: Set PIPE_SZ from 65536 to 1048576
But then they fail on /var/lib/openqa/pool/7/virtio_console
.
Maybe the worker was out of ram?
Updated by cfconrad almost 5 years ago
@okurz, can you tell me if this happen often? And if so, how can I track it.
Updated by pvorel almost 5 years ago
IMHO missing RAM wouldn't return Operation not permitted (EPERM). It might not be apparmor, but I bet it's security related.
Updated by cfconrad almost 5 years ago
@pvorel I don't understand why this happen on one pipe but close after on other not and both pipes are created by the same process.
When you search in autoinst-log.txt for open_pipe
you will find the same command multiple times succeed without any problem.
So most of the time we see this:
[2019-12-07T14:28:03.397 CET] [debug] <<< consoles::virtio_terminal::open_pipe(pipe_prefix='/var/lib/openqa/pool/7/virtio_console1')
[2019-12-07T14:28:03.398 CET] [info] ::: consoles::virtio_terminal::open_pipe: Set PIPE_SZ from 65536 to 1048576
[2019-12-07T14:28:03.398 CET] [info] ::: consoles::virtio_terminal::open_pipe: Set PIPE_SZ from 65536 to 1048576
[2019-12-07T14:28:03.398 CET] [debug] <<< consoles::virtio_terminal::open_pipe(pipe_prefix='/var/lib/openqa/pool/7/virtio_console')
[2019-12-07T14:28:03.398 CET] [info] ::: consoles::virtio_terminal::open_pipe: Set PIPE_SZ from 65536 to 1048576
[2019-12-07T14:28:03.399 CET] [info] ::: consoles::virtio_terminal::open_pipe: Set PIPE_SZ from 65536 to 1048576
And only once the error.
Maybe we should just not autodie on such error and continue or try with less size.
Updated by okurz almost 5 years ago
cfconrad wrote:
@okurz, can you tell me if this happen often? And if so, how can I track it.
https://gitlab.suse.de/openqa/auto-review/pipelines runs recurringly every day and labels incomplete jobs with this ticket whenever the issue is encountered. To find these issues you can query the database on osd, e.g. with
select job_id from comments where text ~ 'poo#55883';
in SQL which as of now only shows a single job, the one I reported about. Maybe there are no more but auto-review does not yet parse the "known_issue" from tickets, waiting for https://github.com/os-autoinst/scripts/pull/14
Updated by cfconrad almost 5 years ago
Thx for explanation.
I actually never took a look into os-autoinst/scripts :) nice!
Updated by cfconrad almost 5 years ago
- Status changed from Feedback to Resolved
Updated by pcervinka about 4 years ago
- Target version changed from 457 to QE Kernel Done