action #55883: known_issue:"Can.t fcntl" [kernel] Investigate qemu left over, when switching from unix-domain sockets to named-pipes used for virtio_console handling - openQA Tests (public) - openSUSE Project Management Tool

Custom queries

All open Feature tests
openQA Infrastructure Project
openqa-review - Closed tickets last updated by openqa-review, last 30 days
QA roadmap long-term
QA SLE functional
QA SLE Functional - closed in last 14 days
QA SLE Functional - High, need to be refined
QA SLE Functional - over cycle time median
QA SLE u
QA SLE y
QA tools (tag not necessary in openQA and subprojects)
QA tools tag (tag not necessary in openQA and subprojects; excluding tickets in "Ready" version as they are already on the backlog)
QAC - Backlog
QAM
QE tools team - backlog (dev)
QE tools team - backlog (ready issues)
QE tools team - backlog SLA high
QE tools team - backlog SLA immediate
QE tools team - backlog SLA no immediate/urgent in feedback/blocked
QE tools team - backlog SLA normal
QE tools team - backlog SLA urgent
QE tools team - backlog SLO high
QE tools team - backlog SLO normal
QE tools team - backlog SLO urgent
QE tools team - backlog, high-level view (epics and higher)
QE tools team - backlog, non-reactive work, needs parent
QE tools team - backlog, top-level view (all sagas)
QE Tools Team - Beginner
QE tools team - closed within last 14 days
QE tools team - closed within last 60 days
QE tools team - closed yesterday
QE Tools Team - Collaborative Session
QE tools team - due date forecast
QE tools team - exceeding due-date
QE Tools Team - Expert
QE tools team - infrastructure backlog
QE tools team - next - sorted by update time
QE tools team - next issues
QE tools team - non-estimated (unblocked) issues (dev)
QE tools team - non-estimated (unblocked) issues (infra)
QE tools team - ready issues - Workable
QE tools team - ready, not assigned/blocked/low
QE tools team - SLO high forecast
QE tools team - update forecast
QE tools team - updated by priority
QE tools team - what members of the team are working on - Feedback (not-low)
QE Tools Team Backlog By Assignee
SLE15 Migration Open Tickets
SLE15 SP1 Migration Open Tickets
SLE15SP3 Migration open ticket
SLE15SP3 Security open ticket
Tools Team Retrospective
Tools Team Retrospective (not estimated or assigned)

Actions

Copy link

action #55883

closed

known_issue:"Can.t fcntl" [kernel] Investigate qemu left over, when switching from unix-domain sockets to named-pipes used for virtio_console handling

Added by cfconrad almost 6 years ago. Updated over 4 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

cfconrad

Category:

Bugs in existing tests

Target version:

QE Kernel - QE Kernel Done

Start date:

2019-08-23

Due date:

% Done:

Estimated time:

Difficulty:

Description

When https://github.com/os-autoinst/os-autoinst/pull/1182 was deployed, we encountered a lot of incomplete jobs on o3.
There seems to be a correlation between save_memory_dump() and qemu left overs, if this PR was included.

This ticked https://progress.opensuse.org/issues/55595 show some investigation on save_memory_dump() in short, I think it was that xz return with 2 on warnings. And we have use autodie :all in qemu.pm which then just let this process die() and the job end's with ´incomplete`.

So I guess that qemu sometimes have problem in exact that "unexpected" end, to cleanup and kill all qemu instances.

The investigation documented in https://progress.opensuse.org/issues/55505 pinpoints to the PR#1182 and also mention a reproducer like:

openqa-clone-job --from https://openqa.opensuse.org/tests/1006973 CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse.git#fix/poo55505_migration_incompletes WORKER_CLASS=qemu_x86_64_tw

Unfortunately, I wasn't able to reproduce it till now...

Related issues 2 (0 open — 2 closed)

Related to openQA Tests (public) - action #54260: [kernel][s390x] Occasional failures due timeout

Rejected

cfconrad

2019-07-15

Actions

Related to openQA Tests (public) - action #60725: [kernel]Can't use an undefined value as a symbol reference at /usr/lib/os-autoinst/consoles/serial_screen.pm line 106.

Resolved

cfconrad

2019-12-05

Actions

Issue # Delay: days Cancel

History
Notes
Property changes

Actions

Copy link

Updated by cfconrad almost 6 years ago

Related to action #54260: [kernel][s390x] Occasional failures due timeout added

Actions

Copy link

Updated by okurz over 5 years ago

Category set to Bugs in existing tests

Actions

Copy link

Updated by cfconrad over 5 years ago

Status changed from New to In Progress

Actions

Copy link

Updated by cfconrad over 5 years ago

Status changed from In Progress to Feedback

Added some cleanup, but still not able to reproduce.
Reopened the PR

https://github.com/os-autoinst/os-autoinst/pull/1274

=> waiting for merge.

Actions

Copy link

Updated by cfconrad over 5 years ago

PR was merged, but introduced a regression!!

From autoinst-log.txt

[2019-12-04T13:40:28.465 CET] [debug] <<< consoles::serial_screen::type_string(text='
', json_cmd_token='bjYlPJkG', cmd='backend_type_string')
[2019-12-04T13:40:28.466 CET] [debug] Backend process died, backend errors are reported below in the following lines:
Can't use an undefined value as a symbol reference at /usr/lib/os-autoinst/consoles/serial_screen.pm line 106.

Issue was introduced with https://github.com/os-autoinst/os-autoinst/pull/1274. Looks like I forgot on rebase to update ssh_serial.pm which was introduced in https://github.com/os-autoinst/os-autoinst/pull/1176.

This PR https://github.com/os-autoinst/os-autoinst/pull/1313 should fix it.

If someone knows a better way to call super class constructor, plz let me know.

Actions

Copy link

Updated by cfconrad over 5 years ago

Related to action #60725: [kernel]Can't use an undefined value as a symbol reference at /usr/lib/os-autoinst/consoles/serial_screen.pm line 106. added

Actions

Copy link

Updated by okurz over 5 years ago

https://openqa.suse.de/tests/3678200 incompleted with

[2019-12-07T14:37:30.891 CET] [debug] <<< consoles::virtio_terminal::open_pipe(pipe_prefix='/var/lib/openqa/pool/7/virtio_console1')
[2019-12-07T14:37:30.891 CET] [info] ::: consoles::virtio_terminal::open_pipe: Set PIPE_SZ from 65536 to 1048576
[2019-12-07T14:37:30.891 CET] [info] ::: consoles::virtio_terminal::open_pipe: Set PIPE_SZ from 65536 to 1048576
/usr/lib/os-autoinst/consoles/vnc_base.pm:62:{
  'connect_timeout' => 3,
  'port' => 5997,
  'hostname' => 'localhost'
}
[2019-12-07T14:37:30.894 CET] [debug] <<< consoles::virtio_terminal::open_pipe(pipe_prefix='/var/lib/openqa/pool/7/virtio_console')
[2019-12-07T14:37:30.894 CET] [info] ::: consoles::virtio_terminal::open_pipe: Set PIPE_SZ from 65536 to 1048576
[2019-12-07T14:37:30.896 CET] [debug] Backend process died, backend errors are reported below in the following lines:
Can't fcntl($fh, '1031', '1048576'): Operation not permitted at /usr/lib/os-autoinst/consoles/virtio_terminal.pm line 147

https://github.com/os-autoinst/os-autoinst/pull/1313 was already deployed on the affected worker. can you take a look please?

Actions

Copy link

Updated by okurz over 5 years ago

Subject changed from [kernel] Investigate qemu left over, when switching from unix-domain sockets to named-pipes used for virtio_console handling to known_issue:"Can.t fcntl" [kernel] Investigate qemu left over, when switching from unix-domain sockets to named-pipes used for virtio_console handling

Actions

Copy link

Updated by cfconrad over 5 years ago

Can't fcntl($fh, '1031', '1048576'): Operation not permitted at /usr/lib/os-autoinst/consoles/virtio_terminal.pm line 147

I think it is related to https://github.com/os-autoinst/openQA/pull/2257 which should adopt the apparmor profile.
Going to double check the worker.

Actions

Copy link

#10

Updated by cfconrad over 5 years ago

The workers /etc/apparmor.d/usr.share.openqa.script.worker looks ok. Need further investigation.

Actions

Copy link

#11

Updated by cfconrad over 5 years ago

From my point of view, it cannot be related to apparmor. The same IOCTLs are used before on the second console:

[2019-12-07T14:37:30.891 CET] [debug] <<< consoles::virtio_terminal::open_pipe(pipe_prefix='/var/lib/openqa/pool/7/virtio_console1')
[2019-12-07T14:37:30.891 CET] [info] ::: consoles::virtio_terminal::open_pipe: Set PIPE_SZ from 65536 to 1048576
[2019-12-07T14:37:30.891 CET] [info] ::: consoles::virtio_terminal::open_pipe: Set PIPE_SZ from 65536 to 1048576

But then they fail on /var/lib/openqa/pool/7/virtio_console.

Maybe the worker was out of ram?

Actions

Copy link

#12

Updated by cfconrad over 5 years ago

@okurz, can you tell me if this happen often? And if so, how can I track it.

Actions

Copy link

#13

Updated by pvorel over 5 years ago

IMHO missing RAM wouldn't return Operation not permitted (EPERM). It might not be apparmor, but I bet it's security related.

Actions

Copy link

#14

Updated by cfconrad over 5 years ago

@pvorel I don't understand why this happen on one pipe but close after on other not and both pipes are created by the same process.

When you search in autoinst-log.txt for open_pipe you will find the same command multiple times succeed without any problem.

So most of the time we see this:

[2019-12-07T14:28:03.397 CET] [debug] <<< consoles::virtio_terminal::open_pipe(pipe_prefix='/var/lib/openqa/pool/7/virtio_console1')
[2019-12-07T14:28:03.398 CET] [info] ::: consoles::virtio_terminal::open_pipe: Set PIPE_SZ from 65536 to 1048576
[2019-12-07T14:28:03.398 CET] [info] ::: consoles::virtio_terminal::open_pipe: Set PIPE_SZ from 65536 to 1048576
[2019-12-07T14:28:03.398 CET] [debug] <<< consoles::virtio_terminal::open_pipe(pipe_prefix='/var/lib/openqa/pool/7/virtio_console')
[2019-12-07T14:28:03.398 CET] [info] ::: consoles::virtio_terminal::open_pipe: Set PIPE_SZ from 65536 to 1048576
[2019-12-07T14:28:03.399 CET] [info] ::: consoles::virtio_terminal::open_pipe: Set PIPE_SZ from 65536 to 1048576

And only once the error.

Maybe we should just not autodie on such error and continue or try with less size.

Actions

Copy link

#15

Updated by okurz over 5 years ago

cfconrad wrote:

@okurz, can you tell me if this happen often? And if so, how can I track it.

https://gitlab.suse.de/openqa/auto-review/pipelines runs recurringly every day and labels incomplete jobs with this ticket whenever the issue is encountered. To find these issues you can query the database on osd, e.g. with

select job_id from comments where text ~ 'poo#55883';

in SQL which as of now only shows a single job, the one I reported about. Maybe there are no more but auto-review does not yet parse the "known_issue" from tickets, waiting for https://github.com/os-autoinst/scripts/pull/14

Actions

Copy link

#16