Project

General

Profile

Actions

action #43949

closed

developer mode: commands process is running amok

Added by coolo about 6 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2018-11-18
Due date:
% Done:

0%

Estimated time:

Description

I enabled developer mode to timeout on assert_screen on my own instance and qemu stalls,
because all CPU is eaten by command process doing basically this:

poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 892) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 892) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 892) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 892) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 892) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 892) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 892) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 892) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 891) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 891) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 891) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 891) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 891) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 891) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 891) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 891) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 891) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 891) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 891) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 891) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 891) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 891) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 891) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 891) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 890) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 890) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 890) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 890) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 890) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 890) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 890) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 890) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 890) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 890) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 890) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 890) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 890) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 890) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 890) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 890) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 890) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 889) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 889) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 889) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 889) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 889) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 889) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=5, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI|POLLOUT}, {fd=3, events=POLLIN|POLLPRI|POLLOUT}], 3, 889) = 1 ([{fd=4, revents=POLLOUT}])
Actions #1

Updated by coolo about 6 years ago

  • Priority changed from Normal to Urgent
  • Target version set to Current Sprint

Also showing in production: http://paste.opensuse.org/48159434

Actions #2

Updated by mkittler about 6 years ago

  • Status changed from New to In Progress
  • Assignee set to mkittler
Actions #3

Updated by mkittler about 6 years ago

That's how it looks like on my system:

  • It looks like the following processes are busy (occupying each one core on my system) when paused on asset screen: backend (see comment), command-server, qemu-system-x86_64, scheduler (!)
  • The following processes are still idling: isotovideo, autotest, videoencoder, liveviewhandler, worker, web UI, resource allocator
  • The backend is only busy the first few minutes. After a while it was as stayed completely idle.
  • QEMU is only busy in the bootloader. When I connect via VNC and start e.g. the rescue system it is also idling. So likely it is just GRUB which consumes all the CPU here.
  • The scheduler is and stays idle after restarting it.
  • This leaves only the command server being busy.
  • According to Wireshark there's not much HTTP traffic going on on my machine at this time.

Conclusions so far:

  • So the interaction between isotovideo and autotest which I added for the developer mode is not causing the high CPU load.
  • Closing all open browser tabs for the job doesn't help. So the high load isn't caused by the developer mode's client code as well.
  • I'm wondering why the scheduler is on the list of busy processes. But this is likely a different problem (which should be investigated as well).
Actions #4

Updated by mkittler about 6 years ago

Seems to be the Mojo::IOLoop->singleton->reactor->io($isotovideo => sub { ... callback in commands.pm.

Actions #5

Updated by mkittler about 6 years ago

  • Status changed from In Progress to Workable
  • Assignee deleted (mkittler)
  • Priority changed from Urgent to Normal
  • Target version deleted (Current Sprint)

PR for the actual command server issue: https://github.com/os-autoinst/os-autoinst/pull/1061

Actions #6

Updated by mkittler about 6 years ago

  • Status changed from Workable to Feedback
  • Assignee set to mkittler
  • Target version set to Current Sprint

PR has been merged. I keep it as feedback because we likely still want to have an eye on it when the fix has been deployed.

Actions #7

Updated by mkittler almost 6 years ago

  • Status changed from Feedback to Resolved

The changes have been merged now. The impact of the issue for production wasn't as big as I thought so I'm marking it as resolved.

Actions

Also available in: Atom PDF