Project

General

Profile

Actions

action #18608

open

[qe-core][tools][sle][functional][research][medium] Implement proper clean up for images on jump hosts, e.g. s390pb, hyperv host, svirt and a proper monitoring

Added by mgriessmeier over 7 years ago. Updated 10 months ago.

Status:
Blocked
Priority:
Low
Assignee:
Category:
-
Target version:
QA (public, currently private due to #173521) - future
Start date:
Due date:
% Done:

100%

Estimated time:
42.00 h

Description

Motivation

We saw a lot of failures recently due to full disk on s390pb which is caused due to the lack of proper clean up and monitoring

Acceptance criteria

  • AC1: Disks on jump hosts do not run full with assets
  • AC2: It's not just a custom script in custom cron job but linked to openQA
  • AC3: Limit is set to 50%

Suggestions

  • Ask mnowak if that is not also a problem for the hyperv host or maybe he has it already solved better since #18608#note-3?
  • Harmonize existing solutions if existing, at least collect them here
  • Come up with a proper approach covered by openQA, e.g. also mention it in the openQA or os-autoinst documentation regarding to jump hosts

Related issues 6 (1 open5 closed)

Related to openQA Tests (public) - action #32932: [sle][functional][u][hyperv] test fails in logs_from_installation_system - Increase timeout for uploading logsResolvedokurz2018-03-08

Actions
Related to openQA Tests (public) - action #32926: [sle][functional][y][hyperv][medium] avoid typing username before switched tty (was: test fails in yast2_i - (mising needles?, rather too low timeout for hyperv) for Installation Report succesful)Resolvedokurz2018-03-082018-05-22

Actions
Related to openQA Tests (public) - action #32929: [sle][functional][u][hyperv] test fails in postgresql_server - SubState=running not foundRejectedokurz2018-03-082018-07-03

Actions
Related to openQA Tests (public) - action #31507: extend storage for /var/lib/libvirt/images on s390pbResolvedmgriessmeier2018-02-08

Actions
Related to openQA Infrastructure (public) - action #154180: Proper kvm asset cleanup for s390x kvm backend (svirt) and testsWorkable

Actions
Follows openQA Tests (public) - action #19080: [s390x][zkvm] test cases fails by no space left on device to download zkvm-imageResolvedmgriessmeier2017-05-10

Actions
Actions #1

Updated by mgriessmeier over 7 years ago

  • Status changed from New to Resolved

workaround through a cronjob which is deleting unused qcow images on s390pb running every 6 hours

[root@s390pb images]# cat /usr/local/bin/cleanup-openqa-assets
#!/bin/sh -e
if [[ $(df | grep "/var/lib/libvirt/images" | awk '{print $5}' | sed "s/\%//") -gt 80 ]] ; then
    find /var/lib/libvirt/images/*.qcow2 ! -exec sudo fuser -s "{}" 2>/dev/null \; -exec rm -f {} \;
fi
Actions #2

Updated by mgriessmeier over 7 years ago

  • Parent task set to #17574
Actions #3

Updated by michalnowak over 7 years ago

Thanks! I just added similar script targeting qcow2, iso and img files on Xen & KVM openQA virt hosts.

Actions #4

Updated by mgriessmeier over 7 years ago

  • Follows action #19080: [s390x][zkvm] test cases fails by no space left on device to download zkvm-image added
Actions #5

Updated by mgriessmeier over 7 years ago

  • Status changed from Resolved to New
  • Assignee deleted (mgriessmeier)
  • Priority changed from Urgent to Normal

reopening, because apparently a cronjob is not the proper way of doing it - anyway, lowering priority and unassigning since I adjusted the cronjob to run more often and don't plan to work on it in the near future - feel free to take

Actions #6

Updated by okurz about 7 years ago

  • Subject changed from [tools][sles][functional] Implement proper clean up for images on s390pb and a proper monitoring to [tools][sle][functional] Implement proper clean up for images on s390pb and a proper monitoring
  • Due date set to 2018-01-30
  • Target version set to Milestone 14

we might have an idea about it again when we discuss with others how to do it properly.

Actions #7

Updated by okurz almost 7 years ago

  • Due date changed from 2018-01-30 to 2018-02-13

M14 only starts after 2018-01-30

Actions #8

Updated by okurz almost 7 years ago

  • Subject changed from [tools][sle][functional] Implement proper clean up for images on s390pb and a proper monitoring to [tools][sle][functional] Implement proper clean up for images on jump hosts, e.g. s390pb, hyperv host, svirt and a proper monitoring
  • Description updated (diff)
  • Due date deleted (2018-02-13)
  • Status changed from New to Workable
  • Target version deleted (Milestone 14)

@coolo is it "Ready"?

Actions #9

Updated by okurz almost 7 years ago

ok, great. http://lord.arch/tests/479 failed for me with

[2018-02-08T07:52:53.0208 CET] [debug] MATCH(rebootnow-390x-20160506:0.00)
[2018-02-08T07:52:53.0338 CET] [debug] MATCH(install_and_reboot-additional-packages-20170823:0.00)
[2018-02-08T07:52:53.0343 CET] [debug] no match: 3905.2s
[2018-02-08T07:52:53.0849 CET] [debug] no change: 3904.2s
[2018-02-08T07:52:54.0849 CET] [debug] no change: 3903.2s
[2018-02-08T07:52:55.0849 CET] [debug] no change: 3902.2s
[2018-02-08T07:52:56.0350 CET] [debug] considering VNC stalled, no update for 4.18 seconds
DIE Error connecting to host <10.161.145.7>: IO::Socket::INET: connect: Connection timed out
 at /usr/lib/os-autoinst/backend/baseclass.pm line 80.
    backend::baseclass::die_handler('OpenQA::Exception::VNCSetupError=HASH(0x6f2c320)') called at /usr/lib/perl5/vendor_perl/5.18.2/Exception/Class/Base.pm line 85
    Exception::Class::Base::throw('OpenQA::Exception::VNCSetupError', 'error', 'Error connecting to host <10.161.145.7>: IO::Socket::INET: co...') called at /usr/lib/os-autoinst/consoles/VNC.pm line 151
    consoles::VNC::login('consoles::VNC=HASH(0x6f2ac48)') called at /usr/lib/os-autoinst/consoles/VNC.pm line 842
    consoles::VNC::send_update_request('consoles::VNC=HASH(0x6f2ac48)') called at /usr/lib/os-autoinst/consoles/vnc_base.pm line 82
    consoles::vnc_base::request_screen_update('consoles::vnc_base=HASH(0x55da488)', undef) called at /usr/lib/os-autoinst/backend/baseclass.pm line 598
    backend::baseclass::bouncer('backend::svirt=HASH(0x7a7c2d8)', 'request_screen_update', undef) called at /usr/lib/os-autoinst/backend/baseclass.pm line 581
    backend::baseclass::request_screen_update('backend::svirt=HASH(0x7a7c2d8)') called at /usr/lib/os-autoinst/backend/baseclass.pm line 177
    eval {...} called at /usr/lib/os-autoinst/backend/baseclass.pm line 156
    backend::baseclass::run_capture_loop('backend::svirt=HASH(0x7a7c2d8)') called at /usr/lib/os-autoinst/backend/baseclass.pm line 129
    backend::baseclass::run('backend::svirt=HASH(0x7a7c2d8)', 5, 8) called at /usr/lib/os-autoinst/backend/driver.pm line 85
    backend::driver::start('backend::driver=HASH(0x6cbe360)') called at /usr/lib/os-autoinst/backend/driver.pm line 48
    backend::driver::new('backend::driver', 'svirt') called at /usr/bin/isotovideo line 211
    main::init_backend() called at /usr/bin/isotovideo line 280
[2018-02-08T07:55:03.0597 CET] [debug] Destroying openQA-SUT-12 virtual machine

Reason: Space depleted :(

Actions #10

Updated by okurz almost 7 years ago

  • Target version set to Milestone 16
Actions #12

Updated by okurz over 6 years ago

  • Related to action #32932: [sle][functional][u][hyperv] test fails in logs_from_installation_system - Increase timeout for uploading logs added
Actions #13

Updated by okurz over 6 years ago

  • Related to action #32926: [sle][functional][y][hyperv][medium] avoid typing username before switched tty (was: test fails in yast2_i - (mising needles?, rather too low timeout for hyperv) for Installation Report succesful) added
Actions #14

Updated by okurz over 6 years ago

  • Related to action #32929: [sle][functional][u][hyperv] test fails in postgresql_server - SubState=running not found added
Actions #15

Updated by okurz over 6 years ago

  • Start date set to 2017-04-29

due to changes in a related task

Actions #16

Updated by okurz over 6 years ago

  • Start date set to 2017-04-29

due to changes in a related task

Actions #17

Updated by okurz over 6 years ago

  • Related to action #31507: extend storage for /var/lib/libvirt/images on s390pb added
Actions #18

Updated by okurz over 6 years ago

  • Start date set to 2017-04-29

due to changes in a related task

Actions #19

Updated by okurz over 6 years ago

  • Subject changed from [tools][sle][functional] Implement proper clean up for images on jump hosts, e.g. s390pb, hyperv host, svirt and a proper monitoring to [tools][sle][u][functional] Implement proper clean up for images on jump hosts, e.g. s390pb, hyperv host, svirt and a proper monitoring
  • Due date set to 2018-07-03
  • Target version changed from Milestone 16 to Milestone 17

-> S20

Actions #20

Updated by okurz over 6 years ago

  • Target version changed from Milestone 17 to Milestone 17
Actions #21

Updated by riafarov over 6 years ago

  • Subject changed from [tools][sle][u][functional] Implement proper clean up for images on jump hosts, e.g. s390pb, hyperv host, svirt and a proper monitoring to [tools][sle][u][functional][research][medium] Implement proper clean up for images on jump hosts, e.g. s390pb, hyperv host, svirt and a proper monitoring
  • Priority changed from Normal to Low

Next step: try to come up with better solution, if it's possible and then propose solution. As of now we don't have better solution in mind.

Lowering priority for this sprint.

Actions #22

Updated by mgriessmeier over 6 years ago

  • Start date set to 2017-04-29

due to changes in a related task

Actions #23

Updated by mgriessmeier over 6 years ago

  • Due date changed from 2018-07-03 to 2018-07-31

low prio, due to hackweek - moving to sprint 22

Actions #24

Updated by okurz over 6 years ago

  • Start date set to 2017-04-29

due to changes in a related task

Actions #25

Updated by okurz over 6 years ago

  • Start date set to 2017-04-29

due to changes in a related task

Actions #26

Updated by okurz over 6 years ago

  • Due date deleted (2018-07-31)
  • Target version changed from Milestone 17 to future
Actions #27

Updated by okurz over 6 years ago

  • Start date set to 2017-04-29

due to changes in a related task

Actions #28

Updated by okurz over 6 years ago

https://bugzilla.suse.com/show_bug.cgi?id=1103826 created about problems of all suse-kvm tests failing because of disk full, RESOLVED INVALID by mgriessmeier relating to this ticket here – rightly so.

Actions #29

Updated by okurz over 5 years ago

  • Start date set to 2017-04-29

due to changes in a related task

Actions #30

Updated by okurz over 5 years ago

  • Project changed from openQA Project (public) to openQA Infrastructure (public)
  • Category deleted (168)
Actions #31

Updated by SLindoMansilla over 5 years ago

  • Start date set to 2017-04-29

due to changes in a related task

Actions #32

Updated by SLindoMansilla over 5 years ago

  • Description updated (diff)
Actions #33

Updated by SLindoMansilla over 5 years ago

  • Start date set to 2017-04-29

due to changes in a related task

Actions #34

Updated by SLindoMansilla over 5 years ago

  • Target version changed from future to Milestone 27
Actions #35

Updated by SLindoMansilla over 5 years ago

  • Start date set to 2017-04-29

due to changes in a related task

Actions #36

Updated by SLindoMansilla over 5 years ago

  • Estimated time set to 42.00 h
Actions #37

Updated by mgriessmeier about 5 years ago

  • Start date set to 2017-04-29

due to changes in a related task

Actions #38

Updated by mgriessmeier about 5 years ago

  • Target version changed from Milestone 27 to Milestone 30+
Actions #39

Updated by mgriessmeier almost 5 years ago

  • Start date set to 2017-04-29

due to changes in a related task

Actions #40

Updated by mgriessmeier almost 5 years ago

  • Target version changed from Milestone 30+ to Milestone 35+
Actions #41

Updated by tjyrinki_suse about 4 years ago

  • Subject changed from [tools][sle][u][functional][research][medium] Implement proper clean up for images on jump hosts, e.g. s390pb, hyperv host, svirt and a proper monitoring to [qe-core][tools][sle][functional][research][medium] Implement proper clean up for images on jump hosts, e.g. s390pb, hyperv host, svirt and a proper monitoring
  • Parent task deleted (#17574)
Actions #42

Updated by szarate over 2 years ago

  • Target version changed from Milestone 35+ to future
Actions #43

Updated by okurz 10 months ago

  • Related to action #154180: Proper kvm asset cleanup for s390x kvm backend (svirt) and tests added
Actions #44

Updated by okurz 10 months ago

  • Status changed from Workable to Blocked
  • Assignee set to okurz
Actions

Also available in: Atom PDF