Project

General

Profile

action #18608

[qe-core][tools][sle][functional][research][medium] Implement proper clean up for images on jump hosts, e.g. s390pb, hyperv host, svirt and a proper monitoring

Added by mgriessmeier over 5 years ago. Updated 3 months ago.

Status:
Workable
Priority:
Low
Assignee:
-
Target version:
Start date:
Due date:
% Done:

100%

Estimated time:
42.00 h

Description

Motivation

We saw a lot of failures recently due to full disk on s390pb which is caused due to the lack of proper clean up and monitoring

Acceptance criteria

  • AC1: Disks on jump hosts do not run full with assets
  • AC2: It's not just a custom script in custom cron job but linked to openQA
  • AC3: Limit is set to 50%

Suggestions

  • Ask mnowak if that is not also a problem for the hyperv host or maybe he has it already solved better since #18608#note-3?
  • Harmonize existing solutions if existing, at least collect them here
  • Come up with a proper approach covered by openQA, e.g. also mention it in the openQA or os-autoinst documentation regarding to jump hosts

Related issues

Related to openQA Tests - action #32932: [sle][functional][u][hyperv] test fails in logs_from_installation_system - Increase timeout for uploading logsResolved2018-03-08

Related to openQA Tests - action #32926: [sle][functional][y][hyperv][medium] avoid typing username before switched tty (was: test fails in yast2_i - (mising needles?, rather too low timeout for hyperv) for Installation Report succesful)Resolved2018-03-082018-05-22

Related to openQA Tests - action #32929: [sle][functional][u][hyperv] test fails in postgresql_server - SubState=running not foundRejected2018-03-082018-07-03

Related to openQA Tests - action #31507: extend storage for /var/lib/libvirt/images on s390pbResolved2018-02-08

Follows openQA Tests - action #19080: [s390x][zkvm] test cases fails by no space left on device to download zkvm-imageResolved2017-05-10

History

#1 Updated by mgriessmeier over 5 years ago

  • Status changed from New to Resolved

workaround through a cronjob which is deleting unused qcow images on s390pb running every 6 hours

[root@s390pb images]# cat /usr/local/bin/cleanup-openqa-assets
#!/bin/sh -e
if [[ $(df | grep "/var/lib/libvirt/images" | awk '{print $5}' | sed "s/\%//") -gt 80 ]] ; then
    find /var/lib/libvirt/images/*.qcow2 ! -exec sudo fuser -s "{}" 2>/dev/null \; -exec rm -f {} \;
fi

#2 Updated by mgriessmeier over 5 years ago

  • Parent task set to #17574

#3 Updated by michalnowak over 5 years ago

Thanks! I just added similar script targeting qcow2, iso and img files on Xen & KVM openQA virt hosts.

#4 Updated by mgriessmeier about 5 years ago

  • Follows action #19080: [s390x][zkvm] test cases fails by no space left on device to download zkvm-image added

#5 Updated by mgriessmeier about 5 years ago

  • Status changed from Resolved to New
  • Assignee deleted (mgriessmeier)
  • Priority changed from Urgent to Normal

reopening, because apparently a cronjob is not the proper way of doing it - anyway, lowering priority and unassigning since I adjusted the cronjob to run more often and don't plan to work on it in the near future - feel free to take

#6 Updated by okurz over 4 years ago

  • Subject changed from [tools][sles][functional] Implement proper clean up for images on s390pb and a proper monitoring to [tools][sle][functional] Implement proper clean up for images on s390pb and a proper monitoring
  • Due date set to 2018-01-30
  • Target version set to Milestone 14

we might have an idea about it again when we discuss with others how to do it properly.

#7 Updated by okurz over 4 years ago

  • Due date changed from 2018-01-30 to 2018-02-13

M14 only starts after 2018-01-30

#8 Updated by okurz over 4 years ago

  • Subject changed from [tools][sle][functional] Implement proper clean up for images on s390pb and a proper monitoring to [tools][sle][functional] Implement proper clean up for images on jump hosts, e.g. s390pb, hyperv host, svirt and a proper monitoring
  • Description updated (diff)
  • Due date deleted (2018-02-13)
  • Status changed from New to Workable
  • Target version deleted (Milestone 14)

coolo is it "Ready"?

#9 Updated by okurz over 4 years ago

ok, great. http://lord.arch/tests/479 failed for me with

[2018-02-08T07:52:53.0208 CET] [debug] MATCH(rebootnow-390x-20160506:0.00)
[2018-02-08T07:52:53.0338 CET] [debug] MATCH(install_and_reboot-additional-packages-20170823:0.00)
[2018-02-08T07:52:53.0343 CET] [debug] no match: 3905.2s
[2018-02-08T07:52:53.0849 CET] [debug] no change: 3904.2s
[2018-02-08T07:52:54.0849 CET] [debug] no change: 3903.2s
[2018-02-08T07:52:55.0849 CET] [debug] no change: 3902.2s
[2018-02-08T07:52:56.0350 CET] [debug] considering VNC stalled, no update for 4.18 seconds
DIE Error connecting to host <10.161.145.7>: IO::Socket::INET: connect: Connection timed out
 at /usr/lib/os-autoinst/backend/baseclass.pm line 80.
    backend::baseclass::die_handler('OpenQA::Exception::VNCSetupError=HASH(0x6f2c320)') called at /usr/lib/perl5/vendor_perl/5.18.2/Exception/Class/Base.pm line 85
    Exception::Class::Base::throw('OpenQA::Exception::VNCSetupError', 'error', 'Error connecting to host <10.161.145.7>: IO::Socket::INET: co...') called at /usr/lib/os-autoinst/consoles/VNC.pm line 151
    consoles::VNC::login('consoles::VNC=HASH(0x6f2ac48)') called at /usr/lib/os-autoinst/consoles/VNC.pm line 842
    consoles::VNC::send_update_request('consoles::VNC=HASH(0x6f2ac48)') called at /usr/lib/os-autoinst/consoles/vnc_base.pm line 82
    consoles::vnc_base::request_screen_update('consoles::vnc_base=HASH(0x55da488)', undef) called at /usr/lib/os-autoinst/backend/baseclass.pm line 598
    backend::baseclass::bouncer('backend::svirt=HASH(0x7a7c2d8)', 'request_screen_update', undef) called at /usr/lib/os-autoinst/backend/baseclass.pm line 581
    backend::baseclass::request_screen_update('backend::svirt=HASH(0x7a7c2d8)') called at /usr/lib/os-autoinst/backend/baseclass.pm line 177
    eval {...} called at /usr/lib/os-autoinst/backend/baseclass.pm line 156
    backend::baseclass::run_capture_loop('backend::svirt=HASH(0x7a7c2d8)') called at /usr/lib/os-autoinst/backend/baseclass.pm line 129
    backend::baseclass::run('backend::svirt=HASH(0x7a7c2d8)', 5, 8) called at /usr/lib/os-autoinst/backend/driver.pm line 85
    backend::driver::start('backend::driver=HASH(0x6cbe360)') called at /usr/lib/os-autoinst/backend/driver.pm line 48
    backend::driver::new('backend::driver', 'svirt') called at /usr/bin/isotovideo line 211
    main::init_backend() called at /usr/bin/isotovideo line 280
[2018-02-08T07:55:03.0597 CET] [debug] Destroying openQA-SUT-12 virtual machine

Reason: Space depleted :(

#10 Updated by okurz over 4 years ago

  • Target version set to Milestone 16

#12 Updated by okurz over 4 years ago

  • Related to action #32932: [sle][functional][u][hyperv] test fails in logs_from_installation_system - Increase timeout for uploading logs added

#13 Updated by okurz over 4 years ago

  • Related to action #32926: [sle][functional][y][hyperv][medium] avoid typing username before switched tty (was: test fails in yast2_i - (mising needles?, rather too low timeout for hyperv) for Installation Report succesful) added

#14 Updated by okurz over 4 years ago

  • Related to action #32929: [sle][functional][u][hyperv] test fails in postgresql_server - SubState=running not found added

#15 Updated by okurz over 4 years ago

  • Start date set to 2017-04-29

due to changes in a related task

#16 Updated by okurz over 4 years ago

  • Start date set to 2017-04-29

due to changes in a related task

#17 Updated by okurz over 4 years ago

  • Related to action #31507: extend storage for /var/lib/libvirt/images on s390pb added

#18 Updated by okurz over 4 years ago

  • Start date set to 2017-04-29

due to changes in a related task

#19 Updated by okurz over 4 years ago

  • Subject changed from [tools][sle][functional] Implement proper clean up for images on jump hosts, e.g. s390pb, hyperv host, svirt and a proper monitoring to [tools][sle][u][functional] Implement proper clean up for images on jump hosts, e.g. s390pb, hyperv host, svirt and a proper monitoring
  • Due date set to 2018-07-03
  • Target version changed from Milestone 16 to Milestone 17

-> S20

#20 Updated by okurz about 4 years ago

  • Target version changed from Milestone 17 to Milestone 17

#21 Updated by riafarov about 4 years ago

  • Subject changed from [tools][sle][u][functional] Implement proper clean up for images on jump hosts, e.g. s390pb, hyperv host, svirt and a proper monitoring to [tools][sle][u][functional][research][medium] Implement proper clean up for images on jump hosts, e.g. s390pb, hyperv host, svirt and a proper monitoring
  • Priority changed from Normal to Low

Next step: try to come up with better solution, if it's possible and then propose solution. As of now we don't have better solution in mind.

Lowering priority for this sprint.

#22 Updated by mgriessmeier about 4 years ago

  • Start date set to 2017-04-29

due to changes in a related task

#23 Updated by mgriessmeier about 4 years ago

  • Due date changed from 2018-07-03 to 2018-07-31

low prio, due to hackweek - moving to sprint 22

#24 Updated by okurz about 4 years ago

  • Start date set to 2017-04-29

due to changes in a related task

#25 Updated by okurz about 4 years ago

  • Start date set to 2017-04-29

due to changes in a related task

#26 Updated by okurz about 4 years ago

  • Due date deleted (2018-07-31)
  • Target version changed from Milestone 17 to future

#27 Updated by okurz about 4 years ago

  • Start date set to 2017-04-29

due to changes in a related task

#28 Updated by okurz about 4 years ago

https://bugzilla.suse.com/show_bug.cgi?id=1103826 created about problems of all suse-kvm tests failing because of disk full, RESOLVED INVALID by mgriessmeier relating to this ticket here – rightly so.

#29 Updated by okurz about 3 years ago

  • Start date set to 2017-04-29

due to changes in a related task

#30 Updated by okurz about 3 years ago

  • Project changed from openQA Project to openQA Infrastructure
  • Category deleted (168)

#31 Updated by SLindoMansilla almost 3 years ago

  • Start date set to 2017-04-29

due to changes in a related task

#32 Updated by SLindoMansilla almost 3 years ago

  • Description updated (diff)

#33 Updated by SLindoMansilla almost 3 years ago

  • Start date set to 2017-04-29

due to changes in a related task

#34 Updated by SLindoMansilla almost 3 years ago

  • Target version changed from future to Milestone 27

#35 Updated by SLindoMansilla almost 3 years ago

  • Start date set to 2017-04-29

due to changes in a related task

#36 Updated by SLindoMansilla almost 3 years ago

  • Estimated time set to 42.00 h

#37 Updated by mgriessmeier almost 3 years ago

  • Start date set to 2017-04-29

due to changes in a related task

#38 Updated by mgriessmeier almost 3 years ago

  • Target version changed from Milestone 27 to Milestone 30+

#39 Updated by mgriessmeier over 2 years ago

  • Start date set to 2017-04-29

due to changes in a related task

#40 Updated by mgriessmeier over 2 years ago

  • Target version changed from Milestone 30+ to Milestone 35+

#41 Updated by tjyrinki_suse almost 2 years ago

  • Subject changed from [tools][sle][u][functional][research][medium] Implement proper clean up for images on jump hosts, e.g. s390pb, hyperv host, svirt and a proper monitoring to [qe-core][tools][sle][functional][research][medium] Implement proper clean up for images on jump hosts, e.g. s390pb, hyperv host, svirt and a proper monitoring
  • Parent task deleted (#17574)

#42 Updated by szarate 3 months ago

  • Target version changed from Milestone 35+ to future

Also available in: Atom PDF