action #18608
[qe-core][tools][sle][functional][research][medium] Implement proper clean up for images on jump hosts, e.g. s390pb, hyperv host, svirt and a proper monitoring
100%
Description
Motivation¶
We saw a lot of failures recently due to full disk on s390pb which is caused due to the lack of proper clean up and monitoring
Acceptance criteria¶
- AC1: Disks on jump hosts do not run full with assets
- AC2: It's not just a custom script in custom cron job but linked to openQA
- AC3: Limit is set to 50%
Suggestions¶
- Ask mnowak if that is not also a problem for the hyperv host or maybe he has it already solved better since #18608#note-3?
- Harmonize existing solutions if existing, at least collect them here
- Come up with a proper approach covered by openQA, e.g. also mention it in the openQA or os-autoinst documentation regarding to jump hosts
Related issues
History
#1
Updated by mgriessmeier over 5 years ago
- Status changed from New to Resolved
workaround through a cronjob which is deleting unused qcow images on s390pb running every 6 hours
[root@s390pb images]# cat /usr/local/bin/cleanup-openqa-assets #!/bin/sh -e if [[ $(df | grep "/var/lib/libvirt/images" | awk '{print $5}' | sed "s/\%//") -gt 80 ]] ; then find /var/lib/libvirt/images/*.qcow2 ! -exec sudo fuser -s "{}" 2>/dev/null \; -exec rm -f {} \; fi
#2
Updated by mgriessmeier over 5 years ago
- Parent task set to #17574
#3
Updated by michalnowak over 5 years ago
Thanks! I just added similar script targeting qcow2, iso and img files on Xen & KVM openQA virt hosts.
#4
Updated by mgriessmeier about 5 years ago
- Follows action #19080: [s390x][zkvm] test cases fails by no space left on device to download zkvm-image added
#5
Updated by mgriessmeier about 5 years ago
- Status changed from Resolved to New
- Assignee deleted (
mgriessmeier) - Priority changed from Urgent to Normal
reopening, because apparently a cronjob is not the proper way of doing it - anyway, lowering priority and unassigning since I adjusted the cronjob to run more often and don't plan to work on it in the near future - feel free to take
#6
Updated by okurz over 4 years ago
- Subject changed from [tools][sles][functional] Implement proper clean up for images on s390pb and a proper monitoring to [tools][sle][functional] Implement proper clean up for images on s390pb and a proper monitoring
- Due date set to 2018-01-30
- Target version set to Milestone 14
we might have an idea about it again when we discuss with others how to do it properly.
#7
Updated by okurz over 4 years ago
- Due date changed from 2018-01-30 to 2018-02-13
M14 only starts after 2018-01-30
#8
Updated by okurz over 4 years ago
- Subject changed from [tools][sle][functional] Implement proper clean up for images on s390pb and a proper monitoring to [tools][sle][functional] Implement proper clean up for images on jump hosts, e.g. s390pb, hyperv host, svirt and a proper monitoring
- Description updated (diff)
- Due date deleted (
2018-02-13) - Status changed from New to Workable
- Target version deleted (
Milestone 14)
coolo is it "Ready"?
#9
Updated by okurz over 4 years ago
ok, great. http://lord.arch/tests/479 failed for me with
[2018-02-08T07:52:53.0208 CET] [debug] MATCH(rebootnow-390x-20160506:0.00) [2018-02-08T07:52:53.0338 CET] [debug] MATCH(install_and_reboot-additional-packages-20170823:0.00) [2018-02-08T07:52:53.0343 CET] [debug] no match: 3905.2s [2018-02-08T07:52:53.0849 CET] [debug] no change: 3904.2s [2018-02-08T07:52:54.0849 CET] [debug] no change: 3903.2s [2018-02-08T07:52:55.0849 CET] [debug] no change: 3902.2s [2018-02-08T07:52:56.0350 CET] [debug] considering VNC stalled, no update for 4.18 seconds DIE Error connecting to host <10.161.145.7>: IO::Socket::INET: connect: Connection timed out at /usr/lib/os-autoinst/backend/baseclass.pm line 80. backend::baseclass::die_handler('OpenQA::Exception::VNCSetupError=HASH(0x6f2c320)') called at /usr/lib/perl5/vendor_perl/5.18.2/Exception/Class/Base.pm line 85 Exception::Class::Base::throw('OpenQA::Exception::VNCSetupError', 'error', 'Error connecting to host <10.161.145.7>: IO::Socket::INET: co...') called at /usr/lib/os-autoinst/consoles/VNC.pm line 151 consoles::VNC::login('consoles::VNC=HASH(0x6f2ac48)') called at /usr/lib/os-autoinst/consoles/VNC.pm line 842 consoles::VNC::send_update_request('consoles::VNC=HASH(0x6f2ac48)') called at /usr/lib/os-autoinst/consoles/vnc_base.pm line 82 consoles::vnc_base::request_screen_update('consoles::vnc_base=HASH(0x55da488)', undef) called at /usr/lib/os-autoinst/backend/baseclass.pm line 598 backend::baseclass::bouncer('backend::svirt=HASH(0x7a7c2d8)', 'request_screen_update', undef) called at /usr/lib/os-autoinst/backend/baseclass.pm line 581 backend::baseclass::request_screen_update('backend::svirt=HASH(0x7a7c2d8)') called at /usr/lib/os-autoinst/backend/baseclass.pm line 177 eval {...} called at /usr/lib/os-autoinst/backend/baseclass.pm line 156 backend::baseclass::run_capture_loop('backend::svirt=HASH(0x7a7c2d8)') called at /usr/lib/os-autoinst/backend/baseclass.pm line 129 backend::baseclass::run('backend::svirt=HASH(0x7a7c2d8)', 5, 8) called at /usr/lib/os-autoinst/backend/driver.pm line 85 backend::driver::start('backend::driver=HASH(0x6cbe360)') called at /usr/lib/os-autoinst/backend/driver.pm line 48 backend::driver::new('backend::driver', 'svirt') called at /usr/bin/isotovideo line 211 main::init_backend() called at /usr/bin/isotovideo line 280 [2018-02-08T07:55:03.0597 CET] [debug] Destroying openQA-SUT-12 virtual machine
Reason: Space depleted :(
#10
Updated by okurz over 4 years ago
- Target version set to Milestone 16
#12
Updated by okurz over 4 years ago
- Related to action #32932: [sle][functional][u][hyperv] test fails in logs_from_installation_system - Increase timeout for uploading logs added
#13
Updated by okurz over 4 years ago
- Related to action #32926: [sle][functional][y][hyperv][medium] avoid typing username before switched tty (was: test fails in yast2_i - (mising needles?, rather too low timeout for hyperv) for Installation Report succesful) added
#14
Updated by okurz over 4 years ago
- Related to action #32929: [sle][functional][u][hyperv] test fails in postgresql_server - SubState=running not found added
#17
Updated by okurz over 4 years ago
- Related to action #31507: extend storage for /var/lib/libvirt/images on s390pb added
#19
Updated by okurz over 4 years ago
- Subject changed from [tools][sle][functional] Implement proper clean up for images on jump hosts, e.g. s390pb, hyperv host, svirt and a proper monitoring to [tools][sle][u][functional] Implement proper clean up for images on jump hosts, e.g. s390pb, hyperv host, svirt and a proper monitoring
- Due date set to 2018-07-03
- Target version changed from Milestone 16 to Milestone 17
-> S20
#20
Updated by okurz about 4 years ago
- Target version changed from Milestone 17 to Milestone 17
#21
Updated by riafarov about 4 years ago
- Subject changed from [tools][sle][u][functional] Implement proper clean up for images on jump hosts, e.g. s390pb, hyperv host, svirt and a proper monitoring to [tools][sle][u][functional][research][medium] Implement proper clean up for images on jump hosts, e.g. s390pb, hyperv host, svirt and a proper monitoring
- Priority changed from Normal to Low
Next step: try to come up with better solution, if it's possible and then propose solution. As of now we don't have better solution in mind.
Lowering priority for this sprint.
#22
Updated by mgriessmeier about 4 years ago
- Start date set to 2017-04-29
due to changes in a related task
#23
Updated by mgriessmeier about 4 years ago
- Due date changed from 2018-07-03 to 2018-07-31
low prio, due to hackweek - moving to sprint 22
#24
Updated by okurz about 4 years ago
- Start date set to 2017-04-29
due to changes in a related task
#25
Updated by okurz about 4 years ago
- Start date set to 2017-04-29
due to changes in a related task
#26
Updated by okurz about 4 years ago
- Due date deleted (
2018-07-31) - Target version changed from Milestone 17 to future
#27
Updated by okurz about 4 years ago
- Start date set to 2017-04-29
due to changes in a related task
#28
Updated by okurz about 4 years ago
https://bugzilla.suse.com/show_bug.cgi?id=1103826 created about problems of all suse-kvm tests failing because of disk full, RESOLVED INVALID by mgriessmeier relating to this ticket here – rightly so.
#29
Updated by okurz about 3 years ago
- Start date set to 2017-04-29
due to changes in a related task
#30
Updated by okurz about 3 years ago
- Project changed from openQA Project to openQA Infrastructure
- Category deleted (
168)
#31
Updated by SLindoMansilla almost 3 years ago
- Start date set to 2017-04-29
due to changes in a related task
#32
Updated by SLindoMansilla almost 3 years ago
- Description updated (diff)
#33
Updated by SLindoMansilla almost 3 years ago
- Start date set to 2017-04-29
due to changes in a related task
#34
Updated by SLindoMansilla almost 3 years ago
- Target version changed from future to Milestone 27
#35
Updated by SLindoMansilla almost 3 years ago
- Start date set to 2017-04-29
due to changes in a related task
#36
Updated by SLindoMansilla almost 3 years ago
- Estimated time set to 42.00 h
#37
Updated by mgriessmeier almost 3 years ago
- Start date set to 2017-04-29
due to changes in a related task
#38
Updated by mgriessmeier almost 3 years ago
- Target version changed from Milestone 27 to Milestone 30+
#39
Updated by mgriessmeier over 2 years ago
- Start date set to 2017-04-29
due to changes in a related task
#40
Updated by mgriessmeier over 2 years ago
- Target version changed from Milestone 30+ to Milestone 35+
#41
Updated by tjyrinki_suse almost 2 years ago
- Subject changed from [tools][sle][u][functional][research][medium] Implement proper clean up for images on jump hosts, e.g. s390pb, hyperv host, svirt and a proper monitoring to [qe-core][tools][sle][functional][research][medium] Implement proper clean up for images on jump hosts, e.g. s390pb, hyperv host, svirt and a proper monitoring
- Parent task deleted (
#17574)