action #18608

openQA Project - action #17574: [tools]Add caching/syncing of assets

[tools][sle][u][functional][research][medium] Implement proper clean up for images on jump hosts, e.g. s390pb, hyperv host, svirt and a proper monitoring

Added by mgriessmeier almost 3 years ago. Updated about 1 month ago.

Status:WorkableStart date:
Priority:LowDue date:
Assignee:-% Done:

100%

Category:-Estimated time:42.00 hours
Target version:SUSE QA tests - Milestone 35+
Duration:

Description

Motivation

We saw a lot of failures recently due to full disk on s390pb which is caused due to the lack of proper clean up and monitoring

Acceptance criteria

  • AC1: Disks on jump hosts do not run full with assets
  • AC2: It's not just a custom script in custom cron job but linked to openQA
  • AC3: Limit is set to 50%

Suggestions

  • Ask mnowak if that is not also a problem for the hyperv host or maybe he has it already solved better since #18608#note-3?
  • Harmonize existing solutions if existing, at least collect them here
  • Come up with a proper approach covered by openQA, e.g. also mention it in the openQA or os-autoinst documentation regarding to jump hosts

Related issues

Related to openQA Tests - action #32932: [sle][functional][u][hyperv] test fails in logs_from_inst... Resolved 08/03/2018
Related to openQA Tests - action #32926: [sle][functional][y][hyperv][medium] avoid typing usernam... Resolved 08/03/2018 22/05/2018
Related to openQA Tests - action #32929: [sle][functional][u][hyperv] test fails in postgresql_ser... Rejected 08/03/2018 03/07/2018
Related to openQA Tests - action #31507: extend storage for /var/lib/libvirt/images on s390pb Resolved 08/02/2018
Follows openQA Tests - action #19080: [s390x][zkvm] test cases fails by no space left on device... Resolved 10/05/2017

History

#1 Updated by mgriessmeier almost 3 years ago

  • Status changed from New to Resolved

workaround through a cronjob which is deleting unused qcow images on s390pb running every 6 hours

[root@s390pb images]# cat /usr/local/bin/cleanup-openqa-assets
#!/bin/sh -e
if [[ $(df | grep "/var/lib/libvirt/images" | awk '{print $5}' | sed "s/\%//") -gt 80 ]] ; then
    find /var/lib/libvirt/images/*.qcow2 ! -exec sudo fuser -s "{}" 2>/dev/null \; -exec rm -f {} \;
fi

#2 Updated by mgriessmeier almost 3 years ago

  • Parent task set to #17574

#3 Updated by michalnowak almost 3 years ago

Thanks! I just added similar script targeting qcow2, iso and img files on Xen & KVM openQA virt hosts.

#4 Updated by mgriessmeier almost 3 years ago

  • Follows action #19080: [s390x][zkvm] test cases fails by no space left on device to download zkvm-image added

#5 Updated by mgriessmeier almost 3 years ago

  • Status changed from Resolved to New
  • Assignee deleted (mgriessmeier)
  • Priority changed from Urgent to Normal

reopening, because apparently a cronjob is not the proper way of doing it - anyway, lowering priority and unassigning since I adjusted the cronjob to run more often and don't plan to work on it in the near future - feel free to take

#6 Updated by okurz about 2 years ago

  • Subject changed from [tools][sles][functional] Implement proper clean up for images on s390pb and a proper monitoring to [tools][sle][functional] Implement proper clean up for images on s390pb and a proper monitoring
  • Due date set to 30/01/2018
  • Target version set to Milestone 14

we might have an idea about it again when we discuss with others how to do it properly.

#7 Updated by okurz about 2 years ago

  • Due date changed from 30/01/2018 to 13/02/2018

M14 only starts after 2018-01-30

#8 Updated by okurz about 2 years ago

  • Subject changed from [tools][sle][functional] Implement proper clean up for images on s390pb and a proper monitoring to [tools][sle][functional] Implement proper clean up for images on jump hosts, e.g. s390pb, hyperv host, svirt and a proper monitoring
  • Description updated (diff)
  • Due date deleted (13/02/2018)
  • Status changed from New to Workable
  • Target version deleted (Milestone 14)

@coolo is it "Ready"?

#9 Updated by okurz about 2 years ago

ok, great. http://lord.arch/tests/479 failed for me with

[2018-02-08T07:52:53.0208 CET] [debug] MATCH(rebootnow-390x-20160506:0.00)
[2018-02-08T07:52:53.0338 CET] [debug] MATCH(install_and_reboot-additional-packages-20170823:0.00)
[2018-02-08T07:52:53.0343 CET] [debug] no match: 3905.2s
[2018-02-08T07:52:53.0849 CET] [debug] no change: 3904.2s
[2018-02-08T07:52:54.0849 CET] [debug] no change: 3903.2s
[2018-02-08T07:52:55.0849 CET] [debug] no change: 3902.2s
[2018-02-08T07:52:56.0350 CET] [debug] considering VNC stalled, no update for 4.18 seconds
DIE Error connecting to host <10.161.145.7>: IO::Socket::INET: connect: Connection timed out
 at /usr/lib/os-autoinst/backend/baseclass.pm line 80.
    backend::baseclass::die_handler('OpenQA::Exception::VNCSetupError=HASH(0x6f2c320)') called at /usr/lib/perl5/vendor_perl/5.18.2/Exception/Class/Base.pm line 85
    Exception::Class::Base::throw('OpenQA::Exception::VNCSetupError', 'error', 'Error connecting to host <10.161.145.7>: IO::Socket::INET: co...') called at /usr/lib/os-autoinst/consoles/VNC.pm line 151
    consoles::VNC::login('consoles::VNC=HASH(0x6f2ac48)') called at /usr/lib/os-autoinst/consoles/VNC.pm line 842
    consoles::VNC::send_update_request('consoles::VNC=HASH(0x6f2ac48)') called at /usr/lib/os-autoinst/consoles/vnc_base.pm line 82
    consoles::vnc_base::request_screen_update('consoles::vnc_base=HASH(0x55da488)', undef) called at /usr/lib/os-autoinst/backend/baseclass.pm line 598
    backend::baseclass::bouncer('backend::svirt=HASH(0x7a7c2d8)', 'request_screen_update', undef) called at /usr/lib/os-autoinst/backend/baseclass.pm line 581
    backend::baseclass::request_screen_update('backend::svirt=HASH(0x7a7c2d8)') called at /usr/lib/os-autoinst/backend/baseclass.pm line 177
    eval {...} called at /usr/lib/os-autoinst/backend/baseclass.pm line 156
    backend::baseclass::run_capture_loop('backend::svirt=HASH(0x7a7c2d8)') called at /usr/lib/os-autoinst/backend/baseclass.pm line 129
    backend::baseclass::run('backend::svirt=HASH(0x7a7c2d8)', 5, 8) called at /usr/lib/os-autoinst/backend/driver.pm line 85
    backend::driver::start('backend::driver=HASH(0x6cbe360)') called at /usr/lib/os-autoinst/backend/driver.pm line 48
    backend::driver::new('backend::driver', 'svirt') called at /usr/bin/isotovideo line 211
    main::init_backend() called at /usr/bin/isotovideo line 280
[2018-02-08T07:55:03.0597 CET] [debug] Destroying openQA-SUT-12 virtual machine

Reason: Space depleted :(

#10 Updated by okurz about 2 years ago

  • Target version set to Milestone 16

#12 Updated by okurz almost 2 years ago

  • Related to action #32932: [sle][functional][u][hyperv] test fails in logs_from_installation_system - Increase timeout for uploading logs added

#13 Updated by okurz almost 2 years ago

  • Related to action #32926: [sle][functional][y][hyperv][medium] avoid typing username before switched tty (was: test fails in yast2_i - (mising needles?, rather too low timeout for hyperv) for Installation Report succesful) added

#14 Updated by okurz almost 2 years ago

  • Related to action #32929: [sle][functional][u][hyperv] test fails in postgresql_server - SubState=running not found added

#15 Updated by okurz almost 2 years ago

  • Start date set to 29/04/2017

due to changes in a related task

#16 Updated by okurz almost 2 years ago

  • Start date set to 29/04/2017

due to changes in a related task

#17 Updated by okurz almost 2 years ago

  • Related to action #31507: extend storage for /var/lib/libvirt/images on s390pb added

#18 Updated by okurz almost 2 years ago

  • Start date set to 29/04/2017

due to changes in a related task

#19 Updated by okurz almost 2 years ago

  • Subject changed from [tools][sle][functional] Implement proper clean up for images on jump hosts, e.g. s390pb, hyperv host, svirt and a proper monitoring to [tools][sle][u][functional] Implement proper clean up for images on jump hosts, e.g. s390pb, hyperv host, svirt and a proper monitoring
  • Due date set to 03/07/2018
  • Target version changed from Milestone 16 to Milestone 17

-> S20

#20 Updated by okurz over 1 year ago

  • Target version changed from Milestone 17 to Milestone 17

#21 Updated by riafarov over 1 year ago

  • Subject changed from [tools][sle][u][functional] Implement proper clean up for images on jump hosts, e.g. s390pb, hyperv host, svirt and a proper monitoring to [tools][sle][u][functional][research][medium] Implement proper clean up for images on jump hosts, e.g. s390pb, hyperv host, svirt and a proper monitoring
  • Priority changed from Normal to Low

Next step: try to come up with better solution, if it's possible and then propose solution. As of now we don't have better solution in mind.

Lowering priority for this sprint.

#22 Updated by mgriessmeier over 1 year ago

  • Start date set to 29/04/2017

due to changes in a related task

#23 Updated by mgriessmeier over 1 year ago

  • Due date changed from 03/07/2018 to 31/07/2018

low prio, due to hackweek - moving to sprint 22

#24 Updated by okurz over 1 year ago

  • Start date set to 29/04/2017

due to changes in a related task

#25 Updated by okurz over 1 year ago

  • Start date set to 29/04/2017

due to changes in a related task

#26 Updated by okurz over 1 year ago

  • Due date deleted (31/07/2018)
  • Target version changed from Milestone 17 to future

#27 Updated by okurz over 1 year ago

  • Start date set to 29/04/2017

due to changes in a related task

#28 Updated by okurz over 1 year ago

https://bugzilla.suse.com/show_bug.cgi?id=1103826 created about problems of all suse-kvm tests failing because of disk full, RESOLVED INVALID by mgriessmeier relating to this ticket here – rightly so.

#29 Updated by okurz 8 months ago

  • Start date set to 29/04/2017

due to changes in a related task

#30 Updated by okurz 8 months ago

  • Project changed from openQA Project to openQA Infrastructure
  • Category deleted (168)

#31 Updated by SLindoMansilla 6 months ago

  • Start date set to 29/04/2017

due to changes in a related task

#32 Updated by SLindoMansilla 6 months ago

  • Description updated (diff)

#33 Updated by SLindoMansilla 6 months ago

  • Start date set to 29/04/2017

due to changes in a related task

#34 Updated by SLindoMansilla 6 months ago

  • Target version changed from future to Milestone 27

#35 Updated by SLindoMansilla 6 months ago

  • Start date set to 29/04/2017

due to changes in a related task

#36 Updated by SLindoMansilla 6 months ago

  • Estimated time set to 42.00

#37 Updated by mgriessmeier 5 months ago

  • Start date set to 29/04/2017

due to changes in a related task

#38 Updated by mgriessmeier 5 months ago

  • Target version changed from Milestone 27 to Milestone 30+

#39 Updated by mgriessmeier about 1 month ago

  • Start date set to 29/04/2017

due to changes in a related task

#40 Updated by mgriessmeier about 1 month ago

  • Target version changed from Milestone 30+ to Milestone 35+

Also available in: Atom PDF