Project

General

Profile

action #25404

[opensuse][o3]openQA's failure rate on typing commands increased again

Added by dimstar over 5 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Urgent
Category:
Infrastructure
Target version:
-
Start date:
2017-09-19
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Since yesterday, the amount of jobs failing due to mistyped commands increased a lot, like for example:

https://openqa.opensuse.org/tests/488048#step/setup_zdup/20

the 'put sgr0' is in caps instead of lowercase; it's often in this command where things fail, be it mistyped (missing '$' sign, or caps)

History

#1 Updated by okurz over 5 years ago

  • Subject changed from openQA's failure rate on typing commands increased again to [opensuse][o3]openQA's failure rate on typing commands increased again
  • Category changed from Bugs in existing tests to Infrastructure
  • Priority changed from Normal to Urgent

IRC log from [#opensuse-factory](irc://chat.freenode.net/opensuse-factory):

‎[‎18 Sep 2017 17:30:20‏] ‎<‎DimStar‎>‎ https://openqa.opensuse.org/tests/487699#step/setup_zdup/20 - somebody trying to change the prompt again?
‎[‎18 Sep 2017 17:33:19‏] ‎<‎DimStar‎>‎ seems the code is fine - but openQA is very drunk when typing (test run 487697 typed it differently wrong)
‎[‎18 Sep 2017 17:34:36‏] ‎<‎DimStar‎>‎ sysrich: I don't think that update was a good idea; openQA can't type at all anymore!?! https://openqa.opensuse.org/tests/487691#step/user_settings/4 (password does not match on the confirm field)
‎[‎18 Sep 2017 17:35:26‏] ‎<‎sysrich‎>‎ DimStar, himm..thats weird, there is no major changes to the worker AFAIK..okurz?
‎[‎18 Sep 2017 17:35:44‏] ‎<‎DimStar‎>‎ hm.. strange; well, in worst case it IS a product bug; but good luck with that :)
‎[‎18 Sep 2017 17:37:01‏] ‎<‎sysrich‎>‎ DimStar, "at all" is an exaggeration - https://openqa.opensuse.org/tests/487685
‎[‎18 Sep 2017 17:37:29‏] ‎<‎DimStar‎>‎ sysrich: ok, it has a few cases it manages to type stuff right
‎[‎18 Sep 2017 17:38:10‏] ‎<‎sysrich‎>‎ DimStar, worker1 seems fine, it's worker4 that seems overloaded
‎[‎18 Sep 2017 17:38:16‏] ‎<‎sysrich‎>‎ by a factor of about 70% more load than worker1
‎[‎18 Sep 2017 17:39:00‏] ‎<‎okurz‎>‎ I don't recall any related changes to os-autoinst, maybe there was a qemu update?
‎[‎18 Sep 2017 17:39:13‏] ‎<‎okurz‎>‎ No, does not seem like it
[‎18 Sep 2017 17:39:34‏] ‎<‎tacit‎>‎ Yes there was a qemu updater.
‎[‎18 Sep 2017 17:39:49‏] ‎<‎okurz‎>‎ aha!
‎[‎18 Sep 2017 17:40:35‏] ‎<‎sysrich‎>‎ yeah we got the latest SLE/Leap qemu updates
‎[‎18 Sep 2017 17:40:49‏] ‎<‎sysrich‎>‎ okurz, what qemu do we have on osd?
‎[‎18 Sep 2017 17:41:18‏] ‎<‎okurz‎>‎ sysrich: qemu-2.6.2-31.3.3 from openSUSE-Leap-42.2-Update installed today
[…]
‎[‎18 Sep 2017 17:43:31‏] ‎<‎okurz‎>‎ the version is on o3. osd has 2.6.2-29.4
[‎18 Sep 2017 17:44:16‏] ‎<‎okurz‎>‎ well then, maybe downgrade one worker and hammer it over night?

#2 Updated by okurz over 5 years ago

  • Status changed from New to In Progress
  • Assignee set to SLindoMansilla

@SLindoMansilla who wants to help here. Please stay in close contact with people on [#opensuse-factory](irc://chat.freenode.net/opensuse-factory), especially DimStar and sysrich. nsinger, mgriessmeier, foursixnine might also be able help regarding administration.

For checking statistics it should be helpful to trigger more tests and check how many fail with new and old qemu version, e.g.

  • trigger for i in {1..100}; do openqa_clone_job_o3 --skip-download --skip-chained-deps 488189 TEST=slindomansilla_check_mistyping_$i _GROUP="Development Tumbleweed" ; done
  • check how many fail with mistyping
  • downgrade qemu version on worker(s)
  • conduct check again
  • crosscheck statistics

#4 Updated by okurz over 5 years ago

#5 Updated by SLindoMansilla over 5 years ago

Currently no mistyping issues found after 100 jobs: https://openqa.opensuse.org/tests?match=slindomansilla_check_mistyping
They were executed after the rollback.

#6 Updated by SLindoMansilla over 5 years ago

  • Status changed from In Progress to Resolved

Since the problem does not appear after the rollback, and after 100 successfully jobs, this ticket can be closed.

#7 Updated by okurz over 5 years ago

tacit asked me if I can test a qemu update in http://download.opensuse.org/repositories/openSUSE:/Maintenance:/7326/openSUSE_Leap_42.2_Update/openSUSE:Maintenance:7326.repo

so I added that repo to openqaworker4, replaced the worker class "qemu_x86_64" by "qemu_x86_64_okurz_incident_7326" and started tests with

for i in {1..100}; do openqa_clone_job_o3 --skip-download --skip-chained-deps 494550 TEST=okurz_check_mistyping_poo25404_$i _GROUP="Development Tumbleweed" WORKER_CLASS=qemu_x86_64_okurz_incident_7326 ; done

will check results tomorrow.

EDIT: All jobs passed. Updated bug report https://bugzilla.suse.com/show_bug.cgi?id=1059369#c39. I removed the repo again and downgraded again to 2.6.2-29.4 to have a consistent state among all workers.

Also available in: Atom PDF