action #6618
closed
The job modules aren't updated correctly
Added by coolo over 9 years ago.
Updated over 9 years ago.
Category:
Regressions/Crashes
Description
I'm still looking for a theory how this can happen, but the result of the modules aren't updated reliable:
id | job_id | name | script | category | t_created | t_updated | result | soft_failure | milestone | important | fatal
--------+--------+-----------------------+---------------------------------------------+--------------+---------------------+---------------------+---------+--------------+-----------+-----------+-------
510801 | 16276 | isosize | tests/installation/isosize.pm | installation | 2015-03-10 11:25:25 | 2015-03-10 11:26:24 | passed | 0 | 0 | 1 | 0
510802 | 16276 | bootloader | tests/installation/bootloader.pm | installation | 2015-03-10 11:25:25 | 2015-03-10 11:26:24 | passed | 0 | 0 | 0 | 1
510803 | 16276 | welcome | tests/installation/welcome.pm | installation | 2015-03-10 11:25:25 | 2015-03-10 11:27:41 | passed | 0 | 0 | 0 | 1
510804 | 16276 | scc_registration | tests/installation/scc_registration.pm | installation | 2015-03-10 11:25:25 | 2015-03-10 11:28:00 | passed | 0 | 0 | 0 | 1
510805 | 16276 | addon_products_sle | tests/installation/addon_products_sle.pm | installation | 2015-03-10 11:25:25 | 2015-03-10 11:28:16 | passed | 0 | 0 | 0 | 1
510806 | 16276 | partitioning | tests/installation/partitioning.pm | installation | 2015-03-10 11:25:25 | 2015-03-10 11:28:26 | passed | 0 | 0 | 0 | 1
510807 | 16276 | partitioning_finish | tests/installation/partitioning_finish.pm | installation | 2015-03-10 11:25:25 | 2015-03-10 11:28:26 | passed | 0 | 0 | 0 | 1
510808 | 16276 | installer_timezone | tests/installation/installer_timezone.pm | installation | 2015-03-10 11:25:25 | 2015-03-10 11:29:05 | passed | 0 | 0 | 0 | 1
510809 | 16276 | logpackages | tests/installation/logpackages.pm | installation | 2015-03-10 11:25:25 | 2015-03-10 11:29:05 | passed | 0 | 0 | 0 | 1
510810 | 16276 | user_settings | tests/installation/user_settings.pm | installation | 2015-03-10 11:25:25 | 2015-03-10 11:29:05 | running | 0 | 0 | 0 | 1
510811 | 16276 | user_settings_root | tests/installation/user_settings_root.pm | installation | 2015-03-10 11:25:25 | 2015-03-10 11:29:25 | passed | 0 | 0 | 0 | 1
510812 | 16276 | installation_overview | tests/installation/installation_overview.pm | installation | 2015-03-10 11:25:25 | 2015-03-10 11:29:35 | passed | 0 | 0 | 0 | 1
510813 | 16276 | start_install | tests/installation/start_install.pm | installation | 2015-03-10 11:25:25 | 2015-03-10 11:29:48 | passed | 0 | 0 | 0 | 1
510814 | 16276 | livecdreboot | tests/installation/livecdreboot.pm | installation | 2015-03-10 11:25:25 | 2015-03-10 11:38:56 | passed | 0 | 0 | 0 | 1
510815 | 16276 | first_boot | tests/installation/first_boot.pm | installation | 2015-03-10 11:25:25 | 2015-03-10 11:39:30 | passed | 0 | 1 | 1 | 1
510816 | 16276 | consoletest_setup | tests/console/consoletest_setup.pm | console | 2015-03-10 11:25:25 | 2015-03-10 11:41:40 | passed | 0 | 1 | 1 | 1
510817 | 16276 | textinfo | tests/console/textinfo.pm | console | 2015-03-10 11:25:25 | 2015-03-10 11:42:30 | passed | 0 | 0 | 0 | 0
510818 | 16276 | hostname | tests/console/hostname.pm | console | 2015-03-10 11:25:25 | 2015-03-10 11:42:56 | passed | 0 | 1 | 1 | 1
510819 | 16276 | xorg_vt | tests/console/xorg_vt.pm | console | 2015-03-10 11:25:25 | 2015-03-10 11:43:05 | passed | 0 | 0 | 1 | 0
510820 | 16276 | zypper_lr | tests/console/zypper_lr.pm | console | 2015-03-10 11:25:25 | 2015-03-10 11:44:05 | passed | 0 | 0 | 0 | 0
510821 | 16276 | zypper_ref | tests/console/zypper_ref.pm | console | 2015-03-10 11:25:25 | 2015-03-10 11:46:35 | passed | 0 | 1 | 1 | 0
510822 | 16276 | yast2_lan | tests/console/yast2_lan.pm | console | 2015-03-10 11:25:26 | 2015-03-10 11:48:41 | passed | 0 | 0 | 0 | 0
510823 | 16276 | aplay | tests/console/aplay.pm | console | 2015-03-10 11:25:26 | 2015-03-10 11:50:15 | passed | 0 | 0 | 0 | 0
510824 | 16276 | glibc_i686 | tests/console/glibc_i686.pm | console | 2015-03-10 11:25:26 | 2015-03-10 11:50:45 | passed | 0 | 0 | 0 | 0
510825 | 16276 | zypper_up | tests/console/zypper_up.pm | console | 2015-03-10 11:25:26 | 2015-03-10 11:52:56 | passed | 0 | 1 | 0 | 0
510826 | 16276 | zypper_in | tests/console/zypper_in.pm | console | 2015-03-10 11:25:26 | 2015-03-10 11:52:56 | running | 0 | 0 | 0 | 0
510827 | 16276 | yast2_i | tests/console/yast2_i.pm | console | 2015-03-10 11:25:26 | 2015-03-10 11:25:26 | none | 0 | 0 | 0 | 0
510828 | 16276 | yast2_bootloader | tests/console/yast2_bootloader.pm | console | 2015-03-10 11:25:26 | 2015-03-10 11:57:24 | passed | 0 | 0 | 0 | 0
510829 | 16276 | sshd | tests/console/sshd.pm | console | 2015-03-10 11:25:26 | 2015-03-10 11:59:51 | passed | 0 | 1 | 0 | 0
510830 | 16276 | mtab | tests/console/mtab.pm | console | 2015-03-10 11:25:26 | 2015-03-10 11:59:51 | running | 0 | 0 | 0 | 0
510831 | 16276 | consoletest_finish | tests/console/consoletest_finish.pm | console | 2015-03-10 11:25:26 | 2015-03-10 11:25:26 | none | 0 | 1 | 1 | 1
Files
l (113 KB)
l |
|
coolo, 2015-03-10 16:12
|
|
- Status changed from New to Resolved
- Status changed from Resolved to New
No, that's not the only reason
id | job_id | name | script | category | t_created | t_updated | result | soft_failure | milestone | important | fatal
--------+--------+-----------------------+---------------------------------------------+--------------+---------------------+---------------------+---------+--------------+-----------+-----------+-------
512108 | 16300 | isosize | tests/installation/isosize.pm | installation | 2015-03-10 13:22:26 | 2015-03-10 13:22:26 | none | 0 | 0 | 1 | 0
512109 | 16300 | bootloader_uefi | tests/installation/bootloader_uefi.pm | installation | 2015-03-10 13:22:26 | 2015-03-10 13:22:26 | none | 0 | 0 | 0 | 1
512110 | 16300 | welcome | tests/installation/welcome.pm | installation | 2015-03-10 13:22:26 | 2015-03-10 13:22:26 | none | 0 | 0 | 0 | 1
512111 | 16300 | scc_registration | tests/installation/scc_registration.pm | installation | 2015-03-10 13:22:26 | 2015-03-10 13:22:26 | none | 0 | 0 | 0 | 1
512112 | 16300 | addon_products_sle | tests/installation/addon_products_sle.pm | installation | 2015-03-10 13:22:26 | 2015-03-10 13:22:26 | none | 0 | 0 | 0 | 1
512113 | 16300 | partitioning | tests/installation/partitioning.pm | installation | 2015-03-10 13:22:26 | 2015-03-10 13:22:26 | none | 0 | 0 | 0 | 1
512114 | 16300 | partitioning_finish | tests/installation/partitioning_finish.pm | installation | 2015-03-10 13:22:26 | 2015-03-10 13:22:26 | none | 0 | 0 | 0 | 1
512115 | 16300 | installer_timezone | tests/installation/installer_timezone.pm | installation | 2015-03-10 13:22:26 | 2015-03-10 13:25:57 | passed | 0 | 0 | 0 | 1
512116 | 16300 | logpackages | tests/installation/logpackages.pm | installation | 2015-03-10 13:22:26 | 2015-03-10 13:25:57 | passed | 0 | 0 | 0 | 1
512117 | 16300 | user_settings | tests/installation/user_settings.pm | installation | 2015-03-10 13:22:26 | 2015-03-10 13:26:06 | passed | 0 | 0 | 0 | 1
512118 | 16300 | user_settings_root | tests/installation/user_settings_root.pm | installation | 2015-03-10 13:22:26 | 2015-03-10 13:26:11 | passed | 0 | 0 | 0 | 1
512119 | 16300 | installation_overview | tests/installation/installation_overview.pm | installation | 2015-03-10 13:22:26 | 2015-03-10 13:26:21 | passed | 0 | 0 | 0 | 1
512120 | 16300 | start_install | tests/installation/start_install.pm | installation | 2015-03-10 13:22:26 | 2015-03-10 13:26:47 | passed | 0 | 0 | 0 | 1
512121 | 16300 | livecdreboot | tests/installation/livecdreboot.pm | installation | 2015-03-10 13:22:26 | 2015-03-10 13:26:47 | running | 0 | 0 | 0 | 1
512122 | 16300 | first_boot | tests/installation/first_boot.pm | installation | 2015-03-10 13:22:26 | 2015-03-10 13:22:27 | none | 0 | 1 | 1 | 1
512123 | 16300 | consoletest_setup | tests/console/consoletest_setup.pm | console | 2015-03-10 13:22:27 | 2015-03-10 13:22:27 | none | 0 | 1 | 1 | 1
Mär 10 14:24:23 ix64ph1072 worker[32602]: 8515: WORKING 16300
Mär 10 14:25:43 ix64ph1072 worker[32602]: 2: Connection error: Premature connection close at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 194.
Mär 10 14:26:33 ix64ph1072 worker[32602]: 2: Connection error: Inactivity timeout at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 194.
Mär 10 14:27:28 ix64ph1072 worker[32602]: 2: Connection error: Inactivity timeout at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 194.
the api log has interesting gaps
10.162.2.72 - - [10/Mar/2015:14:22:26 +0100] "POST /api/v1/jobs/16300/status HTTP/1.1" 200 12 "-" "Mojolicious (Perl)"
10.162.2.72 - - [10/Mar/2015:14:22:31 +0100] "POST /api/v1/jobs/16300/status HTTP/1.1" 200 12 "-" "Mojolicious (Perl)"
10.162.2.72 - - [10/Mar/2015:14:22:36 +0100] "POST /api/v1/jobs/16300/status HTTP/1.1" 200 12 "-" "Mojolicious (Perl)"
10.162.2.72 - - [10/Mar/2015:14:22:41 +0100] "POST /api/v1/jobs/16300/status HTTP/1.1" 200 12 "-" "Mojolicious (Perl)"
10.162.2.72 - - [10/Mar/2015:14:22:46 +0100] "POST /api/v1/jobs/16300/status HTTP/1.1" 200 12 "-" "Mojolicious (Perl)"
10.162.2.72 - - [10/Mar/2015:14:22:51 +0100] "POST /api/v1/jobs/16300/status HTTP/1.1" 200 12 "-" "Mojolicious (Perl)"
10.162.2.72 - - [10/Mar/2015:14:22:56 +0100] "POST /api/v1/jobs/16300/status HTTP/1.1" 200 12 "-" "Mojolicious (Perl)"
10.162.2.72 - - [10/Mar/2015:14:23:01 +0100] "POST /api/v1/jobs/16300/status HTTP/1.1" 200 - "-" "Mojolicious (Perl)"
10.162.2.72 - - [10/Mar/2015:14:23:46 +0100] "POST /api/v1/jobs/16300/status HTTP/1.1" 200 12 "-" "Mojolicious (Perl)"
10.162.2.72 - - [10/Mar/2015:14:23:51 +0100] "POST /api/v1/jobs/16300/status HTTP/1.1" 200 - "-" "Mojolicious (Perl)"
10.162.2.72 - - [10/Mar/2015:14:24:36 +0100] "POST /api/v1/jobs/16300/status HTTP/1.1" 200 12 "-" "Mojolicious (Perl)"
10.162.2.72 - - [10/Mar/2015:14:24:41 +0100] "POST /api/v1/jobs/16300/status HTTP/1.1" 200 12 "-" "Mojolicious (Perl)"
10.162.2.72 - - [10/Mar/2015:14:24:46 +0100] "POST /api/v1/jobs/16300/status HTTP/1.1" 200 - "-" "Mojolicious (Perl)"
10.162.2.72 - - [10/Mar/2015:14:25:31 +0100] "POST /api/v1/jobs/16300/status HTTP/1.1" 200 12 "-" "Mojolicious (Perl)"
10.162.2.72 - - [10/Mar/2015:14:25:36 +0100] "POST /api/v1/jobs/16300/status HTTP/1.1" 200 12 "-" "Mojolicious (Perl)"
[Tue Mar 10 14:23:50 2015] [debug] 200 OK (1.200226s, 0.833/s).
[Tue Mar 10 14:23:50 2015] [debug] POST "/api/v1/jobs/16295/status".
[Tue Mar 10 14:23:50 2015] [debug] Routing to controller "OpenQA::Controller::API::V1" and action "auth".
[Tue Mar 10 14:23:50 2015] [debug] API key from client: AD52F5D0A6FDDB53
[Tue Mar 10 14:23:50 2015] [debug] API auth by user: https://www.opensuse.org/openid/user/lnussel, operator: 1
[Tue Mar 10 14:23:50 2015] [debug] Routing to controller "OpenQA::Controller::API::V1::Job" and action "update_status".
[Tue Mar 10 14:24:35 2015] [debug] DIR /var/lib/openqa/testresults/00016295-sle-11-SP4-Alpha-Server-DVD-i586-Build0941-RAID5
[Tue Mar 10 14:24:35 2015] [debug] DIR /var/lib/openqa/testresults/00016295-sle-11-SP4-Alpha-Server-DVD-i586-Build0941-RAID5
[Tue Mar 10 14:24:36 2015] [debug] 200 OK (45.633372s, 0.022/s).
[Tue Mar 10 14:24:36 2015] [debug] Inactivity timeout.
[Tue Mar 10 14:24:36 2015] [debug] Worker 9 websocket connection closed - 1006
[Tue Mar 10 14:24:36 2015] [debug] Inactivity timeout.
[Tue Mar 10 14:24:36 2015] [debug] Worker 6 websocket connection closed - 1006
[Tue Mar 10 14:24:36 2015] [debug] Inactivity timeout.
[Tue Mar 10 14:24:36 2015] [debug] Worker 7 websocket connection closed - 1006
[Tue Mar 10 14:24:36 2015] [debug] Inactivity timeout.
[Tue Mar 10 14:24:36 2015] [debug] Worker 11 websocket connection closed - 1006
[Tue Mar 10 14:24:36 2015] [debug] Inactivity timeout.
[Tue Mar 10 14:24:36 2015] [debug] Worker 8 websocket connection closed - 1006
[Tue Mar 10 14:24:36 2015] [debug] Inactivity timeout.
[Tue Mar 10 14:24:36 2015] [debug] Worker 10 websocket connection closed - 1006
[Tue Mar 10 14:24:36 2015] [debug] POST "/api/v1/jobs/16300/status".
Note how one POST /status takes 45s to save the PNGs - that means there were 8 more queued in that time ;(
45s for upload is really bad. How is the WebUI handling this? Isn't it stuck during uploads? After #241 update_status calls are rejected when api call is in progress. I'm afraid that may skip some screen shots, I don't yet fully understand new upload_status. :(
I can split the upload_status function into several /status calls, but I'm afraid this will open the whole for a 2nd timer even more. Possibly we should make update_status not reoccuring but one shot timers triggered by the previous call?
Hmm.. how will that solve this problem? It won't make changes any smaller since we will still need to wait until everything from previous call is uploaded. That is effectively what we have now (after #241), if api call is running event from recurrent timer is ignored.
What about we stop uploading screen shots for live view and proxy VNC directly? (e.g. https://github.com/InstantWebP2P/peer-vnc/tree/master/ ), or just upload them only when someone is actually looking?
And the only screen shots to be uploaded are those passed from needle match (e.g. assert_screen)?
Edit: I'm assuming here that the screen shots are the main bandwidth hog.
uploading the live screenshot only when someone looks is ok - as long as we upload one as soon as we get the command. Possibly the same for the log
But the 45 seconds are spend in openQA not in the worker, so my approach was to split the one huge api call into several smaller. And let's get prefork done :)
I deployed #241 now and still get this:
id | job_id | name | script | category | t_created | t_updated | result | soft_failure | milestone | important | fatal
--------+--------+----------------------------------+--------------------------------------------------------+--------------+---------------------+---------------------+---------+--------------+-----------+-----------+-------
513240 | 16340 | bootloader | tests/installation/bootloader.pm | installation | 2015-03-10 15:59:13 | 2015-03-10 16:00:22 | passed | 0 | 0 | 0 | 1
513241 | 16340 | welcome | tests/installation/welcome.pm | installation | 2015-03-10 15:59:13 | 2015-03-10 16:00:22 | running | 0 | 0 | 0 | 1
513242 | 16340 | check_medium | tests/installation/check_medium.pm | installation | 2015-03-10 15:59:13 | 2015-03-10 16:01:26 | passed | 0 | 0 | 0 | 0
513243 | 16340 | installation_mode | tests/installation/installation_mode.pm | installation | 2015-03-10 15:59:13 | 2015-03-10 16:01:34 | passed | 0 | 0 | 0 | 1
513244 | 16340 | installer_timezone | tests/installation/installer_timezone.pm | installation | 2015-03-10 15:59:13 | 2015-03-10 16:01:34 | running | 0 | 0 | 0 | 1
513245 | 16340 | server_base_scenario | tests/installation/server_base_scenario.pm | installation | 2015-03-10 15:59:13 | 2015-03-10 16:01:54 | passed | 0 | 0 | 0 | 0
513246 | 16340 | logpackages | tests/installation/logpackages.pm | installation | 2015-03-10 15:59:13 | 2015-03-10 16:01:54 | running | 0 | 0 | 0 | 1
513247 | 16340 | partitioning_raid_sle11 | tests/installation/partitioning_raid_sle11.pm | installation | 2015-03-10 15:59:13 | 2015-03-10 15:59:13 | none | 0 | 0 | 0 | 1
But the worker log shows nothing too obvious - note though that the time on worker and webui are off. I'll enable ntp now :)
I was looking into how results are collected in upload_status. I'm not yet completely in the picture, but doesn't it always write the complete status of indivitual tests from beggining to current_running?
1) read tests from $test_order (:257)
2) and then collects results for each test in read_result_file (:319, called :288).
If so, than there must be problem somewhere else and not in upload (storing on isotovideo or evaluation on scheduler side)
no, it doesn't. This removes the test when uploading:
my $test = (shift @$test_order)->{name};
- Status changed from New to Resolved
I didn't spot it today at all
- Status changed from Resolved to New
- Assignee set to oholecek
- Priority changed from Normal to High
the problem seems to be that POST hits a busy mojo and as we use a single thread the POST is discarded. Apache sends a 200 for this error, so the worker will ignore it too ;(
The most straightforward (hopefully) solution is using the websocket.
It didn't happen yesterday most likely as we had not as much TODO
- Status changed from New to In Progress
- Status changed from In Progress to Resolved
Also available in: Atom
PDF