action #67972
one module failed to upload results ending an otherwise fine job as failed (was: Some tests are flagged as failed whereas they passed)
0%
Description
Observation¶
Some tests are flagged as failed whereas they passed. But one test result is none
and no screenshot uploaded.
Example: https://openqa.opensuse.org/tests/1296756
Within OSD infrastructure, probably since last deployment we observe the same, e.g.
https://openqa.suse.de/tests/4348455 - worker.log: Unable to read result-consoletest_finish.json: No such file or directory
https://openqa.suse.de/tests/4348369 - worker.log: Unable to read result-disable_grub_timeout.json: No such file or directory
https://openqa.suse.de/tests/4357369
Reproducible¶
Fails since (at least) Build 20200615-1 (current job)
Expected result¶
Last good: 20200614-1 (or more recent)
Further details¶
Always latest result in this scenario: latest
Related issues
History
#1
Updated by okurz 7 months ago
- Subject changed from Some tests are flagged as failed whereas they passed to one module failed to upload results ending an otherwise fine job as failed (was: Some tests are flagged as failed whereas they passed)
- Category set to Concrete Bugs
It is expected behaviour that a job can only end up as "passed" if all test modules are "passed" as well. So the question is really why this one job module failed to upload.
#3
Updated by okurz 7 months ago
- Priority changed from Normal to Low
- Target version set to future
- Difficulty set to hard
Unless we can reproduce this I consider the issue infeasible to be fixed.
ggardet_arm I hope you can help to find jobs where this happens again if at all.
#4
Updated by okurz 7 months ago
- Has duplicate action #68104: [tools] test fails being unable to read result-<test>.json: No such file or directory added
#5
Updated by okurz 7 months ago
- Description updated (diff)
- Status changed from New to Workable
- Priority changed from Low to Urgent
- Target version changed from future to Ready
- Difficulty deleted (
hard)
seems to be a problem we recently introduced and it can be reproduced on osd as well, see #68104
I merged in the content from #68104
Likely the change is one of the following from http://mailman.suse.de/mlarch/SuSE/openqa/2020/openqa.2020.06/msg00003.html :
- Update to version 4.6.1591699942.8543386ed: * Dependency cron 200609 * Add notes about arbitrary reasons * Remove out-dated comments; stopping will no longer be delayed as stated * Use newly introduced constants for worker commands and stop reasons * Improve dealing with worker commands and stop reasons * Add missing copyright header in constants modules * Remove stop type 'scheduler_abort' which is not used anymore * Improve coding style in set_running * Prevent Perl warning in downloader code * Improve UI for selecting the display order of builds * Select interval checkbox for interval from query parameter if applicable * Improve wording in the index page form, mainly to improve consistency * Improve coding style in JavaScript/CSS/template files * Avoid accumulating `comments_page` query parameters when using pagination * Fix typo in pagination code * Add autorefresh in the index page (#3141) * Maintain pagination when an user role changes * Fix pagination in the group overview page * Dependency cron 200605 * Turn two schema functions into methods * Hide livestream when loading test details and job is currently uploading * Update out-dated comment; the image upload is async via Mojo::IOLoop->subprocess * t: Use Test::MockModule->redefine in t/28-logging.t * Only reload test schedule when test_order.json has been changed * Make restart button work again after the job state changed * Dependency cron 200604 * t: Allow to configure amount of unstable/rejective workers in 05-scheduler-full.t * t: Simplify worker startup in 05-scheduler-full with map * t: Improve test description strings in 05-scheduler-full * t: Fix warning in 28-logging.t with strict MockModule and Test::Warnings * Extract methods for "ignore_known_images" and "ignore_known_files" in Worker::Job * Simplify Worker::Job slightly * Remove unnecessary single-line step in Worker::Job * Use the more idiomatic ->find instead of ->single * Replace duplicate method OpenQA::Schema::Result::Workers::to_string * Delete unnecessary pre-declaration of log_warning in OpenQA::Log * Generalize handling the video * Remove dead code from OpenQA::WebAPI::Plugin::ObsRsync * doc: Improve virtio/svirt serial terminal docs * Fix tests on OBS when "ps" is not installed * Use :report_warnings feature of Test::Warnings * Mark "25-cache-service.t" as unstable test as observed in circleci * Mark old issues and pull requests as stale * Extract methods for "ignore_known_images" and "ignore_known_files" in Worker::Job * Simplify Worker::Job slightly * Remove unnecessary single-line step in Worker::Job * t: Fix "Dubious, test returned 9" in 05-scheduler-full.t * t: Fix unstable test 'Allocated maximum number' in 05-scheduler-full * t: Sleep only when necessary when waiting for worker * t: Fix sporadic test failures on mis-assignment of jobs in scheduling * t: Increase robustness of 05-scheduler-full with reduced scheduler tick * t: Prevent use of occupied port in all full-stack/scalability tests * Makefile: Bump timeout limit for various tests for slow circleci * t: Provide unique process name for worker in scalability test as well * t: More cleanup in t/05-scheduler-full.t * Tue Jun 02 2020 coolo@suse.com - Update to version 4.6.1591092399.b9fb68988: * t: Make output nicer on time limit * Dependency cron 200601 * Avoid stopping the live log when the worker already works on different job * worker: Keep uploading results until logs and assets have been uploaded * worker: Log every 10th uploaded chunk to autoinst-log.txt as well * Keep 'live' tab shown while uploading for improved feedback * Avoid rendering 'live' tab when job is already done * Update comments in worker function _upload_result * Fix style issue in running.js so jshint shows no errors * Do not shadow $errors in utils-job-templates test * Capture base_state.json JSON error warning * Add '--merge' option to prove * Show an explicit warning when failing to wakeup scheduler * Prefetch parent job in JobModules::job_modules() * Capture screenshot created message in screenshot test * Dependency cron 200529 * t: Make process handling more robust with IPC::Run * Log errors when claiming OpenID identity * Fix visual artifact in audio result box * t: Use OpenQA::Test::TimeLimit in basic.t for a start * t: Add test module to check for test runtime os-autoinst changes: * Tue Jun 09 2020 okurz@suse.com - Update to version 4.6.1591696864.96b3acb2: * Move MakeMaker dependency to build_requires * Fix missing qemu-kvm dependency on non-x86_64 * Slightly simplify amt backend * spec: Fix 'update-deps' not to be packaged * Deps: rename %requires to %main_requires * Fix wrong comment about replacerect * Fix YAML::PP dependency * Add flags for non-verbose output to ffmpeg example * Use :report_warnings feature of Test::Warnings * Update documentation of videoencoder.cpp * Support using an external video encoder * Mark old issues and pull requests as stale * Capture isotovideo output in isotovideo test * Dependencies: Add code to update Dockerfile * Makefiles: set CXXFLAGS not CFLAGS or CPPFLAGS * Improve coding style in code file migration of QEMU backend * Improve 'Migrate to file failed [...]' error message to include unit * make update-deps * Dependencies: add tools/update-deps * Add YAML::PP dependency * Simplify use of Test::Output with bare blocks rather than sub refs
Judging from the changelog text I suspect it is one of
* Dependency cron 200609 * Add notes about arbitrary reasons * Remove out-dated comments; stopping will no longer be delayed as stated * Use newly introduced constants for worker commands and stop reasons * Improve dealing with worker commands and stop reasons * Remove stop type 'scheduler_abort' which is not used anymore * Improve coding style in set_running * Prevent Perl warning in downloader code * Dependency cron 200605 * Hide livestream when loading test details and job is currently uploading * Update out-dated comment; the image upload is async via Mojo::IOLoop->subprocess * Only reload test schedule when test_order.json has been changed * Dependency cron 200604 * Dependency cron 200601 * Avoid stopping the live log when the worker already works on different job * worker: Keep uploading results until logs and assets have been uploaded * worker: Log every 10th uploaded chunk to autoinst-log.txt as well * Keep 'live' tab shown while uploading for improved feedback * Avoid rendering 'live' tab when job is already done * Update comments in worker function _upload_result * Do not shadow $errors in utils-job-templates test * Capture base_state.json JSON error warning * Prefetch parent job in JobModules::job_modules() * Capture screenshot created message in screenshot test * Dependency cron 200529
One can reduce the diff further by looking into actual source code changes.
EDIT: Based on looking into actual source code changes:
36d40a917 Dependency cron 200609 989c72f81 Use newly introduced constants for worker commands and stop reasons 86da6e1e8 Improve dealing with worker commands and stop reasons c12830442 Remove stop type 'scheduler_abort' which is not used anymore a58b3e925 Dependency cron 200605 3a566d824 Turn two schema functions into methods 0d1061afd Only reload test schedule when test_order.json has been changed 857c44176 Dependency cron 200604 f3c04a623 Avoid stopping the live log when the worker already works on different job a61c520a9 worker: Keep uploading results until logs and assets have been uploaded aab58ce63 worker: Log every 10th uploaded chunk to autoinst-log.txt as well ea13c22d6 Prefetch parent job in JobModules::job_modules() 3d5a2b7fd Dependency cron 200529
#7
Updated by tinita 7 months ago
I was debugging, and found out that I can reproduce it by adding a sleep here:
# os-autoinst/basetest.pm sub runtest { ... $self->done(); sleep 5; # happens sometimes, using 10 seconds it happens always $self->{execution_time} = execution_time($starttime); ...
In that case I get the Unable to read result-boot.json: No such file or directory
message, the test is labeled as none
, and the job is marked as failed.
So a timing issue as expected. That might give us something to work with for fixing it.
#10
Updated by tinita 7 months ago
The fix (revert) has been deployed.
Created a new PR for https://progress.opensuse.org/issues/58700