# Ready Next tasks to be picked up by the QE tools team * tickets #88751: Problem login into openqa.suse.de and openqa.opensuse.org * tickets #114490: Tumbleweed snapshot URL fails * action #11048: copyright checker test for changed files * coordination #12876: [epic] Offer a way for jobs to dynamically schedule children * coordination #15850: [epic] Improve displaying job dependencies * action #17886: [dashboard] Create full screen view for openQA dashboard * coordination #19720: [epic] Simplify investigation of job failures * action #27955: Allow the worker_bridge to sync job status from a slaveUI to a masterUI * action #29634: Enable QEMU snapshots function for virtio-gpu * action #32545: Provide warning when user creates/triggers misconfigured multi-machine clusters * action #34486: database of "test cases" or how to search for tests we have in openQA * action #36250: Add automated check for isotovideo::INTERFACE * coordination #37958: [epic] self-tests in os-autoinst-distri-opensuse for impact on staging test schedule * coordination #39719: [saga][epic] Detection of "known failures" for stable tests, easy test results review and easy tracking of known issues * action #39905: Job trying to download worker local file "aavmf-aarch64-vars.bin" into cache and fails with 404 * action #42047: Mark (non) keyframes in OGV file correctly size:M * action #43619: Improve workflow for dealing with openQA's dependencies * action #43712: Update upstream dockerfiles to provide an easy to use docker image of openQA-webui * action #43715: Update upstream dockerfiles to provide an easy to use docker image of workers * action #43718: Docker image for webui and workers are versioned and uploaded to obs registry * action #45062: Better visualization of incompletes - show module in which incomplete happens * action #48389: self-tests in os-autoinst-distri-opensuse executing a simple (staging) test using isotovideo * action #49202: Make audit events more accessible to testers * action #52532: Harmonisation for team text messaging communication structure * action #53264: Show actual "version" in job groups for builds when configured + multiple versions are available * action #54557: All openQA tests incomplete but neither package build nor unit tests fail when a new file is added to os-autoinst without mentioning in Makefile.am * coordination #55364: [epic] Let's make codecov reports reliable * action #56534: support list of machines for YAML job templates * action #56591: Improve feedback for jobs in "Uploading" * action #57689: asset cleanup jobs do not run on o3 (results cleanup works), workaround: unlock locks manually * coordination #58184: [saga][epic][use case] full version control awareness within openQA * action #58304: A personal activity view for developers * action #58823: os-autoinst is too slow pressing F2 causing ARM tests to fail in "boot_to_desktop" * action #59043: Fix unstable/flaky full-stack test, i.e. remove sleep, and ui tests * action #59273: module result missing for incompleting job * action #59984: unstable test: t/05-scheduler-full.t * action #60029: OBS builds of os-autoinst fail for SLE12SP5 but not for SLE12SP3-4 * action #60443: job incomplete with "(?s)process exited: 0.*isotovideo failed.*EXIT 1":retry but no further details what is wrong * action #61164: Use of uninitialized value / unhandled debug output in os-autoinst t/17-basetest.t * action #62159: Asset GRU download not done by web UI host if job scheduled by `isos post`, fails to download and then cloned (was: … using the Web UI) * coordination #62420: [epic] Distinguish all types of incompletes * action #63451: Improve openqa-monitor-incompletes and openqa-label-known-issues to not report about incompletes with clone * action #63718: incomplete reason with just "quit"/"died" could provide more information * action #64120: Make weekly meetings available to community members * action #64520: Deal with jobs stuck in assigned state * coordination #64746: [saga][epic] Scale up: Efficient handling of large storage to be able to run current tests efficiently but keep big archives of old results * action #64776: [cache][worker] cache service suddenly stopped to download assets, all subsequent jobs needing download incomplete auto_review:"setup failure: Cache service status error: Premature connection close":retry * action #65004: The asset generated by FORCE_PUBLISH works fine but does not show up in "Logs & Assets" and is also not mentioned in logs * coordination #65118: [epic] multimachine test fails with symptoms "websocket refusing connection" and other unclear reasons * action #66066: incomplete with reason "died: terminated prematurely" but log shows error 404 failing to download asset into cache auto_review:"(?s)Download.*failed: 404.*No scripts" * action #66071: TEST is overridden in parent job when doing `openqa-clone-custom-git-refspec` * action #66376: MM tests fail in obscure way when tap device is not present * action #66664: circleci jobs do not retry, are aborted with "Too long with no output (exceeded 10m0s): context deadline exceeded" * action #66682: `apache2.service` and `openqa-webui.service` wrongly started by `openqa-worker.service` * action #66718: unstable/flaky/sporadic test: t/ui/27-plugin_obs_rsync_status_details.t * action #66721: Use GitHub actions for os-autoinst * coordination #66727: [epic] Define structure to define test suites not in openQA database * action #67000: Job incompletes due to malformed worker cache database disk image with auto_review:"Cache service status error.*(database disk image is malformed|Specified job ID is invalid).*":retry * action #67039: [sporadic] os-autoinst tests stuck in OBS, likely t/18-backend-qemu.t ? * action #67573: "OpenID Connect" support in openQA * coordination #67723: [epic] Remote openQA worker fails to run tests from openqa-clone-custom-git-refspec * action #67810: trigger OBS builds of openQA github pull requests * action #67972: one module failed to upload results ending an otherwise fine job as failed (was: Some tests are flagged as failed whereas they passed) * action #68131: [cache] Cache service misses requests from workers * action #68167: [tests][services] Tests for systemd services and/or the daemon wrapper scripts and wrong arguments should exit services with failure * action #68260: [packages] prevent submission of failed packages from devel:openQA:tested to Factory * action #68350: Improve package and base OS version support * action #68362: Click on link to *.qcow2 file display raw content inside the web browser * action #68473: The test job is stuck on BIOS. * action #68684: Port os-autoinst's build system to CMake * action #68851: Allow qemu host IP to be something other than 10.0.2.2 * action #68938: Try to reduce waiting time in case of qemu (early-)exits auto_review:"QEMU terminated before QMP connection could be established at /usr/lib/os-autoinst/OpenQA/Qemu/Proc.pm line 443":retry * action #68956: Restart the parent and child jobs of a test in a START_DIRECTLY_AFTER_TEST test chain * action #69058: openqa-review fails to download job module detail step text info files since around 2020-07-08 * action #69082: javascript lint tests * action #69085: Make "last good" a link to a job instead of plain job ID * action #69088: Present changes between packages on openQA worker machines in "investigation" * action #69133: os-autoinst self-tests fail to remove tempdir on test failure * action #69148: regression: output of self-tests on circle CI is missing from log view, "tests" tab shows nothing (due to timeout) * action #69154: Improve package and base OS version support: Bump versions to Leap 15.2 * action #69157: Improve package and base OS version support: Update contribution hints or review checklists for package impact * action #69160: Improve package and base OS version support: Enable tests for more repos * action #69178: workaround for #64776 using https://github.com/os-autoinst/scripts/blob/master/openqa-label-known-issues * action #69313: When using refspec for ppc for a particular job, PRODUCTDIR is set wrong * action #69316: Benefit from automatic issue updates on GitHub * action #69343: all jobs on o3 incomplete with auto_review:"Can.t locate object method.*pid.*" * action #69346: flaky/unstable os-autoinst test "22-svirt.t" * action #69355: [spike] redundant/load-balancing webui deployments of openQA * action #69439: not possible to delete expiration for api keys anymore over UI * action #69460: Video not available anymore from WebUI for some tests * action #69484: perl warning in os-autoinst t/99-full-stack.t "Subroutine OpenQA::Isotovideo::Utils::diag redefined at …os-autoinst/t/data/tests/main.pm line 19." not caught * action #69490: openqa webui sometimes displays the test suite instead of the needles in the needles drop down * action #69553: job incompletes with "Failed to rsync tests: exit code 10":retry, improve user feedback * action #69637: Faster localhost uploads from worker to webui * action #69784: Workers not considered offline after ungraceful disconnect; stale job detection has no effect in that case * action #69820: os-autoinst: unstable/flaky/sporadic test 28-signalblocker.t * action #69946: [tools] when needle does not exist, test is paused until MAX_JOB_TIME * action #69976: Show dependency graph for cloned jobs * action #69979: Advanced job restarting via the web UI * action #69997: Streamline "restart" and "duplicate" routes of the REST-API * action #70120: os-autoinst: failed 18-qemu.t, prevents submission to Factory * action #70189: openQA-common package broken on Tumbleweed * action #70252: Dependency problem: Parent shown as passed instead of failed * action #70384: Align tools/tidy scripts * action #70648: Show serial tests in a more feasible way * action #70654: Create git-subrepo for tools/update-deps * action #70687: Download gru is attached to all scheduled jobs when doing 'isos post' * action #70720: Unable to restart a child from START_DIRECTLY_AFTER_TEST chain if another child has been restarted already * action #70723: Fix tests not to rely on `/var/lib/openqa/share` mountpoint * action #70792: Optional link to fixed timestamp for tests overview * action #70873: Test fails because auto_review:"Encoder not accepting data":retry, video is missing * action #70972: failed minion jobs with ""malformed JSON string, neither tag, array, object, number, string or atom, at character offset 0 (before \"(end of string)\") at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/JSON.pm line 31.\n"," * action #71095: [ci][os-autoinst] unable to read files in codecov reports, probably due to "opt" prefix * action #71110: Reduce waiting time in case of os-autoinst shutdown * action #71137: osd deployment fails due to python-openqa_client failing in OBS * action #71146: [os-autoinst][backends] Prevent undocumented test variables in backends / prevent use of external testapi methods * action #71176: Fail to show job details on OSD when using Firefox * action #71185: job incompletes with auto_review:"setup failure: Cache service status error: Premature connection close":retry and does not retry, should we just automatically retry the connection? * action #71236: job incompletes with auto_review:"backend died: Error connecting to VNC server : IO::Socket::INET: connect: Connection refused" * action #71251: Forbid duplicate mapping keys in Job Templates YAML files * action #71323: outdated references to "coveralls" in os-autoinst * action #71368: 25-cache-service.t fails after passing all tests * action #71386: Stale job detection fails with "Can't locate object method "gru" via package "OpenQA::Scheduler" at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/Jobs.pm" * action #71449: 25-cache-service.t fails repeatedly but circleCI receives the status as "success" * action #71464: Show output of failed tests within circleci log view, like on local prove calls, not only "Tests" tab * action #71467: add specific timeout values to all applicable openQA tests in t/ * action #71476: Migrate from AssetPack to Webpack * action #71500: Potential optimization by skipping deployment checks in our tests (all except explicit deployment check tests) * action #71536: Unhandled perl warnings in t/ui/15-admin-workers.t, not failing tests as expected (possibly other test modules as well) * action #71551: unstable/flaky/sporadic t/04-scheduler.t test failing * action #71554: unstable/flaky/sporadic t/full-stack.t test failing in script waits on CircleCI * action #71644: job stuck in "uploading", can not be cancelled, after what should have incompleted early * action #71758: [spike][timeboxed:20h] complete test definition from a yaml schedule file in local test distribution folder * action #71827: test incompletes with auto_review:"(?s)Failed to download.*Asset was pruned immediately after download":retry because worker cache prunes the asset it just downloaded * action #71845: tests fail in circleci with "Non-zero wait status: " but "All subtests passed" * action #71857: flaky/unstable/sporadic test coverage from t/34-developer_mode-unit.t * action #72055: download_asset tasks are not reliable * action #72082: Reduce test runtime, e.g. less reliance on test fixtures or test database instances * action #72127: check and/or reduce runtime of t/44-scripts.t * action #72196: t/24-worker-jobs.t fails in OBS * action #72238: websocket connection retry on flaky connections (was: SLE15-SP2 on AWS M6g (aarch64) machine fails to run a worker properly due to lots of Websocket connections lose) * action #72289: web-ui - test display is incomplete * action #72292: openqa-review fails to reference bugs from soft-fail references * action #72316: [tests][ci] circleci can fail in `zypper ref` due to temporary repository problems * action #72319: unstable test: t/ui/15-comments.t "got: 'Demo wrote less than a minute ago (last edited less than a minute ago)'" expected: 'Demo wrote less than a minute ago' (2nd try) * action #73123: t/14-grutasks.t shows errors but still succeeds * action #73126: No tests show unhandled output in main test summary log * action #73156: test not failing but error showing up t/25-downloader.t: "Cannot determine file type for '/tmp/DptzoOZjyt/test" * action #73162: t/01-test-utilities.t fails in "test would have failed" * action #73231: [microos]job incompletes with auto_review:"backend died: Virtio terminal and svirt serial terminal do not support send_key. Use" * action #73285: test incompletes with auto_review:"(?s)Download of.*processed[^:].*Failed to download":retry , not helpful details about reason of error * action #73321: all jobs run on openqaworker8 incomplete:"Cache service status error from API: Minion job #46203 failed: Couldn't add download: DBD::SQLite::st execute failed: database disk image is malformed*" * action #73339: auto_review:"setup failure: Cache service status error from API: Minion job.* failed: Can't use an undefined value as a HASH reference at.*" * action #73366: auto-review: Improve output * action #73396: job incompletes with auto_review:"setup failure: Failed to rsync tests: exit code 23":retry * action #73447: POC: Create openQA Web Application container image (feature) * action #73450: POC: Create openQA worker container image (feature) * action #73486: Package devel:openQA:Leap:15.2/perl-DBD-Pg failed to build in openSUSE_Leap_15.2/ppc64le * action #73567: openQA tour does not reliably save its state * action #75073: finalize_job_results minion task fails because 'Job xxx does not exist.' * action #75214: openqa-review fails to post reminder comments on bugzilla, errors in log "Encountered error trying to post a reminder comment on issue" * action #75232: error message when worker has no network (yet): Unable to serialize fatal error: Can't open file "base_state.json": Permission denied at /usr/lib/os-autoinst/bmwqemu.pm line 86." * action #75256: Try out AV1 video codec as potential new default * action #75265: sporadic errors in test suite of perl-Mojo-IOLoop-ReadWriteProcess * action #75346: t/api/08-jobtemplates.t started failing in OBS checks * action #75370: unstable/flaky/sporadic t/full-stack.t failing on master (circleCI) "worker did not propagate URL for os-autoinst cmd srv within 1 minute" * action #75454: sometimes clone job is incomplete because of api failure * action #76741: help popup next to "Submit comment" does not work anymore * action #76900: unstable/flaky/sporadic t/full-stack.t test failing in CircleCI "worker did not propagate URL for os-autoinst cmd srv within 1 minute" * action #76912: OpenQA::Script::Client throws perl warning "Wide character in print", should not be there * action #76978: How to run an openQA test in 5 minutes size:M * action #76990: Improve documentation for redundant/load-balancing webui deployments of openQA * action #77008: Conduct openQA-in-openQA test on the latest *published* TW snapshot * action #77032: Improve t/25-cache.t runtime * action #77215: In jobs API data reference a parent job group name as well * action #77320: why does this show leap 15.1? https://build.opensuse.org/package/view_file/devel:openQA/os-autoinst_dev/_service:download_url:Dockerfile?expand=1u * action #77905: CI pipeline proof-of-concept running isotovideo * action #77929: UI build bar alignment broken on index page * action #78019: [sporadic] os-autoinst t/18-backend-qemu.t timed out in OBS checks after 10s * action #78043: unstable/flaky/inconsistent statement coverage in t/lib/OpenQA/SeleniumTest * action #78052: system stalled and then no output for three seconds, then test shuts down with no explicit message about reason and ends up incomplete * action #78163: After OSD upgrade, many jobs incomplete with "Cache service status error 500: Internal Server Error" * action #78169: after osd-deploy 2020-11-18 incompletes with auto_review:"Cache service (status error from API|.*error 500: Internal Server Error)":retry * action #78390: Worker is stuck in "broken" state due to unavailable cache service (was: and even continuously fails to (re)connect to some configured web UIs) * action #80108: HDD images not available for aarch64 Tumbleweed (cleaned-up too early?) * action #80118: test incompletes with auto_review:"(?s)Failed to download.*Asset was pruned immediately after download":retry, not effective on osd, or second fix needed * action #80202: jobs incomplete with auto_review:"setup failure: No workers active in the cache service":retry * action #80264: multimachine tests unable to get vars from its pair job * action #80268: Fix flaky coverage - t/05-scheduler-full.t * action #80274: Fix flaky coverage - t/lib/OpenQA/Test/Utils.pm size:M * action #80298: Fix flaky coverage - lib/OpenQA/WebSockets.pm * action #80334: job incompletes with auto_review:"(?s)terminated prematurely with corrupted state file.*No space left on device":retry , should automatically retrigger * coordination #80372: [epic] Cleanup vars.json as initial information container between openQA worker and isotovideo * action #80374: unstable/flaky/sporadic test t/ui/25-developer_mode.t * action #80412: tests fail with auto_review:"(?s)version is 4\.6\.1606298538\.191b5988.*Can.*t locate object method.*code.*via package":retry * action #80492: [easy] API help document says "Deletes …" for routes like "test_suites/:id" PUT and POST and looks inconsistent * action #80518: provide container images for aarch64 * action #80534: publication+demo for updated openQA containers * coordination #80546: [epic] Scale up: Enable to store more results * action #80576: [learning] perl warning in openqa-livehandler "Use of uninitialized value $ret in exit at /usr/share/openqa/script/openqa-livehandler line 31.", similar for openqa-websockets * action #80662: uninitialized value in string eq at .../load_templates line 137 * action #80682: Automatic tests for our openQA containers - worker only * action #80684: Automatic tests for our openQA containers - worker+webui connection * action #80686: wrong label carry-over to a job where an additional module failed before the one failing in before * action #80736: Trigger 'auto-review' from within openQA when jobs incomplete (or fail) , for testing: auto_review:"tests died: unable to load main.pm, check the log for the cause" * action #80746: test fails in await_install - openQA hasn't inherit the comments for incomplete job * action #80800: flaky/unstable t/full-stack.t, Failed test 'test 1 is running', Bailout called. Further testing stopped: URL for os-autoinst cmd srv not available * action #80826: Trigger 'auto-review' from within openQA when jobs incomplete on osd as well * coordination #80828: [epic] Trigger 'auto-review' and 'openqa-investigate' from within openQA when jobs incomplete or fail on o3+osd * action #80830: Trigger 'openqa-investigate' from within openQA when jobs fail on o3 * coordination #80908: [epic] Continuous deployment (package upgrade or config update) without interrupting currently running openQA jobs * action #80910: openQA workers read updated configuration, e.g. WORKER_CLASS, whenever they are ready to pick up new jobs * action #80962: os-autoinst development container image is still based on 15.1 but repo is switched to 15.2 already * action #80986: terminate worker process after executing all currently assigned jobs based on config/env variable * action #81038: Avoid using Mojolicious::Routes::Route::route which is DEPRECATED in favor of Mojolicious::Routes::Route::any in asset pack * action #81068: openQA tests fail on OBS Tumbleweed * action #81116: openqa-webui is failed to start because /var/lib/openqa/db/db.lock owned by vnc user * action #81118: automatic container tests for os-autoinst * action #81150: sporadic fail of os-autoins t/10-virtio_terminal.t * action #81180: fetchneedles fails on uncommited needles * action #81206: Trigger 'openqa-investigate' from within openQA when jobs fail on osd * action #81386: python-openqa_client fails to build on OBS for SLE 15/ 15 SP1 * action #81492: openQA-in-openQA fails with 'function' object has no attribute 'func_name' and label:non_existing asset, candidate for removal or wrong settings * action #81703: The values of 'ISO' and 'HDD' includes absolute path in vars.json * action #81816: OSD deployment failed at 2021-01-06 * action #81828: Jobs run into timeout_exceeded after the 'chattr' call, no output until timeout, auto_review:"(?s)Refusing to save an empty state file to avoid overwriting a useful one.*Result: timeout":retry * action #81859: openqa-investigate triggers incomplete sets for multi-machine scenarios * action #81890: Extend help for openQA client config, e.g. mention the environment variable OPENQA_CONFIG * action #81899: Move code from isotovideo to a module size:M * action #82067: Unable to use openqa-clone-custom-git-refspec - Returns 403 Not authorized * action #87695: Full openQA test development, maintenance and administration from browser without the need of a local terminal size:M * action #87698: openQA jobs can be triggered with single curl calls * action #87898: Add grafana alert for "broken workers" as reported by openQA * action #88121: Trigger cleanup of results (or assets) if not enough free space based on configuration limit * action #88125: Get feedback and adapt the population scripts for testreports * action #88187: Set the addresses in the "internal clients" configurable * action #88247: manage password for multiple test users in os-autoinst for openQA tests * action #88347: upload_asset function fails for larger files * action #88363: openqa-client cannot return job info * action #88451: Static validation for containers * action #88452: script_run syntax error when & is at the end of command * action #88459: low-prio single-machine jobs can starve out high-prio multi-machine tests * action #88482: Two absolute paths concatenated to form a default needle dir when PRODUCT_DIR/needles doesn't exist * action #88496: openQA "t" tests time out (again) and take long * action #88538: Since 2021-02-11 all openQA CI tests fail in codecov "406 Not Acceptable {'detail': ErrorDetail(string='Could not satisfy the request Accept header.', code='not_acceptable')}" * action #88564: text field for git commit details in needle editor * action #88594: codecov reports for individual files yield "forbidden" and files can not be shown * action #88603: Two identical jobs are created from one abandoned test * action #88609: Mojolicious 9.0 compatibility * action #88696: Executing /usr/share/openqa/script/openqa-gru fails * action #88745: [easy] Clicking on "Untracked" on /admin/assets yields "Not Found" page, better have no clickable link at all * action #88754: openQA-in-openQA tests always fail and results do not impact submission pipeline * action #88915: Codecov always fails with 404 Not Found - Build has already finished, uploads rejected * action #89056: incomplete jobs with auto_review:"isotovideo died: isotovideo received signal HUP":retry * action #89059: package checks fail due to timeout in openSUSE:Factory:ARM/aarch64 * coordination #89062: [epic] Simplify review for SUSE QAM * action #89077: os-autoinst Makefile is missing symlinks configuration * action #89200: Switch OSD deployment to two-daily deployment * action #89206: openqa-review CI failure about "ImportError: No module named enum" in python 2.7 tests * action #89221: Statistics about the time it takes developers from QA to close tickets related to failing tests (creation - closing) * action #89224: Limit execution time of hook scripts run within Minion * action #89281: Prevent investigation jobs to do any asset uploads to prevent overriding production assets * action #89545: [epic] Presence on openSUSE discord? Or matrix channel? reddit? * action #89548: presence on discord * action #89554: presence on reddit * action #89557: presence on matrix * action #89620: openqa-clone-custom-git-refspec fails with: env: git: Permission denied * action #89710: Add redirection after login compatible to the new IDP login system size:M * action #89719: docker-compose up fails on master * action #89722: Need automatic check for docker-compose * action #89731: containers: The deploy using docker-compose is not stable and eventually fails * action #89752: containers: Add a worker service as part of the docker-compose * action #89899: Fix flaky coverage - t/ui/27-plugin_obs_rsync_status_details.t * action #89935: t/ui/27-plugin_obs_rsync_status_details.t fails in circleCI, master branch even * action #90038: Better error handling when reading API key+secret from ~/.config/openqa/client.conf * action #90131: don't obsolete jobs with the same build number in different job groups when using `_OBSOLETE` * action #90152: module results missing on quick job (on auto-restarting worker) * action #90164: Make gitlab.suse.de/openqa/salt-states-openqa public * action #90290: Relative paths for CASEDIR and others as default to be not bound to specific workers * action #90293: Optional relative paths for CASEDIR and others to be not bound to specific workers * action #90299: 414 failure when cloning job with very long job setting values * action #90302: Remote openQA worker fails to run tests from openqa-clone-custom-git-refspec due to differing paths * action #90362: Allow to customize worker engine by configuration * action #90371: Warnings "Subroutine JSON::PP::Boolean::(0+ redefined" * action #90614: CI test webui-docker-compose failed but PR was merged anyway * coordination #90758: [epic] python bindings for openQA * action #90761: os-autoinst-distri-opensuse CI checks fail due to `cpanm --installdeps` failing on Inline::Python * action #90767: containers: Fix github test "webui-docker-compose" timeout * action #90818: [openqa][tool] Not able to get group_overview json output. * action #90872: openQA / os-autoinst 'either does not dequeue its messages, or exhibits some other buggy client-behavior' * action #90929: get OAuth2 to work with salsa.debian.org (gitlab) * action #90974: Make it obvious if qemu gets terminated unexpectedly due to out-of-memory * action #91097: CI: "webui-docker-compose" eventually fails building images * action #91157: [Alerting] web UI: Too many Minion job failures alert: limit_results_and_logs failed * action #91163: Many jobs on OSD and o3 are incomplete because of auto_review:"backend died: missing input at /usr/lib/os-autoinst/bmwqemu.pm line 202" * action #91232: Test t/ui/25-developer_mode.t failed in CI * action #91250: handle codecov Bash Uploader Security Update * action #91257: try out python backend for production tests in a new test distribution or os-autoinst-distri-openQA * action #91284: Prevent recursive apparmor profile inclusion * action #91347: [spike][timeboxed:18h] Support for archived jobs * action #91377: CI: fix static-check-containers * action #91461: Test is missing webui results and fail despite all tests passed * coordination #91467: [epic] Surface openQA failures per squad in a single place * action #91488: containers: openqa test "single_container_webui" eventually fails * action #91491: potential performance regression, failures in os-autoinst t/18-qemu-options.t * coordination #91518: [epic] Provide 'first bad' vs. 'last good' difference in investigation info * action #91521: link to "first bad" in investigation tab * action #91527: Cleanup logging in autoinst-log.txt * action #91542: openQA API what jobs were/are testing X incident/package * action #91578: [openQA][environmen][module] Missing perl use NetAddr::IP on openQA environmen * action #91601: Add "return to top" button on openQA pages, e.g. job details, index, group overview * action #91605: notifications about failed and unreviewed jobs - but using Slack (was: Rocket.Chat) size:M * action #91638: Ensure standard javascript code * action #91647: Making option to filter by flavor, test name on /tests/overview more prominent * action #91650: Resolve the most recent builds per job group on /tests/overview when showing multiple job groups * action #91652: Remind about the use of openqa-review in squads * action #91653: Python tests fail with generic error message regardless of the problem size:M * action #91658: Make "black certificate" stricter to only show when /tests/overview?todo=1 is empty, i.e. no unlabeled failures * action #91659: Add Playwright to devel:languages:perl * action #91710: openQA API documentation mentions non-functional route /groups * action #91752: jenkins: Multiple missing fields and errors in configuration of openQA-in-openQA * action #91764: openqaworker13 has active jobs but not doing anything useful for hours * action #91773: Automatic replacement of openQA job URLs preview of openQA size:M * action #91782: Add support for archived jobs * action #91785: UI representation for archived jobs * action #91902: Tests incomplete with reason "Failed modules: …" * coordination #91914: [epic] Make reviewing openQA results per squad easier * action #91971: crosscheck if testapi::eject_cd actually works * action #92161: CI: build-docs-nightly broken since May 3 * action #92164: t/01-test-utilities.t fails in 'test would have failed' in unrelated PR * action #92179: `make coverage` on openSUSE Leap 15.2 can fail with "Bad Sereal header: Not a valid Sereal document" * action #92188: test reviewers are pointed to the "first bad vs. last good" comparison if current job is not already the first bad * action #92311: Complete test definition from a single yaml schedule file in local test distribution folder * action #92344: many temp-folders left over from live openQA jobs, regression? * action #92497: Perl test execution in os-autoinst runs succesfully but then fails with error * action #92584: Leap 15.3 support for os-autoinst+openQA * action #92665: Automatically validate code style for python code * action #92746: Log viewer in openQA webUI with color parsing * action #92761: devel:openQA/openqa_dev failed to build in containers15.2/x86_64 "no space left on device" * action #92764: Possible spam from check_for_unsent_reports.sh, repeated sending of "PASSED" and "FAILED" emails * action #92788: Use openQA archiving feature on osd size:S * action #92800: tests: Can't open file "t/data/openqa/share/tests/opensuse/needles/inst-timezone-text.json" * action #92833: containers: Web UI cannot connect to scheduler * action #92893: containers, docker-compose: Ensure that the scheduler can connect to the websockets container size:M * action #92902: Remove unnecessary linebreaks in cache service logging * action #92905: openqa-investigate creates jobs with wrong Git hash when triggered for jobs which already have a CASEDIR set to a git repo * action #92957: Add option to openqa-review to skip displaying all passed results * action #93065: "parallel_failed" jobs show up on /tests/overview?todo=1 but these do not need a label * action #93083: unstable test in os-autoinst master * action #93086: unstable test in openQA master t/10-jobs.t exceeding runtime of 280s * action #93101: investigation jobs scheduled against specific git hashes load wrong product dir files * action #93141: t/18-qemu-options.t fails on Leap 15.2 with coverage enabled * coordination #93246: [epic] List all unreviewed failed (or incomplete) jobs on /tests on request size:M * action #93345: Source URLs ending in .pm/steps/1/src.txt produce 404 * action #93423: Migrate openqa_review CI to supported solution after travis-ci.org shutdown expected at or after 2021-06-15, e.g. Github Actions * action #93453: openQA CI: Certain Selenium tests are failing * coordination #93501: openqa-bootstrap fails to guess a value for platform on Leap 15.3 * coordination #93609: [epic] openqa-bootstrap support on Leap 15.3 * action #93677: t/full-stack.t runs into 9m make-level timeout exceeding test module internal timeout size:S * action #93713: openqa-in-OpenQA fails in openqa-from-containers * action #93724: openqa-review: "Unassigned bugs" includes soft-fails to bugs with assignees * action #93727: Publish openqa-review reports with "--skip-passed" * action #93811: [timeboxed:20h] vanastasiadis: Learn about Perl and Perl development * action #93853: os-autoinst t/01-test_needle.t fails in OBS on aarch64 in float number comparison * action #93880: os-autoinst "ci extended" CI job fails due to boo#1182451 * coordination #93883: [epic] Speedup openQA coverage tests with running minion jobs synchronously using new upstream "perform_jobs_in_foreground" mojo function * action #93925: Optimize SQL on /tests/overview * action #93940: text thumbnail preview feels inconsistent to other screenshots size:M * action #94060: unstable test t/ui/18-tests-details.t * action #94111: Optimize /api/v1/jobs * action #94114: OpenQA/Os-autoinst youtube channel. * action #94231: lessons learned from "make needle matching less forgiving" * coordination #94258: [epic] deployment pipeline failed, alerts not handled * action #94315: [learning][easy][beginner] Add test coverage for mouse_tclick testapi function * action #94354: Optimize /dashboard_build_results and /group_overview/* pages * action #94606: New builds of aggregate tests should not obsolete old ones size:M * action #94667: Optimize products/machines/test_suites API calls size:M * action #94678: Multiple errors from hook scripts in openqa-gru service log (was: Missing openqa-investigate jobs on osd) size:M * action #94732: Provide link to /tests/overview of latest builds of all job groups within a parent job group size:M * action #94735: needles not found in `needles` subdirectory when CASEDIR is a git repository * action #94753: Try out munin on o3 * action #94762: openqa-review: Add mode of single-line todo lists size:M * action #94774: bug referral links only point to bugzilla.suse.com, not the specific bug size:M * action #94792: Also show "investigation" tabs for incomplete jobs * action #94838: Make qem-dashboard a proper public open source project size:M * action #94850: QEMU 6.0 fails to start if job has QEMU_NUMA=1 * action #94880: Carry-over despite mismatch of failing job modules * action #94901: Simple validation of parameters on /tests/overview * action #94937: Distinguish comment types on jobs on /tests (maybe optional) size:S * action #94952: [easy][beginner] Increase code coverage of os-autoinst basetest.pm size:M * action #94991: os-autoinst OBS checks fail in t/10-virtio_terminal.t (https://github.com/os-autoinst/os-autoinst/issues/1686), sporadic failure or only in OBS size:S * action #95006: jenkins job trigger-openQA_in_openQA-TW fails with "/opt/openqa-scripts/trigger-openqa_in_openqa: line 68: client_prefix: unbound variable" size:S * action #95009: unstable/flaky test t/ui/12-needle-edit.t size:M * action #95024: openQA test t/ui/26-jobs_restart.t very unstable (already marked as unstable) size:M * action #95075: Find jobs matching search parameters over /api/v1/jobs (especially documentation) size:S * action #95105: osd-deployment pipelines fail and alerts are not handled size:M * action #95164: OBS Package build in github fails to provide updates, "OBS Package Build Expected — Waiting for status to be reported" size:M * action #95170: Increase code coverage of critical component OpenQA::Worker::Job without introducing slow-down due to subprocess coverage collection * action #95179: [sporadic] containers: eventually the tests fails on single_container_webui step with the error "Mojolicious::Plugin::AssetPack::Pipe::Sass requires" * action #95185: openqa-bootstrap ignores errors when systemd is not available * action #95188: Document how to properly configure GitLab pipeline notifications size:M * action #95281: error on "Next & previous results": ajax error message and no results showing up * action #95290: error in openqa-webui systemd service log "Mojo::Reactor::Poll: Timer failed: Can't open file "/tmp/RbB4AGlDAq/": No such file or directory at /usr/share/openqa/script/../lib/OpenQA/WebAPI/Controller/Running.pm line 211." * action #95296: openQA-in-openQA container tests fail with "/root/run_openqa.sh: line 6: This: command not found" * action #95299: Tests timeout with reason 'setup exceeded MAX_SETUP_TIME' on osd ppc64le workers auto_review:"Result: timeout":retry size:M * action #95317: openqa-bootstrap on Leap 15.3 from devel:openQA passes * action #95320: openqa-bootstrap on Leap 15.3 from official repos passes * action #95323: openqa-bootstrap support on current version of Leap - automatic test size:S * action #95374: All pull request openQA CI checks fail now in "webui-docker-compose": "Container is unhealthy" size:M * action #95581: ci: Use a git commit message style checker size:S * action #95662: Codecov: Green/red markers are off by one or more lines sometimes size:S * action #95715: Investigate non-fatal openqa-review pipeline error "ERROR:openqa_review.openqa_review:Could not find any soft failure reference within details of soft-failed job" size:M * action #95721: [Sporadic] containers: tests fail with "Test died: no candidate needle with tag(s) 'inst-console' matched" size:M * action #95730: on o3 in openqa-webui journal: "Jul 20 08:06:54 ariel openqa-webui-daemon[31559]: Use of uninitialized value in concatenation (.) or string at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/Jobs.pm line 661." size:S * action #95783: Provide support for multi-machine scenarios handled by openqa-investigate size:M * action #95830: Errors from hook scripts in openqa-gru service log ("parse error: Invalid numeric literal at...", JSON error) size:M * action #95836: [sporadic] CircleCI: openQA t/full-stack.t flaky size:M * action #95839: [sporadic] CircleCI: openQA t/ui/14-dashboard.t flaky size:M * action #95842: URLs containing "#" should be parsed as complete URLs size:S * action #95845: IP address for accessing host network from SUT no longer automatically assigned (under Tumbleweed) size:M * action #95848: [sporadic] CircleCI: openQA t/05-scheduler-full.t flaky size:M * action #95851: [sporadic] CircleCI: openQA t/43-scheduling-and-worker-scalability.t flaky size:M * action #95995: [sporadic][openqa-in-openqa] Test openqa_from_git eventually fails because of a timeout waiting for webui service size:M * action #96007: OpenQA jobs randomly time out during setup phase * action #96010: [qem] test fails in hawk_gui acquiring a lock as the support server ended prematurely after a '503 response: Service Unavailable; URL was http://openqa.suse.de/api/v1/mm/children' * action #96019: incomplete jobs with no logs get labeled with poo#61922 which is closed for long * action #96058: [spike] Filter test results on /tests or /tests/overview by regex match in modules size:M * action #96122: Errors from hook scripts in openqa-gru service log ("grep: range out of order in character class") size:M * coordination #96185: [epic] Multimachine failure rate increased * action #96191: Provide "fail-rate" of tests, especially multi-machine, in grafana size:M * action #96197: [alert] web UI: Too many Minion job failures alert size:M * action #96260: Failed to add GRE tunnel to openqaworker10 on most OSD workers, recent regression explaining multi-machine errors? size:M * action #96311: qemu error message is still "debug", should be "warn" or more severe size:S * action #96317: Unexpected warnings fail 25-cache.t if run with recent perl-DBD-SQLite * action #96519: Fix flaky coverage - lib/OpenQA/Scheduler.pm size:M * action #96545: t/43-scheduling-and-worker-scalability.t fails in multiple OBS checks size:S * action #96557: jobs run into MAX_SETUP_TIME, one hour between 'Downloading' and 'Download processed' and no useful output in between auto_review:"timeout: setup exceeded MAX_SETUP_TIME":retry * action #96561: Speed up `t/25-cache-service.t` by avoiding forking to run Minion jobs * action #96564: Speed up `t/ui/12-needle-edit.t` and `t/ui/21-admin-needles.t` by avoiding forking to run Minion jobs size:M * action #96623: Let workers declare themselves as broken if asset downloads are piling up size:M * action #96632: A lot of time is being spent stabilizing existing tests * action #96636: static-check-containers is flaky on GHA size:M * action #96684: Abort asset download via the cache service when related job runs into a timeout (or is otherwise cancelled) size:M * action #96707: Failed systemd services alert for osd in session-c26076.scope * action #96744: Report Product Bug from 15SP4 jobs failed due to wrong project name * action #96959: Job hooks trigger investigate jobs for passed/soft-failed size:M * action #97034: openqa-gru log is difficult to read * action #97046: Fix flaky coverage - t/05-scheduler-full.t size:M * action #97241: build failure of os-autoinst in 18-backend-qemu.t on OBS (timeout) size:S * action #97304: Assets deleted even if there are still pending jobs size:M * action #97508: openqa-label-known-issues not triggered correctly on o3 size:S * action #97541: Monitoring alerts on errors in logs on o3 * action #97580: Automatic check for qa-tools backlog limits in Github Actions * action #97763: Event-based cleanup jobs triggered based on quota size:M * action #97856: [sporadic] openqa-review pipeline failed: ConnectionResetError size:M * action #97952: Handle "unable to upgrade ws to command server" better size:S * action #97979: Asset cleanup takes very long to process 60k files in "other" size:M * action #98087: jobs on s390 fail with "Too few arguments for subroutine 'consoles::sshVirtsh::define_and_start'" after OSD deployment on 2021-09-03 * action #98117: qemu-img calls fail with qemu 6.1.0 (need to specify format for backing file) * action #98153: OSD deployment fails in check state of packages at 2021-09-06 * action #98186: Backlog checker does not take `QA` project issues into account for untriaged issues * action #98258: No results on /tests/overview w/o build * action #98388: Non-existing asset "uefi-vars" is still shown up on #downloads * action #98403: [easy][beginner] Make sections on build bars clickable size:S * action #98445: improve description for "Test module" UI element as followup to #96058 * action #98448: As followup to #98087 tell mergify to only accept changes with 100% patch statement coverage * action #98460: Filter actual test results on /tests or /tests/overview by regex match in modules * coordination #98472: [epic] Scale out: Disaster recovery deployments of existing openQA infrastructures * action #98496: nightly circle CI job fails in "Install cached packages" with "rpm: no packages given for install" size:M * action #98577: Unknown ARRAY( variables matching HDD_1 or ISO in job settings * action #98604: Provide data about ratio of automatically approved SLE Maintenance incidents size:M * action #98655: Fix flaky coverage - lib/OpenQA/Worker/Job.pm size:M * action #98664: Hiding test data * action #98727: [tools][sle][aarch64] the published hdd can't be booted up due to wrong format * action #98769: Unresolvables in devel:openQA Leap 15.3 size:S * action #98841: qemu randomly fails to start on QA-Power8-5-kvm auto_review:"(?s)QA-Power8.*Failed to allocate KVM HPT of order 25 (try smaller maxmem?): Cannot allocate memory":retry size:M * action #98862: Comment about intermittent/sporadic test issues on original job if openqa-investigate retry job passes size:M * action #98898: `t/05-scheduler-full.t` sometimes fails in CircleCI size:M * action #98901: [alert] Incomplete jobs (not restarted) of last 24h alert * action #98940: mmapi calls can still fail despite retries * action #98946: Argument "4.2.1" isn't numeric in numeric ge (>=) at /usr/lib/os-autoinst/backend/qemu.pm line 686. * coordination #98952: [epic] t/full-stack.t sporadically fails "clickElement: element not interactable" and other errors * action #99009: Occasional "Unhandled rejected promise" failure when publishing AMQP messages size:M * coordination #99030: [epic] openQA bare-metal test dies due to lost SSH connection auto_review:"backend died: Lost SSH connection to SUT: Failure while draining incoming flow":retry * action #99108: Automatically retry openQA bare-metal tests size:S * action #99111: Confirm or disprove that openQA bare-metal test loses SSH connection due to package updates size:M * action #99123: ssh based backends can run into timeout if ssh connection is stuck * action #99126: os-autoinst CI GHA fails with weird C++ error? * action #99135: Provide ratio of tests by result in monitoring - by worker * action #99153: [Alerting] Incomplete jobs (not restarted) of last 24h alert on 2021-9-24 * action #99168: drop unsupported distributions from devel:openQA * action #99234: Proposal: Remove https://build.opensuse.org/project/show/devel:openQA:stable and https://build.opensuse.org/project/show/devel:openQA:V2 * action #99246: Published QCOW images appear to be uncompressed * action #99327: openqa-webui-daemon: DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: DBD::Pg::st execute failed: ERROR: syntax error at or near "AND" * action #99396: Incompletes with auto_review:"api failure: Failed to register .* 503":retry should be restarted automatically * action #99402: Incompletes with "backend died: Error connecting to VNC server.*: IO::Socket::INET: connect: Connection timed out":retry should be restarted automatically * action #99426: Asset cleanup takes very long to process 60k files in "other" - suboptimal logging? * action #99519: Investigation jobs triggered but missing comment on original job size:M * coordination #99579: [epic][retro] Follow-up to "Published QCOW images appear to be uncompressed" * action #99594: Fix flaky coverage - t/lib/OpenQA/Test/FullstackUtils.pm size:M * action #99597: Fix flaky coverage - lib/OpenQA/Worker/WebUIConnection.pm size:M * action #99654: Revisit decision in https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/545 regarding I/O alerts size:S * coordination #99660: [epic] Use more perl signatures in our perl projects * action #99663: Use more perl signatures - os-autoinst size:M * action #99672: [openqa][tools] Non-existing `…-uefi-vars.qcow2` asset disturbs openqa-clone-job workflow * action #99675: Useless use of a constant ("can load python test module at a"...) in void context at ./08-autotest.t line 342. * action #99678: Speedup tidy * action #100503: Identify all "finalize_job_results" failures and handle them (report ticket or fix) * coordination #100688: [epic][virtualization][3rd party hypervisor] Add svirt backend compatibility for vmware 7.0 * action #100709: openqa-review pipeline failed because details-* JSON is empty * action #100850: Chromedriver crashes in openQA's CI * action #100973: Cancel any scheduled jobs after a configurable timeout, e.g. days size:M * action #101015: [tools][sle][x86_64][aarch64][QEMUTPM] can openqa create swtpm device automatically? size:M * action #101030: Typing problems on aarch64 * action #101262: Document running os-autoinst full-stack.t on OSD workers size:M * action #101265: Upgrade arm3 to Leap 15.3 and compare failure rate size:M * action #101379: Reduce amount of unhelpful log messages at debug level * action #101478: openqa-review pipeline failed because details-* JSON contains non-UTF8 char ? size:S * action #101520: [bot-ng] Stop very frequent scheduling of single incident jobs size:M * action #101533: Make text thumbnails easily distinguishable from info thumbnails * action #101716: Improve error reporting for invalid "force_result" comments * action #101734: Prevent "called on undefined file descriptor" in myjsonrpc.pm:40 in openQA t/full-stack.t * action #101779: osd deployment failed with non-zero status openqa-worker5 * action #101884: openqa-review: list index out of range * action #101950: open.qa document need update base latest opensuse release for trace/debug firewalld size:S * action #102146: Deprecate os-autoinst backend::pvm * action #102206: Make bot-ng a proper public open source project size:M * action #102221: t/25-cache-service.t fails exceeding 90s timeout consistently size:M * action #102332: Unable to read *.json: Can't open file in o3 openQA logs /var/log/openqa size:M * action #102347: bot-ng: repohash calculation * action #102374: Support use of force_result via ticket title in auto-review size:M * action #102428: Provide "fail-rate" alerting with ratio_mm_failed 5.360 size:M * action #102437: Job age alert median followed by max size:S * action #102440: openqa-review pipeline failed with assert self.issue_type == "bugzilla" * action #102464: Upgrade OBS package CI checks to Leap 15.3 (os-autoinst+openQA) size:M * action #102467: test fails in reconnect_mgmt_console with auto_review:"Test died: unexpected end of data at /usr/lib/os-autoinst/consoles/VNC.pm line 190." * action #102539: Non-fatal test modules aborting the whole job size:M * action #102578: [sporadic] t/full-stack.t Failed test 'Expected result for job 1 not found' size:M * coordination #102581: Proof of concept of t/full-stack.t on GitHubActions * action #102584: openqa_review: Cannot run tox successfully locally * action #102641: openqa_review: Cannot build doctests successfully size:M * coordination #102710: [epic] Improve our backup * action #102786: Text results "unable to read" when showing result page during test execution, i.e. while an openQA test is running size:M * coordination #102951: [epic] Better network performance monitoring * action #102957: Better network performance monitoring - up-/download speed from cache service, e.g. in log file size:M * action #102975: Fix missing openqa.o.o data on metrics.o.o size:M * action #103029: Fix problems with os-autoinst's thread creation and the latest TBB version * action #103032: openQA "investigation" does not show "diff_to_last_good" content anymore? * action #103287: Ensure openQA within standard repos of Leap 15.3 is working and no problem with object method "route" size:M * action #103329: os-autoinst: unhandled and confusing CI test ouput from git calls size:M * action #103398: Send an AMQP message on comments taken over size:M * action #103416: Better handle minion tasks failing with "Job terminated unexpectedly" - "limit_results_and_logs" size:M * action #103422: [sporadic] os-autoinst: 13-osutils.t:167 Failed test 'Exit code appear in log' in GHA size:M * action #103425: Ratio of multi-machine tests alerting with ratio_mm_failed 5.280 size:M * action #103467: [sporadic] os-autoinst: 14-isotovideo.t:133 'no fatal error recorded' size:M * action #103485: [sporadic] os-autoinst: flaky test t/13-osutils.t * action #103527: osd-deployment pipelines fail and alerts are not handled size:M * action #103581: Many jobs on openqa.opensuse.org incomplete in ' timeout: setup exceeded MAX_SETUP_TIME' * action #103584: job incompletes with exception in OpenCV code "Assertion failed) ksize.width > 0 && ksize.width % 2 == 1 && ksize.height > 0 && ksize.height % 2 == 1 in function 'createGaussianKernels'" * action #103605: Most tests time out with perl-Mojo-IOLoop-ReadWriteProcess 0.28 * action #103611: test failure: os-autoinst: 29-backend-driver.t:31 Failed test 'exit logged' * action #103617: Cover unhandled output in openQA "t" tests size:M * action #103692: Only upgrade o3 workers if package checks are good, same as for o3 webui size:M * action #103765: Support for "todo" query parameter on /tests, same as /tests/overview size:M * action #103788: Run openQA full-stack test as part of os-autoinst CI tests size:M * action #103791: After module failure, the console is broken size:M * action #103864: Support arch as filter parameter in jobs/overview API route (was: The openqa-cli api command seems not support …) * coordination #103947: [saga][epic] Scale up: Future proof backup of o3+osd * action #103953: Use openQA archiving feature on o3 size:S * coordination #103962: [saga][epic] Easy multi-machine handling: MM-tests as first-class citizens * coordination #103965: [epic] Easy triggering of multi-machine tests, similar as for single-machine tests * coordination #103971: [epic] Easy *re*-triggering and cloning of multi-machine tests * action #104007: Support retry of openQA jobs based on test variables * action #104037: Can not save description in job group unless other values are changed as well * action #104077: backend died: Can't syswrite(IO::Socket::UNIX=GLOB(0x558d9dd5cb68), ): Broken pipe at /usr/lib/os-autoinst/backend/qemu.pm line 985 size:M * action #104116: Better handle minion tasks failing with "Job terminated unexpectedly" - "scan_needles" size:M * action #104136: "archive_job_results" fail with "Unable to copy ... File exists" size:M * action #104149: openqa_bugfetcher fails to build size:S * action #104164: The openSUSE package perl-App-cpanminus was suggested for removal but we rely on it within openQA size:M * action #104178: Increase OSD deployment rate from every second day to daily * action #104199: Prevent confusion when openQA comments look like both a bugref as well as label at the same time size:M * action #104220: openQA-inopenQA tests failing in boot step * action #104262: o3 responds with 500 on softfailed for job that "failed to load modules" * action #104350: [alert] failed systemd service on grenache-1, os-autoinst-openvswitch, turned to "ok" automatically size:M * action #104499: s390 tests fail with backend died: unable to extract assets: Too many arguments for subroutine 'backend::baseclass::do_extract_assets * action #104517: Connections to download.opensuse.org affecting openQA CI and o3 * action #104616: Make openQA labels clearly visible * action #104619: openqa_from_containers needs adapted post_fail_hook (was: tests fails due to openSUSE mirror problems with can't locate Mojo/Base.pm in INC in script/client) * action #104670: Fix circular dependency of autotest <-> bmwqemu * action #104751: Extend "_SECRET_" variable handling to os-autoinst backend password variables * action #104827: openQA documentation generation deletes test API content * action #104836: [easy] perl warning about uninitialized value in pattern match when localhost not reachable size:S * action #104841: Prevent empty changelog messages from osd-deployment when there are no changes size:M * action #104917: Raw (escaped) html shown in developer mode error message box * action #104971: [sporadic] os-autoinst t/99-full-stack.t sporadically fails in "Result in testresults/result-reload_needles.json is ok" size:M * action #104986: tests incomplete with: auto_review:"backend died: Too many arguments for subroutine.*consoles::vnc_base::get_last_mouse_set":retry * action #104995: Add UI element and help text for "todo" query parameter on /tests, similar as /tests/overview * action #105001: Add doc for "todo" query parameter on /tests, similar to /tests/overview size:S * action #105019: Make the number of previous restarts of a job more discoverable size:M * action #105040: [tools][sle][s390x] handle select_console 'root-console' failure if root ssh is not permitted in system with Common Criteria role * action #105061: [sporadic] os-autoinst: 29-backend-driver.t: Failed test 'log output for backend driver creation' size:S * action #105064: Reduce verbosity of openQA logging to improve performance and reduce storage requirements size:M * action #105127: Use more perl signatures - openQA - some simple classes size:S * action #105145: osd-deployment pipelines fail because ContainersNotInitialized size:M * action #105157: Needle editor canvas has broken border size:M * action #105217: deprecate and prepare removal of os-autoinst backend::amt because it is likely unused and remove if we have no significant code coverage * action #105310: OBS checks fail in os-autoinst with " backend::svirt::run_ssh does not exist! at ./22-svirt.t line 119." * action #105370: https://openqa.opensuse.org/tests/2150931#next_previous as well as the according /tests/overview page shows a comment, should show a label icon size:M * action #105379: Continuous deployment of o3 workers - one worker first size:M * action #105382: Reconsider coloring of "failed modules" to make it obvious that they actually fail * action #105417: OBS checks stuck in openQA * action #105429: openQA's fullstack test fails in `shutdown` module * action #105432: Multiple build errors on devel:openQA size:M * action #105512: Multiple build errors on devel:openQA:tested * action #105690: s390x svirt jobs incomplete with auto_review:"unable to extract assets:.*/var/lib/libvirt/images/a.img":retry * coordination #105699: [epic] 5 whys follow-up to s390x svirt jobs incomplete with unable to extract assets:.*/var/lib/libvirt/images/a.img" size:S * action #105759: flaky code coverage in os-autoinst consoles/virtio_terminal.pm * action #105804: Job age (scheduled) (median) alert size:S * action #105882: Test using svirt backend fails with auto_review:"Error connecting to VNC server.*localhost.*Connection refused" * action #105885: Continuous deployment of o3 workers - all the other o3 workers size:M * action #105909: o3 logreports - Ignoring invalid group {"name":"123"} when creating new job * action #105924: o3 logreports - Template was modified * action #105984: [os-autoinst][flaky] flaky coverage in backend/baseclass.pm size:M * action #106083: [virtualization][3rd party hypervisor][timeboxed:10h][research] Learn about VMWare VirtualMachine.AcquireTicket("webmks") API size:S * action #106245: o3 logreports - Testsuite 'xyz' is invalid * action #106470: Selenium's `send_keys` function broken as of ChromeDriver 98.0.4758.80 * action #106654: [ipmi][openqa][vnc] Massive test run failures with 'IO::Socket::INET: connect: Connection refused' due to "Use of uninitialized value.*connect_timeout in addition.*consoles/VNC.pm line 13.*":retry * action #106684: tests incomplete with "(?s)testapi::record_info.*Use of uninitialized value.*in join or string at.*Mojo/File.pm line 14.*" * action #106759: Worker xyz has no heartbeat (400 seconds), restarting repeatedly reported on o3 size:M * action #106783: Cover manual testing steps, especially for "exotic" backends, in os-autoinst contribution hints size:M * action #106865: os-autoinst git test alters user configuration * action #106867: os-autoinst: local svirt testing instructions size:M * action #106898: Protection against asset clobbering * action #106901: Expose bandwidth data for worker cache via influxdb size:M * action #106904: Monitoring for worker specific bandwidth size:M * action #106912: Fullstack test can still fail due to `shutdown` module size:M * action #106996: os-autoinst: describe how to take out a production worker instance for testing backend changes size:M * action #106999: os-autoinst: Document the use of custom openQA backend commands to test os-autoinst changes on production workers size:M * action #107002: Expose fullstack test video from pool directory in CI size:M * action #107005: Automatic review checklists on pull requests, especially for os-autoinst non-qemu backend tests * action #107026: Improve existing unit tests for VNC module to increase its test coverage (before doing any actual changes) size:M * action #107029: Consider removing support for ikvm size:M * action #107032: [timeboxed:20h] [spike] Create integration test of os-autoinst's VNC module with VMWare's VNC-over-websockets size:S * action #107197: Wrong arch information showed up for worker 's390-kvm-sle12' * action #107254: Flaky code coverage in consoles/VNC.pm * action #107311: The dependency tree of `openqa-clone-job --clone-children` is broken * action #107470: [openqa][ipmi][worker][sut][needle matching] 'sshd-server-started' needle matching has been continuously failing on some workers/SUTs size:M * action #107497: [qe-tools] openqaworker14 (and openqa15) - developer mode fails size:M * action #107533: Better handle minion tasks failing with "Job terminated unexpectedly" - "finalize_job_results" size:M * action #107701: [osd] Job detail page fails to load * action #107746: Some directly chained jobs are skipped by openQA * action #107878: number of failed job provides wrong value on the build's status bar * action #107881: [retro] Conduct a zombie scrum team survey * action #107926: Broken SQL query in `lib/OpenQA/WebAPI/Plugin/Helpers.pm line 403` * action #107941: [sporadic] openQA Fullstack test t/full-stack.t can still fail with "udevadm" log message size:M * action #107986: [qa-tools] openqa-clone-custom-git-refspec cannot set CASEDIR correctly * action #107998: [sporadic][os-autoinst] t/29-backend-driver.t fails in "log output for backend driver creation" * action #108004: Consider jobs important so long as they reference open bugs * action #108007: Ensure jobs are re-run with any prerequisite modules * action #108091: Most systemd units should not Want= or Require= network.target (bsc#1196359) size:M * action #108125: openqa-worker-cacheservice(-minion) can trigger "UNIQUE constraint failed: mojo_migrations.name at OpenQA/CacheService/Model/Cache.pm line 77." * action #108272: openQA 25-cache-client.t fails in OBS * action #108281: test fails in svirt_upload_assets - can not upload qcow, error "File is too big" * action #108323: Subroutine consoles::sshVirtsh::has redefined at .../Class/Accessor.pm * action #108443: [sporadic][timeboxed:10h] OBS os-autoinst fails in 18-qemu.t: Can't open file "base_state.json": No such file or directory size :M * action #108452: test incompletes in patch_and_reboot with 'Seems like os-autoinst has produced a result which openQA can not display.', reason auto_review:"backend died: encountered object.*consoles::VNC.*, but allow_blessed.*myjsonrpc":retry * action #108476: The siblings jobs with START_DIRECTLY_AFTER_TEST are all cancelled * action #108530: os-autoinst wheels: x11_start_program from os-autoinst-distri-openQA dynamically loaded from another git repo size:M * action #108659: t/22-dashboard.t fails in one step but overall test still succeeds, maybe Test::Warnings? * action #108662: Can't call method "id" on an undefined value at WebAPI/Controller/Test.pm * action #108701: Individual groups return 500 via group_overview route * action #108824: Some of the daily aggregate tests are cancelled without a reason size:M * action #108971: [tools][tw][sle] with job setting "RETRY=1" , openQA should not re-trigger passed jobs * action #109232: Document relevant differences of arm-4/5 vs. arm-1/2/3 and aarch64.o.o, involve domain experts in asking what parameters are important to be able to run openQA tests size:M * action #109235: Errors with custom ISOTOVIDEO command should fail clearly * action #109292: OSD is missing x86_64 jobs duplicate key value violates unique constraint "assets_type_name" in lib/OpenQA/Schema/ResultSet/Assets.pm line 33 within find_or_create * action #109310: qem-bot/dashboard - mixed old and new incidents size:M * action #109313: Duplicates in dependencies.yaml should only show once in Dockerfile * action #109319: [qe-core] aarch64 tests failing in qemu-img due to broken image (was: "with cache error") size:S * action #109376: [retro] Due date reminders seem more important than resolving tickets * action #109443: os-autoinst-openvswitch sometimes fails with `br1 setup-in-progress` * action #109551: [retro] The extension on-demand overlaps with two calls (three counting the virtual coffee break) * action #109620: os-autoinst: Improve unit-test code coverage for backend::svirt size:M * coordination #109656: [epic] Stable non-qemu backends * coordination #109668: [saga][epic] Stable and updated non-qemu backends for SLE validation * action #109734: Better way to prevent conflicts between openqa-worker@ and openqa-worker-auto-restart@ variants size:M * coordination #109740: [epic] Stable os-autoinst unit tests with good coverage * action #109809: Show job relations for parallel/children jobs also on the tests overview page /tests/overview (mainly for parallel jobs) size:M * action #109815: Add retry for HTTP requests in openQABot where missing * action #109836: All Jobs on OSD are incomplete since 2022-04-12 * coordination #109846: [epic] Ensure all our database tables accomodate enough data, e.g. bigint for ids * action #109849: Migrate job_modules table to bigint * action #109851: os-autoinst was removed from o3 openqaworker7 * openqa-force-result #109857: Secure auto-review+force_result size:M auto_review:"Failed to download gobbledeegoop":force_result:softfailed * action #109864: Conduct Five Whys for "All Jobs on OSD are incomplete since 2022-04-12" size:M * action #109920: Identify reproducible product issues using openqa-investigate size:M * action #110032: Migration group can not trigger, missing minion jobs? size:M * action #110089: openQA in openQA: openqa_webui failing because postgresql is not installed * action #110136: [research][timeboxed:10h] Add alerts for any database IDs nearing a limit: Research the industry standard for postgreSQL software development and admins best practices size:S * action #110142: Sync tools/tidy and .perltidyrc * action #110181: ci:build-docs broken and not clear how to use tools/generate-documentation * action #110196: A big number of tests fail with networking (all workers) due to SLE libslirp0 update * action #110260: o3 logreports - /var/lib/openqa/share/tests/obs/needles is not a git directory size:M * action #110389: o3 logreports - fatal: ambiguous argument: unknown revision or path not in the working tree size:M * action #110392: o3 logreports - Can't open file "/var/lib/openqa/testresults/.../vars.json" size:M * action #110491: Failed pipeline for osd-deployment, check state of packages job failed size:S * action #110497: Minion influxdb data causing unusual download rates size:M * action #110509: Test 03-auth.t fails after Mojolicious update * action #110515: Command export feature for openqa-clone-job size:M * action #110518: Call job_done_hooks if requested by test setting (not only openQA config as done so far) size:M * action #110524: [timeboxed:20h][spike] openQA proof-of-concept within kubernetes size:M * action #110530: Do NOT call job_done_hooks if requested by test setting * action #110533: Minion Jobs page keeps retrieving stats even when logged out size:M * action #110536: unhandled test output in openQA test t/35-script_clone_job.t * action #110542: Try to mitigate "VNC typing issues" with disabled key repeat * action #110566: [tools] Problem with ISOS Post command from OBS sync * action #110629: openqa-label-known-issues: Fallback notification address in openqa-label-known-issues if no email address could be parsed from group_overview * action #110677: Investigation page shouldn't involve blocking long-running API routes size:M * action #110680: Overview page shouldn't allow long-running requests without limits size:M * action #110719: Fold ok clusters by default, unfold if there is any non-ok result * action #110785: OSD incident 2022-05-09: Many scheduled jobs not picked up despite idle workers, blocked by one worker instance that should be broken? * action #110881: Investigation jobs run because of the lack of automatic takeover size:S * action #110899: Missing audit events for comments * action #110911: Do not trigger investigation jobs and notifications on jobs where openQA automatically retries size:M * action #110914: Extend the group exclude regex from openqa-investigate to openqa-trigger-bisect-jobs * action #110983: Wrong signatures auto_review:"Too.*arguments for subroutine":retry * action #111004: Timeout of test API functions not enforced if backend gets stuck, e.g. on the VNC socket size:M * action #111028: Continuous update of o3 webUI * action #111066: Document suggested workflows for multiple teams reviewing openQA test results size:M * action #111152: Investigation jobs run because of the lack of automatic takeover size:M * action #111215: Various improvements for email notification about unreviewed jobs size:M * action #111251: Cover code of os-autoinst path OpenQA/ fully (statement coverage) size:M * action #111254: Cover code of os-autoinst path backend/ fully (statement coverage) size:M * action #111290: Troubles upgrading os-autoinst on my SLE15-SP3 workstation * action #111293: Template button for force_result size:M * action #111323: Simplified web proxy setup (remove path rewrite) * action #111329: openQA within kubernetes with tested helm charts size:M * action #111470: Reconsider folding ok results by default and/or uncollapse all on /tests/overview size:M * action #111539: [sporadic] audit log comment test addition fails sometimes size:M * action #111542: [sporadic] openQA test t/ui/15-comments.t fails in 'heading text' size:M * action #111545: openQA-in-openQA test fails in search due to new Firefox pop-ups. Can Firefox be asked to not create any pop-ups? auto_review:"(?s)openQA/login.pm:6 called testapi::assert_and_click.*match=openqa-login timed out" size:M * action #111590: [alert] HPC jobs not picked up for multiple days, job age alert triggered * action #111602: 18-qemu-options.t makes apparently unsafe assumptions about qemu behaviour with multiple params size:M * action #111605: Moving isotovideo version patch from RPM spec to cmake made it not work in tests * action #111608: 27-consoles-vnc.t 'update framebuffer' test fails on s390x (big-endian) auto_review:"(?s)s390x.*Test died: no candidate needle.*'installation'":retry * action #111770: Limit finished tests on /tests, but query configurable and show complete number of jobs size:S * action #111833: Allow tests/overview page to handle more than 500 jobs with result filter conditions size: M * coordination #111860: [epic] Upgrade all our infrastructure, e.g. o3+osd workers+webui and production workloads, to openSUSE Leap 15.4 * action #111881: Upgrade CI container image versions to Leap 15.4 size:M * action #111887: Upgrade OBS package CI checks to Leap 15.4 (os-autoinst+openQA) size:S * action #111920: [sporadic] Flaky test t/ui/10-tests_overview.t size:S * action #111989: Seems like o3 machines do not automatically reboot anymore, likely because we continuously call `zypper dup` so that the nightly upgrades don't find any changes? size:M * action #111992: Deal with QEMU and OVMF default resolution being 1280x800, affecting (at least) qxl size:M * action #112130: os-autoinst tests fail due to unexpected warning with perl 5.36 * action #112265: Just use bigint in all our database tables (for auto-incremented ID-columns) size:M * action #112403: 18-qemu-options.t fails on Leap 15.4 and Tumbleweed size:M * action #112433: openQA-inopenQA webui test failing * action #112484: https://openqa.opensuse.org/tests/?match=:investigate: shows the last match 15 days ago, so horribly broken since then and no investigation jobs size:M * action #112523: Make hook scripts restartable with a special exit code * action #112535: openQA-in-openQA test fails in shutdown: no matching needle 'root-console' size:S * action #112595: continous deployment installed old version of openQA due to timeout accessing a repo size:M * action #112736: Better alert based on 2022-06-18 incident size:M * action #112859: Conduct Five Whys for "[alert][osd] openqa.suse.de is not reachable anymore, response times > 30s, multiple alerts over the weekend" * action #112946: Extend openQA documentation with best practices what to do after migration, e.g. look at pg_stats size:S * action #113030: test distribution directory git revision can be parsed as "UNKNOWN" and openQA investigation fails to show test git log size:M * action #113039: Include original os-autoinst story within docs size:M * action #113138: sporadic failure in openQA test "t/ui/23-audit-log.t" size:M * action #113141: [sporadic] OBS checks fail os-autoinst test "Calling 'isotovideo --help' returns exit code 0" from t/44-scripts.t but only on aarch64? * action #113189: Research where we need limits size:S * action #113201: Integrate spike solution for accessing VMWare's VNC-over-websockets into os-autoinst's VNC console size:M * action #113219: Create a master label at job group level to label all the jobs in the same build * action #113282: Many incompletes due to VNC error "backend died: unexpected end of data at /usr/lib/os-autoinst/consoles/VNC.pm line 183.", especially on o3/aarch64 size:M * action #113312: passwords (again) showing up in logs, this time in video_base size:M * action #113318: openQA live view stays blank when browser tab is staying open on scheduled jobs until jobs start size:M * action #113348: [timeboxed:2h] OBS checks fail os-autoinst test "Calling 'isotovideo --help' returns exit code 0" from t/44-scripts.t but only on aarch64? -- initial investigation of "--help" route size:S * action #113507: [logwarn] fatal: ambiguous argument '(unreadable git hash)..abcdef': unknown revision or path not in the working tree * action #113549: perl-Inline-Python will break on TW 0713+ size:M * action #113704: compile warning in tinycv.xs with newer GCC size:S * action #113776: [qa-tools] wrong test result 'failed' and wrong orders of running tests showed, autoinst-log is missing * action #113794: Use prepared OVMF image with expected settings size:M * action #114412: Add support for "wait_screen_change" with "no_wait" option to allow to use on cases like "wait for every character to be typed" size:M * action #114421: Add a limit where it makes sense after we have it for /tests, query configurable size:M * action #114451: Incidents from all test issues variables are collected during bisect size:M * action #114820: Error connecting to VNC over WebSockets server provided by VMWare * action #114829: o3 logreport: malformed JSON string - size:S * action #114878: isotovideo: Add option to disable color output size:S * action #114881: [sporadic] OBS checks fail os-autoinst test "exceeds runtime limit of '200' seconds " from t/27-consoles-vmware.t on ppc64le size:M * action #115022: Complete unit test coverage (with coverage analysis) in os-autoinst/wheel-launcher * action #115106: Cancelled jobs can end up being stuck associated with worker * action #115112: Conduct 5 Whys for "QEMU 6.2.0 assigns all CPUs to NUMA node 0 by default" size:M * action #115178: openqa-investigate: Ensure proper error handling size:M * action #115334: [tools][openqa-in-openqa] test fails in start_test - likely due to o3 temporarily not available * action #115784: openqa-bootstrap requires ssh key size:M * action #115943: openQA message box becomes invisible when scroll bar is present * action #116107: openQA-in-OpenQA openqa_from_containers test fails in build size:M * action #116134: o3 logreport: Unknown 'min2022-03-01T15:21' * action #116554: Make sleeping time in "no_wait" scenarios consistent size:M * action #116593: Dependency cron openqabot needs to be recovered or replaced size:M * action #116596: CircleCI not reporting subtests anymore size:M * action #116614: openqa-label-known-issues might label jobs incorrectly * action #117133: Improve NEEDLES_DIR documentation and/or behaviour * action #117136: Prevent the stale bot on github to close issues as they might still be valid just not worked on * action #117196: virsh domain XML does not specify VM ID which is required from VNC over WebSockets * action #117340: openQA-in-openQA test openqa_install+publish fails in start_test size:M * action #117352: OBS build fails in t/29-backend-generalhw.t size:M * action #117553: multiple people can not access openqa.suse.de but can access openqa.nue.suse.com, we should clarify the difference and maybe change our wording * action #117655: Provide API to get job results for a particular incident, similar to what dashboard/qem-bot does size:M * action #117784: [os-autoinst] Support both empty and no password for ssh connections * action #117925: generalhw workers running on Tumbleweed are currently broken (Tests on Raspberry Pi 2, 3, 4 on o3) * action #118351: [openQA] clone a job but read schedule from custom git branch * action #118597: openqa-in-openqa test does not stop immediately on "Connect timeout", missing pipefail option? Job fails later size:M * action #118633: Re-try on cloning of wheel repositories size:M * action #118969: [alert] web UI: Too many Minion job failures alert * action #119032: Help with dark mode development size:M * action #119077: openQA infrastructure issues for s390x and PowerPC * action #119110: Job status of running jobs on worker page is shown wrong * action #119182: openQA job that should download an ISO file specified in ISO_URL does not seem to have made any download attempt * action #119185: http://jenkins.qa.suse.de/job/gnome_next-openqa/ needs to be migrated to openqa-cli size:S * action #119245: openQA in openQA test fails in docker build * action #119362: [darkmode] Test overview table header is broken * action #119365: [darkmode] Some form buttons are invisible by default * action #119380: Pause on module failure in developer mode size:M * action #119386: [darkmode] Some large pages show up with light theme briefly before switching to dark theme * action #119461: Serial failure autodetection overrides test result when it shouldn't size:M * action #119464: The CI badge in https://github.com/os-autoinst/os-autoinst shows failed - cause: long commit msg size:M * action #119467: "Internal server error" on opening any job group front page at OSD * action #119473: [openqa][group overview] Group view is not available * action #119713: Leap tests are failing because of failed log file uploading in multiple tests on s390x size:M * action #119746: [spike][timeboxed:20h] Filter openQA todo-jobs on /tests belonging to one "review squad" size:S * action #119866: o3 logreport message DBIx::Class::Row::update(): Can't update OpenQA::Schema::Result::Jobs=HASH(...): row not found at lib/OpenQA/Schema/Result/Jobs.pm line 1283 size:M * action #120040: OBS Package devel:openQA/openQA_container_worker failed to build in containers15.4/x86_64 size:S * action #120175: [sporadic]The CI badge in https://github.com/os-autoinst/os-autoinst shows failed - github <--> codecov api 404 size:M * action #120226: Ensure openQA t/full-stack.t is stable again and not tracked as unstable test size:S * action #120315: openqa-client does not get complete asset list size:S * action #120333: [os-autoinst][ipmi] Add support for ssh jump host in IPMI backend * action #120405: Failed log file tests are transferred from Leap to Tumbleweed for s390x size:M * action #120570: [qe-core][functional][tools] test fails in bootloader because root device is not ready and it leads to kernel panic size:M * action #120579: test fails in openqa_worker * action #120732: Reference more blogs on http://open.qa/documentation/ * action #120786: Jobs are now incomplete when postfail hook fails size:S * action #120841: Add pagination for GET /api/v1/assets size:M * action #120853: openQA-in-openQA test fails in openqa_webui size:M * action #120891: Product not scheduled: DBD::Pg::st execute failed: ERROR: deadlock detected size:M * action #121042: [sporadic] typing issue in comments UI test size:M * action #121045: Performance regression of `os-autoinst` size:M * action #121048: Add pagination for GET /api/v1/bugs * action #121054: bigint conversion fails due to idx_job_id_value_settings index being too wide size:S * action #121102: Add pagination for GET /api/v1/jobs size:M * action #121105: Add pagination for GET /api/v1/test_suites, GET /api/v1/test_suites/:id, GET /api/v1/machines, GET api/v1/machines/:id, GET /api/v1/products, GET /api/v1/products:id size:M * action #121108: Add pagination for GET /api/v1/workers * action #121222: Add ssh support to terraform recipe size:M * action #121357: [easy][beginner] Ensure 100% test coverage of t/ in os-autoinst size:M * action #121366: unhandled error in t/04-scheduler that is not causing the test to fail * action #121429: [qe-tools] qe-review bot sends wrong notification even the issue got fixed and the test case failed on different place size:M * action #121441: logreport o3: Can't call method "results" on an undefined value at /.../OpenQA/WebAPI/Controller/Test.pm line 735 * action #121444: logreport o3: Can't open file ".../details-gd.json": No such file or directory at OpenQA/Schema/Result/JobModules.pm line 95 size:M * action #121567: test fails in test_running * action #121768: openQA jobs have been restarted but are stuck in "running" size:M * action #121777: PostgreSQL update ordering deadlock on jobs table * action #122047: [tools] tesseract ocr test fails * action #122440: [sporadic] openQA Assetpack download can fail on initial download size:M * action #122458: O3 ipmi worker rebel:5 is broken size:M * action #122578: [alert] OpenQA logreport for ariel.suse-dmz.opensuse.org, problems connecting to the database when database shuts down size:M * action #122584: build-docs-nightly fails with "Error installing asciidoctor-pdf" size:M * action #122608: exit code of shell command not received by script_run * coordination #122650: [epic] Fix firewall block and improve error reporting when test fails in curl log upload * action #122659: Improved error reporting in openQA tests when curl times out on connection attempts * action #122782: CircleCI openQA "cache" ob fails with: Load key "/home/squamata/.ssh/id_rsa": invalid format * action #122929: [os-autoinst] Unhandled test output in t/18-backend-qemu.t size:S * action #123193: 02-test_ocr.t fails in OBS size:M * action #123445: [tools] ShellCheck test in os-autoinst fails in CI ( on Tumbleweed ) * action #123451: [retro] Open questions on how a ticket about update_install on PowerPC was handled size:M * action #123556: os-autoinst git cloning of test case repo can fail with auto_review:"fatal.*unable to access.*Connection timed out":retry size:M * action #123649: Error message in gru logs: Could not chdir back to start dir '' size:M * action #123661: Use non-personal or in-team tokens for openQA OBS CI integration size:M * action #123724: auto_review not working despite ticket in openQA auto review project size:M * action #123867: [sporadic][ci] circleCI job "build-docs-nightly" failed * action #123873: openQA test using wheel repo fails to clone: auto_review:"unable to access.*wheel.*Could not resolve host: github.com":retry * action #123888: [os-autoinst] Clone retry attempts seem to be the wrong way around, retrying when not necessary and vice versa size:M * action #124029: Bugrefs may be picked up even in clarifying comments and there's no clear way to tell if they will be * action #124143: openqa-in-openqa test fails because text color changed - missing CSS? size:M * action #124212: Unreviewed issue for "obvious" needle mismatch without any indication what unknown error was found size:M * action #124230: "Off-by-one" error when using the limit=1 openQA API parameter * action #124274: openQA reports non-sporadic issue when retry job just softfailed size:M * action #124316: [tools][openQA-in-openQA] test fails in openqa_worker auto_review:"zypper -n --gpg-auto-import-keys ref && zypper --no-cd -n in openQA-worker.*failed" size:M * coordination #124466: [epic] Put open points from okurz's hackweek 22 project into proper tickets * action #124469: Allow partial product retrigger size:M * action #124476: Rendering of git log is off by one on the webUI side. * action #124484: Flaky coverage in lib/openQA/Worker/Job.pm * action #124493: openqa-clone-job --skip-deps behavior contradicts documentation size:M * action #124497: [openQA-in-openQA] test fails in openqa_webui - "perl-Mojolicious-Plugin-Assetpack = 2.13" needed size:M * action #124502: [spike][timeboxed:20h] complete test definition from yaml schedule in git checked out test distribution * action #124565: Avoid sending mails about unreviewed e-mails by default size:S * action #124649: Spotty responses from OSD 2023-02-15 * action #124652: gtk glitch not showing dialog window decoration on openQA size:M * action #124670: obs: openQA does not build on ppcle64 anymore since we removed noarch size:M * action #124694: Redundant email about new comment in OBS * action #124739: o3 deployment failed due to "run(/usr/bin/ruby) failed: Permission denied at /usr/lib/perl5/vendor_perl/5.26.1/Mojolicious/Plugin/AssetPack/Pipe.pm line 27" and then "run(/bin/sass) failed: No such file or directory" size:M * action #124757: Move AssetPack plugins into a config file * action #124913: os-autoinst broken (incomplete) with Reason: isotovideo died: Can't locate auto/NetAddr/IP/InetBase/AF_INET6.al in @INC * action #124934: os-autoinst: Misleading error message in the openQA info panel reason size:M * action #124961: openQA restarts user_cancelled jobs with RETRY=N (N>0) size:M * action #124991: Copy ids of other investigate jobs to retry job * action #125237: os-autoinst codecov check "fully_covered" returns 99% but codecov reports look like 100% size:M * action #125276: Ensure that the incomplete jobs with "cache service full" are properly restarted size:M * action #125372: o3 jobs failed with auto_review:"api failure: 403 response: timestamp mismatch":retry because chrony is not installed on w19+w20 size:M * action #125378: [timeboxed:20h] Proof of concept for supporting creating demo videos with pauses and mouse moving size:M * action #125459: [o3-logwarn] error naive_verify_failed_return: Direct contact invalidated ID provider response. size:M * action #125663: [openqa-in-openqa][sporadic] Test is stuck on Firefox 'Make yourself at home' dialog auto_review:"Test died: no candidate needle.*openqa-logged-in.*matched" size:M * action #125720: [spike][timeboxed:20h] Add monitoring-support into openqa-cli * action #125723: Provide a ready-to-use container image or GitHub action repository to trigger/monitor openQA jobs as CI checks size:M * action #125903: Database connection error occurs when database is restarted size:M * action #126032: iso posts do not start all the children chain * action #126179: Some presentations reference the "openQA DSL" but open.qa doesn't mention it anywhere size:S * action #126188: [openQA][infra][worker][sut] openQA infra performance fluctuates to the level that that leads to tangible test run failure size:M * action #126527: [spike] Parse comments to identify reproducible product issues using openqa-investigate size:M * action #126530: [openQA-in-openQA] Repository 'devel_openQA' is invalid * action #126623: Tags can't coexist with builds that contain colons size:M * action #126665: Worker did not upload details when running into error size:M * action #126680: [openQA-in-openQA] no candidate needle with tag(s) 'boot-menu, openqa-desktop' matched in worker size:M * action #126704: [regression] developer mode does not allow to create needle after use of "skip timeout" size:M * action #126950: [openQA-in-openQA] openQA tests in pull requests to github.com/os-autoinst/os-autoinst-distri-openQA/ size:M * action #126959: Make testapi::script_run die on timeout by default without needing any additional options * action #127034: [spike][timeboxed:20h] Run openQA (webUI+worker) based on SLE to find out problems size:M * action #127037: os-autoinst on SLE+packagehub size:M * action #127091: openQA default container should provide a certificate for starting * action #127139: Docs for containerized setup lead to errors and ambiguity * action #127280: Force_result comment cannot be added on a cancelled child job size:M * action #127412: [openQA-in-openQA] test fails in test_distribution size:M * action #127532: CI checks failing on os-autoinst master size:M * action #127541: Test os-autoinst+openQA against openSUSE:Backports:SLE-X in pull request OBS CI checks size:M * action #127622: [openQA-in-openQA] openqa_install+publish takes very long - Use example distribution instead of os-autoinst-distri-opensuse size:M * action #127688: Make 2FA mandatory for os-autoinst GitHub org size:M * action #127757: Cover SLE in openQA docs * action #127826: [tools] openqa-review CI pipelines failing * action #127883: Cleanup OBS project devel:openQA:Leap:15.4 * action #127949: [spike][timeboxed:20h] Research native GitHub for running openQA tests as CI checks size:M * action #128087: Regular cleanup of OBS project devel:openQA:* size:M * action #128129: codecov checks missing in pull requests size:M * action #128153: [sporadic] typing issue in comments UI test t/15-comments.t size:M * action #128216: isotovideo -d in empty directly does not tell what the error is anymore * action #128267: Restarting jobs (e.g. due to full cache queue) can lead to weird behavior for certain job dependencies (was: Ensure that the incomplete jobs with "cache service full" are properly restarted (take 2)) size:M * action #128276: Handle workers with busy cache service gracefully by a two-level wait size:M * action #128318: [spike][timeboxed:20h] Current openQA+os-autoinst+dependencies in pure SLE size:M * action #128345: [logwarn] Worker 30538 has no heartbeat (400 seconds), restarting size:M * action #128360: Supporting fork based development model size:M * action #128405: Missing investigate jobs on both o3+osd since months? size:M * action #128588: Investigation Tab broken for some jobs size:M * action #128591: [openqa_logwarn] logwarn reports the same entry over and over size:M * action #128603: Implement new multi-specfile approach for Factory submissions size:M * action #128651: [spike][timeboxed:20h] Current openQA+os-autoinst+dependencies updated in SLE+packagehub size:M * action #128783: failed job red is now too dark and similar to incomplete * action #128795: Error when registering a worker when base_url has trailing / (https://github.com/os-autoinst/openQA/issues/5115) size:M * action #128807: [sporadic] ci: openQA circleci fullstack-unstable fails with new error size:M * action #128909: Comments from investigation jobs contain warnings failing to parse SCC_ADDONS setting * action #128930: [os-autoinst][CI] Unhandled output in t/17-basetest.t size:S * action #128975: Try to fix openSUSE Krypton tests * action #129068: Limit the number of uploadable test result steps size:M * action #129262: [openQA][webui] Can not access or refresh openQA website due to SSL failure * action #129316: [spike][timeboxed:20h] openQA container on ALP * action #129340: [regression] openqa cannot start jobs with symlinked assets size:M * action #129412: Verify cleanup behavior of groupless job results * action #129487: high response times on osd - Limit the number of concurrent job upload handling on webUI side. Can we use a semaphore or lock using the database? size:M * action #129490: high response times on osd - Try nginx on o3 with enabled load limiting or load balancing features * action #129619: high response times on osd - simple limit of jobs running concurrently in openQA size:M * action #129724: [tools] test fails in test_distribution, can not reach server size:M * action #129730: Adapt http://open.qa/docs/#_running_openqa_jobs_as_ci_checks for the use of github pull_request_target size:M * action #129745: Enable apache response time alert and apache log alert again after we think it's good now size:M * action #129871: [tools] [openQA] Optically split build in group and search fields. * action #129877: Failed logrotate service on staging2 * action #129883: all-in-one openQA container solution * action #129946: [tools][ci][ui] Randomly failing t/ui/15-comments.t * action #129949: Enable build+test of openQA and deps on s390x * action #129955: Second attempt to try out AV1 video codec as potential new default as of 2023 size:M * action #129958: Ensure openSUSE container best practices are followed for our container images in devel: openQA size:M * action #130096: openqa-clone-custom-git-refspec modifies PRODUCTDIR setting * action #130258: [openqa_logwarn] git warning: exhaustive rename detection (investigation tab) size:M * action #130369: [spike][timeboxed:20h] Reduce duplication of openQA-in-openQA tests in os-autoinst-distri-opensuse and os-autoinst-distri-openQA size:S * action #130477: [O3]http connection to O3 repo is broken sporadically in virtualization tests, likely due to systemd dependencies on apache/nginx size:M * coordination #130582: [epic] Upgrade all our infrastructure, e.g. o3+osd workers+webui and production workloads, to openSUSE Leap 15.5 * action #130585: Upgrade o3 workers to openSUSE Leap 15.5 * action #130588: Upgrade osd workers to openSUSE Leap 15.5 * action #130591: Upgrade o3 webUI host to openSUSE Leap 15.5 * action #130594: Upgrade osd webUI host to openSUSE Leap 15.5 * action #130597: Upgrade CI container image versions to Leap 15.5 * action #130600: Upgrade qam hosts maintained by us to latest stable, i.e. Leap 15.5 * action #130603: Upgrade OBS package CI checks to Leap 15.5 (os-autoinst+openQA) size:M * action #130627: Ensure devel:openQA is built correctly for Leap 15.5 * action #130636: high response times on osd - Try nginx on OSD size:S * action #130648: Upgrade all other LSG QE salt controlled machines to openSUSE Leap 15.5 * action #130682: perl-DBD-Pg failed to built correctly in devel:openQA:Leap:15.5 size:M * action #130922: Provide a way to get a VM snapshot at a certain step size:M * action #130934: Trigger openQA tests mentioned in github description as part of CI size:M * action #130940: Trigger openQA tests mentioned in github comments as part of automatic testing as well - trusted group "tests-maintainer" only size:M * action #130943: Test parameterization for github description/comments mentioned openQA job clones as part of CI size:S * action #131012: [openqa_logwarn] Nested quantifiers in regex;lib/OpenQA/Schema/Result/JobGroups.pm size:M * action #131024: Ensure both nginx+apache are properly covered in packages+testing+documentation size:S * action #131027: openqa-trigger-from-obs circleCI job running against master fails due to "Package 'python3-black' not found." size:M * action #131102: [openQA-in-openQA] test fails in test_running * action #131204: openqa-trigger-bisect-jobs is called on passed jobs size:M * action #131279: [timeboxed:6h][spike solution] a single command line or openQA webUI search view to show all tests blocking an incident by squad size:S * action #131447: Some jobs incomplete due to auto_review:"api failure: 400.*/tmp/.*png.*No space left on device.*Utils.pm line 285":retry but enough space visible on machines * action #131465: Make temporary files and directories created by openQA services easier to identify size:M * action #131486: Fix comment posting in openqa-trigger-bisect-jobs * action #131555: Broken OBS CI integration for os-autoinst size:S * action #132125: Automatic submission from devel:openQA:tested into openSUSE:Factory no longer working for os-autoinst size:M * action #132167: asset uploading failed with http status 502 size:M * action #132236: Current openQA+os-autoinst+dependencies updated in SLE+packagehub size:M * action #132272: Identify reproducible *TEST* issues (not product issues anymore) using openqa-investigate size:M * action #132332: Multiple investigation comments for multimachine tests size:M * action #132335: In openqa-in-openqa use scenario definitions instead of job group templates size:M * action #132410: [sporadic][ci] t/ui/18-tests-details.t fails size:M * action #132413: Missing url encoding for build links in test overview page * action #132434: Minion job `cache_asset` failed and jobs runs into `setup exceeded MAX_SETUP_TIME` size:M * action #132446: Review how the SUSE QE Tools team does without moderation duty six weeks in size:M * action #132545: [o3-logwarn] invalid input syntax for type bigint * action #132656: Runs only the last Python test in os-autoinst size:M * action #132665: [alert] openqa-label-known-issues-and-investigate minion hook failed on o3 size:S * action #132668: [alert] Cron job from openqa-service failed: fetch_openqa_bugs size:M * action #133025: Configure Virtual Interfaces instructions do not work on Leap 15.5 size:M * action #133232: o3 hook scripts are triggered but no comment shows up on job * action #133301: quick-fix for openQA-in-openQA lockscreen race condition * action #133349: openQA-in-openQA tests fail in zypper command with 'Maximum (6) redirects followed' auto_review:"Test died.*retry -s 30.*zypper.*nginx":retry size:S * action #133352: Activating systemd target openqa-worker.target when openqa-worker-auto-restart@ is already used causes havoc size:M * action #133451: Backend dies when passing non-US-keyboard letters to the 'type_string' function size:M * action #133496: [openqa-in-openQA] test fails due to not loaded install test modules * action #133520: [tools] please add useful information for openqa-clone-custom-git-refspec * action #133673: nginx on ariel logs in multiple locations size:M * action #133769: hydra.opensuse.org causing excessive load on o3 size:S * action #133772: Munin making many requests on o3 size:M * action #133889: [alert] Minion jobs failed hook alert * action #133979: test fails in openqa_webui * action #134114: Ensure to call OpenQA::Setup::read_config in unit tests * action #134279: jenkins build submit-openQA-TW-to-oS_Fctry failed * action #134390: Log proactively where wheel components are located to ease debugging size:M * action #134600: [OSD] Figure out that some 64bit-ipmi workers change to iPXE bootmod size:M * action #134693: Only one or two jobs running on osd for several hours * action #134717: salt-states-openqa pipeline fails because of telegraf checks * action #134732: o3 logreport and osd journal - Publishing opensuse.openqa.job.done failed: Connect timeout * action #134810: [tools] GitlabCI deploy on salt-states-openqa took too much time * action #134837: SLE test repo not updated on OSD, cron service was not running since 2023-08-29, fetchneedles not called size:M * action #134876: Many test suites failed to be triggered in SLE16SP6 Build 16.1 * action #134933: Filter openQA todo-jobs on /tests belonging to one "review squad" size:M * action #135035: Optionally restrict multimachine jobs to a single worker * coordination #135122: [epic] OSD openQA refuses to assign jobs, >3k scheduled not being picked up, no alert * action #135134: [tools] GitlabCI salt-pillars-openqa deploy failed on baremetal-support.qa.suse.de * action #135239: Conduct lessons learned "Five Why" analysis for "OSD openQA refuses to assign jobs, >3k scheduled not being picked up, no alert" size:M * action #135347: openqa-investigate needs more documentation * action #135362: Optimize worker status update handling in websocket server size:M * action #135407: [tools] Measure to mitigate websockets overload by workers and revert it size:M * action #135482: Move to systemd journal only on o3+osd (was: Missing openqa_websockets log file on OSD for websocket server) size:M * action #135521: os-autoinst test t/14-isotovideo.t commonly runs into timeout executed locally, in subtest 5 "standard tests based on simple vars.json file" size:M * action #135803: hook_scripts apparently stuck for 8h (by now back to good) size:M * action #135914: Extend/add initial validation steps and "best practices" for multi-machine test setup/debugging to openQA documentation size:M * action #136013: Ensure IP forwarding is persistent for multi-machine tests also in our salt recipes size:M * action #136151: qem-bot unit testing broken with openqa-client==4.2.2 * action #136154: multimachine tests restarted by RETRY test variable end up without the proper dependency size:M * action #136244: Ensure arbitrary comments can be taken over to new jobs size:M * action #136364: openQA-in-openQA tests fail due to new gnome version * action #137105: Handle Perl deprecation warning messages gracefully (Mojo::File::spurt is deprecated in favor of Mojo::File::spew) size:M * action #137300: [FIRING:1] (Incomplete jobs (not restarted) of last 24h alert Salt size:M * action #137303: CircleCI t/api/14-plugin_obs_rsync_async.t failure size:M * action #137426: test fails in openqa_webui auto_review:"Test died:.*Listening at.*timed out" * action #137525: os-autoinst codecov "fully_covered" check fails in unrelated PRs, maybe since https://github.com/os-autoinst/os-autoinst/pull/2364 * action #137765: logwarn does not work on new o3 (anymore?) size:M * action #137825: Urgent/Immediate tickets can only be in new/workable/progress/resolved - this needs to be mentioned in the wiki and also reflected in the backlog status * action #137828: [spike solution][timeboxed:10h] Notification if one of the queries on https://os-autoinst.github.io/qa-tools-backlog-assistant/ is red, e.g. write email to our Slack or o3-admins from backlogger size:S * action #137831: Reduce limit on feedback tickets to 10 * action #137834: Introduce a rule or guideline how to communicate clearly who is on the next steps size:S * action #137837: [spike solution][timeboxed:10h] Come up with a ticket template extension that strongly encourages reproducers and impact be included size:S * action #138029: [research][timeboxed:10h] How to cache "wheel" repositories which are stored on github size:M * action #138032: Find out most/least used testapi functions to decide about where to extend/cleanup size:M * action #138203: [openQA-in-openQA] CI jobs show error but don't fail the CI job as they should *and* openqa_install+publish missing size:M * action #138287: petrol sometimes take a long time to respond/render http://localhost:9530/influxdb/minion * action #138299: Make the final aggregation messages from openqa-investigate more prominent size:S * action #138302: Ensure automated openQA tests verify that os-autoinst-setup-multi-machine sets up valid networking size:M * action #138416: Unify GitHub Actions for QA Projects size:M * action #138440: devel:openQA container files follow cha.obs.supported_formats#id-1.5.10.8.5 and set BuildVersion+BuildName consistently size:S * action #138464: [qe-tools] openqa-worker stopped to work or wait for : A QEMU instance using the current pool directory is still running (PID: 3911) - checking again for web UI 'localhost' in 114.58 s * action #138593: Restart of scheduled products is prone to retriggers by humans * action #138674: Reusable github workflow in openQA is causing a problem for os-autoinst-bot dependency PR * action #138698: significant increase in multi-machine test failures on OSD since 2023-10-25, e.g. test fails in support_server/setup size:M * action #139055: Comments mentioning bugrefs as part of a sentence are treated like bug refs and taken over size:S * action #139073: ObsRsync plugin needs to support authentication with 2FA size:M * action #139136: Conduct "lessons learned" with Five Why analysis for "test fails in iscsi_client due to salt 'host'/'nodename' confusion" size:M * action #139262: Request for optimse opening LTP test case * action #150917: Restarting a job together with failed children will break dependencies of the new job * action #150959: openQA fails to build on SLE/Leap 15.6 Backports due to test-unit-and-integration failures size:M * action #150992: [timeboxed][spike solution:20h] openQA tests in pull requests to github.com/os-autoinst/os-autoinst-distri-opensuse/ size:M * action #151138: [openQA][aarch64][media] 15-SP6 Build39.1 media for aarch64 cleaned up while suse.asia instance was still using it size:S * action #151219: [bug][os-autoinst-distri-opensuse] No .perltidyrc * action #151258: Boot order handling in os-autoinst has various problems due to confusion between `-boot order`, `-boot once` and `bootindex`, e.g. PXEBOOT=once does not work as intended size:M * action #151310: [regression] significant increase of parallel_failed+failed since 2023-11-21 size:M * action #151399: Identify reproducible *infrastructure* issues using openqa-investigate size:M * action #151402: [spike solution][timeboxed:20h] Allow to search for tests by comment on the UI size:M * action #151522: openqa-clone-job succeeds even when assets are missing size:M * action #151576: Provide FQDN worker information in the openQA webUI * action #152077: [sporadic][unstable] openQA unit test t/ui/15-comments.t failed "four pagination buttons present (one is >>)" size:M * action #152170: Run openQA tests in pull requests to github.com/os-autoinst/os-autoinst-distri-opensuse/ size:M * action #152281: Schedule openQA SLE maintenance bisect jobs with lower priority same as openqa-investigate * action #152287: [FIRING:1] (Packet loss between worker hosts and other hosts alert Salt 2Z025iB4km): qe-jumpy.suse.de unreachable * action #152365: os-autoinst-openvswitch.service fails on start-up size:S * action #152389: significant increase in MM-test failure ratio 2023-12-11: test fails in multipath_iscsi and other multi-machine scenarios due to MTU size auto_review:"ping with packet size 1350 failed, problems with MTU" size:M * action #152392: Why does the investigation diff show PRJDIR value "/var/lib/openqa/cache/openqa.suse.de" vs. "/var/lib/openqa/share" when both should use the cache service? size:M * action #152485: test fails in mr_test * action #152545: Files have been deleted from the cache while the job was running size:M * action #152560: [alert] Incomplete jobs (not restarted) of last 24h alert Salt * action #152569: Many incomplete jobs endlessly restarted over several weeks size:M * action #152657: os-autoinst-scripts OBS build fails * action #152681: openqa-investigate creates wrong CASEDIR setting for some tests size:M * coordination #152847: [epic] version control awareness within openQA for test distributions * action #152853: Prevent faulty openQA workers causing wrong openqa-investigate conclusions size:M * action #152855: ci: Warning in test output about Devel::Cover: This version of Devel::Cover was built with Perl version 5.038000. size:M * action #152889: test fails in boot * action #152939: Find "last build" of a product over API size:M * action #152957: Weekly evaluation of cycle-times within SUSE QE Tools size:S * action #153145: [ci] Failing openQA pull requests in generate-packed-assets step * action #153235: Running jobs using --no-cleanup worker fails when using SKIPTO * action #153340: OBS CI checks for os-autoinst subpackages size:M * action #153346: Update Grafana API call to get firing alerts to new API size:M * action #153427: Improve updating cached assets size:M * action #153466: Notification if one of the queries on https://os-autoinst.github.io/qa-tools-backlog-assistant/ is red, e.g. write email to our Slack or o3-admins from backlogger size:M * action #153475: Reconsider the formatting of variable-names in the reason field, e.g. "$auto_clone_regex" size:S * action #153499: Ensure the openQA developer mode works straight-forward in the container setup when following our documentation size:M * action #153616: Prevent `duplicate key value violates unique constraint` on image uploads size:S * action #153763: [regression] Fix dry-run of openqa-investigate size:S * action #153769: Better handle changes in GRE tunnel configuration size:M * action #153793: Ensure proper whitespace in os-autoinst scripts size:M * action #153859: Preview images for backlog status are broken when they shouldn't size:S * action #153874: [regression] openqa-investigate post investigate feature is broken * action #154021: [alert] Ratio of not restarted multi-machine tests by result * action #154156: [spike][timeboxed:10h] Cache test distributions from git on production size:S * action #154237: [spike][timeboxed:10h] Ensure the worker cache doesn't duplicate git caching of test distributions on o3 size:S * action #154240: Ensure cloning openQA jobs with GIT_CACHE_DIR works in usual use cases * action #154261: [spike][timeboxed:20h] batch commenting on all openQA jobs, e.g. involving a specified SLE maintenance incident in webUI size:M * action #154537: Clicking "Apply" on /tests/overview filter box looses all distri selections but one size:M * action #154552: [ppc64le] test fails in iscsi_client - zypper reports Error Message: Could not resolve host: openqa.suse.de * action #154723: Complete list of openQA+os-autoinst+dependencies packages not currently in current SLE in development is known size:M * action #154783: [spike][timeboxed:10h] Run os-autoinst-distri-example directly from git and ensure candidate needles show up on the web UI size:S * action #154816: "Mojo::Reactor::Poll: Timer failed: Invalid characters in X-API-Key" when starting registry.suse.de/home/okurz/bci/15.5/containers_backports_updates/suse/openqa-single-instance:latest * action #155062: Unify GitHub Actions for QA Projects - perltidy in os-autoinst size:M * action #155068: [sporadic] failing openQA unit test t/ui/16-tests_job_next_previous.t "Failed test '.dataTables_wrapper present'" * action #155170: [openqa-in-openqa] [sporadic] test fails in test_running: parallel_failed size:M * action #155173: [openqa-in-openqa] [sporadic] test fails in openqa_worker: os-autoinst-setup-multi-machine timed out size:M * action #155218: [spike][timeboxed:30h] Use scenario definitions instead of job group templates for os-autoinst-distri-opensuse size:M * action #155278: o3 aarch64 multi-machine tests on openqaworker-arm21 and 22 fail to resolve codecs.opensuse.org size:M * action #155305: Ensure our commonly used openQA installation autoyast profile covers chronyd size:M * action #155512: Complete list of openQA+os-autoinst+dependencies packages not currently in current SLE in development is known - automatically and recurringly generated size:M * action #155713: openQA local tests should work out of the box even if node assets are not yet available size:M * action #155716: [alert] openqa-worker-cacheservice fails to start on worker29.oqa.prg2.suse.org with "Database has been corrupted: DBD::SQLite::db commit failed: disk I/O error" size:S * action #156052: [alert] Scripts CI pipeline failing after logging multiple Job state of job ID 13603796: running, waiting size:S * action #156394: [tools] some Automatic investigation jobs for job 13642310 only run part of the test modules NOT ALL: [is it by design?] size:M * action #156625: [alert] Scripts CI pipeline failing due to osd yielding 503 - take 2 size:M * action #156748: investigation script should not schedule a job with a broken CASEDIR size:S * action #156754: "DBIx::Class::Row::update(): Can't update OpenQA::Schema::Result::JobLocks=HASH(0x55b77ea45e28): row not found at /usr/share/openqa/script/../lib/OpenQA/Resource/Locks.pm line 139" size:S * action #156769: openQA nightly documentation build CI jobs fail with "ERROR: Error installing asciidoctor-pdf: /usr/bin/ruby.ruby2.5…can't find header files for ruby", possibly needs update to more recent ruby version? size:S * action #156907: [tools][qe-core][leap15.6 Beta]test fails in openqa_bootstrap with Beta build size:M * action #157018: [sporadic] Build failed in Jenkins: submit-openQA-TW-to-oS_Fctry - Error 503: Service Unavailable size:S * action #157147: Documentation for OSD worker region, location, datacenter keys in https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls size:S * action #157270: [spike solution][timeboxed:20h] Run os-autoinst-distri-openQA directly from git without anything related in /var/lib/openqa/share/tests * action #157273: Run os-autoinst-distri-openQA directly from git without anything related in /var/lib/openqa/share/tests - Follow-up with ideas not tried out in the spike solution #157270 size:M * action #157333: Log all job setting changes in autoinst-log.txt * action #157339: os-autoinst t/14-isotovideo.t is again taking too long (>20s on my setup) size:M * action #157369: Handle all node dependabot updates, not just security updates in our usual work processes * action #157432: parted /dev/sda disk got error at powerVM worker * action #157534: Multi-Machine Job fails in suseconnect_scc due to worker class misconfiguration when we introduced prg2e machines * coordination #157537: [epic] Secure setup of openQA test machines with secure network+secure authentication * action #157540: [sporadic] ci openQA: t/33-developer_mode.t fails size:M * action #157543: [sporadic] ci openQA: t/ui/23-audit-log.t fails size:M * action #157576: Cropped openQA version string on bottom of page if custom links_footer_left and/or links_footer_right is used size:S * action #157774: Empty scenario definitions causing Use of uninitialized value $testsuite_name in hash element in lib/OpenQA/Schema/Result/JobGroups.pm size:S * action #157912: Scheduling a product can fail on asset creation because unique constraint is violated * coordination #157969: [epic] Upgrade all our infrastructure, e.g. o3+osd workers+webui and production workloads, to openSUSE Leap 15.6 * action #157984: Upgrade CI container image versions to Leap 15.6 * action #157987: Upgrade qam hosts maintained by us to latest stable, i.e. Leap 15.6 * action #157990: Upgrade OBS package CI checks to Leap 15.6 (os-autoinst+openQA) size:M * action #157993: Ensure devel:openQA is built correctly for Leap 15.6 * action #157996: Upgrade all other LSG QE salt controlled machines to openSUSE Leap 15.6 * action #158125: typing issue on ppc64 worker - only pick up (or start) new jobs if CPU load is below configured threshold size:M * action #158236: Backlog Limits Checker github workflow fails on pull requests from forks size:S * action #158422: flaky sporadic test failures t/ui/13-admin.t * action #158455: [spike][timeboxed:10h] openQA worker native on s390x * action #158553: [tools] test fails in openqa_worker trying to download rpm files from devel:openQA ending in HTTP error 404 despite retries * action #158628: Prevent passwords being logged in s390x kvm test cases * action #158805: Job details on minion dashboard inaccessible, observed on both o3+osd * action #158808: Prevent HTTP response codes 500 as observed in OSD monitoring size:M * action #158814: o3 logreport: Active version 24 is greater than the latest version 23 at .../Mojo/Pg.pm size:M * action #158826: openqa-in-openqa - test fails in start_test due to selecting the wrong base qcow image size:S * action #158985: openQA worker native on s390x * action #159168: [openqa-in-openqa] Builds in openQA job group very broken since 2024-04-16 * action #159171: Create and maintain up to date version of test distri/needles for webui size:M * action #159348: s390x kvm jobs incomplete with auto_review:"cache failure: Failed to send asset request for SLE-Micro-.*Cache service enqueue error 500: Internal Server Error" size:M * action #159384: Add CORS headers size:S * action #159408: Upgrade bootstrap from 4.6.1 to 5.3.3 size:M * action #159411: Upgrade codemirror from 5.58.2 to 6.0.1 size:M * action #159444: Many minion jobs failing with rc_hook error code because progress is unavailable * action #159447: logreport o3: Can't open file ".../libssh-18.txt": No such file or directory at OpenQA/Schema/Result/JobModules.pm line 113 size:M * action #159663: Prevent missing js assets in PR checks as observed in #159411-11 * action #159720: [OBS] openQA packages failing with "File not found: /home/abuild/…/usr/share/openqa/node_modules" apparently on all architectures but x86_64 * action #159747: Consolidate openQA related containers size:S * action #159792: Add better logging for 500 errors on websocket routes size:M * action #159888: Open pull requests on Backlog Limits Checker use up too much space * action #160089: Handle uncommented package lock on "kernel-default" and "kernel-default-base" on openqa-piworker * action #160095: s390x libvirt started kvm machines on Leap 15.6 fail with "unsupported configuration: machine type 's390-ccw-virtio-8.2' does not support ACPI" * action #160098: After upgrade to Leap 15.6 osiris again showed no proper mount points for libvirt VMs * action #160111: Latest SR of openQA to fctry was declined, https://build.opensuse.org/request/show/1172817 * action #17412: [qam] Work for Maintenance Incidents tests * action #46502: [tools] Support "record_info" in serial_failure_detection * action #69328: [o3][s390x] Early fail on s390x workers: connection refused * action #69432: test fails with no module details after boot_ltp, broken run-time scheduling? * action #75373: Reserve repo links of SLE-15-SP2-Online-s390x-GM-Media1 and SLE-15-SP2-Online-x86_64-Build209.2-Media1 * action #80670: [opensuse][gnuhealth] test fails in gnuhealth_client_preconfigure due to incorrect needle about "host field selected" when it is not selected * action #89990: Error on tests/kernel/run_ltp.pm: Can't locate sle/tests/kernel/run_ltp.pm * action #91341: [qac][jeos][opensuse] test fails in vim: ssh test use the wrong console font * action #93826: test fails in openqa_webui - openqa_from_containers - package installation times out, just seems to be slow due to recent opensuse mirror problems * action #94465: [tools] zkvm tests are scheduled by retriggering month old jobs even though we do not have any "svirt" workers anymore * action #94714: [tools] Prevent rogue SCC servers to break maintenance tests auto_review:"2021-06-25.*no candidate needle.*scc-invalid-url":retry * action #99345: [tools][qem] Incomplete test runs on s390x with auto_review:"backend died: Error connecting to VNC server.*s390.*Connection timed out":retry size:M * action #105506: [sporadic][tools] openQA-in-openQA test sporadically fails in shutdown * action #106257: [sle][security][migration][PowerVM][hardware]test fails in await_install which is caused by disk error on redcurrant-*.qa * action #108953: [tools] Performance issues in some s390 workers * action #109046: [tools] auto_review:"Unable to find image SLES15-SP3-JeOS.x86_64-15.3-kvm-and-xen-GM.qcow2.*svirt":retry * action #109737: [opensuse][sporadic] test fails in chromium due to lost characters when typing in the address bar size:M * action #110803: [tools] openqa_from_containers: test fails in setup_env size:S * action #111416: [tools] Firefox: new screen for total cookie protection * action #111854: Leap 15.4: Please the fix link on openqa.opensuse.org for goldmaster DVD (maintenance) * action #112415: [qa-tools] handle new openSUSE-Leap-Micro-5.2 * action #114944: test fails in openqa_webui due to missing Tumbleweed assets * action #116704: [qe-core] nvme@uefi: boot from DVD post install instead of from HDD * action #117280: [sporadic] openQA-in-openQA test fails in openqa_worker trying to download a file, more retries? size:S * action #119566: [qe-core] Allow to send keyboard commands to rpi realhw SUTs size:M * action #119851: backend died: Failed to select socket for reading * action #119923: [tools][qe-core][leap-on-osd] test fails in setup_online_repos or await_install due to seemingly very slow repositories from download.opensuse.org * action #120288: [tools] cloud based tests fail due to traffic to cloud blocked auto_review:"2022-11-0.*Test died: (Waiting for Godot.*ssh|Cannot find image after upload)":retry * action #122830: [tools][openQA-in-openQA][sporadic] test fails in login, likely due to new Firefox popup about a "survey" size:M * action #123106: [tools][openQA-in-openQA][sporadic] test fails in docker build size:M * action #123496: [tools][openQA-in-openQA][sporadic] test fails in worker because of GNOME authentication dialog size:M * action #123797: Continuous retriggers of GNOME:Next openQA tests * action #123864: [openqa-in-openqa][sporadic] test fails in start_test due to empty response from o3 size:M * action #124538: generation of png from sound files creates a backtrace auto_review:"sh:.*snd2png.*HASH":retry * action #125507: [qe-core] test fails in firefox: new 'Still trying Firefox' popup needs handling * action #125783: [jeos] Test fails in kdump_and_crash on SLE 12sp5 and 15sp4 XEN after worker migration from SLES to Leap 15.4 * action #125909: [tools] Support the task "openQA: eliminate i586 from openSUSE:Factory" size:M * action #126734: incomplete job with 404 for hdd asset: sle-15-SP4-Server-DVD-Updates-s390x-Build20230326-1-mau-extratests_fips_kernelmode@s390x-kvm-linuxOne 0 * action #126923: [tools] cannot clone job from O3 because assert cannot be downloaded * action #127550: test fails in bootloader_zkvm * action #128804: [openQA-in-openQA] openqa_from_containers test fails in search * action #128810: [openQA-in-openQA] openqa_from_git fails in search * action #130510: [tools] https://github.com/openSUSE/os-autoinst-distri-opensuse should be deleted, since not used in favor to https://github.com/os-autoinst/os-autoinst-distri-opensuse * action #131051: [openQA-in-openQA] test fails in openqa_worker: python310 vs python311 * action #134285: [tools] Openqa bootstrap: test fails in test_results size:M auto_review:"Test died: no candidate needle with tag.+openqa-testresult" * action #135143: [tools] test fails in openqa_from_git -> dashboard size:M auto_review:"no candidate needle.*boot-menu, openqa-desktop":retry * action #135731: [tools][opensuse] Enable Slowroll tests size:M * action #136130: test fails in iscsi_client due to salt 'host'/'nodename' confusion size:M * action #137006: Maintenance bot re-schedules incident each time it runs job "schedule incident" * action #137384: [tools][s390x] worker imagetester can't reach SUT auto_review:"backend done: Error connecting to : Connection timed out" size:M * action #150911: remote_{vnc,ssh}_controller: unable to refresh repo download.o.o * action #151612: [kernel][tools] test fails in suseconnect_scc - SUT times out trying to reach https://scc.suse.com * action #152461: [core][tools] test fails in various s390x-kvm tests with "s390x-kvm[\S\s]*(command 'zypper -n in[^\n]*timed out|sh install_k3s.sh[^\n]*failed)" * action #152667: openQA-in-openQA tests fail since 2023-12-14 in gnome login dialog * action #152755: [tools] test fails in scc_registration - SCC not reachable despite not running multi-machine tests? size:M * action #153057: [tools] test fails in bootloader_start because openQA can not boot for s390x size:M * action #156067: [alert] test fails in setup_multimachine * action #156919: [sporadic] test fails in start_test - openqa-cli api request to o3 itself often times out trying to get latest 'ping_client' auto_review:"openqa-cli api --host.*openqa.opensuse.org.*ping_client":retry size:S * action #157414: Network broken with multimachine on multiple workers (broken packet forwarding / NAT) size:M * action #157864: [qe-core][jeos] test fails in apache_ssl * action #158245: test fails in openqa_worker * action #159558: network unreachable on aarch64-o3 * coordination #37910: [tools][epic] Migration of or away from qanet.qa.suse.de * action #38012: [tools][labs][medium] Setup DHCPv6 and DNS AAAA records for VLAN12 * action #38018: [labs][tools] Setup new qanet * action #52655: [epic] Move openqa-review from cron-jobs on lord.arch to a more sustainable long-term solution * coordination #69310: [epic] SUSE QA tools team ticket process helpers * action #69322: Automatic check for SUSE QA tools WIP-Limit based on tickets * action #73468: SUSE QA tools team ticket process helpers: Set due date on tickets in redmine based on SLOs * coordination #77899: [epic] Extend "auto-review" for failed jobs as well * action #77944: Run "auto-review" more often but alarm less * action #78187: Import and enable Testreport data into SMELT * action #80414: [proof-of-concept] Extend "auto-review" for failed jobs as well, start with o3 * action #80418: [learning] Fix parse errors in "openqa-investigate" "parse error: Invalid numeric literal at line 1, column 10" * action #80690: [qem] Testing of packages in different codestreams but in one incident * action #80806: Extend "auto-review" for failed jobs as well - Generalize openqa-monitor-investigation-candidates to look at more than just one job group * action #80808: Extend "auto-review" for failed jobs as well - enable same as on o3 but on osd * action #81192: [tools] Migrate (upgrade or replace) qanet.qa.suse.de to a supported, current OS size:M * action #81200: [tools][labs] some partitions on qanet are 100% full, seems like /data/backups has no new archives since 20201009 due to that * action #87755: [teregen] Replace productdefs by API call size:M * action #88127: [tools][qem] Test coverage DB for maintenance updates * action #88183: [spike][timeboxed:20h] rancher: Create a simple selenium test for the UI as proof of concept * action #88485: [teregen] Fetch and store coverage info for each incident * action #89080: Flatpak OBS builds fail on tumbleweed * action #89491: Add flatpak build test to obs-build CI * action #90047: Improve dependency detection in cpanspec * action #90401: [teregen] Integrate coverage information in a presentable way into test template * action #90404: [teregen] Update TeReGen for deployment on qam2 * action #90428: proposal: Plan to demo features in weekly meetings depending on due-date * action #90441: Only set due date on tickets in progress * action #90443: Reset due date on tickets when it doesn't apply / only apply due-date when assignee actually works on tickets * action #90764: auto-merge github feature can be enabled for os-autoinst-distri-opensuse * action #91356: Save openqa-review reports as gitlab CI artifacts * coordination #91646: [saga][epic] SUSE Maintenance QA workflows with fully automated testing, approval and release * action #91665: Add projects that we maintain to "SUSE Projects" list * action #91674: SUSE QE Tools process proposal: Go further on regression fixing * action #92007: proposal: Move SUSE QE Tools workshop to 0700 UTC after multiple conflicts * action #92122: Improve dependency detection via MYMETA.json in cpanspec * action #92125: Move "MR" on submission tests into a separate job group * action #92149: crosscheck status/goal/maintainership of a machine "qam2" * action #92221: support for SLE-Module-NVIDIA-Compute_15 * action #92287: Document access to raw openqa database * action #92341: fix potential leak of tempfiles of openqa-label-known-issues * action #92755: Investigate problems with post-commit hook for testreports * action #92851: Workshop series proposal "How SUSE QE teams review openQA test results" * action #93522: [tools][qem] auto-approval of kgrafts and live-patches * action #93710: Reference individual openqa-review reports in gitlab CI artifacts, e.g. using gitlab pages * action #93799: teregen: Improvement of usability of disabled testcases notification size:M * action #93838: Use new OBS SCM integration to trigger OBS checks on pull/merge requests for our projects * action #93916: idea from SUSE QE Tools retro 2021-06-11: Ask stakeholders for their list of priorities regarding our tasks * action #93931: [tools][qem] MTUI support multiple versions of package in incident * action #93934: [tools][qem] template generator - multiple package version in incident * action #93943: openqa-review pipeline fails accessing OSD test overview pages sometimes, more retries? * action #93988: build openqa-review container within OBS with ca-certificates-suse package * action #94246: Collect data on update/packages and maintenance workload for 2020/2021 * action #94486: Complete and deploy testreport db and page on smelt * action #94661: [teregen] template generator - make sure to skip SUSE:Updates:openSUSE-* release targets for test report generation * action #95000: Remove SUSE-internal links to line teams in our team roster which should not be needed size:XS * action #95033: openqa-review fails upon trying to access openqa with no-urlencoded addresses * action #95221: Provide more people with administrative access to services on qam2.suse.de, at least qa-maintenance/openQABot, i.e. increase bus factor size:M * action #95254: [teregen] Out of memory exception during template generation * action #95335: openqa-review: Only send the openqa_suse_de_status to openqa-suse-status@suse.de, ignore all others size:S * action #95503: qam.suse.de api returns 500(Internal Server Error) for requests with valid logs * action #95742: In openqa-investigate jobs add URL to original job as setting * action #95746: Identify likely "sporadic" openQA tests with "openqa-investigate" size:M * action #95765: Provide more people with administrative access to services on qam2.suse.de, adding ssh keys for existing tools team members * action #95822: qa-maintenance/openQABot failed to trigger aggregate tests with "urllib.error.HTTPError: HTTP Error 500: Internal Server Error" * action #95854: Grafana doesn't show information during some minutes, but also we got alerts on the CPU after the recovery * action #95989: openqa-review gitlab CI pipeline jobs fail with "AttributeError: 'NoneType' object has no attribute 'group'" * action #96016: [oscqam] Traceback on running ibs qam assigned -G qam-manager * action #96350: Improve openqa-review generation: Add date to index size:S * action #96353: Improve openqa-review generation: If individual reports can not be generated, mention that as well with an explicit message * action #96356: Improve openqa-review generation: Try to preserve old reports if individual new ones can not be generated * action #96362: Improve openqa-review generation: Passthrough all vars and use git managed file for config * action #96539: Conclude migration of qam.suse.de * action #96560: openqa-review gitlab CI pipeline failed to access remote repo * action #96752: 'openSUSE-SLE' product schedule jobs to OSD * action #96792: openqa-review index page doesn't fail when a report is missing * action #96827: gitlab CI pipeline failed with Job failed: pod status is Failed size:S * action #96899: jenkins.qa.suse.de is unavailable, error from /var/log/jenkins/jenkins.log "java.io.IOException: NSS initialization failed" size:M * action #96968: qem-dashboard jobs api extension * action #96998: Increase bus factor for bot-ng size:M * action #97091: teach SUSE QE Tools team about "Extreme Programming" and Kanban size:M * action #97367: Update CI image QA:Maintenance/openSUSE-Leap-Container to Leap 15.3 * action #97403: openqa-review: Polish job group section titles in todo-only mode size:S * action #97481: qam dashboard including development job group * action #97535: cpanspec: Fix package (dependency and missing script) * action #97661: [retro] find a better time for "estimation" meeting which is APAC friendly size:S * action #97733: Bot fails on Failed to query latest publiccloud tools image using {settings['PUBLICCLOUD_TOOLS_IMAGE_QUERY']} and no aggregates are scheduled * action #97847: Feature request: Ability for bot-ng to schedule test runs on custom openQA instance * action #97955: [openqabot] Possible TypeError during execution "'NoneType' object is not subscriptable" * action #98198: [bot-ng] Fix auto-approve on updates without Incidents size:M * action #98262: MTUI fails to export version of the package prior installation of the update * action #98394: Fix MicroOS scheduling to have incidents * coordination #98457: [epic] Handle openqa-review reminder comments on very old jobs better * action #98475: links in https://progress.opensuse.org/projects/qa/wiki#Definition-of-DONE are dead size:S * action #98637: [timeboxed:20h] try to enable comments on IBS (and smelt) again from SUSE QA maintenance openQA test results size:M * action #98667: Unhandled [Alerting] Queue: State (SUSE) alert for > 4h size:M * action #98673: [retro] Unhandled alert about job queue for nearly a day, users brought it up in chat, should have been picked up sooner size:S * action #98916: Improve alert handling - weekly alert duty * action #98997: [tools] [smelt] [python3.6] Replace all older text formatting with f-strings * action #99339: Find out with SUSE-IT what is the best way to collaborate based on tickets * action #99411: openqa-review report openqa_suse_de_status.html missing from https://openqa.io.suse.de/openqa-review/openqa_suse_de_status.html, page is 404 * action #99534: qa-maintenance / openQABot fails trying to access download.suse.de, we should provide certificates already in the container image size:S * action #100982: openqa-review: Do not post reminder comment if comment would be exactly the same as the last size:M * action #101045: presenter needed for SUSE QE Tools roadmap workshop 2021-11 on 2021-11-05 * action #101073: Smoke test of basic rust/cargo functionality size:S * action #101722: openqa-review: Do not post reminder comment in progress.opensuse.org if comment would be exactly the same as the last size:M * action #102059: Integrate the Slack feed notifications feature for progress queries * action #102200: openqa-review pipeline failed: 'NoneType' object is not subscriptable, or failed with `assert self.issue_type == "bugzilla"` size:M * action #102335: qa-tools-backlog-assistant: Automated runs of the workflow are making pull requests impossible to accept * action #102350: Move openqa-review CI to Github Actions * action #103197: qem-dashboard: Failed deployment pipeline * action #103464: qa-tools-backlog-assistant: Extract code into a GitHub Action for easier reusing * action #103762: gitlab CI pipeline failed with Error cleaning up pod: Delete ... connect: connection refused Job failed (system failure): prepare environment: waiting for pod running ... i/o timeout. Check ... for more information * coordination #103845: [epic] Get involved with Mojo-IOLoop-ReadWriteProcess * action #103999: Include Mojo-IOLoop-ReadWriteProcess in our team wiki for usual processes size:S * action #104004: Subscribe to Mojo-IOLoop-ReadWriteProcess github issues + pull requests size:S * action #104025: Grafana: grenache-1: partitions usage (%) alert * action #104031: bot-ng: Provide manual openQA trigger commands for retrying/retriggering/triggering special tests * action #104781: Getting familiar with oscqam plugin size:M * action #105028: https://progress.opensuse.org/projects/qa/wiki/Wiki is getting big, split out tools team to separate page? * action #105244: retrospective outcome: Come up with best practices for creating pull requests that depend on each other * action #105250: process: daily updates: Improve our feedback time on what we do aka. "Make dailys great again" * action #105612: [tools] Make yaml-ng default for repose * action #105891: retro: Our definition of non-estimated tickets is ambiguous * action #106179: No aggregate maintenance runs scheduled today on osd - dashboard.qem.suse.de down size:S * action #106368: openqa-review: Configurable no-reminder message pattern size:M * coordination #106546: [epic][tools] dashboard.qem.suse.de adoption * action #106547: Setup database for dashboard.qam.suse.de * action #106549: Deployment host for dashboard.qam.suse.de * action #106552: Document deployment process for dashboard.qam.suse.de * action #106907: Exponential backoff for reminders based on previous reminders size:M * action #106909: [openqa-review] reminder comments point to specific openQA test details steps or openQA comments size:M * action #107014: trigger openqa-trigger-bisect-jobs from our automatic investigations whenever the cause is not already known size:M * action #107173: s.qa.suse.de needs to be upgraded to a current OS * action #107227: bot-ng schedule aborted with "ERROR: something wrong with /etc/openqabot/singlearch.yml" size:M * action #107578: Upgrade backup.qa to Leap 15.3 * action #107671: No aggregate maintenance runs scheduled today on osd size:M * action #107731: Salt all SUSE QA machines, at least passwords and ssh keys and automatic upgrading size:M * action #107923: qem-bot: Ignore not-ok openQA jobs for specific incident based on openQA job comment size:M * action #108569: [tools] There are groups in Slack, we could have one for tools team, e.g. to be pinged in #eng-testing * action #108869: Missing (re-)schedules of SLE maintenance tests size:M * action #108944: 5 whys follow-up to Missing (re-)schedules of SLE maintenance tests size:M * coordination #109349: [saga][epic] libreQA and OpenOpenQA: Value through speed, flexibility and security with the community, for our customers * action #109488: qem-bot - better logging * action #109512: qem-bot - add vars with GitlabCI job link and qem-dashboard link * action #109623: Allow adding scheduling settings for informal purposes that are not added to openQA jobs * coordination #109641: [epic] qem-bot improvements * action #109701: enable qem-bot comments on IBS again after subscriptions can be personally configured * action #109779: Cannot approve incident due to test report parsing error * coordination #109818: [epic] qa-maintenance/openQAbot improvements * action #109878: bot-ng schedule/approve aborted * action #109977: qem-bot - approve pipeline failed with 403 forbidden size:M * action #110167: Tests for qem-bot * action #110176: [spike solution] [timeboxed:10h] Restart hook script in delayed minion job based on exit code size:M * action #110262: openqa_review: Github action failed with "tox4: command not found" * action #110409: qem-dashboard - remove old openQA jobs when rr_number changes size:M * action #110452: http://jenkins.qa.suse.de/ throws an error with "Faithfully yours, nginx." so the jenkins instance behind seems to be problematic * coordination #110575: [epic] Necessary change for qam-oscplugin: change the way we (un-)assign maintenance updates * action #110614: Add SLO based queries to https://os-autoinst.github.io/qa-tools-backlog-assistant/ size:S * coordination #110884: [epic] Properly maintained open source mtui+oscqam * action #111075: Collect code coverage for qem-bot * action #111078: Simple automatic test exercising one of the existing happy path workflows of qem-bot size:M * action #111272: oscqam reports in smelt show text like "See Testreport: >SUSE:Maintenance:24247:272218/log" * action #111338: Open source https://gitlab.suse.de/qa-maintenance/mtui size:M * action #111341: Open source https://gitlab.suse.de/qa-maintenance/qam-oscplugin/ size:M * action #111344: Ensure we know all relevant implications of open sourcing mtui+qam-oscplugin * coordination #111347: [saga][epic] Properly maintained Maintenance QA tooling * action #111407: bot-ng "sync incidents" step fails in gitlab CI, reason unclear, log too big * action #111446: openQA-in-openQA tests fail due to corrupted downloaded rpm auto_review:"Test died: command '.*zypper -n in os-autoinst-distri-opensuse-deps' failed at openqa//tests/install/test_distribution.pm line 1.*":retry * action #111506: qa-tools: qem-bot - Development results leaked to dashboard size:M * action #111710: [qa-tools] [tools] remove usage of *_TEST_TEMPLATE vars in qem-bot and openQA media definition if favor of *_TEST_REPOS size:M * action #111998: Make our SLE related tooling work with upcoming changes to build.suse.de (2FA and ssh key based authentication) size:M * action #112232: [tools] Multiple recurring failures due to zypper failing to download packages temporarily * action #112367: [tools] python-paramiko on Leap/SLE throws exception with ed25519 key size:M * action #112430: [qa-tools] [qem-bot] Incident schedule fails in preparation Incident instance * action #112871: obs_rsync_run Minion tasks fail with no error message size:M * action #112898: Minion workers alert triggering on and off size:M * action #113087: [qa-tools][qem-bot] malformed data in smelt incident causes smelt sync fail * action #113345: qem-bot does not ignore Development/Leap job groups as it should size:M * action #113797: Automated alerts and reminders about SLO's for openqatests size:M * action #114415: [timeboxed:10h][spike solution] qem-bot comments on IBS size:S * action #114694: Incident seems to have missing aggregate test results in qem-dashboard but openQA jobs exists size:M * action #114872: [tools] qam plugin throws exception in query for open requests in other groups * action #115103: [tools] osc-plugin-qam reads Incident Priority from IBS but that was replaced by SMELT size:M * action #115469: QA::Maintenance{,::Test} projects on IBS maintained by indivdual persons size:S * action #115544: [tools][osc-qam] osc qam my or osc qam list ends with KeyError: 'ReviewRequestID' * action #115565: Setup OBS integration for openSUSE/mtui and openSUSE/osc-plugin-qam size:M * action #116539: jenkins.qa.suse.de not reachable and not triggering tests like krypton anymore * action #116545: Automated alerts and reminders about SLO's for openqatests (only one reminder) size:M * action #116605: False SLO update comment for a Low ticket size:M * coordination #116623: [epic] Migration of SUSE Nbg based openQA+QA+QAM systems to new security zones * action #116626: Migration of SUSE QA systems to new security zones - QAM systems * action #116629: Preparation planning for migration of SUSE openQA+QA systems to new security zones size:M * action #116959: Unreliable/unusable audio connections using Jitsi instance on meet.opensuse.org from Android clients size:S * action #117043: Request DHCP+DNS services for new QE network zones, same as already provided for .qam.suse.de and .qa.suse.cz * action #117619: Bot approved update request with failing tests size:M * coordination #117694: [epic] Stable and reliable qem-bot * action #117790: osc-plugin-qam: TypeError in qam my subcommand during incident priority fetch size:M * action #118186: Prevent SLO alerts for tickets with open subtasks as they inherit priority and can not be changed directly size:M * action #119161: Approval step of qem-bot says incident has failed job in incidents but it looks empty on the dashboard size:M * action #119176: Automated alerts and reminders about SLO's for openqatests size:M * action #119281: [alert] baremetal-support: Memory usage alert size:M * action #119443: Conduct the migration of SUSE openQA systems from Nbg SRV1 to new security zones size:M * action #119446: Conduct the migration of SUSE openQA+QA systems from Nbg SRV2 to new security zones * action #119449: Conduct the migration of SUSE openQA+QA systems from Nbg QA labs to new security zones * action #119551: Move QA labs NUE-2.2.14-B to Frankencampus labs - bare-metal openQA workers size:M * action #119638: Ensure every physical machine within .qam.suse.de has an IPMI+eth L2 address entry in racktables size:M * action #120264: Conduct the migration of SUSE QA systems (non-tools-team maintained) from Nbg SRV1 to new security zones size:M * action #120267: Conduct the migration of openqa-ses aka. "storage.qa.suse.de" size:M * action #120327: monitoring data for jenkins.qa.suse.de is empty size:S * action #120468: [sporadic] Failed: os-autoinst/openqa-trigger-from-obs on master / test (7ea68e0) on CircleCI size:M * action #120525: Ensure our usual bugzilla integrated tooling works with the upgraded test instance size:M * action #120540: [timeboxed:10h][research] Find inefficient test implementations and backend use by a "tests per hardware machine ratio" size:S * action #121225: bot-ng - synchronize pipeline fails on GitLab size:S * action #121582: [tools][metrics] Calculate cycle + lead times for SUSE QE Tools continuously size:M * action #121612: [tools][experiment] split dev/infra work management within tools team for better focus size:M * action #121699: bot-ng - approve incidents pipeline fails with 401 unauthorized on GitLab size:S * coordination #121720: [saga][epic] Migration to QE setup in PRG2+NUE3 while ensuring availability * action #121729: [timeboxed:10h][research] Find out what libvirt can do to provide access only to a single VM for users/groups * action #121846: openQABot - schedule:openqabot pipeline fails on GitLab size:S * action #121903: [sporadic?] CI checks failing in https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess/actions size:M * action #122110: Fix https://build.opensuse.org/package/live_build_log/devel:languages:perl/perl-Crypt-HSXKPasswd/SLE_15/x86_64 size:S * action #122308: Handle invalid openQA job references in qem-dashboard size:M * action #122407: slo-gin commented about broken slo period for immediate ticket too early * action #122668: Flaky network connection from SUSE Nbg Frankencampus for okurz and mkittler size:M * action #122908: [tools] openSUSE conference 2023 contributions size:M * action #123064: bot-ng - pipelines in GitLab fail to pull qam-ci-leap:latest * action #123286: Bot and dashboard reference to wrong data and block update approval size:M * action #123289: SUSE Hack Week 22 in the SUSE QE Tools team size:S * action #123367: [tools] drop old testplatform code from teregen, repose and productdefs.pm size:M * action #123508: [tools][teregen] Handle "one instance is already running" better for template generator size:M * action #123523: [tools] osc-plugin-qam throws traceback on Tumbleweed size:M * action #123697: Conduct the migration of SUSE QA systems s390x zVM instances to new security zones size:M * action #123748: [tools] Add support for excluding packages from test flavor in bot config * coordination #123800: [epic] Provide SUSE QE Tools services running in PRG2 aka. Prg CoLo * action #124221: Repurpose quake.qe.nue2.suse.org (formerly known as cloud4) as employee-workstation replacement size:M * action #124233: [tools] Find more mentors for GSOC and send to organizer ddemaio size:S * coordination #124721: [epic] Ensure proper QE maintainership of Nbg QAM machines * action #124724: Ensure Nbg QAM machines have a current maintainer as "contact person" size:S * action #125144: Give members of SUSE QE Tools team a chance to get familiar with Nbg QAM machines size:M * action #125147: Nuremberg QA workstations and QAM labs cold storage move to Frankencampus size:M * action #125204: Move QA labs NUE-2.2.14-B to Frankencampus labs - non-bare-metal machines size:M * action #125231: Automate deployment of qanet config * action #125234: Decommission obsolete machines in qam.suse.de size:M * action #125243: Define GSOC project proposal next to https://github.com/openSUSE/mentoring/issues/120 size:M * coordination #125363: [epic] Improve collaboration with Eng-Infra * action #125444: Improve collaboration with Eng-Infra - SD ticket template size:M * action #125447: Clarify to Eng-Infra that SD tickets have flaws size:S * action #125450: Improve collaboration with Eng-Infra - Firewall management access, potentially also DHCP+DNS size:M * action #125519: version control PXE stuff on qa-jump * action #126548: [qem-dashboard] Add an API endpoint to flag openQA jobs as missing in openQA size:M * action #126551: [qem-bot] Flag missing openQA jobs with qem-dashboard API size:M * action #126671: [tools] osc-plugin-qam force reject switch * action #127025: [tools][metrics] Improve cycle + lead times in Grafana * action #127763: [teregen] Separate bugs with severity lower than Major in generated template * action #127907: jenkins package (and others) not upgraded on jenkins.qa.suse.de since some time size:M * action #128045: /var on qanet is 100% * action #128390: Move QA labs NUE-2.2.14-B to Frankencampus labs - infrastructure management improvements * action #128393: Move QA labs NUE-2.2.14-B to Frankencampus labs - recover openQA staging test setup size:M * action #128399: [bot-ng] Scheduler uses metadata from previous day size:M * action #128498: ARM server for UV squad (was: Requesting a quote for two Ampere Altra Servers to be used for various testing efforts inside the department) size:M * action #128822: processes on qanet slow to execute despite low load, e.g. htop - do we have outdated addresses pointing to wotan where we should use different hosts? * coordination #129280: [epic] Move from SUSE NUE1 (Maxtorhof) to new NBG Datacenters * action #129283: [tools] Help Needed: Active Inventory of Maxtorhof SRV1/SRV2/SRV2X * action #129391: [tools][discuss] Should we participate with in regular SLE maintenance test review with a "reviewer" role or rather support more squad rotation size:M * action #130312: [tools] URL listing TW snapshots (and the changes therein), has stopped working * action #130327: openqa snapshot-changes is broken * action #130796: Use free blades on quake.qe.nue2.suse.org and unreal.qe.nue2.suse.org as openQA OSD bare-metal test machines * coordination #130955: [epic] Migration out of SUSE NUE1 - QE setup in NUE3 * action #131144: Decide about all LSG QE machines in NUE1 size:M * coordination #131525: [epic] Up-to-date and usable LSG QE NUE1 machines * action #131528: Bring backup.qam.suse.de up-to-date size:M * action #132146: Support migration of osd VM to PRG2 - 2023-08-29 size:M * action #132158: Ensure that osd can work without relying on any physical machine in NUE1 size:M * action #132320: Bring styx.qam.suse.de up-to-date * action #132323: Bring arm4.qe.suse.de up-to-date * action #132347: Bring borg.qam.suse.de up-to-date * action #132353: Bring enterprise-nx02.qam.suse.de up-to-date size:M * action #132356: Bring fibonacci.qam.suse.de up-to-date * action #132359: Bring galileo.qam.suse.de up-to-date size:M * action #132362: Bring openqa-service.qe.suse.de up-to-date * action #132452: Bring seth+osiris up-to-date * action #132488: gitlab CI shows showing no logs or are getting stuck (was: qem-bot sync aggregates gitlab CI job times out after 2h) size:M * action #132617: Move of selected LSG QE machines NUE1 to PRG2e size:M * action #132620: Move of selected LSG QE machines NUE1 to NUE3 size:M * action #132623: Decommissioning of selected selected LSQ QE machines from NUE1-SRV2 * action #132689: [qem-dashboard] Incorrect behavior for Group Flavor feature size:M * action #133307: mtui: Connection to svn+ssh is not possible or the "inconsistent submission" * action #133454: bot-ng - pipelines in GitLab fail to pull qam-ci-leap:latest size:M * action #133457: salt-states-openqa gitlab CI pipeline aborted with error after 2h of execution size:M * action #133583: qem-bot approve incidents failed in gitlab CI, reason unkown size:M * action #133706: Setup of former QAM machines from NUE1-SRV2 in FC Basement * action #133748: Move of openqaworker-arm-1 to FC Basement size:M * action #134111: [tools] MTUI to check for install log duplicates * action #135647: Separate SLOs+SLAs size:M * action #135746: Longer term plan along with less confusion for the team with additional "Next" target version size:M * action #136328: Missing alert emails about some failing gitlab CI pipelines size:M * action #137144: Ensure that we have less or no workstation left clogging our FC Basement space size:M * action #137651: [qem-dashboard] CI tests using Playwright are broken * action #137870: [tools][retro] try out https://play.workadventu.re/ size:S * action #138314: requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://progress.opensuse.org/issues.json?query_id=830 * action #138356: Migration of qam.suse.de to PRG2 size:M * action #138446: Ensure SUSE QE tooling always uses authenticated IBS API access size:M * action #138491: tests fail in retry despite having successfully executed relevant commands such as git clone size:M * action #139097: Improve collaboration with Eng-Infra - Firewall management access, potentially also DHCP+DNS - take 2 * action #139112: Ensure OSD openQA PowerPC machine grenache is operational from PRG2 * action #139115: Ensure o3 openQA PowerPC machine qa-power8-3 is operational from PRG2 size:M * action #139130: Migration of openqa-service to PRG2 size:M * action #139199: Ensure OSD openQA PowerPC machine redcurrant is operational from PRG2 size:M * action #152470: openqa-service fetch_openqa_bugs "requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='bugzilla.suse.com', port=443)" * action #153649: reminders about still open pull requests in github projects we own * action #153670: Move of selected LSG QE machines NUE1 to PRG2e - fozzie size:M * action #153673: Move of selected LSG QE machines NUE1 to PRG2e - orion * action #153682: Move of selected LSG QE machines NUE1 to PRG2e - quinn size:M * action #153688: Move of selected LSG QE machines NUE1 to PRG2e - openqaw9-hyperv * action #153691: Move of selected LSG QE machines NUE1 to PRG2e - openqaw5-xen * action #153694: Move of selected LSG QE machines NUE1 to PRG2e - fibonacci * action #153697: Move of selected LSG QE machines NUE1 to PRG2e - sauron * action #153700: Move of selected LSG QE machines NUE1 to PRG2e - arm4 * action #153703: Move of selected LSG QE machines NUE1 to PRG2e - voyager * action #153706: Move of selected LSG QE machines NUE1 to PRG2 - amd-zen2-gpu-sut1 size:M * action #153709: Move of selected LSG QE machines NUE1 to PRG2e - ada size:M * action #153715: Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - whale * action #153718: Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - haldir size:M * action #153721: Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - legolas * action #153724: Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - blackcurrant * action #153727: Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - cloudberry size:S * action #153730: Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - huckleberry * action #153733: Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - soapberry size:S * action #153742: Move of OSD machine NUE1 to PRG2 - storage.qe.prg2.suse.org * action #153787: Move of selected LSG QE machines NUE1 to PRG2e - openqaworker20 size:M * action #153796: Prepare DHCP/DNS for qe.prg2.suse.org based on former qa.suse.de entries size:M * action #153799: Prepare DHCP/DNS for machines coming to qe.prg2.suse.org based on former qam.suse.de entries size:M * action #153937: [tools] Make the end of each meeting explicit size:S * coordination #154444: Ensure SAP QA machines in PRG2 J06 are usable * action #154447: Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - gollum * action #154450: Move of selected LSG QE machines NUE1 to PRG2e - openqaw7-hyperv * action #154453: Move of selected LSG QE machines NUE1 to PRG2e - openqaw8-vmware * action #154498: [spike][timeboxed:20h][integration] Approve/reject SLE maintenance release requests on IBS synchronously listening to AMQP events when testing for one release request as "openQA product build" is finished size:M * action #154699: https://os-autoinst.github.io/qa-tools-backlog-assistant/ not updated since 2023-01-31 size:M * coordination #154756: [epic] Decommission qa-maintenance/openQABot * action #154759: Decommission qa-maintenance/openQABot size:S * action #155179: Participate in alpha-testing of new version of velociraptor-client * action #155458: Seemingly reproducible build failures in devel:languages:perl perl-Mojo-IOLoop-ReadWriteProcess size:S * action #155629: [spike][timeboxed:6h][qem-dashboard] Order blocked incidents by priority to allow reviewers to focus on higher prio incidents first size:S * action #155755: OBS build errors in gitlint * action #155917: [backlogger] Count "Feedback" ticket state for cycle time as well size:S * action #156175: Support development of https://github.com/openSUSE/qem-bot/pull/154 size:M * action #156250: Ensure IPv6 works in qe.nue2.suse.org * action #156775: cpanspec should adopt new %patch syntax size:S * action #157204: Sync openQA job removal events to qem-dashboard listening to AMQP events size:M * action #157237: dependabot PRs for the dashboard are not getting approved and merged automatically size:S * action #157522: No ticket reminder comments about SLO's for openqatests size:M * action #157741: Approve/reject SLE maintenance release requests on IBS synchronously listening to AMQP events when testing for one release request as "openQA product build" is finished size:M * action #157753: Bring back automatic recovery for openqaworker-arm-1 size:M * action #157819: Can't login to walter1 and walter2 offline * action #157858: Repeated reminder comments about SLO's for openqatests size:S * action #158488: reconsider tools team estimation meeting times * coordination #158550: [epic][tools] extended metrics as team SLOs * action #159231: Bring back worker class "hmc_ppc64le-4disk" on redcurrant or another machine size:M * action #159306: Fix AAAA records in qe.prg2.suse.org size:S * action #159660: Consider why some picked up tickets take a long time to resolve size:S * action #12506: openqa-scripts on osd/o3 should be deployed automatically (CD) * action #18142: [labs] Create wiki with up to date information (linklist included) * action #19190: make use of ix64ph1014, e.g. for proxymode * action #19238: setup pool devices+mounts+folders with salt(was: ext2 on workers busted) * action #29015: [o3][monitoring] too often alert from nagios about swap space depleted; alarm notification disabled -> increase/disable swap? * action #37644: [tools] osd SSL certificate is only valid for openqa.suse.de, not for openqa.nue.suse.com * action #44078: Implement proper backups for o3 size:M * action #44612: Do we want to update http://tumblesle.qa.suse.de/ or decommision it? * action #49694: openqaworker7 lost one NVMe * action #51836: Manage (parts) of s390 kvm instances (formerly s390p7 and s390p8) with salt * action #53471: machine aarch64.o.o often unresponsive and needs power-cycle * action #57233: Properly handle transport errors in ObsRsync plugin * action #57680: o3: optimize fs utilization and resize root disk (was: change /var/log to bind mount to prevent out-of-space) * action #59300: auto_review:"DBus.*: The name org.opensuse.os_autoinst.switch was not provided":retry GRE tunnel settings not applied on initial setup / after reboot * action #59933: Prevent depletion of space on / with extra partition or quota on /home on o3 size:M * action #61994: VNC console corruption on aarch64 * action #63142: Upgrade firmware of ppc9 machine redcurrant * action #64279: [virtualization][OS upgrade] upgrade xen host openqaw5-xen.qa.suse.de * action #64685: openqaworker1 showing NVMe problems "kernel: nvme nvme0: Abort status: 0x0" * action #64700: setup o3 workers openqaworker4 and openqaworker7 for multi-machine tests size:S * action #64941: after every reboot openqaworker7 is missing var-lib-openqa-share.mount , check dependencies of service with openqaworker1 * action #64983: Staging machine openqa-staging-1.qa.suse.de does not boot after update to Leap 15.1 * action #65130: Upgrade of firmware(s) for cloudberry (power9 machine) * action #65154: root partition on osd exceeds alert threshold, 90%, after osd deployment -> apply automatic reboots to OSD machines * action #65178: Drop rsync.pl config from salt for osd and o3 * action #65975: Monitoring for "scheduled but not executed" (was: perl-Mojolicious-8.37 broke WS connection for (some?) workers) * action #66709: Storage server for OSD and monitoring * action #67804: use non-personal account and key for pushing needles on osd to gitlab.suse.de * action #68053: powerqaworker-qam-1 fails to come up on reboot (repeatedly) * action #68077: alert about too many failed minion jobs but https://openqa.suse.de/minion/jobs?state=failed shows none * action #68095: Migrate osd workers from SuSEfirewall2 to firewalld * action #68161: alert raised about "Minion workers down" after osd-deployment 2020-06-17 * action #68410: repeated alerts about "no data" that are not actionable and recover themselves often enough * action #68785: [monitoring] Setup of QA generic monitoring instance * action #68872: job age max exceeds alarm threshold * coordination #68923: [epic] Use external videoencoder in production auto_review:"External encoder not accepting data" * action #69202: icinga alert "openqaworker3.suse.de/Number of threads" * action #69475: [tools] openQA child task fails to download asset created by parent job * coordination #69478: [epic] Upgrade o3+osd workers+webui to openSUSE Leap 15.2 * action #69523: lessons learned: osd did not come up after reboot 2020-08-02 * action #69577: Handle installation of the new "Storage Server" * action #69610: ipmi management interface of openqaworker-arm-3 is inaccessible * action #69613: osd-pre-deployment checks fail due to invalid certificates for stats.openqa-monitor.qa.suse.de * action #69664: [osd][alert] CPU usage alert: IOwait too high * action #69667: missing monitoring data for vde after partitions where reordered * action #69694: openqa-worker systemd services running in osd which should not be enabled at all and have no tap-device configured auto_review:"backend died:.*tap.*is not connected to bridge.*br1":retry * action #69721: autoinst.qa is pingable and resolves to snipe.qa.suse.de, a VM, whereas there is a physical machine as well * action #69727: reduce heat in NUE-SRV2 * action #70768: obs_rsync_run and obs_rsync_update_builds_text Minion tasks fail frequently * action #70834: [alert] Refine I/O time alerts for OSD * action #70843: OSD automatic deployment failed in Sept, 02, 2020 * action #70885: [osd][alert] flaky file system alert: /assets * action #70909: salt CI jobs fail due to (now) missing python3 package * action #70918: gitlab pillar merge requests try to execute CI tests and fail because tests from states are not compatible * action #70939: [alert] ** PROBLEM Service Alert: ariel-opensuse.suse.de/root partition is WARNING ** * action #70966: ipmi management interface of openqaworker-arm-3 is inaccessible * action #70969: openqaworker-arm-2 stuck in system management menu after reboot * action #70975: [alert] too many failed minion jobs * action #70978: automatic reboots on o3 to activate new kernel versions * action #71011: [alert] workers alert * action #71098: openqaworker3 down but no alert was raised * action #71182: OBS package os-autoinst blocked since multiple days on "blocked: downloading 6 dod packages" * action #71191: inform EngInfra automatically if the IPMI interfaces are not accessible * action #71224: [osd][alert] NTP offset alert * action #71332: [alert] failed systemd service on openqaworker6: "display-manager" * action #71575: [osd][alert] limited /assets - idea: ask EngInfra for slow+cheap storage from central server for /assets/fixed only * action #71635: [osd][alert] QA-Power8-4-kvm: CPU Usage alert - alert needs to be refined * action #71893: can not reach openqaworker-arm-2-ipmi.suse.de DNS anymore from a gitlab CI pipeline * action #71908: [osd][alert] CPU usage alert (High IO wait on OSD) * action #71962: [osd-admins] openqaworker-arm-1: MergePoint email alert " Voltage sensor, warning event was asserted" * action #72136: [osd-admins] [Alerting] Workers alert during osd deployment, then "ok" after 1 minute, should not alert * action #72139: openQA services on OSD failed to connect to database * action #73165: [osd] Consolidate "expensive+fast" and "cheap+slow" storage after realizing vdc is "cheap+slow" as well * action #73174: [osd][alert] Job age (scheduled) (median) alert * action #73183: Extend vdc (/assets) * action #73189: Upgrade o3 workers to openSUSE Leap 15.2 after openqa-aarch64 already done * action #73228: [labs] Create guide for new machines (from the "customer perspective") * action #73234: ipmi management interface of openqaworker7 is inaccessible ("no matching cipher suite") and not pingable * action #73246: [osd-admins][alert] openqaworker8: Memory usage alert * action #73333: Failed systemd services alert (workers) flaky * action #73342: all jobs run on openqaworker8 incomplete: "Cache service status error from API: Minion job .*failed: .*(database disk image is malformed|not a database)":retry * action #73348: The worker openqaworker-arm-2.suse.de is down, so the OSD deployment fails * action #73360: merging MRs in openqa/salt-states-openqa does not trigger pipeline in master * action #73387: Cleanup of old needles from os-autoinst-needles-opensuse and os-autoinst-needles-sle * action #73399: [osd][alert] Failed systemd services alert (workers) - many alert messages * action #73405: job incompletes with "(?s)openqaworker8.*terminated prematurely.*OpenCV Error: Insufficient memory" * action #73483: ipmitool from openSUSE Leap 15.2 and TW to our openQA workers fails with "no matching cipher suite", works on login.suse.de (SLES15SP1) * action #73501: Bind mounts of fixed assets is racy * action #73633: OSD partially unresponsive, triggering 500 responses, spotty response visible in monitoring panels but no alert triggered (yet) * action #75016: [osd-admins][alert] Failed systemd services alert (workers): os-autoinst-openvswitch.service (and var-lib-openqa-share.mount) on openqaworker-arm-2 and others * action #75055: grenache-1 can't connect to webui's over IPv4 only * action #75067: save_needle minion task fails because "Your account has been blocked" * action #75220: all jobs run on openqaworker8 incomplete: "Cache service status error from API: Minion job .*failed: .*(database disk image is malformed|not a database)":retry * action #75235: container image devel:openQA:ci/base is unresolvable with "nothing provides this-is-only-for-build-envs needed by cmake-mini" * action #75238: Upgrade osd workers and openqa-monitor to openSUSE Leap 15.2 * action #75241: Upgrade o3 webUI host to openSUSE Leap 15.2 * action #75244: Upgrade osd webUI host to openSUSE Leap 15.2 * action #75259: 100% of powerpc tests incomplete auto_review:"(?s)Running on power8.*qemu-system-ppc64: Requested safe cache capability level not supported by kvm":retry * action #75274: [osd-admins][alert][learning] Failed systemd services alert (workers): os-autoinst-openvswitch.service aborts retries after 60s and is not easily configurable * action #75277: enable "postfix" service again after #73633 resolved * coordination #75400: [epic] Improve documentation about how to manage infrastructure * action #75403: salt "cheat sheet" for common commands to change/fix osd infrastructure * action #75445: unknown dashboards for "linux-fwcx" and "localhost" reappearing on monitor.qa * action #75448: OSD deployment fails because openqaworker-arm-3.suse.de is not connected * action #76774: auto-review/openqa-label-known-issues does not conclude within GitLab CI * action #76783: research how hostnames with systemd work and make them static for all OSD related machines * action #76786: Configure static hostnames with salt for all salt nodes * action #76804: openqaworker-arm-2 was detected as offline but no grafana webhook action was triggered * action #76822: Fix /results over-usage on osd (was: sudden increase in job group results for SLE 15 SP2 Incidents) * action #76828: big job queue for ppc as powerqaworker-qam-1.qa and malbec.arch and qa-power8-5-kvm were not active * action #76876: Find a better (automated) way to inform infra about hanging (arm) workers * action #76903: openQA test modules failure statistics collection * action #76924: Upgrade postgresql database version on o3 to default of Leap 15.2, i.e. postgres12 size:M * action #76927: Upgrade postgresql database version on osd to default of Leap 15.2, like on o3 size:M * action #76951: Check if new firmware for kerosene (aka. power8.o.o) exists and remove os-autoinst workarounds again when according machine settings are applied when necessary size:M * action #76987: re-encode some videos from existing results to save space * action #77017: Add "how to do rollback of deployment" to https://gitlab.suse.de/openqa/osd-deployment/ * action #77089: [osd][retrospective] multiple unattended alerts, unattended gitlab CI pipeline fails, all osd aarch64 workers offline * action #77101: fix selection of gitlab CI runners * action #77191: [easy][learning] ** PROBLEM Service Alert: openqaworker12.suse.de/Interface br1 is CRITICAL ** * action #77209: workers on o3 machine rebel provide no "WORKER_HOSTNAME" value anymore but it shows up in journal of worker service * action #77218: gitlab CI pipeline openqa/grafana-webhook-actions failed with "You have reached your pull rate limit" for docker hub but should not use docker hub at all * action #77827: asg-qe.maintenance reported problems about too aggressive asset cleanup - bumping some settings * action #77836: login to aarch64.o.o fails with ssh keys and password, also not working over IPMI SoL * action #77839: Use external videoencoder in production on one o3 worker * action #77842: Use external videoencoder in production on all o3 machines size:M * action #77845: Use external videoencoder in production on all osd machines size:M * action #77848: No more workarounds in OSD due to inefficient video encoder size:M * action #77887: [tools][openqa] Enable automatic openQA investigation jobs for osd as well * action #77890: [easy] Extend OSD storage space for "results" to make bug investigation and failure archeology easier * action #78010: unreliable reboots on openqaworker3, likely due do openqa_nvme_format (was: [alert] PROBLEM Host Alert: openqaworker3.suse.de is DOWN) * action #78058: [Alerting] Incomplete jobs of last 24h alert - again many incompletes due to corrupted cache, on openqaworker8 * action #78064: failing logrotate on monitor.qa.suse.de due to mariadb/mysql? * action #78127: follow-up to #73633 - lessons learned and suggestions * action #78165: infrastructure task: After osd deployment 2020-11-18 many jobs incomplete with auto_review:"Cache service (status error from API|.*error 500: Internal Server Error)":retry * coordination #78206: [epic] 2020-11-18 nbg power outage aftermath * action #78218: [openQA][worker] Almost all openQA workers become offline * action #78438: openQA webui entry "Assigned worker" shows ip instead of names as formerly - manual cleanup work * action #80106: corrupted worker cache sqlite: Enlarge systemd service kill timeout temporarily * action #80128: openqaworker-arm-2 fails to download from openqa * action #80166: IPMI SOL does not show me anything anymore during bootup until the linux getty login prompt, can anybody confirm? * action #80178: gitlab.suse.de CI shared runners are unable to resolve addresses within suse.de domain * action #80380: salt deploy often or always fails in test-webui and deploy due to postgres files not found * action #80398: [openQA] increase storage space for baremetal_support VM on qamaster * action #80408: revert longer timeout override for openQA services as we could not see less problems with corrupted worker cache * action #80538: flaky and misleading alerts about "openQA minion workers alert" as well as "Minion Jobs alert" * action #80542: Configure "automatic power-on" after power loss for openqaworker1 (and all others) * action #80544: Ensure that IPMI for powerqaworker-qam works reliably * action #80594: Needles are not pushed from o3 to github repo * action #80656: OSD deployment failed at 2020-12-02 because 'malbec.arch.suse.de' is down * action #80688: Upgrade IO firmware for powerqaworker-qam-1 * action #80734: GitLab pipeline trigger via Grafana fails due to TLS errors * action #80768: All workers in grenache-1 are broken at 2020-12-07 * action #80770: "host up" alerts alerting on Sunday around 03:10Z and all going back to ok 30m later, should be prevented * action #80812: Fix mail sending on o3 size:S * action #80834: [alert] osd reported 502 errors, unresponsive, failing alerts for CPU usage since 2020-12-08 0840Z and minion jobs failed * action #80956: update osd VM root disk VM config and resize root disk on osd * action #81020: QA-Power8-4-kvm start failed since reboot on 2020-12-13 * action #81026: many jobs incomplete with auto_review:"(?s)Running on openqaworker-arm-2.*failed: 521 Connect timeout.*Result: setup failure":retry * action #81046: openqaworker-arm-2.suse.de unreachable * action #81058: [tracker-ticket] Power machines can't find installed OS. Automatic reboots disabled for now * action #81198: [tracker-ticket] openqaworker-arm-{1..3} have network problems (cacheservice, OSD reachability). IPv6 disabled for now * action #81220: overdrive2.arch wants to be accepted with salt key on osd, what to do with this machine? * action #81228: many o3 workers not working on jobs as of 2020-12-21 (w4 seems to be ok) * action #81232: salt high state triggers multiple errors * action #81274: consider disabling emergency mode on our machines * action #81884: openqa-webui should automatically restart on config updates * action #87838: *alert* osd: Open database connections by user alert * action #87856: [alert] openqaworker3.suse.de/NRPE is CRITICAL * action #87883: osd infrastructure: services like "telegraf" are not enabled to start immediately on boot * action #87970: tumbleweed container images in gitlab CI fail on salt calls, e.g. "KeyError: 'cmd.run_all'" * action #87979: Requirements for access to the openqa VM * action #88189: Firmware upgrade of qa-power8-4.qa.suse.de * action #88191: openqaworker2 boot ends in emergency shell * action #88225: osd infrastructure: Many failed systemd services on various machines * action #88385: openqaworker3 host up alert is flaky * action #88450: Flaky NTP offset alert * action #88474: All workers on powerqaworker-qam-1 are offline * action #88546: Make use of the new "Storage Server", e.g. complete OSD backup * action #88807: Open vSwitch command 'set_vlan' with arguments 'tap41 13' failed: 'tap41' is not connected to bridge 'br1' at /usr/lib/os-autoinst/backend/qemu.pm line 152. * action #88900: openqaworker13 was unreachable * action #88912: Assets cleaned up too early for Tumbleweed aarch64 * action #89047: Failed to commit needles, gitlab account blocked 2021-02-24 * action #89050: OSD deployment blocked since two weeks - salt nodes failing * action #89113: [alert] PROBLEM Service Alert: openqa.suse.de/NTP Time is CRITICAL * action #89275: Flaky Incomplete jobs (not restarted) of last 24h alert and New incompletes alert triggered and back to OK * action #89419: Incomplete jobs after OSD deployment * action #89497: flaky Failed systemd services alert (except openqa.suse.de) * action #89551: NFS mount fails after boot (reproducible on some OSD workers) * action #89815: osd-deployment blocked by openqaworker-arm-3 offline and not recovered automatically * action #89821: alert: PROBLEM Service Alert: openqa.suse.de/fs_/srv is WARNING (flaky, partial recovery with OK messages) * action #89993: OSD deployment rollback failed finding "before" and "osd_deployment_rpm_q" files * action #90161: [Alerting] malbec: Memory usage alert triggered briefly and turned OK within the next minute * action #90170: Service for purging old kernels might run while system management is locked and fail * action #90275: Replacement openQA OSD aarch64 hardware (was: Dedicated non-rpi aarch64 hardware for manual testing) * action #90629: administration of the new "Storage Server" * action #90635: NTP alerts coinciding with reboots of ppc64le host * action #90746: OSD deployment fails at 2021-04-07 because 'storage.qa.suse.de: Minion did not return' * action #90755: aarch64 openQA jobs on osd not properly processed since 2021-04-05 * action #90857: Add redundancy for SAP multi machines tests - Extend openQA worker config to accomodate for upgraded RAM * action #90875: powerqaworker-qam-1 is online but https://monitor.qa.suse.de/d/4KkGdvvZk/osd-status-overview shows "No Data" * action #90920: Restore IPMI access to malbec.arch.suse.de * action #90968: [alert] Multiple flaky incomplete job alerts on Sunday * action #91530: Severe performance problems on malbec * action #91779: Add monitoring for storage.qa.suse.de * action #91803: openqaworker8 failed to reboot: [Alerting] openqaworker8: host up alert on 2021-04-25 * action #92034: Re-enable openqa-investigate options after the black certificate now only shows properly "reviewed" jobs * action #92086: http://monitor.qa should redirect to https://monitor.qa.suse.de * action #92110: Several Job age (scheduled) alerts on Sunday * action #92113: [Alerting] openqaworker-arm-3: NTP offset alert * action #92167: remove workaround for PowerPC workers after boo#1174166 fixed and updates available * action #92176: [alert] openqaworker-arm-3 offline and CI pipeline unable to send email but stating "passed" * action #92185: Our PowerPC worker machines need to be configured for automatic start after a power outage * action #92302: NFS mount var-lib-openqa-share.mount often fails after boot of some workers * action #92338: [Alerting] File systems alert, / on osd * action #92467: Unit has `iscsid.socket` failed on some OSD workers since today's nightly reboot * action #92701: backup of etc/ from both o3 was not working since some days due to OOM on backup.qa.suse.de (was: … and osd not updated anymore since 2019) * action #92770: openqa.opensuse.org down, o3 VM reachable, no failed service * action #92915: OSD deployment fails at 2021-05-21 because ' openqaworker-arm-2.suse.de Minion did not return' * action #92969: Failing service os-autoinst-openvswitch after boot of some workers * action #92978: salt failed to write files on grenache-1.qa.suse.de 'workers.ini': Operation not permitted * action #92996: OSD deployment fails at 2021-05-24 because ' openqaworker-arm-3.suse.de Minion did not return' * action #93050: Proposal: Use openqaworker11 and openqaworker12 as normal workers and only pull out from production when necessary * action #93071: o3 s390x worker machines unreachable * action #93195: [Alerting] Failed systemd services alert (except openqa.suse.de) on 2021-05-28, logrotate.service on openqaworker-arm-1 * action #93381: [O3]request to add an IPMI SUT to O3 size:M * action #93612: Several unhandled alerts triggered regarding incompletes and running out of space * action #93650: alert: PROBLEM Service Alert: openqa.suse.de/fs_/assets is WARNING * action #93683: osd-deployment failed due to storage.qa.suse.de not reachable by salt * action #93919: salt gitlab CI pipeline fails applying openqa-webui restart state with "The following requisites were not found: watch: file: /etc/openqa/openqa.ini" * action #93922: grafana dashboard for "approximate result size by job group" fails to render any data with "InfluxDB Error: unsupported mean iterator type: *query.stringInterruptIterator" * action #93961: Add redundancy for SAP multi machines tests - extend RAM on machines * action #93964: salt-states CI pipeline deploy step fails on some workers with "Unable to unmount /var/lib/openqa/share: umount.nfs: /var/lib/openqa/share: device is busy." * action #94015: proper backup for osd * action #94237: No alert about too many scheduled tests size:S * action #94312: [Alerting] web UI: Too many Minion job failures alert - likely due to openqa-client declared deprecated * action #94318: http://jenkins.qa.suse.de not reachable * action #94438: OSD deployment fails at 2021-06-21 because ' openqaworker (arm-3 and arm-2) Minion did not return' * action #94456: no data from any arm host on https://stats.openqa-monitor.qa.suse.de/d/1bNU0StZz/automatic-actions?orgId=1 * action #94492: Configure retention/downsampling policy for monitoring data stored within InfluxDB size:M * action #94513: openqaworker-arm-3 not reachable and not recoverable over usual ways * action #94555: backup.qa.suse.de needs to be upgraded to Leap 15.2 (size:S) * action #94576: alert: PROBLEM Service Alert: openqa.suse.de/fs_/results is WARNING * action #94747: broken RPM database on openqaworker-arm-* during osd deployment * action #94919: All arm workers down 2021-06-30 , NUE SRV2 Rack A8 was switched off by EngInfra size:S * action #94940: multiple network related problems, gitlab CI pipelines not working, workers not reachable, proxySCC not reachable * action #94949: Failed systemd services alert for openqaworker3 var-lib-openqa-share.automount * action #94994: Tracebacks in the journal for salt-minion size:M * action #95140: regression: salt state fails on one worker with "Passed invalid arguments to state.highstate: expected str, bytes or os.PathLike object, not list" size:M * action #95167: Bring openqaworker12 into production including multi-machine test support * action #95293: Monitoring alerts on errors in logs on o3 (was: followup to: error on "Next & previous results": ajax error message and no results showing up) size:M * action #95443: Variants of Job age (scheduled) alerts on Grafana on Sunday and Monday size:S * action #95482: openqaworker-arm-3 offline and not automatically recovered due to gitlab CI failures * action #95980: "no data" alert for "Incomplete jobs (not restarted)" but there should be no alert for "no data" size:S * action #95983: alert about "minion workers", alert triggered two times and turned green again * action #96089: alert about "minion workers" - make meaning of grafana panel clear size:S * action #96242: [alert] Disk I/O time for /dev/vde (/space-slow) alert 2021-07-28 size:M * action #96269: Define what a "complete OSD backup" should or can include * action #96272: Test failed incomplete: "Reason: backend died: QEMU exited unexpectedly, see log for details" size:M * action #96380: "minion workers" alert shows <1 total minion workers if active == 1 size:M * coordination #96447: [epic] Failed systemd services and job age alerts * action #96551: Persistent records of systemd journal size:S * action #96554: Mitigate on-going disk I/O alerts size:M * action #96618: The machine is unable to boot from PXE in NUE QA network due to TFTP open timeout * action #96710: Error `Can't call method "write" on an undefined value` shows up in worker log leading to incompletes * action #96713: Slow grep in openqa-label-known-issues leads to high CPU usage * action #96719: recover imagetester with broken filesystem/hardware (was: automatic updates on imagetester don't work and it failed to come up after reboot) * action #96789: File systems alert 90.256 assets used size:M * action #96795: CPU Load alert and telegraf going between 41%, 98.5% and 115% CPU * action #96807: Web UI is slow and Apache Response Time alert got triggered * action #96938: openqaworker10+13 are offline, reason unknown, let's fix other problems first size:M * action #96947: Availability of openQA webUI is above 100% size:S * coordination #96974: [epic] Improve/reconsider thresholds for skipping cleanup * action #97043: job queue hitting new record 14k jobs * action #97136: [alert] multiple unhandled alerts about "broken workers" size:M * action #97139: [alert] multiple unhandled alerts about "malbec: Memory usage alert" size:M * action #97244: openqaworker-arm-3 is offline and EngInfra wants us to create JiraSD tickets instead of infra size:M * action #97364: openqaworker-arm-2 and openqaworker-arm-3 seem to be offline, alerts had been triggered size:S * action #97382: ARM automatic reboot pipeline does not fail if ipmitool fails size:S * action #97406: gitlab emails to osd-admins@suse.de are not accepted as the ML is not in "To:" field but CC/BCC * action #97415: handle request about machines in SRV1 from SUSE-IT/Accenture to fill a spreadsheet with system properties * action #97418: Pipeline in salt-states-openqa can fail occasionally size:M * action #97502: osd deployment failed due to openqaworker-arm-3 being down, needs to be worked around size:M * action #97574: deployment failed in gitlab job with "ERROR: Job failed: execution took longer than 1h0m0s seconds" * action #97583: [spike] Configure retention/downsampling policy for monitoring data stored within InfluxDB on newer influxdb version * action #97658: many (maybe all) jobs on rebel within o3 run into timeout_exceeded "setup exceeded MAX_SETUP_TIME" size:M * action #97751: replacement setup for o3 s390x openQA workers size:M * action #97862: More openQA worker hardware for OSD size:M * action #97943: Increase number of CPU cores on OSD VM due to high usage size:S * action #98243: salt-states-openqa pipeline/job "test-worker" fails because zypper reports an error size:M * action #98307: Many jobs in o3 fail with timeout_exceeded on openqaworker1 auto_review:"timeout: setup exceeded MAX_SETUP_TIME":retry size:M * action #98499: [alert] web UI: Too many Minion job failures alert size:S * action #98922: Run asset cleanup concurrently to results based on config * action #98979: monitor-post-deployment failed while arm3 was being rebooted by our automatic recovery * action #99045: osd deployment fails due to zypper package resolution problem * action #99117: malbec 🍷️ is not reachable via ssh or ipmi * coordination #99183: [epic] Upgrade all our infrastructure, e.g. o3+osd workers+webui, to openSUSE Leap 15.3 * action #99189: Upgrade o3 workers to openSUSE Leap 15.3 size:M * action #99192: Upgrade osd workers and openqa-monitor to openSUSE Leap 15.3 size:M * action #99195: Upgrade o3 webUI host to openSUSE Leap 15.3 size:M * action #99198: Upgrade osd webUI host to openSUSE Leap 15.3 size:M * action #99201: Upgrade postgresql database version on o3 to default of Leap 15.3, i.e. postgres14 size:M * action #99204: Upgrade postgresql database version on osd to default of Leap 15.3, like on o3 * action #99240: Upgrade CI container image versions to Leap 15.3 size:M * action #99243: Upgrade qam hosts maintained by us to latest stable, i.e. Leap 15.3 * action #99288: [Alerting] openqaworker-arm-5 and openqaworker-arm-4: host up alert on 2021-09-26 size:M * action #99333: qa-maintenance/openQABot CI job fails after max retries to reach germ159.suse.cz * action #99741: Minion jobs for job hooks failed silently on o3 size:M * action #100712: Investigate what broke git checkouts on o3 * action #100859: investigate how to optimize /srv data utilization on OSD size:S * action #100862: Failed systemd services alert: session-50177.scope failed - but why? * action #101006: Provide unique non-dictionary passwords for all our IPMI/HMC interfaces size:S * action #101033: openqaworker13: Too many Minion job failures alert - sqlite failed: database is locked size:M * action #101166: o3 was not auto-updated lately, due to unresolvables in devel:openQA for Leap 15.3 * action #101271: Try Kernel:stable on arm4+arm5 and compare failure rate size:M * action #101373: worker openqa-aarch64 fails on cache * action #101962: update BMC/IPMI firmware on openQA hosts * action #102062: check state of packages fails with test '' = clean * action #102143: o3 ran out of disk space * coordination #102266: [epic] o3 ran out of disk space * action #102284: openQABot pipeline failed with simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0) * action #102575: Prevent false-positive ticket reporting for openqaworker-arm-3 * action #102650: Organize labs move to new building and SRV2 size:M * action #102713: Implement proper backups for o3 - list of installed packages * action #102716: Disable job triggering during the weekly SUSE IT maintenance window * coordination #102882: [epic] All OSD PPC64LE workers except malbec appear to have horribly broken cache service * action #102942: Failed systemd services alert: snapper-cleanup on QA-Power8-4-kvm fails size:M * action #103128: gitlab CI pipelines sporadically fail with "Could not resolve host: gitlab.suse.de", e.g. Recovery pipelines for ARM workers might fail during the maintenance window size:M * action #103149: Salt the dehydrated setup * action #103518: opencv update broke os-autoinst (was: Tests on Raspberry Pi 2/3/4 are broken) * action #103524: OW1: performance loss size:M * action #103530: failed systemd services alert - openqaworker-arm-3 - ovsdb-server size:M * action #103539: Update expired SSL certificate on monitor.qa.suse.de with dehydrated and salt, same as on OSD size:M * action #103554: o3 s390x worker instances 102+103 down whereas 101+104 are up * action #103575: [virtualization][3rd party hypervisor] Worker openqaw8-vmware.qa.suse.de is not reachable * action #103602: Reply to gitlab@suse.de pipeline email fails silently * action #103683: [tools][sle][x86_64][aarch64][QEMUTPM] install package "swtpm" on x86_64 and aarch64 workers * action #103736: Make aarch64 machine chan-1 and chow up and running after it is broken size:M * action #103848: No permission to view SD tickets filed for broken arm workers * action #103954: Run asset cleanup concurrently to results based on config on o3 as well * action #104085: openQABot pipeline failed with terminating connection due to administrator command size:S * action #104088: bot-ng pipeline consistently fails with KeyError: 'test_issues' * action #104106: [qe-core] test fails in await_install - Network peformace for ppc installations is decreasing size:S * action #104142: osd-deployment pipeline failed: File ... not found on medium * action #104172: osd service ca-certificates failed with "p11-kit: couldn't complete writing of file: /var/lib/ca-certificates/ca-bundle.pem.tmp: File exists" * action #104217: Ask eng infra why thruk.suse.de stopped working * action #104304: Crosscheck results of https://github.com/os-autoinst/os-autoinst#verifying-a-runtime-environment on arm-1/2/3 vs. arm-4/5 to find out if arm-4/5 are "typing stable" size:M * action #104344: Inconsistent systemd default target in OSD infrastructure * action #104347: Lost salt connection from some hosts in OSD infrastructure due to no route to host "salt" * action #104607: Verify fix for https://bugzilla.suse.com/show_bug.cgi?id=1192126 regarding QEMU+OVMF, ensure updated packages on o3 size:M * action #104673: Access to o3 workers is not well-documented and not automated * action #104835: Failed systemd service "systemd-journal-flush" on OSD * action #104869: Packages not building on Leap 15.2 and SLE15-SP3 * action #104965: openqaworker10 restarted into maintenance mode - reason unknown * action #104967: File systems alert repeatedly triggering on and off * action #104970: Add two OSD workers (openqaworker14+openqaworker15) specifically for sap-application testing size:M * action #104992: [virtualization][3rd party hypervisor] vmware test run failed with credential errors * action #105169: Pipeline of openQABot project fails with "urllib.error.HTTPError: HTTP Error 503: Service Unavailable" causing alert/notification * action #105274: Workers constantly logging websocket connection "finished by remote side with code 1006, no reason" * action #105373: Ask to increase OSD /srv so that we can save enough logs+DB * action #105594: Two new machines for OSD and o3, meant for bare-metal virtualization size:M * action #105603: openQABot pipeline failed: "ERROR:root:Something bad happended during reading MR data from SMELT/IBS: Expecting value: line 4 column 1 (char 3)" size:M * action #105618: [Alerting] CPU Load alert size:S * action #105621: [Alerting] Failed systemd services alert * action #105828: 4-7 logreport emails a day cause alert fatigue size:M * action #105960: Dehydrated fails on OSD size:M * action #106017: [ipmi backend]VNC stalled, no update for 5.38 seconds * action #106035: [qe-tools] dehydrated service fails on osd * action #106365: Improve security for OSD worker credentials broke Gitlab CI/CD deploy of salt in OSD size:M * action #106538: lessons learned "five whys" for "All OSD PPC64LE workers except malbec appear to have horribly broken cache service" size:S * action #106540: Mitigate/resolve All OSD PPC64LE workers except malbec appear to have horribly broken cache service * action #106543: Conduct rollback steps and check impact for "All OSD PPC64LE workers except malbec appear to have horribly broken cache service" size:M * action #106594: [tools] openqaworker-arm-3 periodically fails os-autoinst-openvswitch service * action #106598: Redcurrant has a broken HDD * action #106607: GitHub review notifications are sent to osd-admins+os-autoinst-obs@suse.de * action #106666: Improve worker startup in our salt states or "openqa-worker-auto-restart repeatedly failing on grenache-1.qa.suse.de" * action #106751: Update machines and passwords in the monitor-o3 repository * action #106753: openqa-worker-auto-restart repeatedly failing on grenache-1.qa.suse.de * action #106771: imagetester missing in action * action #106832: Monitor masked units on our infrastructure * action #106846: Junk messages posted on osd-admins@suse.de * action #106880: Job template name ... is already used in job group error logged on o3 size:M * action #106919: openqa_install+publish fails in start_test size:M * action #106925: [timeboxed:10h][research] What are best practices and options in the salt and GitLab community to handle secrets * action #106933: Use PSU capabilites to power cycle openqaworker-arm-[1-3] instead of infra tickets size:M * action #107017: Random asset download (cache service) failures on openqaworker1 * action #107074: error on openqaworker-arm-2 failing osd-deployment size:M * action #107077: gitlab.suse.de/openqa/scripts CI checks fail due to xargs missing, update or decommission * action #107083: SUSE QE Tools team must learn about switch administration and get access * action #107086: Ask for volunteers in SUSE QE Tools that would be able to visit the Nbg server rooms, e.g. as second person accompanying nsinger or any potential new admin * action #107089: Make SUSE QE Tools team aware that we need to support EngInfra due to limited capacity * action #107152: [osd] failing systemd services on "grenache-1": "openqa-reload-worker-auto-restart@10, openqa-reload-worker-auto-restart@21, openqa-reload-worker-auto-restart@22, openqa-reload-worker-auto-restart@23, openqa-reload-worker-auto-restart@25, …" size:M * action #107158: [osd] failing systemd service on "storage": "systemd-udev-settle" size:M * action #107257: [alert][osd] Apache Response Time alert size:M * action #107437: [alert] Recurring "no data" alerts with only few minutes of outages since SUSE Nbg QA labs move size:M * action #107515: [Alerting] web UI: Too many Minion job failures alert size:S * action #107638: OSD deployment failed due to openqaworker-arm-1 with "rpmdb2solv: inconsistent rpm database" despite our repair attempts * action #107875: [alert][osd] Apache Response Time alert size:M * action #107917: Recovery of imagetester via IPMI failed size:M * action #107932: Handling broken RPM databases does not handle certain cases * action #107989: CPU-specific worker classes * action #108401: Can't call method "id" on an undefined value at V1/Job.pm * action #108533: o3 logreports DBI Exception: DBD::Pg::st execute failed: ERROR: invalid input syntax for type integer * action #108665: openqa_from_containers repeatedly failing at build * action #108668: Failed systemd services alert (except openqa.suse.de) for < 60 min * action #108671: Resilient IPMI recovery of o3 machines in monitor-o3 size:M * action #108740: qa-power8-5-kvm minions alert is heart-broken 💔️ * action #108743: qa-power8-5-kvm minions alert is heart-broken * action #108845: Network performance problems, DNS, DHCP, within SUSE QA network auto_review:"(Error connecting to VNC server.*qa.suse.*Connection timed out|ipmitool.*qa.suse.*Unable to establish)":retry but also other symptoms size:M * action #108896: [ppc64le] auto_review:"(?s)Size of.*differs, expected.*but downloaded.*Download.*failed: 521 Connect timeout":retry * action #108998: openqa_install+publish fails in start_test * action #109028: [openqa][worker][sut] Very severe stability and connectivity issues of openqa workers and suts * action #109052: openqa_install+publish fails in dashboard * action #109055: Broken workers alert * action #109241: Prefer to use domain names rather than IPv4 in salt pillars size:M * action #109253: Add monitoring for SUSE QA network infrastructure size:M * action #109298: salt-minion on grenache-1 is not working preventing OSD deployment * action #109301: openqaworker14 + openqaworker15 sporadically get stuck on boot * action #109494: Restore network connection of arm-4/5 size:M * action #109746: Improve QA related server room management, consistent naming and tagging size:M * action #109969: s390zp19 - out of disk space * action #109971: Service `systemd-journal-flush` timed out on OSD size:M * action #110170: Request for s390x worker class with SIE support * action #110266: [alert] Disk I/O time for sr0 size:S * action #110269: [alert] QA-Power8-4-kvm + QA-Power8-5-kvm: Disk I/O time alert size:M * action #110293: Can not ping s.qa.suse.de * action #110296: Machines within .qa.suse.de unavailable (was: Some ipmi workers become offline which affects PublicRC-202204 candidate Build137.1 test run) * action #110301: alert about o3: ** PROBLEM Service Alert: ariel-opensuse.suse.de/swap is CRITICAL ** size:S * action #110494: alert: openqaworker5 host up size:M * action #110521: Improve QA related server room management, network topology and configuration size:M * action #110539: Ask OBS team if they would like to swap ARM workers with us * action #110920: Emails from o3 are rejected by mx2.suse.de for certain sender/recipients size:S * action #111063: Ping monitoring for our s390z mainframes size:S * action #111149: Recover openqaworker-arm-3 * action #111156: No effect of remote PDU controls on openqaworker-arm-4.qa and openqaworker-arm-5.qa, check power connections * action #111159: Check if we can give away our left-over 10G cisco switch * action #111171: Handle installation of new FC switch size:M * action #111440: Avoid OSD deployment monitoring to fail due to WIP dashboard alerts * action #111473: Get replacements for imagetester and openqaworker1 size:M * action #111578: Recover openqaworker-arm-4/5 after "bricking" in #110545 size:M * action #111755: power8 fails to execute jobs successfully, no kvm, but also no sshd auto_review:"(?s)power8.*no kvm-img/qemu-img found":retry * action #111758: o3 jobs exceeding MAX_SETUP_TIME auto_review:"(?s)openqaworker4.*timeout: setup exceeded MAX_SETUP_TIME":retry size:M * action #111863: Upgrade o3 workers to openSUSE Leap 15.4 size:M * action #111866: Upgrade osd workers and openqa-monitor to openSUSE Leap 15.4 * action #111869: Upgrade o3 webUI host to openSUSE Leap 15.4 size:S * action #111872: Upgrade osd webUI host to openSUSE Leap 15.4 * action #111884: Upgrade qam hosts maintained by us to latest stable, i.e. Leap 15.4 size:M * action #111926: osd-deployment pipeline failed: test 481 -le 0, due to job age alert, likely just the raspberry pi based tests stuck in schedule * action #111986: Ensure uno.openqanet.opensuse.org is properly used * action #112193: [alert][osd] web UI: Too many Minion job failures alert size:S * action #112196: [alert][sporadic] QA-Power8-4-kvm: Disk I/O time alert size:M * action #112346: [alert] multiple alerts about "Download rate" and "Job age" on OSD 2022-06-12 size:M * action #112553: [osd][amd][zen3][network][sriov] New AMD Zen3 machine on OSD lost its nework connection with p3p1 interface * action #112583: [alert] "Job age" on OSD 2022-06-12 * action #112673: Cannot send emails from o3 size:M * coordination #112718: [alert][osd] openqa.suse.de is not reachable anymore, response times > 30s, multiple alerts over the weekend * action #112733: Webui Summary dashboard in Grafana is missing I/O panels size:M * action #112781: Connect and switch on autobot for Anton * action #112835: All "developers" in progress QA project hierarchy should be able to change the status of "openqa-force-result" tickets same as for "action" * action #112916: postgresql.conf is invalid after recent salt changes size:M * action #113366: Add three more Prague located OSD workers size:M * action #113477: Get replacements for o3+osd top of rack switch * action #113498: [alert] QA network infrastructure Ping time alert for several hosts * action #113561: failed pipelines for openQABot and bot-ng because of an expired cert * action #113662: Ensure upgrade of vanilla Leap 15.3 os-autoinst can be automatically upgraded to Leap 15.4 size:S * action #113671: [timeboxed][10h] Configure write of I/O panels to be on the negative Y-axis again once we're on grafana 8.4 size:S * coordination #113674: [epic] Configure I/O alerts again for the webui after migrating to the "unified alerting" in grafana size:M * action #113746: monitoring: The grafana "ping time" panel does not list all hosts size:S * action #114397: glibc regression causes cron to crash * action #114448: malbec.arch.suse.de is unreachable via IPMI * action #114484: Failed service "transactional-update.service" on openqaworker4 * action #114526: recover openqaworker14 * action #114565: recover qa-power8-4+qa-power8-5 size:M * action #114586: fix openqaworker-arm-1+2+3 recovery pipeline (was: likely stuck in reboot loop) size:M * action #114685: powerqaworker-qam-1 seems to have just gone unresponsive due to unknown reason * action #114697: What are orion and andromeda.o.o * action #114733: openqaworker-arm-3 not consistently reachable * action #114802: Handle "QA network infrastructure Package loss alert" introduced by #113746 size:M * action #114817: salt-states-openqa deployment failed due to SUSE ca issue on openqaworker-arm-2 * action #114908: [tools] https://stats.openqa-monitor.qa.suse.de not responding * action #114914: Container devel/openqa/containers/isotovideo:qemu-x86 is outdated * action #114923: We lost multi-machine capabilities within o3 due to openqaworker1 being replaced * action #114941: repository names inconsistent on o3 workers hence continuous update failed to update as "devel_openQA" as repo name is not found everywhere * action #115094: [tools] test fails in bootloader_start: redcurrant is down size:M * action #115208: failed-systemd-services: logrotate-openqa alerting on and off size:M * action #115226: Use ext4 (instead of ext2) for /var/lib/openqa on qa-power8 workers * action #115418: Setup ow19+20 to be able to run MM tests size:M * action #115484: [alert] OSD deployment failed on 18.08.22 size:M * action #115547: openqaworker20 fails to boot, broken hardware size:M * action #115553: salt-states-openqa pipeline failed: Authentication failed for 'https://gitlab.suse.de/openqa/salt-pillars-openqa/' * action #115580: Reason: abandoned: associated worker openqaworker3:13 re-connected but abandoned the job size:M * action #115733: bot-ng pipeline fails because of empty dictionary in data returned by smelt size:M * action #116060: Recover openqaworker-arm-1 size:M * action #116078: Recover o3 worker kerosene formerly known as power8, restore IPMI access size:M * action #116113: salt responses timing out some of the time size:M * action #116344: openqaw9-hyperv.qa.suse.de (flexo.qa.suse.cz) can not be reached size:M * action #116377: openQABot pipeline failing with KeyError: 'project' * action #116437: Recover qa-power8-5 size:M * action #116494: Too many Minion job failures alert because needle-pusher is blocked on GitLab * action #116563: `salt storage\* test.ping` times out * action #116566: salt-states-openqa: Failed pipeline for master * action #116689: Do not rely on statically configured IPv4 addresses for the salt master in /etc/hosts size:S * action #116722: openqa.suse.de is not reachable 2022-09-18, no ping response, postgreSQL OOM and kernel panics size:M * action #116740: [alert] openqaworker14: host up alert * action #116743: [alert] QA-Power8-5-kvm: host up alert * action #116746: [alert] openqaworker9: host up alert * action #116752: [alert] powerqaworker-qam-1: host up alert * action #116758: Help with adding monitoring for the SLE maintenance update queue size:M * action #116782: o3 s390 workers are offline * action #116794: Bring back grenache.qa.suse.de + grenache-1.qa.suse.de * action #116845: salt-states-openqa CI complains about "fatal: Authentication failed for 'https://gitlab.suse.de/openqa/salt-pillars-openqa/'" but no useful hint size:M * action #116848: Ensure kdump is enabled and working on all OSD machines * action #116911: [openQA][needle] Can not commit new needle for test suite on openqa.suse.de * action #117172: Flaky alert about infrastructure packet loss * action #117205: Some boot_from_pxe failed from assigned worker: grenache-1:17 (kermit) and also openqaworker2:17 (quinn) size:M * action #117229: [tools] openqa failing on worker QA-Power8-5-kvm * action #117262: [alert] failed systemd service: ca-certificates on openqa.suse.de, "p11-kit: couldn't complete writing of file: /var/lib/ca-certificates/ca-bundle.pem.tmp: Unknown error 17" size:M * coordination #117268: [epic] Handle reduced PowerPC ressources * action #117526: [alert] Dehydrated fails again on OSD * action #117580: web interface of qanet02nue.qa.suse.de and qanet03nue.qa.suse.de can not be reached over neither IPv4 nor IPv6 size:M * action #117625: Fix IPMI connection to openqaworker1.suse.de size:M * action #117631: Failed systemd service transactional-update on openqaworker1 - system is no longer reachable after reboot size:M * action #118024: Ensure all PPC workers are upgraded after kernel regression resolved size:M * coordination #118642: [epic] PPC testing capabilities in openqa.opensuse.org * action #118885: No cable connection or link to the 2nd network card in amd-zen3-gpu-sut1-1 * action #119179: [openQA][iso] Fail to mount nfs share openqa.suse.de:/var/lib/openqa/share/factory/iso/fixed * action #119215: [openQA][repos][aarch64][15-SP5[Full Media] 15-SP5 Full media is missing from http://openqa.suse.de/assets/repo/ * action #119290: [alert] Packet loss between worker hosts and other hosts alert * action #119488: The QEM dashboard is empty * action #119521: [alert] jenkins: partitions usage (%) alert * action #119557: [qem-dashboard] Show a link to Smelt if there are no active incidents size:S * action #119767: Failed pipeline for "openqa-worker" in salt-states-openqa size:M * action #119938: openQABot | Failed pipeline for master | 9b1b1857: ERROR: Connection error during reading from IBS: HTTP Error 403: Forbidden size:M * action #120004: [alert] Host powerqaworker-qam-1.qa.suse.de is down size:M * action #120007: [alert] Many systemd alerts triggered on 06.11.22 size:S * action #120025: [openQA][ipmi][worker] Worker host hostname changed and broken networking connection * action #120073: [qem-dashboard] Loading incident... indefinitely for non-existing incident * action #120112: worker worker2.oqa.suse.de auto_review:"Error connecting to : Connection timed out":retry size:M * action #120163: Use salt grains instead of manually specifying IPs in "bridge_ip" size:M * action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meow * action #120270: Conduct the migration of SUSE openQA systems IPMI from Nbg SRV1 to new security zones size:M * action #120339: QEMU DNS fails to resolve openqa.suse.de via IP address * action #120372: OSD deployment fails presumably when handling the changelog * action #120441: OSD parallel jobs failed with "get_job_autoinst_url: No worker info for job xxx available" size:meow * coordination #120522: [epic] Support upgrade SUSE bugzilla instance * action #120564: [qem-dashboard] Assets are not cleaned up on deployment * action #120615: [alert] "Packet loss between worker hosts and other hosts" i.e. between osd and worker12.oqa.suse.de and worker5.oqa.suse.de * action #120648: [alert] qa-power8-4-kvm host up alert * action #120675: openqaworker-arm-1 not bootable via IPMI * action #120744: [alert] QA-Power8-5-kvm: Too many Minion job failures alert size:M * action #120780: [Alerting] InfluxDB not reachable, turned ok after some minutes * action #120783: [Alerting] failed systemd service on worker11, os-autoinst-openvswitch. Failed at system boot, turned ok after some hours size:M * action #120807: [alert] openqa.suse.de - worker12.oqa.suse.de 100% packet loss due to outdated AAAA record * action #120813: [alert] Salt states fail to apply on `scriptgen.py` invocation * action #120874: openqa_service VM is down size:M * action #120886: "PXE-E32: TFTP open timeout" on ipmi machines in O3 size:S * action #120921: [alert] Salt states fail to compile with "Rendering SLS 'base:openqa.openvswitch' failed: Jinja error: argument of type 'NoneType' is not iterable" size:M * action #120939: [alert] Pipeline for scheduling incidents runs into timeout size:M * action #120967: [alert] o3 webUI down due to regression from https://github.com/os-autoinst/openQA/pull/4922 * action #120973: [qem-dashboard] 500 internal server errors reported by qem-bot * action #121282: Recover storage.qa.suse.de size:S * action #121594: Extend OSD storage space for "results" to make bug investigation and failure archeology easier - 2022 * action #121771: openqaworker20 has no heartbeat * action #121789: MultiMachine tests lose ability to communicate * action #121816: Cannot access installation media on updates.suse.com - maintenance tests broken size:S * action #122158: [alert] qa-power8-4-kvm host up alert - machine not up, nothing obvious on SoL but IPMI works size:M * action #122302: Support SD-105827 "PowerPC often fails to boot from network with 'error: time out opening'" * action #122575: [alert] jenkins host up alert * action #122653: Ask SUSE-IT network admins to REJECT packets instead of DROP so that we get more clear results size:S * action #122656: Ask SUSE-IT network admins to *not* block this traffic which we need for tests regarding s390x within SUSE network size:M * action #122743: Rotate secrets used in CircleCI as per recommendation * action #122746: Publishing opensuse.openqa.job.done failed: SSL connect attempt failed * action #122842: Configure I/O alerts again for the webui after migrating to the "unified alerting" in grafana size:M * action #122845: Migrate our Grafana setup to "unified alerting" * action #122848: Configure grouped alerts in Grafana correctly size:M * action #122983: [alert] openqa/monitor-o3 failing because openqaworker1 is down size:M * action #122998: o3 worker rebel is down; was: inconsistent package database or filesystem corruption size:M * action #123004: Downgrade kernel on o3+osd x86_64 machines as workaround for boo#1206616 size:M * action #123025: o3 worker openqaworker4 is down; boots to emergency shell only * action #123028: A/C broken in TAM lab size:M * action #123082: backup of o3 to storage.qa.suse.de was not conducted by rsnapshot since 2021-12 size:M * action #123151: [Alerting] Failed systemd services alert * action #123232: [Alerting] failed pipelines for openQABot/ bot-ng/ os-autoinst-needles-opensuse-mirror on gitlab staging instance size:S * action #123382: repurpose openqaworker-arm-3 as baremetal worker * action #123421: [alert] dehydrated is failing on monitor.qa.suse.de * action #123490: o3 logreports DBI Exception: DBD::Pg::st execute failed: ERROR: invalid input syntax for type bigint size:M * action #123493: [Alerting] Failed systemd services alert * action #123825: Ensure proper o3 monitoring after shutdown of thruk/icinga by SUSE-IT Eng-Infra * action #123933: [worker][ipmi][bmc] Some worker can not be reached via BMC * action #123984: [boot][pxe][sut] Machine fozzie can not boot from pxe * action #124119: Conduct the migration of remaining SUSE openQA systems IPMI to new security zones * action #124146: [alert] Incomplete jobs (not restarted) of last 24h * action #124391: test fails in bootloader_start - The command server of powerhmc1 cannot be connected size:M * action #124398: salt gitlab CI jobs fail due to exceeding log length of 500kB size:M * action #124412: [alert] logrotate services failed on openqa-piworker.qa.suse.de and OSD size:M * action #124655: [openQA][infra][pxe] Physical SUT machine can not boot from pxe and mismatch hostname * action #124658: [osd-deployment] missing os-autoinst changes in deployment and changelog emails , build failure on ppc64le in OBS size:M * action #124661: [qe-tools] tftp server and directory mount issue on qanet.qa * action #124685: [qe-tools] Make sure Power8 and Power9 machines can be used with *any* usable HMC (was: move to new powerhmc3.arch.suse.de) size:M * action #124715: Failing pipelines because of unreachable machine openqaworker-arm-1 * action #124877: Failing pipelines because of unreachable machine openqaworker-arm-1 * action #125132: [alert] logrotate failed on OSD * action #125207: worker11 host up alert - similar as for worker13 * action #125210: worker13 host up alert - kernel crash size:M * action #125213: Failed systemd services alert due do crash dumps on worker11 and worker13 (except openqa.suse.de) * action #125216: Use qa-power8 for ppc tests in o3 - try one of the suggestions size:M * action #125303: prevent confusing "no data" alerts size:M * action #125306: rebootmgr/D-Bus-related problem on backup.qa.suse.de * action #125468: [alert] [FIRING:1] (Apache Response Time alert J5M8aX04z) then resolved itself so flaky? size:M * action #125531: salt-pillar C pipeline runs into 1h timeout * action #125534: Consolidate the installation of openqaw5-xen with SUSE QE Tools maintained machines size:M * action #125642: Manage "unified alerting" via salt size:M * action #125735: [openQA][infra][pxe] Some machines can not boot from pxe due to "TFTP open timeout" * action #125750: In salt-states-openqa support machines requiring ssh password login for root user size:M * action #125765: Make Telegraf errors visible in alert handling * action #125798: Visual differences in GRUB menu on different x86_64 UEFI workers * action #125810: [openqa][infra] Some SUT machines can not upload logs to worker machine size:S * action #125885: worker10 crashed triggering systemd-services alert and host-up alert size:M * action #126212: openqa.suse.de response times very slow. No alert fired size:M * action #126290: Recover thincsus.qe.nue2.suse.org size:M * action #126674: openqa-review pipeline failed to produce a report with no test results * action #126821: [openQA][infra][worker] ppc64 fails to load grub2 completely over tftp/pxe on qanet and PXE load timeout issues size:M * action #126872: bot-ng pipeline(s) fail(s) to pull openSUSE container images * action #126962: Use templating for all provisioned "unified alerts" were the original alerts were part of templated dashboards * action #127049: gitlab runner system failure: Cannot connect to the Docker daemon * action #127052: [alert] Apache Response Time alert followed by DatasourceNoData for min_apache_response size:M * action #127055: [alert] Download rate alert openQA (openqaworker{14,16,17,18}) * action #127097: [alert] Failed systemd services alert * action #127256: missing nameservers in dhcp response for baremetal machines in NUE-FC-B 2 size:M * action #127274: [alert] Usage of partition mmcblk0p3 on openqa-piworker exceeds threshold size:M * action #127337: Some s390x workers have been failing for all jobs since 11 months ago * action #127754: osd nfs-server needed to be restarted but we got no alerts size:M * action #127877: [alert] Mail "DatasourceNoData Salt" triggered after rebooting OSD-VM * action #127961: FC office access for mkittler * action #127982: [alert][grafana] DatasourceNoData e.g. openQA QA-Power8-5-kvm+openqaw5-xen Disk I/O time alert worker * action #127985: [alert][grafana] Handle various problems with slow monitoring https requests and such (was: DatasourceNoData HTTP Response alert) size:M * action #127991: Fix DNS entry or IPv6 connectivity of qe-jumpy.suse.de * action #128030: IPMI of ix64ph1075.qa.suse.de is not accessible size:M * action #128090: [alert] SUSE nbg network outage 2023-04-20 * action #128273: [alert] openqaworker-arm-1+2+ failed to recover, problem in name resolution, network connection? size:M * action #128417: [alert][grafana] openqaw5-xen: partitions usage (%) alert fired and quickly after recovered again size:M * action #128420: [alert][grafana] 100% packet loss from qa-power8-4-kvm, grenache-1 and powerqaworker-qam-1 to s390zp{11,15,17}.suse.de size:M * action #128561: salt managed host being down does not trigger any alert (was: jenkins.qa.suse.de stuck in emergency mode but no alert) size:M * action #128654: [sporadic] Fail to create an ipmi session to worker grenache-1:16 (ix64ph1075) in its vlan * action #128669: gitlab CI jobs fail when using docker executor with "ERROR: Failed to remove network for build" * action #128786: worker instances on rebel (o3 s390x worker) were not running, services disabled, except for rebel:5 * action #128789: [alert] Apache Response Time alert size:M * action #128927: OSD deployment changelog repeatedly mentions old os-autoinst changes, is a worker outdated and unable to install newer packages? size:M * action #128942: [alert][grafana][openqa-piworker] NTP offset alert Generic (openqa-piworker ntp_offset_alert_openqa-piworker generic) size:M * action #128945: [alert][grafana] web UI: Too many Minion job failures alert Salt (liA25iB4k) * action #128969: [alert][grafana] Failed systemd services alert (except openqa.suse.de) Salt (Uk02cifVkz) * action #128999: openQA workers salt recipes should ensure that also developer mode works size:M * action #129065: [alert] HTTP Response alert fired, OSD loads slow size:M * action #129127: Trying to login on o3 fails with Forbidden * action #129241: [alert] Many alert notification for "QA network infrastructure Ping time alert openQA" received on 12.05.23 13:10 CEST * action #129244: [alert][grafana] File systems alert for WebUI /results size:M * action #129484: high response times on osd - Move OSD workers to o3 to prevent OSD overload size:M * action #129493: high response times on osd - better nice level for velociraptor * action #129604: evil backtick character in ipmi password * action #130132: jenkins.qa.suse.de seems down * action #130201: openqa_bootstrap: fetchneedles not called during openqa-bootstrap size:S * action #130210: [FIRING:1] Packet loss between worker hosts and other hosts alert Salt (2Z025iB4km) * action #130375: Automatically update jenkins plugins on jenkins.qa.suse.de * action #130633: Better documentation on jenkins.qa.suse.de alerts and recovery * action #130790: [alert] failed systemd alert openqa-staging-2 velociraptor-client size:M * action #130835: salt high state fails after recent merge requests in salt pillars size:M * action #130952: [alert][thursday] gitlab.suse.de CI jobs fail with "error: RPC failed; HTTP 500 curl 22 The requested URL returned error: 500" as of 2023-06-15 size:M * action #131021: [O3 repo]Missing openSUSE-Tumbleweed-oss-x86_64-CURRENT directory in /var/lib/openqa/share/factory/repo size:M * action #131096: [alert] Service `ca-certificates` can fail size:M * action #131123: qa-jump.qe.nue2.suse.org is not reachable since 2023-06-19 * action #131141: [alert] Packet loss between qa-jump.qe.nue2.suse.org and other hosts * action #131147: Reduce /assets usage on o3 * action #131150: Add alarms for partition usage on o3 size:M * action #131201: [alert][grafana] HTTP Response alert, alerted at 10:46L, back to green 10:56L size:M * action #131249: [alert][ci][deployment] OSD deployment failed, grenache-1, worker5, worker2 salt-minion does not return, error message "No response" size:M * action #131264: OSD deployment fails to install package os-autoinst-distri-opensuse size:M * action #131276: SUSE Summer 2023 - AC failure in NUE1-SRV1 * action #131303: [alert] Packet loss between worker hosts and other hosts (tumblesle.qa.suse.de) * action #131309: [alert] NFS mount can fail due to hostname resolution error size:M * action #131318: User management via salt does not work on backup.qa.suse.de * action #131321: salt-states-openqa | Failed pipeline for master * action #131459: [openQA][infra] OSD ran out of inodes without triggering a notification size:M * coordination #131519: [epic] Additional redundancy for OSD virtualization testing * action #131540: openqa-piworker fails to upgrade many packages. vendor change is not enabled as our salt states so far only do that for openQA machines, not generic machines size:M * action #131543: We have machines with both auto-update&auto-upgrade deployed, we should have only one at a time size:M * action #132134: Setup new PRG2 multi-machine openQA worker for o3 size:M * action #132137: Setup new PRG2 openQA worker for osd size:M * action #132143: Migration of o3 VM to PRG2 - 2023-07-19 size:M * action #132218: Conduct lessons learned for "openQA is not accessible" on 2023-07-02 * action #132278: Basic o3 http response alert on zabbix size:M * action #132311: osd-deployment failed at least on qa-power8-5 due to home:tiwai:kernel:5.6 missing * action #132383: FC Basement OSD hosts not reachable since 2023-07-06 01:50 CEST * action #132437: Ensure everybody in SUSE QE Tools knows how to silence alerts in various monitoring systems size:M * action #132461: manage tls certificates on o3/ariel directly with dehydrated size:M * action #132470: salt states fail to apply due to glibc error on storage.oqa.suse.de * action #132500: NUE1-SRV2, .qa.suse.de, aarch64 workers offline due to heat-related SRV2 shutdown size:M * action #132512: Handover manual raspberry pi testing size:S * action #132671: Ensure everybody in SUSE QE Tools knows how to access netbox size:M * action #132752: Use proper bot account for notifications in zabbix.suse.de size:M * action #132773: fail to set iPXE boot for a baremetal machine in FC basement * action #132788: [alert][flaky] QA-Power8-5-kvm: QA network infrastructure Ping time alert * action #132812: [alert] openqaw5-xen host up alert + infrastructure ping size:M * action #132815: [alert][flaky][o3] Multiple flaky zabbix alerts related to o3 * action #132818: salt state for worker in CI test does not apply anymore, "ID: /var/lib/openqa, Function: mount.mounted" size:M * action #132860: openqa-piworker is unstable and needs regular power-cycles size:M * action #132893: [alert] failed systemd services on jenkins: jenkins-plugins-update, snapper-cleanup * action #132902: Check and document PDU connection of nibali.qe.nue2.suse.org * action #132947: Bring back ada.qe.suse.de and fix it properly * action #133097: cron on OSD (date; fetch_openqa_bugs /etc/openqa/bugfetcher_o3.conf) > /tmp/fetch_openqa_bugs_o3.log failed * action #133127: Frankencampus network broken + GitlabCi failed --> uploading artefacts * action #133130: Lots of alerts for a single cause. Can we group and de-duplicate? * action #133142: 4 baremetal SUTs in FC basement are unreachable * action #133154: osd-deployment failed because unreachable workers * action #133160: Setup a modern UEFI httpboot setup on o3 with dnsmasq size:M * action #133181: Migration of o3 VM to PRG2 - Fix https://openqa.opensuse.org/snapshot-changes/opensuse/Tumbleweed/ * action #133250: gitlab.suse.de unusable "Failed to write to log, write /srv/www/vhosts/gitlab-ce/log/gitlab-shell.log: no space left on device" on remote operations * action #133322: qa-jump.qe.nue2.suse.org is not reachable - take 3 * action #133325: osd http response alerts - bump threshold further up * action #133328: Notification settings API | GitLab: I wonder if we can loop over all and disable/enable size:M * action #133358: Migration of o3 VM to PRG2 - Ensure IPv6 is fully working * action #133364: Migration of o3 VM to PRG2 - Decommission old-ariel in NUE1 as soon as we do not need it anymore * action #133367: Evaluate if we have hardware alternatives for Windows Server 2016+ testing * action #133385: Problem: Interface tun5: Link down alerting and autoresolving shortly size:S * action #133397: HTTP Response alert Salt alerting and autoresolving shortly size:M * action #133403: Login on o3 does not work * action #133469: [alert] Salt states don't apply sometimes on individual workers size:M * action #133490: Migration of o3 VM to PRG2 - Fix o3 bare metal hosts iPXE booting size:M * action #133511: [spike solution][timeboxed:10h] Prevent memory over-commits in openQA worker service definitions size:S * action #133553: OSD deployment fails due to unreachable openqaworker1 * action #133580: Be able to bind openQA tests to x86_64 architecture sub-versions * action #133793: salt-pillars-openqa failing to apply within 2h and it is not clear which minion(s) are missing size:M * action #133892: [alert] arm-worker2 (arm-worker2: host up alert openQA host_up_alert_arm-worker2 worker size:M * action #133928: salt-states-openqa | Failed pipeline for master * action #133985: [alert] Backup VM not reachable via FQDN size:M * action #134018: [alert] Multiple alerts with "used > 80%" size:S * action #134042: auto-update on OSD does not install updates due to "Problem: nothing provides 'libwebkit2gtk3 ..." but service does not fail and we do not get an alert size:M * action #134051: Eng-Infra maintained DNS server for .qa.suse.de taking over from qanet size:M * action #134087: Fix ix64ph1075 bare metal openQA test size:M * action #134123: Setup new PRG2 openQA worker for o3 - two new arm workers size:M * action #134132: Bare-metal control openQA worker in NUE2 size:M * action #134243: fozzie not responsive via ipmi * action #134282: [tools] network protocols failures on multimachine tests on HA/SAP size:S auto_review:"no candidate.*iscsi-target-overview-service-tab|yast2.+firewall.+services.+add.+zone":retry * action #134453: backup.qam.suse.de is Failed according to netbox and not creating backups size:M * action #134489: backup.qa.suse.de does not create backups * action #134519: We were not notified that backup.qa.suse.de did not create backups size:M * action #134522: [alert] Certificate renewal on monitor.qa.suse.de might not be working causing alerts size:M * action #134723: Conduct lessons learned for making new-lines in script_run fatal size:S * action #134735: [alert] openQA piworker openqa-piworker: host up alert * action #134816: [tools] grafana dashboard for `OpenQA Jobs test` partially without any data from OSD migration size:M * action #134822: Migration of osd VM to PRG2 - Decommission old-osd in NUE1 as soon as we do not need it anymore size:M * action #134861: https://stats.openqa-monitor.qa.suse.de/ reports "502 Bad Gateway", https://monitor.qa.suse.de works fine * action #134879: reverse DNS resolution PTR for openqa.oqa.prg2.suse.org. yields "3(NXDOMAIN)" for PRG1 workers (NUE1+PRG2 are fine) size:M * action #134900: salt states fail to apply due to "Pillar openqa.oqa.prg2.suse.org.key does not exist" * action #134906: osd-deployment failed due to openqaworker1 showing "No response" in salt size:M * action #134912: Gradually phase out NUE1 based openQA workers size:M * action #134927: OSD throws 503, unresponsive for some minutes size:M * action #134939: [OSD][Worker] openqaworker1:7 is unworkable * action #134948: Ensure IPv6 is working in the OSD setup (since we have workers in PRG2 and the VM has been migrated) size:M * action #135026: Oli shows up with 50% resolved tickets despite having been on hols for two weeks * action #135029: Many unhandled alert messages while users report problems * action #135032: Communication about vacation within the team could be better * action #135047: retro feedback: This week was very exhausting - too much infra work * action #135137: Bring back imagetester size:M * action #135152: Zabbix agent is not available * action #135185: Zabbix reports high swap space usage > 50% * action #135191: Migration of o3 VM to PRG2 - Use direct zabbix connection size:M * action #135206: [tools] GitlabCI telegraf step on salt-states-openqa failed * action #135230: salt pillars pipelines failing due to Temporary failure in name resolution * action #135236: https://monitor.qa.suse.de/d/WebuiDb/webui-summary?orgId=1&viewPanel=78 shows no data * action #135260: zabbix - o3 High CPU utilization (over 90% for 5m) size:M * action #135329: s390x work demand exceeds available workers * action #135335: [tools] gitlabci salt-pillars-openqa deploy failed on imagetester and other hosts size:M * action #135380: A significant number of scheduled jobs with one or two running triggers an alert * action #135404: openqaworker-arm-2.suse.de minion not returning * action #135467: aarch64.openqanet.opensuse.org 100 packet loss * action #135470: Grafana: Average Ping time (ms) alert with unexpanded variable "${tag_url}", which machine is this about? size:M * action #135491: fozzie and quinn unable to access PXE server or iPXE server (TFTP open timeout) * action #135509: monitor.qa.suse.de yields 502 Bad Gateway from nginx/1.21.5 size:M * action #135515: malbec.arch.suse.de not reachable anymore * action #135578: Long job age and jobs not executed for long size:M * action #135632: "Mojo::File::spurt is deprecated in favor of Mojo::File::spew" breaking os-autoinst OBS build and osd-deployment size:M * action #135737: [alert] Munin - network eth errors - opensuse.org :: openqa.opensuse.org size:M * action #135740: [alert] Munin - minion hook failed - opensuse.org :: openqa.opensuse.org - only "label_known_issues" hook scriptssize:M * action #135773: [tools] many multi-machine test failures in "ovs-client+ovs-server" test scenario when tests are run across different workers size:M * action #135833: false-positive inode and disk usage alert on windows image * action #135848: Icinga alarm about Postfix mail queue since July 26 * action #135890: [alert][FIRING:1] (Apache Workers alert Salt MW025mB4z) 2023-09-18 00:11 * action #135893: FQDN (long host name) is unavailable for fozzie * action #135941: openqa-piworker offline and no alert * action #136007: Conduct "lessons learned" with Five Why analysis for network protocols failures on multimachine tests on HA/SAP size:S * action #136052: bot-ng in gitlab CI runner + monitor.qe.nue2.suse.org unable to access smelt.suse.de, "No route to host" * action #136115: https://racktables.nue.suse.com/ yields "Database write failed - LDAP caching error" on connection attempts after about 1m waiting time * action #136121: Repurpose PowerPC hardware in FC Basement - deagol * action #136133: Migrate aarch64.openqanet.opensuse.org to FC Basement size:M * action #136238: test incompletes with auto_review:"qemu-system-.*Address already in use":retry size:M * action #136274: Failing DNS resolution on o3 for hosts like github.com * action #136304: [tools] please provide a machine for testing sle-micro 5.5 (virtualization testing) * action #136325: salt deploy fails due to multiple offline workers in qe.nue2.suse.org+prg2.suse.org * action #136370: systemd service rsnapshot@beta on backup-vm.qe.nue2.suse.org failed due to process conflict * action #136946: https://monitor.qa.suse.de/d/WebuiDb/webui-summary shows multiple panels with "no data" size:M * action #137075: Fail to login to the osd, 'Forbidden' error is returned due to DNS server change within SUSE *and* auto_review:"Bugzilla query failed: Network is unreachable":retry size:M * action #137231: [tools] qem-bot and others can not execute scheduled jobs due to registry.opensuse.org outage * action #137270: [FIRING:1] host_up (malbec: host up alert openQA malbec host_up_alert_malbec worker) and similar about malbec * action #137306: Check unreal6 cabling, SP and system not reachable over network size:M * action #137408: Support move of s390x mainframe(s) to PRG2 - o3 size:M * action #137504: Lots of emails of the form Switched Rack PDU: Outlet .* (openqaworker-arm-.*) .* * action #137519: [alert] Failed systemd services - openqaworker1 - proc-sys-fs-binfmt_misc.mount, kernel modules already removed with old kernel still running size:M * action #137522: [alert] alerts about "host: sushil-linux-tw-kde" that tools team should not be notified about, e.g. Inode utilization inside the OSD infrastructure is too high size:M * action #137600: [alert] Packet loss between worker hosts and other hosts size:S * action #137603: [alert] Queue: State (SUSE) - too few jobs executed alert size:S * action #137615: [alert] Failed systemd services alert - s390zl12,s390zl13 - kdump-early, kdump, smartd * action #137744: [alert] gitlab pipeline fails in deploy step for salt-pillars-openqa: Data failed to compile * action #137747: NFS and download.opensuse.org related outage handling * action #137756: Re-enable worker31 for multi-machine tests in production auto_review:"tcpdump.+check.log.+timed out at" * action #137771: Configure o3 ppc64le multi-machine worker size:M * action #137813: [alert] Failed systemd services - qamaster - logrotate fails on /var/log/messages with "/usr/bin/xz: (stdin): Read error: Input/output error" size:S * action #137984: salt "refresh" job full of errors but CI job passes size:M * action #138005: grafana panel "Packet loss between worker hosts and other hosts" shows more than just ping to "other hosts" and hence becomes slow and triggers redundant alerts size:M * action #138038: diesel+petrol missing network, IPMI still reachable * action #138044: Grouped seemingly unrelated alert emails are confusing size:M * action #138320: test fail hook in openqa_webui is failing: script/opencli api jobs - Can't locate Mojo::Base in @INC * action #138350: worker31 and likely more OSD machines get stuck on boot in grub command line * action #138377: bot-ng and openQABot pipelines fail to pull containers from registry.suse.de size:M * action #138515: foobar host up alert size:S * action #138518: unreal6 partition usage alert * action #138527: Zabbix agent on ariel.dmz-prg2.suse.org reported no data for 30m and there is nothing in the journal size:S * action #138536: Alert Worker .* has no heartbeat (900 seconds), restarting (see FAQ for more) on o3 * action #138545: Munin - minion hook failed - opensuse.org :: openqa.opensuse.org size:S * action #138551: DNS outage of 2023-10-25, e.g. Cron (date; fetch_openqa_bugs)> /tmp/fetch_openqa_bugs_osd.log Max retries exceeded with url size:S * action #138650: partition usage panels show a long list of undefined and no reasonable graphs at least for some generic machines size:M * action #138746: [tools] s390x VM randomly fails to open QCOW disk image: Permission denied * action #138773: Trying to login on o3 or osd fails with Forbidden - again * action #139004: Cron (date; fetch_openqa_bugs)> /tmp/fetch_openqa_bugs_osd.log - OSError: [Errno 101] Network is unreachable * action #139100: Long OSD ppc64le job queue - Move nue3 power8 machines to nue2 * action #139145: dehydrated on monitor.qe.nue2.suse.org aka. monitor.qa.suse.de fails with "EXPECTED value GOT EOF" size:M * action #139148: [alert] openqa-review repeatedly failed to read job group overview data running into "Read timed out" * action #139271: Repurpose PowerPC hardware in FC Basement - mania Power8 PowerPC size:M * action #139307: openQA for s390x does not work at the moment * action #150815: unable to login over ssh to o3 (gate.opensuse.org:2214) size:M * action #150824: monitor-pre-deploy from osd-deployment is failing with Build status for devel:openQA openSUSE_Leap_15.5 arch x86_64 is not successful * action #150830: Two new ARM servers 2023-11 for openqa.suse.de bare-metal testing size:M * action #150845: openqaworker-arm22 broken due to packages automatically removed size:M * action #150887: [alert] [FIRING:1] s390zl12 (s390zl12: partitions usage (%) alert Generic partitions_usage_alert_s390zl12 generic), also s390zl13 size:M * action #150908: o3 "Unable to fetch build results" and "Internal server error" on some pages size:M * action #150920: openqaworker-arm22 is unable to join download.opensuse.org in parallel tests = tap mode size:M * action #150938: [openQA][sut][ipmi] No ipmi sol output with ix64ph1075 size:M * action #150956: o3 cannot send e-mails via smtp relay size:M * action #150965: At least diesel+petrol+mania fail to auto-update due to kernel locks preventing patches size:M * action #150983: CPU Load and usage alert for openQA workers size:S * action #151013: o3 yielding "502 Bad Gateway" from nginx 2023-11-19, why was the config overwritten? size:M * action #151130: IPv6 for openqa.opensuse.org and open.qa size:S * action #151165: salt-pillars-openqa pipeline fails * action #151231: package loss between o3 machines and download.opensuse.org size:M * action #151390: Brute-force salt osiris so that we enable self-management of VMs for users size:M * action #151396: After osiris is now in salt decide about the fate of seth * action #151588: [potential-regression] Our salt node up check in osd-deployment never fails size:M * action #151597: [alert] osiris-1 (osiris-1: partitions usage (%) alert Generic partitions_usage_alert_osiris-1 generic * action #151696: Evaluate use of https://itpe.io.suse.de/open-platform/docs/ size:M * action #151807: [alert] o3 zabbix: Problem: /var/lib/snapshot-changes: Disk space is critically low (used > 94%) size:M * action #152008: s390zl13 redirecting http to https because of .wget-hsts but we don't install ca-certificates-suse on all salt-controlled machines * action #152092: Handle all package downgrades in OSD infrastructure properly in salt size:M * action #152095: [spike solution][timeboxed:8h] Ping over GRE tunnels and TAP devices and openvswitch outside a VM with differing packet sizes size:S * action #152101: Allow salt to properly configure non-production multi-machine workers size:M * action #152377: [tools] SLE-15-SP2 and SLE-15-SP3 install target medias for x86_64 in the ...../assets/repo are purged * action #152386: [alert] "Apache Workers" and "HTTP Response" alerts fired shortly on 2023-12-11 size:S * action #152446: openqaworker-arm21 is broken and produces lots of incomplete jobs * action #152503: [FIRING:1] worker38 (worker38: partitions usage (%) alert openQA partitions_usage_alert_worker38 worker) * action #152557: unexpected routing between PRG1/NUE2+PRG2 * action #152578: Many incompletes with "Error connecting to VNC server " size:M * action #152599: [alert] `rsnapshot@alpha.service` failed on `backup.qa.suse.de` * action #152643: https://monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test showing two entries for x86_64, one with quotes and one without * action #152649: [alert] `rsnapshot@alpha.service` failed on `backup.qa.suse.de` size:M * action #152673: [alert] `systemctl status iscsid.socket` failed on `s390zl12.oqa.prg2.suse.org` size:S * action #152741: [tools] gitlab CI - openqa_review failed with connection timeout on osd * action #152811: ada.qe.suse.de is not responding to salt commands * action #152813: openqaw5-xen.qa.suse.de is not responding to salt commands * action #152827: [tools] cron service updating clamav database failing on OSD + O3 size:S * action #152857: [tools] alert ping between hosts timeout proxy.scc.suse.de * action #152887: Setup of Ampere Altra Q32-17 for bare-metal tests in openQA size:M * action #152941: circleCI job runs into 20m timeout due to slow download from registry.opensuse.org * action #152981: monitoring: Update deprecated angular grafana panels * action #153023: [FIRING:1] (Packet loss between worker hosts and other hosts alert Salt 2Z025iB4km) * action #153325: osd-deployment | Failed pipeline, Digest verification failed for openQA-common size:M * action #153328: jenkins fails in submit-openQA-TW-to-oS_Fctry, Server returned an error: HTTP Error 400: Bad Request size:M * action #153418: http based health check against proxy.scc.suse.de size:M * action #153544: os-autoinst:os-autoinst-openvswitch-test not building on Leap_15.5 x86_64 * action #153925: Support YAM squad to get backlogger running in our salt states (and fix our pipelines again) * action #153958: [alert] s390zl12: Memory usage alert Generic memory_usage_alert_s390zl12 generic * action #153961: Grafana doesn't show values for inode utilization for s390zl12 * action #154018: [alert] Failed systemd services alert: backup-vm postfix * action #154177: File systems alert Salt: One of the file systems is too full size:M * action #154345: Incomplete jobs (not restarted) of last 24h alert Salt * action #154426: HTTP Response alert Salt alerting and autoresolving shortly size:M * action #154546: Cron fetch_openqa_bugs refused or timed out trying to fetch individual tickets * action #154549: Certain queries on poo are not accessible * action #154624: Periodically running simple ping-check multi-machine tests on x86_64 covering multiple physical hosts on OSD alerting tools team on failures size:M * action #154627: [potential-regression] Ensure that our "host up" alert alerts on not host-up conditions size:M * action #154927: [alert] Broken workers alert was firing several hours after weekly reboot * action #154939: Clarify use of quake2 and ensure it does not waste power size:S * action #154942: Clarify use of quake6 and ensure it does not waste power size:S * action #155074: salt-states-pipeline fails trying to install influxdb * action #155080: jenkins is no longer producing GNOME:Next test runs: http://jenkins.qa.suse.de/job/gnome_next-openqa/8670/console * action #155326: [alert] "HTTP Response" alert fired shortly on 2024-02-12 and 2024-03-04 size:M * action #155659: [openQA][infra][sut] Failed to establish connnection to ix64ph1075-sp.qe.nue2.suse.org * action #155689: bot-ng pipelines fails to schedule incidents * action #155725: [openQA][infra][sut] Failed to establish connnection to fozzie-sp and quinn-sp * action #155737: Salt pillars pipelines fail due to refused connection errors on telegraf * action #155740: Scripts CI pipelines fail due to timeout after many Job state of job ID xyz: scheduled, waiting messages * action #155824: Support IPv6 SLAAC in our infrastructure size:M * action #155848: Firewalld is logging many errors and sometimes restarting on worker29, possibly related to MM failures size:M * action #155929: Try out rstp_enable=True in openqa/openvswitch.sls size:M * action #156016: [openQA][sle-micro][virtualization] Test slem_virtualization@uefi with Default-encrypted image was not triggered correctly size:M * action #156049: [alert] Scripts CI pipeline failing with No reason for devel:openQA:Leap:15.6/local-npm-registry * action #156064: Handle planned datacenter shutdown PRG1 2024-02-28 size:M * action #156076: bot-ng gitlab CI pipeline fails with "ERROR: Job failed (system failure): Error response from daemon: mkdir /var/lib/docker/overlay2/…/merged: no space left on device (exec.go:78:0s)" * action #156130: Install Intel GPU on one of the servers in FC size:M * action #156226: [bot-ng] Pipeline failed / failed to pulled image / no space left on device * action #156301: [bot-ng] Pipeline failed / KeyError: 'priority' / execution took longer than 4h0m0s seconds * action #156304: [salt-pillars] Pipeline failed / Passed invalid arguments to state.highstate * action #156322: zabbix-proxy.dmz-prg2.suse.org not reachable from ariel.suse-dmz.opensuse.org * action #156331: [gitlab] New pipeline schedules cannot be created, you have exceeded the maximum number of pipeline schedules for your plan * action #156460: Potential FS corruption on osd due to 2 VMs accessing the same disk * action #156481: cron -> (fetch_openqa_bugs)> /tmp/fetch_openqa_bugs_osd.log failed / No route to host / openqa.suse.de * action #156514: Cron (date; fetch_openqa_bugs)> /tmp/fetch_openqa_bugs_osd.log failed / simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0) / bugzilla_issue.py * action #156517: Can't call method "script" on an undefined value at lib/OpenQA/WebAPI/Controller/Step.pm size:S * action #156532: lessons learned about "Potential FS corruption on osd due to 2 VMs accessing the same disk" size:S * action #156535: Handle unfinished SLE maintenance tests due to FS corruption on OSD 2024-03-01 * action #156913: Remove its=off setting in global QEMUMACHINE for worker-arm{1,2} size:M * action #156934: RPi realhw tests fail with # Test died: Error connecting to * action #157081: OSD unresponsive or significantly slow for some minutes 2024-03-12 08:30Z * action #157243: Update HMC with vMF68994 * action #157438: Failed systemd services alert (jenkins-plugins-update, snapper-cleanup) * action #157441: osd-deployment | Failed pipeline for master (qesapworker-prg5.qa.suse.cz) * action #157453: [FIRING:1] host_up (qesapworker-prg5: host up alert openQA qesapworker-prg5 host_up_alert_qesapworker-prg5 worker) * action #157468: Handle internal test machines with compromised root password size:M * action #157528: Remove redundant ASM connections for powerPC machines size:S * action #157615: [alert] osd-deployment failed in post-deploy , telegraf errors size:M * action #157666: OSD unresponsive and then not starting any more jobs on 2024-03-21 * action #157834: [openQA][ipmi] IPMI backend machines in NUE2 can not be reached auto_review:"Reason: backend died: ipmitool.*Address lookup for.*qe.nue2.suse.org":retry * action #158020: salt-states-openqa pipeline times out * action #158023: salt-states-openqa pipeline invalid arguments to state.highstate on monitor.qe.nue2.suse.org * action #158026: osd-deployment exceeds 2h maximum runtime during package installation * action #158041: grenache needs upgrade to 15.5 * action #158059: OSD unresponsive or significantly slow for some minutes 2024-03-26 13:34Z * action #158104: typing issue on ppc64 worker size:S * action #158113: typing issue on ppc64 worker - make CPU load alert more strict size:M * action #158170: Increase resources for s390x kvm size:M * action #158185: parallel job failed to get the vars from its pair size:S * action #158242: Prevent ssh access to test VMs on svirt hypervisor hosts with firewall size:M * action #158266: openQA jobs on diesel ppc64le fail due to auto_review:"QEMU: This is probably because your SMT is enabled." * action #158377: Detect from monitoring data which monitored machines show a too low system usage over time size:M * action #158383: Crosscheck which machines marked as "unused" in racktables are still pingable (as they should not be powered on at all) size:M * action #158404: [qe-core][jeos]test fails in prepare_firstboot * action #158419: osiris-1.qe.nue2.suse.org not responsive over virt-manager and "virsh list" hangs * action #158502: [FIRING:1] host_up (petrol: host up alert openQA petrol host_up_alert_petrol worker) * action #158505: Failed systemd services alert for jenkins-plugins-update size:S * action #158526: Apply the latest firmware+BIOS upgrade for diesel as well size:S * action #158556: Single-value SLI of OSD HTTP response code successful vs. all size:S * action #158559: Single-value SLI of OSD HTTP response time size:S * action #158907: Automated check for machines marked as "unused" in racktables but still pingable (as they should not be powered on at all) size:M * action #158913: Fix salt warning observed on petrol "State for file: /etc/fstab - Neither 'source' nor 'contents' nor 'contents_pillar' nor 'contents_grains' was defined, yet 'replace' was set to 'True'" * action #159048: Setup new Power10 machine for QE LSG in PRG2 (S/N 7882391) * action #159066: network-level firewall preventing direct ssh+vnc access to openQA test VMs size:M * action #159165: Ensure that rsyncd is properly covered in salt states size:S * action #159186: [alert] Systemd-services alert failing due to unit "rsnapshot@alpha" on host "storage" * action #159264: Discard Ampere ARM server packaging if we still have it + reserve thunderx21 machine in FC Basement for afaerber size:S * action #159270: openqaworker-arm-1 is Unreachable size:S * action #159303: [alert] osd-deployment pre-deploy pipeline failed because openqaworker-arm-1.qe.nue2.suse.org was offline size:S * action #159318: openqa-piworker host up alert * action #159396: Repeated HTTP Response alert for /tests and unresponsiveness due to potential detrimental impact of pg_dump (was: HTTP Response alert for /tests briefly going up to 15.7s) size:M * action #159402: "SSL expiration" panel on https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary shows problems size:S * action #159555: IPMI access over IPv6 doesn't work on imagetester - try to update BIOS with physical access size:S * action #159639: [alert] "web UI: Too many 5xx HTTP responses alert" size:S * action #159669: Missing openQA data on metrics.opensuse.org since o3 migration to PRG2 * action #159840: Munin - minion hook failed - see openqa-gru service logs for details - opensuse.org :: openqa.opensuse.org * action #159966: [FIRING:1] DatasourceError Salt (000000001 A External http responses) * action #160083: client gets a redirect and downloads an HTML page from microsoft instead of the proper windows .qcow2 image * action #151286: [Tools] Investigate s390x-zVM-vswitch-l2 worker issue: Could not retrieve required variable ZVM_GUEST_SUBNETMASK size:S * openqa-force-result #112844: this is a test ticket, please ignore me!