Project

General

Profile

Actions

action #137303

closed

CircleCI t/api/14-plugin_obs_rsync_async.t failure size:M

Added by tinita about 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2023-10-02
Due date:
2023-11-02
% Done:

0%

Estimated time:

Description

Observation

https://app.circleci.com/pipelines/github/os-autoinst/openQA/12251/workflows/82b3402d-8e38-4c56-b9d5-1438b18528b2/jobs/114353

[02:50:04] t/api/14-plugin_obs_rsync_async.t .. 4/?     # Premature connection close

    #   Failed test 'PUT /api/v1/obs_rsync/Proj1/runs'
    #   at t/api/14-plugin_obs_rsync_async.t line 128.

    #   Failed test 'Proj1 just starts as gru should empty queue for now'
    #   at t/api/14-plugin_obs_rsync_async.t line 128.
    #          got: undef
    #     expected: '201'
    # Looks like you failed 2 tests of 22.
[02:50:04] t/api/14-plugin_obs_rsync_async.t .. 5/? 
#   Failed test 'test concurrenctly long running jobs again'
#   at t/api/14-plugin_obs_rsync_async.t line 142.

#   Failed test 'Number of finished jobs'
#   at t/api/14-plugin_obs_rsync_async.t line 147.
#          got: '9'
#     expected: '10'

#   Failed test 'Number of finished jobs'
#   at t/api/14-plugin_obs_rsync_async.t line 154.
#          got: '9'
#     expected: '10'

    #   Failed test 'Job should retry succeed'
    #   at t/api/14-plugin_obs_rsync_async.t line 174.
    #          got: '10'
    #     expected: '11'
    # Looks like you failed 1 test of 6.
[02:50:04] t/api/14-plugin_obs_rsync_async.t .. 12/? 
#   Failed test 'test max retry count'
#   at t/api/14-plugin_obs_rsync_async.t line 184.
# Looks like you failed 4 tests of 12.
[02:50:04] t/api/14-plugin_obs_rsync_async.t .. Dubious, test returned 4 (wstat 1024, 0x400)
Failed 4/12 subtests 

Acceptance Criteria

  • AC1: obs rsync tests pass in all CircleCI and GitHub PR runs

Suggestions

  • Confirm if this is happening reliably, or perhaps flaky
  • This was only in the nightly job a test job of a single dependency PR? Double-check if it appears anywhere else
  • Investigate what could cause the numbers of finished jobs to be off here
  • Check what the "Premature connection close" means
Actions #1

Updated by okurz about 1 year ago

  • Priority changed from Normal to High
  • Target version set to Ready
Actions #2

Updated by livdywan about 1 year ago

  • Subject changed from CircleCI t/api/14-plugin_obs_rsync_async.t failure to CircleCI t/api/14-plugin_obs_rsync_async.t failure size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #3

Updated by livdywan about 1 year ago

  • Description updated (diff)

This hasn't occurred again since the first time. Also, I think there was a mixup. The failure was not a nightly job but a dependency PR.

Actions #4

Updated by tinita about 1 year ago

Actually it was a test on master after a PR.

Actions #5

Updated by tinita about 1 year ago

  • Status changed from Workable to In Progress
  • Assignee set to tinita

I'm running the unit test 1000 times to see if it fails

Actions #6

Updated by openqa_review about 1 year ago

  • Due date set to 2023-11-02

Setting due date based on mean cycle time of SUSE QE Tools

Actions #7

Updated by tinita about 1 year ago

  • Status changed from In Progress to Resolved

So i tried to run it 1000 times.
It ran out of disk space, but I don't know which path. /tmp has 2GB, /home has 24GB.

% time runs=1000 ~/repos/okurz-scripts/count_fail_ratio make test TESTS=t/api/14-plugin_obs_rsync_async.t EXTRA_PROVE_ARGS="-v" OPENQA_TEST_TIMEOUT_DISABLE=1 KEEP_DB=1

DBIx::Class::Storage::TxnScopeGuard::DESTROY(): A DBIx::Class::Storage::TxnScopeGuard went out of scope without explicit commit or error.Ro[1806/1806]
 at t/api/14-plugin_obs_rsync_async.t line 0
Can't call method "signal" on an undefined value at t/api/14-plugin_obs_rsync_async.t line 187.
END failed--call queue aborted.
make[1]: *** [Makefile:221: test-unit-and-integration] Error 1
make: *** [Makefile:216: test-with-database] Error 2
failed to run SQL in /home/tinita/repos/openQA/t/api/../../dbicdh/PostgreSQL/deploy/100/001-auto-__VERSION.sql: DBIx::Class::DeploymentHandler::DeployM
ethod::SQL::Translator::try {...} (): DBI Exception: DBD::Pg::db do failed: ERROR:  could not extend file "base/16384/547436": No space left on device HINT:  Check free disk space. at inline delegation in DBIx::Class::DeploymentHandler for deploy_method->deploy (attribute declared in /usr/lib/perl5/vendor_perl/5.38.0/DBIx/Class/DeploymentHandler/WithApplicatorDumple.pm at line 51) line 18                                                              
 (running line 'CREATE TABLE dbix_class_deploymenthandler_versions ( id serial NOT NULL, version character varying(50) NOT NULL, ddl text, upgrade_sql 
text, PRIMARY KEY (id), CONSTRAINT dbix_class_deploymenthandler_versions_version UNIQUE (version) )') at /usr/lib/perl5/vendor_perl/5.38.0/DBIx/Class/D
eploymentHandler/DeployMethod/SQL/Translator.pm line 263.
DBIx::Class::Storage::TxnScopeGuard::DESTROY(): A DBIx::Class::Storage::TxnScopeGuard went out of scope without explicit commit or error. Rolling back.
 at t/api/14-plugin_obs_rsync_async.t line 0                                                                                                           DBIx::Class::Storage::TxnScopeGuard::DESTROY(): A DBIx::Class::Storage::TxnScopeGuard went out of scope without explicit commit or error. Rolling back. at t/api/14-plugin_obs_rsync_async.t line 0                                                                                                                                                                                                

But until it got there, it ran 741 successful tests.

I'm closing it.

Actions #8

Updated by okurz about 1 year ago

the space that was depleted here is /dev/shm where the temporary database is stored so RAM. You can avoid that by either removing the KEEP_DB=1 switch or increase RAM or change the path that is initialized in Makefile.

Actions #9

Updated by livdywan about 1 year ago

okurz wrote in #note-8:

the space that was depleted here is /dev/shm where the temporary database is stored so RAM. You can avoid that by either removing the KEEP_DB=1 switch or increase RAM or change the path that is initialized in Makefile.

Perhaps we have an opportunity here to make this more discoverable: https://github.com/os-autoinst/openQA/pull/5339

Actions

Also available in: Atom PDF