Project

General

Profile

Actions

action #165408

closed

Unreviewed issue (Group 24 openQA) test-running in openQA failing with error detecting default remote branch size:S

Added by livdywan 4 months ago. Updated 4 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
Due date:
2024-09-06
% Done:

0%

Estimated time:

Description

Observation

https://openqa.opensuse.org/tests/4408286#step/test_running/12
/usr/bin/env: git: Permission denied

From https://openqa.opensuse.org/tests/4408284/logfile?filename=test_running-openqa_services.log.txt

Aug 16 07:05:26 susetest openqa-gru[126234]: [info] Running cmd: env GIT_SSH_COMMAND="ssh -oBatchMode=yes" git ls-remote --symref https://github.com/os-autoinst/os-autoinst-distri-example.git HEAD
Aug 16 07:05:26 susetest openqa-gru[126234]: [55B blob data]
Aug 16 07:05:26 susetest openqa-gru[126234]: [error] cmd returned 32256

From https://openqa.opensuse.org/tests/4408284/logfile?filename=autoinst-log.txt

[2024-08-16T11:12:25.875198Z] [info] [pid:16517] ::: basetest::runtest: # Test died: command 'retry -s 5 -r 3 -- sh -c 'test -f /var/lib/openqa/share/tests/*/.git/config'' failed: the test distribution should be checked out by openQA automatically at /usr/lib/os-autoinst/autotest.pm line 412.

https://openqa.opensuse.org/tests/4408286/logfile?filename=test_running-openqa_minion_results.txt

[
  "Job successfully executed",
  "Error detecting remote default branch name for 'https://github.com/os-autoinst/os-autoinst-distri-example.git': /usr/bin/env: \x{e2}\x{80}\x{98}git\x{e2}\x{80}\x{99}: Permission denied at /usr/share/openqa/script/../lib/OpenQA/Task/Git/Clone.pm line 66.\n",
  "Job successfully executed"
]

Note that this should read ‘git’, it's just not encoded correctly in the error message.

Acceptance criteria

  • AC1: Jobs don't fail because git cannot be executed.

Suggestions

  • Note env GIT_SSH_COMMAND="ssh -oBatchMode=yes" git ls-remote ... - is why "env" is failing here
  • Try and fix the encoding i.e. ` ending up garbled in the JSON
  • This might be caused by apparmor, which is enabled on o3/ in openQA-in-openQA tests
  • Why is this sporadic? Is it a timing issue?
  • Run the test 1000 times
  • Extend the existing logging e.g. to reveal what is causing the permission issues

Files


Related issues 4 (0 open4 closed)

Related to openQA Project (public) - action #163784: [openqa-in-openqa] test fails in test_running on "retry -s 10 -r 6 -e -- sh -c 'test -f /var/lib/openqa/share/tests/*/.git/config", probably due to transient network errors size:SResolvedmkittler2024-07-11

Actions
Related to openQA Project (public) - action #155413: Ensure apparmor is enforced in openQA-in-openQA tests size:MResolvedjbaier_cz

Actions
Related to openQA Project (public) - action #165692: [openQA-in-openQA] test does not fail if state is done but incomplete auto_review:"no candidate needle.*openqa-dashboard.*matched" size:S Resolvedjbaier_cz2024-08-22

Actions
Has duplicate openQA Project (public) - action #165411: test fails in test_running: /usr/bin/env: git: Permission deniedRejected2024-08-16

Actions
Actions #1

Updated by jbaier_cz 4 months ago

  • Description updated (diff)
Actions #2

Updated by tinita 4 months ago

  • Has duplicate action #165411: test fails in test_running: /usr/bin/env: git: Permission denied added
Actions #3

Updated by tinita 4 months ago

  • Description updated (diff)
Actions #4

Updated by tinita 4 months ago

  • Description updated (diff)
Actions #5

Updated by tinita 4 months ago

  • Description updated (diff)
Actions #7

Updated by tinita 4 months ago

Why do we actually run env GIT_SSH_COMMAND="ssh -oBatchMode=yes" git ls-remote ... instead of

local $ENV{GIT_SSH_COMMAND} = 'ssh -oBatchMode=yes';
# then run git ls-remote

I guess this won't solve the problem as I don't know why we get the Permission denied, but the code could look easier.

Actions #8

Updated by livdywan 4 months ago

  • Subject changed from Unreviewed issue (Group 24 openQA) test-runnung in openQA failing with error detecting default remote branch to Unreviewed issue (Group 24 openQA) test-running in openQA failing with error detecting default remote branch
Actions #9

Updated by livdywan 4 months ago

I tried to investigate this but I'm not sure this is reproducible anymore?

tinita wrote in #note-7:

Why do we actually run env GIT_SSH_COMMAND="ssh -oBatchMode=yes" git ls-remote ...

openQA / lib/OpenQA/Task/Git/Clone.pm follows the same pattern as os-autoinst / OpenQA/Isotovideo/Utils.pm, both introduced by @dheidler if I see it correctly. My guess is it just makes reproducing it easier as you can copy the entire command.

Actions #10

Updated by tinita 4 months ago · Edited

livdywan wrote in #note-9:

I tried to investigate this but I'm not sure this is reproducible anymore?

Well, we see that failure sporadically, and now we actually see the error message. That was added as part of #163784

Probably it makes sense to schedule this a hundred times or so, but I would add some debugging around it. This error means that git was found in $PATH but is not executable or readable. At least I can't think of a different cause for this.

edit: or apparmor doesn't allow it, then I think the error would be the same

Actions #11

Updated by tinita 4 months ago

  • Related to action #163784: [openqa-in-openqa] test fails in test_running on "retry -s 10 -r 6 -e -- sh -c 'test -f /var/lib/openqa/share/tests/*/.git/config", probably due to transient network errors size:S added
Actions #12

Updated by ybonatakis 4 months ago

  • Assignee set to ybonatakis
Actions #14

Updated by ybonatakis 4 months ago · Edited

I checked the logs at the time of the https://openqa.opensuse.org/tests/4408286#step/test_running/12 (started 2024-08-16 11:16:20)

Aug 16 11:21:01 ariel systemd[1]: Started Continuously deploys openQA, see https://progress.opensuse.org/issues/105379.
Aug 16 11:21:01 ariel systemd[1]: Started Session c40651 of User geekotest.
Aug 16 11:21:01 ariel systemd[1]: Started Session c40652 of User geekotest.
Aug 16 11:21:01 ariel systemd[1]: Started Session c40653 of User geekotest.
Aug 16 11:21:01 ariel cron[19389]: pam_unix(crond:session): session opened for user geekotest by (uid=0)
Aug 16 11:21:01 ariel cron[19387]: pam_unix(crond:session): session opened for user geekotest by (uid=0)
Aug 16 11:21:01 ariel cron[19388]: pam_unix(crond:session): session opened for user geekotest by (uid=0)
Aug 16 11:21:01 ariel openqa-continuous-update[19449]: devel:openQA looks bad for Leap 15.5 (x86_64):
Aug 16 11:21:01 ariel openqa-continuous-update[19449]: <resultlist state="7031733d2b1c68ca877c892b5908a0eb">
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:   <result project="devel:openQA" repository="15.5" arch="x86_64" code="building" state="building" dirty="true">
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:     <status package="isotovideo-os-autoinst-distri-opensuse" code="excluded"/>
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:     <status package="isotovideo-qemu-kvm" code="excluded"/>
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:     <status package="isotovideo-qemu-kvm-jq" code="excluded"/>
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:     <status package="isotovideo-qemu-x86" code="excluded"/>
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:     <status package="isotovideo-qemu-x86-jq" code="excluded"/>
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:     <status package="openQA" code="building">
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:       <details>building on i02-ch2c:8</details>
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:     </status>
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:     <status package="openQA-devel-container" code="disabled"/>
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:     <status package="openQA-single-instance-codespaces-container" code="excluded"/>
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:     <status package="openQA-single-instance-container" code="disabled"/>
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:     <status package="openQA:openQA-client-test" code="unresolvable">
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:       <details>nothing provides openQA-client = 4.6.1723806881.14d72391, (got version 4.6.1723805254.25dc7a3f-lp155.7062.1)</details>
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:     </status>
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:     <status package="openQA:openQA-devel-test" code="unresolvable">
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:       <details>nothing provides openQA-devel = 4.6.1723806881.14d72391, (got version 4.6.1723805254.25dc7a3f-lp155.7062.1)</details>
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:     </status>
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:     <status package="openQA:openQA-test" code="unresolvable">
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:       <details>nothing provides openQA = 4.6.1723806881.14d72391, (got version 4.6.1723805254.25dc7a3f-lp155.7062.1)</details>
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:     </status>
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:     <status package="openQA:openQA-worker-test" code="unresolvable">
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:       <details>nothing provides openQA-worker = 4.6.1723806881.14d72391, (got version 4.6.1723805254.25dc7a3f-lp155.7062.1)</details>
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:     </status>
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:     <status package="openQA_container_client" code="disabled"/>
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:     <status package="openQA_container_webUI" code="excluded"/>
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:     <status package="openQA_container_webUI_loadbalancer" code="excluded"/>
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:     <status package="openQA_container_worker" code="excluded"/>
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:     <status package="openQA_container_worker_os_autoinst_distri_opensuse" code="excluded"/>
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:     <status package="opencv" code="disabled"/>
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:     <status package="openqa_bugfetcher" code="succeeded"/>
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:     <status package="os-autoinst" code="succeeded"/>
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:     <status package="os-autoinst-distri-opensuse-deps" code="succeeded"/>
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:     <status package="os-autoinst-scripts" code="succeeded"/>
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:     <status package="os-autoinst-scripts-container" code="excluded"/>
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:     <status package="os-autoinst:os-autoinst-devel-test" code="succeeded"/>
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:     <status package="os-autoinst:os-autoinst-openvswitch-test" code="succeeded"/>
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:     <status package="os-autoinst:os-autoinst-test" code="succeeded"/>
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:     <status package="os-autoinst_dev" code="excluded"/>
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:     <status package="python-openqa_client" code="succeeded"/>
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:     <status package="qem-bot-dev" code="disabled"/>
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:     <status package="xterm-console" code="succeeded"/>
Aug 16 11:21:01 ariel openqa-continuous-update[19449]:   </result>
Aug 16 11:21:01 ariel openqa-continuous-update[19449]: </resultlist>
Aug 16 11:21:01 ariel systemd[1]: openqa-continuous-update.service: Deactivated successfully.
Aug 16 11:21:01 ariel CRON[19389]: pam_unix(crond:session): session closed for user geekotest
Actions #15

Updated by livdywan 4 months ago

  • Status changed from New to In Progress
Actions #16

Updated by livdywan 4 months ago

  • Subject changed from Unreviewed issue (Group 24 openQA) test-running in openQA failing with error detecting default remote branch to Unreviewed issue (Group 24 openQA) test-running in openQA failing with error detecting default remote branch size:S
  • Description updated (diff)
Actions #17

Updated by tinita 4 months ago

  • Related to action #155413: Ensure apparmor is enforced in openQA-in-openQA tests size:M added
Actions #18

Updated by ybonatakis 4 months ago

logs from 2024-08-16 are lost in openqaworker26.
But there is a recent failing job https://openqa.opensuse.org/tests/4422622#step/test_running/9

I found again a similar "looks bad" error

Aug 22 10:39:34 openqaworker22 systemd[1]: openqa-continuous-update.service: Deactivated successfully.
Aug 22 10:45:29 openqaworker22 systemd[1]: Started Continuously deploys openQA, see https://progress.opensuse.org/issues/105379.
Aug 22 10:45:29 openqaworker22 openqa-continuous-update[4137]: devel:openQA looks bad for Leap 15.5 (x86_64):
Aug 22 10:45:29 openqaworker22 openqa-continuous-update[4137]: <resultlist state="b47e07375eb3a2402e03653ced95fb5d">

but I didnt associate the time with the test time. Now is 2:50 but the worker is 12:50.
The error appears twice in the last two/three hours

checked openqa-worker-cacheservice and openqa-worker-cacheservice-minion but I am not sure there is something wrong here.

Aug 22 11:03:52 openqaworker22 systemd[1]: openqa-worker-cacheservice-minion.service: Main process exited, code=exited, status=72/OSFILE
Aug 22 11:03:52 openqaworker22 systemd[1]: openqa-worker-cacheservice-minion.service: Failed with result 'exit-code'.
Actions #20

Updated by tinita 4 months ago

How about uploading /var/log/audit/audit.log additionally so we can check the theory that it is apparmor?

Actions #21

Updated by jbaier_cz 4 months ago

This IS AppArmor issue which will be fixed in a while by https://github.com/os-autoinst/openQA/pull/5883. I need to double check I collected all the paths.

Actions #22

Updated by jbaier_cz 4 months ago

  • Related to action #165692: [openQA-in-openQA] test does not fail if state is done but incomplete auto_review:"no candidate needle.*openqa-dashboard.*matched" size:S added
Actions #23

Updated by jbaier_cz 4 months ago

Now the failure should be not sporadic anymore as in https://openqa.opensuse.org/tests/4423883#step/test_running/8 :) The fix is already merged so as soon as the fixed openQA package is installable from our repositories this should go away.

Actions #24

Updated by jbaier_cz 4 months ago

Because I am blind and introduced a typo during rebase, we need another fix: https://github.com/os-autoinst/openQA/pull/5885

Actions #25

Updated by openqa_review 4 months ago

  • Due date set to 2024-09-06

Setting due date based on mean cycle time of SUSE QE Tools

Actions #26

Updated by tinita 4 months ago

  • Status changed from In Progress to Resolved

I would say this is fixed as part of #155413

Actions

Also available in: Atom PDF