Project

General

Profile

Actions

action #166721

open

openQA Project (public) - coordination #58184: [saga][epic][use case] full version control awareness within openQA

openQA Project (public) - coordination #152847: [epic] version control awareness within openQA for test distributions

[alert] Waves of emails due to kex_exchange_identification: Connection closed by remote host errors size:S

Added by livdywan 3 months ago. Updated 15 days ago.

Status:
Workable
Priority:
Low
Assignee:
-
Category:
Regressions/Crashes
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Observation

Many emails with the subject Cron <geekotest@ariel> git -C /opt/os-autoinst-scripts pull --quiet --rebase origin master and Cron <geekotest@ariel> env updateall=1 force=1 /usr/share/openqa/script/fetchneedles:

kex_exchange_identification: Connection closed by remote host
Connection closed by 140.82.121.4 port 22
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

Suggestions

  • Replace such cron jobs with systemd timers
    • Add the timer definition to the git repo
    • Copy the service/timer to avoid it being changed if the git repo is rolled back
  • Rely on #164898 to deal with fetchneedles

Related issues 3 (0 open3 closed)

Related to openQA Project (public) - action #164898: Replace fetchneedles with a minion job for the regular update of git repos size:MResolvedtinita

Actions
Related to openQA Project (public) - action #166772: openqa-label-known-issues overrides size:SResolvedtinita2024-09-13

Actions
Copied from openQA Infrastructure (public) - action #166433: [alert] Waves of emails due to manual changes in /opt/openqa-trigger-from-obs size:SResolvedlivdywan

Actions
Actions #1

Updated by livdywan 3 months ago

  • Copied from action #166433: [alert] Waves of emails due to manual changes in /opt/openqa-trigger-from-obs size:S added
Actions #2

Updated by livdywan 3 months ago

  • Description updated (diff)
Actions #3

Updated by livdywan 3 months ago

  • Related to action #164898: Replace fetchneedles with a minion job for the regular update of git repos size:M added
Actions #4

Updated by livdywan 3 months ago

  • Status changed from New to In Progress
  • Assignee set to livdywan
  • Replace such cron jobs with systemd timers
    • Add the timer definition to the git repo
    • Copy the service/timer to avoid it being changed if the git repo is rolled back

I'll propose something like in #166433 for os-autoinst-scripts and then block on #164898.

Actions #5

Updated by livdywan 3 months ago

  • Status changed from In Progress to Feedback
Actions #6

Updated by livdywan 3 months ago

  • Status changed from Feedback to Blocked

Blocking on #164898

Actions #7

Updated by livdywan 3 months ago

livdywan wrote in #note-6:

Blocking on #164898

As this is tracked and in progress I modified the cron job o env updateall=1 force=1 /usr/share/openqa/script/fetchneedles || true so we don't see unactionable emails about temporary connectivity issues. The kex_exchange_identification: Connection closed by remote host are sporadic.

Actions #8

Updated by okurz 3 months ago

cron sends an email if there is any output so I replaced the || true with > /dev/null 2>&1 but it would be better to only exclude the very specific problematic lines when they are covered by retries which is probably necessary to be done within fetchneedles.

Actions #9

Updated by livdywan 3 months ago

okurz wrote in #note-8:

cron sends an email if there is any output so I replaced the || true with > /dev/null 2>&1 but it would be better to only exclude the very specific problematic lines when they are covered by retries which is probably necessary to be done within fetchneedles.

Thanks! Yes, that will be handled in the blocker. This is just temporary.

Actions #10

Updated by livdywan 3 months ago

Still pending on #164898 which I expect we'll address next week.

Actions #11

Updated by livdywan 3 months ago

livdywan wrote in #note-10:

Still pending on #164898 which I expect we'll address next week.

I'm still periodically checking via sudo -u geekotest env updateall=1 force=1 /usr/share/openqa/script/fetchneedles that needles are being fetched.

Actions #12

Updated by ybonatakis 3 months ago · Edited

#164898 is almost done. we dont expect any progress today due to the germans' holiday

Actions #13

Updated by ybonatakis 2 months ago · Edited

#164898 has deployed.
Last email with this subject was from 13/09. lets wait to see what Liv will find when she is back

Actions #14

Updated by livdywan 2 months ago

  • Priority changed from High to Low

See #164898#note-47 and further comments. I assume we're still waiting here. And maybe we can make this Low, since this isn't about major breakage but temporary issues on the remote end.

Actions #15

Updated by tinita about 1 month ago · Edited

livdywan wrote in #note-5:

https://github.com/os-autoinst/scripts/pull/346

I found out today that the scripts repo on o3 hadn't been updated since september 12. /etc/cron.d/os-autoinst-scripts-update-git had been removed around that time.
I found no mentioning of removing /etc/cron.d/os-autoinst-scripts-update-git in progress, so I enabled it again. But I assume you removed the file as part of this ticket, right?
Now I found this pull request. apparently that service never got installed on o3:

% systemctl status os-autoinst-scripts-update-git.service
Unit os-autoinst-scripts-update-git.service could not be found.

I'm currently working on the scripts repo, so please coordinate with me if you want to enable the cronjob/timer.

Actions #16

Updated by okurz about 1 month ago

  • Category set to Regressions/Crashes
  • Status changed from Blocked to Feedback

blocker resolved. Also let's be explicit to answer the open points raised by tinita

Actions #17

Updated by tinita about 1 month ago

  • Related to action #166772: openqa-label-known-issues overrides size:S added
Actions #18

Updated by okurz 23 days ago

  • Subject changed from [alert] Waves of emails due to kex_exchange_identification: Connection closed by remote host errors to [alert] Waves of emails due to kex_exchange_identification: Connection closed by remote host errors size:S
  • Description updated (diff)
Actions #19

Updated by livdywan 21 days ago

  • Status changed from Feedback to Blocked

tinita wrote in #note-15:

livdywan wrote in #note-5:

https://github.com/os-autoinst/scripts/pull/346

I found out today that the scripts repo on o3 hadn't been updated since september 12. /etc/cron.d/os-autoinst-scripts-update-git had been removed around that time.
I found no mentioning of removing /etc/cron.d/os-autoinst-scripts-update-git in progress, so I enabled it again. But I assume you removed the file as part of this ticket, right?
Now I found this pull request. apparently that service never got installed on o3:

% systemctl status os-autoinst-scripts-update-git.service
Unit os-autoinst-scripts-update-git.service could not be found.

I'm currently working on the scripts repo, so please coordinate with me if you want to enable the cronjob/timer.

So I checked with @tinita. The git_auto_update feature is not enabled yet and it seems we ended up without a ticket for that. Filed #170464 now.

Actions #20

Updated by okurz 21 days ago

  • Parent task set to #152847
Actions #21

Updated by livdywan 21 days ago

  • Status changed from Blocked to Workable

So I checked with @tinita. The git_auto_update feature is not enabled yet and it seems we ended up without a ticket for that. Filed #170464 now.

Apparently I misunderstood. This is enabled meaning this ticket is no longer blocked (see #170464#note-9).

Actions #22

Updated by tinita 15 days ago

  • Assignee deleted (livdywan)
Actions

Also available in: Atom PDF