Project

General

Profile

Actions

action #128822

closed

processes on qanet slow to execute despite low load, e.g. htop - do we have outdated addresses pointing to wotan where we should use different hosts?

Added by okurz over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
Start date:
2023-05-05
Due date:
% Done:

0%

Estimated time:

Description

Observation

nicksinger and me logged into qanet and called "htop" as okurz observed a slow ssh login when trying to work on #128654. Also time is off by 5m. But first things first. We found that the load is low, CPU usage is low, MEM low, I/O low but then we found that it's related to NFS and identified ypbind as culprit. After stopping the service ypbind the system was snappy again. But grep -R 10.160. /etc/ revealed some mentions of 10.160.0.1. Do we need to use a newer address in more places?

Actions #1

Updated by okurz over 1 year ago

  • Status changed from New to In Progress
  • Assignee set to okurz

We have picked the yp.conf config from wotan for qanet:

ypserver midgard2.suse.de                                                                                                                                          ypserver amor.suse.de

from https://suse.slack.com/archives/C029APBKLGK/p1683220624711789?thread_ts=1683218111.323909&cid=C029APBKLGK

Then installed chrony and copied over /etc/chrony.d/{pool,suse}.conf from qamaster.qa.suse.de and did

systemctl disable --now ntpd
systemctl enable --now chronyd

journalctl -u chronyd says "chronyd[24255]: System clock was stepped by 260.861843 seconds", time looks ok again.

Actions #2

Updated by okurz over 1 year ago

  • Due date set to 2023-05-19
  • Status changed from In Progress to Feedback

Reviewed our salt controlled machines if they have maybe an outdated ypserver entry:

okurz@openqa:~> sudo salt --no-color \* cmd.run 'grep 10.160 /etc/yp.conf'
worker9.oqa.suse.de:
    domain suse.de server 10.160.0.1
worker2.oqa.suse.de:
    domain suse.de server 10.160.0.1
powerqaworker-qam-1.qa.suse.de:
    domain suse.de server 10.160.0.1
malbec.arch.suse.de:
worker5.oqa.suse.de:
    domain suse.de server 10.160.0.1
openqaw5-xen.qa.suse.de:
    domain suse.de server 10.160.0.1
QA-Power8-4-kvm.qa.suse.de:
    domain suse.de server 10.160.0.1
storage.oqa.suse.de:
    domain suse.de server 10.160.0.1
baremetal-support:
    domain suse.de server 10.160.0.1
openqaworker1.qe.nue2.suse.org:
    domain suse.de server 10.160.0.1
qamasternue.qa.suse.de:
worker11.oqa.suse.de:
    domain suse.de server 10.160.0.1
worker6.oqa.suse.de:
    domain suse.de server 10.160.0.1
worker10.oqa.suse.de:
    domain suse.de server 10.160.0.1
openqaworker18.qa.suse.cz:
openqaworker16.qa.suse.cz:
worker13.oqa.suse.de:
    domain suse.de server 10.160.0.1
QA-Power8-5-kvm.qa.suse.de:
    domain suse.de server 10.160.0.1
worker12.oqa.suse.de:
    domain suse.de server 10.160.0.1
worker3.oqa.suse.de:
    domain suse.de server 10.160.0.1
grenache-1.qa.suse.de:
    domain suse.de server 10.160.0.1
worker8.oqa.suse.de:
    domain suse.de server 10.160.0.1
openqaworker14.qa.suse.cz:
openqa-monitor.qa.suse.de:
    domain suse.de server 10.160.0.1
openqaworker17.qa.suse.cz:
backup.qa.suse.de:
    domain suse.de server 10.160.0.150
    domain suse.de server 10.160.0.1
schort-server:
    domain suse.de server 10.160.0.1
tumblesle:
    domain suse.de server 10.160.0.1
jenkins.qa.suse.de:
    domain suse.de server 10.160.0.150
    domain suse.de server 10.160.0.1
openqa-piworker.qa.suse.de:
    domain suse.de server 10.160.0.1
openqaworker-arm-2.suse.de:
    domain suse.de server 10.160.0.1
openqaworker-arm-1.suse.de:
    domain suse.de server 10.160.0.1
openqa.suse.de:
openqaworker-arm-3.suse.de:
    domain suse.de server 10.160.0.1
ERROR: Minions returned with non-zero exit code
okurz@openqa:~> sudo salt --no-color \* cmd.run 'grep ypserver /etc/yp.conf'
storage.oqa.suse.de:
worker3.oqa.suse.de:
openqaworker17.qa.suse.cz:
openqaworker18.qa.suse.cz:
openqaworker1.qe.nue2.suse.org:
openqaworker16.qa.suse.cz:
worker5.oqa.suse.de:
worker2.oqa.suse.de:
openqaw5-xen.qa.suse.de:
openqaworker14.qa.suse.cz:
worker6.oqa.suse.de:
powerqaworker-qam-1.qa.suse.de:
QA-Power8-5-kvm.qa.suse.de:
qamasternue.qa.suse.de:
jenkins.qa.suse.de:
baremetal-support:
openqa-monitor.qa.suse.de:
QA-Power8-4-kvm.qa.suse.de:
tumblesle:
schort-server:
worker11.oqa.suse.de:
malbec.arch.suse.de:
worker12.oqa.suse.de:
worker10.oqa.suse.de:
backup.qa.suse.de:
worker13.oqa.suse.de:
openqa.suse.de:
grenache-1.qa.suse.de:
openqa-piworker.qa.suse.de:
openqaworker-arm-1.suse.de:
worker9.oqa.suse.de:
openqaworker-arm-2.suse.de:
worker8.oqa.suse.de:
openqaworker-arm-3.suse.de:
ERROR: Minions returned with non-zero exit code

looks good so nothing changing there. In the dhcpd config we found some IPv4 addresses specified and compared that to what we have on walter1.qe.nue2.suse.org where the setting is "option nis-servers wotan.suse.de,amor.suse.de;". We used "option nis-servers wotan.suse.de,amor.suse.de,midgard2.suse.de;", committed, pushed, etc.

Actions #3

Updated by okurz over 1 year ago

  • Due date deleted (2023-05-19)
  • Status changed from Feedback to Resolved

No problems regarding ypserv reported, qanet is still snappy, we are good.

Actions

Also available in: Atom PDF