action #39743: [o3][tools] o3 unusable, often responds with 504 Gateway Time-out - openQA Project (public) - openSUSE Project Management Tool

Actions

#1

Updated by okurz almost 7 years ago

/var/log/openqa reveals that a worker tries to connect with a mismatching timestamp.

okurz@ariel:~> sudo salt '*' cmd.run 'date'
power8.openqanet.opensuse.org:
    Wed Aug 15 04:44:44 UTC 2018
openqaworker4.openqanet.opensuse.org:
    Wed Aug 15 06:44:44 CEST 2018
openqaworker1.openqanet.opensuse.org:
    Wed Aug 15 06:44:20 CEST 2018
imagetester.openqanet.opensuse.org:
    Wed Aug 15 04:44:44 UTC 2018
openqa-aarch64:
    Wed Aug 15 06:44:44 CEST 2018

-> time on openqaworker1 is not in sync. The files /etc/ntp.conf have different content.

progress.infra.opensuse.org recently also had a time mismatch and ntp1.i.o.o was inactive. This was fixed by tampakrap. The revelant configuration part on progress.i.o.o is

server ntp1.infra.opensuse.org iburst
server ntp2.infra.opensuse.org iburst
server ntp3.infra.opensuse.org iburst
restrict ntp1.infra.opensuse.org
restrict ntp2.infra.opensuse.org
restrict ntp3.infra.opensuse.org

so I configured this on openqaworker1 as well now and brought time in sync. Does not seem to be the reason though. webui still unresponsive.

Still, connections with timestamp mismatch are reported for the IPv4 adresses of power8, openqaworker1 and openqaworker4, something else? Is it maybe that just worker services need to be restarted now?

I stopped worker instances on power8 and openqaworker1, this seemed to have helped, https://openqa.opensuse.org is reactive again. Retriggered latest incomplete openSUSE Tumbleweed and Leap tests.

Restarted worker instances on openqaworker4 as well which seems to have caused the webui to go unresponsive again. Stopped all and restarted only openqa-worker@{1..2}, will monitor for now.

Actions

Project

General

Profile

QA (public) » openQA Project (public)

Tags

Custom queries

action #39743

[o3][tools] o3 unusable, often responds with 504 Gateway Time-out

Updated by okurz almost 7 years ago

Updated by szarate almost 7 years ago

Updated by okurz almost 7 years ago

Updated by okurz almost 7 years ago

Updated by okurz almost 7 years ago

Updated by szarate almost 7 years ago

Updated by szarate almost 7 years ago

Updated by szarate almost 7 years ago

Updated by okurz almost 7 years ago

Updated by okurz almost 7 years ago

Updated by okurz almost 7 years ago

Updated by szarate almost 7 years ago

Updated by okurz almost 7 years ago

Updated by okurz almost 7 years ago

Updated by RBrownSUSE almost 7 years ago

Updated by RBrownSUSE almost 7 years ago

Updated by okurz almost 7 years ago

Updated by okurz almost 7 years ago

Updated by szarate almost 7 years ago

Updated by tampakrap almost 7 years ago

Updated by okurz almost 7 years ago

Updated by okurz almost 7 years ago

Updated by okurz almost 7 years ago

Updated by okurz almost 7 years ago

Updated by okurz almost 7 years ago

Updated by okurz almost 7 years ago

Updated by okurz almost 7 years ago

Updated by okurz almost 7 years ago

Updated by okurz almost 7 years ago

Updated by okurz almost 7 years ago

Updated by szarate almost 7 years ago

Updated by okurz almost 7 years ago

Updated by okurz almost 7 years ago

Updated by okurz almost 7 years ago

Updated by okurz almost 7 years ago

Updated by okurz almost 7 years ago

Updated by okurz almost 7 years ago

Updated by szarate almost 7 years ago

Updated by okurz almost 7 years ago

Updated by okurz almost 7 years ago

Updated by szarate almost 7 years ago

Updated by szarate almost 7 years ago

Updated by szarate almost 7 years ago

Updated by szarate almost 7 years ago

Updated by okurz almost 7 years ago

Updated by szarate almost 7 years ago

Updated by szarate almost 7 years ago

Updated by RBrownSUSE almost 7 years ago

Updated by okurz almost 7 years ago

Updated by okurz almost 7 years ago

Updated by okurz almost 7 years ago

Root Cause¶

Lessons Learned¶

Updated by szarate almost 7 years ago

Updated by okurz almost 7 years ago

Updated by okurz almost 7 years ago

Updated by okurz almost 7 years ago

Updated by szarate over 6 years ago

O3 stability issues and downtime post-mortem summary¶

Root cause analysis¶

Solution¶

Reasoning for deploying¶

Consequences after deploying¶

Lessons Learned¶

Updated by RBrownSUSE over 6 years ago

Updated by sebchlad over 6 years ago

Updated by coolo over 6 years ago

Updated by okurz 6 months ago