tickets #54977: kubernetes access broken again - openSUSE admin - openSUSE Project Management Tool

Actions

Copy link

tickets #54977

closed

kubernetes access broken again

Added by jberry over 5 years ago. Updated almost 5 years ago.

Status:

Closed

Priority:

High

Assignee:

kbabioch

Category:

Core services and virtual infrastructure

Target version:

Start date:

2019-08-01

Due date:

2020-02-29

% Done:

100%

Estimated time:

Description

A repeat of #51059.

$ kubectl api-versions
error: couldn't get available api versions from server: Get https://caasp-master.infra.opensuse.org:6443/api?timeout=32s: failed to refresh token: oauth2: cannot fetch token: 500 Internal Server Error
Response: {"error":"server_error"}

Assuming there are some related issues as the current deployment is not entirely working.

Actions

Copy link

Updated by jberry over 5 years ago

Private changed from Yes to No

Actions

Copy link

Updated by kbabioch over 5 years ago

Assignee set to kbabioch

Hi jberry,

last time, if I remember correctly, the problem was related to disks running full. I'm currently on caasp-admin.infra.opensuse.org, but so far the status of the cluster looks fine:

caasp-admin:~ # kubectl get nodes --all-namespaces
NAME STATUS ROLES AGE VERSION
caasp-master1 Ready master 1y v1.10.11
caasp-master2 Ready master 1y v1.10.11
caasp-master3 Ready master 1y v1.10.11
caasp-worker1 Ready 1y v1.10.11
caasp-worker2 Ready 1y v1.10.11
caasp-worker3 Ready 1y v1.10.11
caasp-worker4 Ready 1y v1.10.11

caasp-admin:~ # kubectl cluster-info
Kubernetes master is running at https://api.infra.caasp.local:6443
Dex is running at https://api.infra.caasp.local:6443/api/v1/namespaces/kube-system/services/dex:dex/proxy
KubeDNS is running at https://api.infra.caasp.local:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
Tiller is running at https://api.infra.caasp.local:6443/api/v1/namespaces/kube-system/services/tiller:tiller/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

What does not workcaasp-admin:~ # zypper dup
Warning: You are about to do a distribution upgrade with all enabled repositories. Make sure these repositories are compatible before you continue. See 'man zypper' for more information about this command.
Refreshing service 'SUSE_CaaS_Platform_3.0_x86_64'.
Permission to access 'https://updates.suse.com/SUSE/Updates/SUSE-CAASP/3.0/x86_64/update/repodata/repomd.xml?LONG_STRING' denied.
is the zypper upddate:

Unfortunately I'm not too familiar with setting up clusters like this. Can only try to get in touch with the CaaSP team next week.

Actions

Copy link

Updated by jberry over 5 years ago

I still get the same error locally, although I can access the k8s dashboard via the token I had used previously.

Actions

Copy link

Updated by kbabioch over 5 years ago

At least we identified that the Velum container is not running on caasp-admin.infra.opensuse.org:

Aug 29 14:01:21 caasp-admin hyperkube[2276]: I0829 14:01:21.293893 2276 kuberuntime_manager.go:757] checking backoff for container "velum-mariadb" in pod "velum-private-127.0.0.1_default(129e6359555d04e433543a00b8dde025)"
Aug 29 14:01:21 caasp-admin hyperkube[2276]: I0829 14:01:21.294762 2276 kuberuntime_manager.go:767] Back-off 20s restarting failed container=velum-mariadb pod=velum-private-127.0.0.1_default(129e6359555d04e433543a00b8dde025)
Aug 29 14:01:21 caasp-admin hyperkube[2276]: E0829 14:01:21.294826 2276 pod_workers.go:186] Error syncing pod 129e6359555d04e433543a00b8dde025 ("velum-private-127.0.0.1_default(129e6359555d04e433543a00b8dde025)"), skipping: failed to "StartContainer" for "velum-mariadb" with CrashLoopBackOff: "Back-off 20s restarting failed container=velum-mariadb pod=velum-private-127.0.0.1_default(129e6359555d04e433543a00b8dde025)"

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7aa8ec77b682 3e53833cc4c0 "entrypoint.sh /usr/…" 7 seconds ago Up 7 seconds k8s_velum-mariadb_velum-private-127.0.0.1_default_129e6359555d04e433543a00b8dde025_17012
c3f338023753 3e53833cc4c0 "entrypoint.sh /usr/…" About a minute ago Exited (1) 49 seconds ago k8s_velum-mariadb_velum-private-127.0.0.1_default_129e6359555d04e433543a00b8dde025_17011
a253da85d9e3 5b825d06ba53 "sh -c 'umask 377; i…" 3 minutes ago Exited (0) 3 minutes ago k8s_mariadb-secrets_velum-private-127.0.0.1_default_129e6359555d04e433543a00b8dde025_5
487b68141214 3e53833cc4c0 "/setup-mysql.sh" 3 minutes ago Up 3 minutes k8s_mariadb-user-secrets_velum-public-127.0.0.1_default_a391db3c1de70a74b3e4c8032d9c08aa_5
add95a4a0709 sles12/pause:1.0.0 "/usr/share/suse-doc…" 3 minutes ago Up 3 minutes k8s_POD_velum-public-127.0.0.1_default_a391db3c1de70a74b3e4c8032d9c08aa_5
6fe4b866c69d sles12/pause:1.0.0 "/usr/share/suse-doc…" 3 minutes ago Up 3 minutes k8s_POD_velum-private-127.0.0.1_default_129e6359555d04e433543a00b8dde025_5
a175ddede17f da3d576af7a2 "bash /usr/local/bin…" 3 minutes ago Up 3 minutes k8s_haproxy_haproxy-127.0.0.1_kube-system_40582128b581e009d37995bbc5e9bdf2_1
b927a8206174 sles12/pause:1.0.0 "/usr/share/suse-doc…" 3 minutes ago Up 3 minutes k8s_POD_haproxy-127.0.0.1_kube-system_40582128b581e009d37995bbc5e9bdf2_1
6cbce7a6ed72 da3d576af7a2 "bash /usr/local/bin…" 7 weeks ago Exited (137) 5 minutes ago k8s_haproxy_haproxy-127.0.0.1_kube-system_40582128b581e009d37995bbc5e9bdf2_0
ca7831ce1807 sles12/pause:1.0.0 "/usr/share/suse-doc…" 7 weeks ago Exited (0) 5 minutes ago k8s_POD_haproxy-127.0.0.1_kube-system_40582128b581e009d37995bbc5e9bdf2_0
f5033071c4a7 5b825d06ba53 "entrypoint.sh bundl…" 7 weeks ago Exited (1) 7 weeks ago k8s_velum-event-processor_velum-public-127.0.0.1_default_a391db3c1de70a74b3e4c8032d9c08aa_1169
18d6bc9bc0fe d0c49ae0ad52 "/usr/local/bin/entr…" 3 months ago Exited (0) 7 weeks ago k8s_openldap_velum-public-127.0.0.1_default_a391db3c1de70a74b3e4c8032d9c08aa_4
6cd685172a71 5b825d06ba53 "entrypoint.sh bundl…" 3 months ago Exited (1) 7 weeks ago k8s_velum-api_velum-public-127.0.0.1_default_a391db3c1de70a74b3e4c8032d9c08aa_3
0655fdad4944 5b825d06ba53 "entrypoint.sh bin/r…" 3 months ago Exited (137) 7 weeks ago k8s_velum-dashboard_velum-public-127.0.0.1_default_a391db3c1de70a74b3e4c8032d9c08aa_3
bc51e2895142 b66c8f309faf "salt-minion.sh" 3 months ago Exited (137) 7 weeks ago k8s_salt-minion-ca_velum-public-127.0.0.1_default_a391db3c1de70a74b3e4c8032d9c08aa_3
93005d51f344 2e0c00c42883 "salt-api" 3 months ago Exited (0) 7 weeks ago k8s_salt-api_velum-public-127.0.0.1_default_a391db3c1de70a74b3e4c8032d9c08aa_3
99d03bcb6806 22d737cbcffb "entrypoint.sh salt-…" 3 months ago Exited (0) 7 weeks ago k8s_salt-master_velum-public-127.0.0.1_default_a391db3c1de70a74b3e4c8032d9c08aa_3
201350bde254 sles12/pause:1.0.0 "/usr/share/suse-doc…" 3 months ago Exited (0) 7 weeks ago k8s_POD_velum-public-127.0.0.1_default_a391db3c1de70a74b3e4c8032d9c08aa_3

Actions

Copy link

Updated by lrupp about 5 years ago

Category set to Core services and virtual infrastructure

Actions

Copy link

Updated by lrupp about 5 years ago

Status changed from New to Feedback

What is the status here? - The Prio is high and nothing happens after 5 months?

Beside the mentioned problem: the caasp machines are running unsupported Leap 42.3 and need an upgrade anyway. Is someone taking care of this?

Otherwise it might be a good idea to free up resources and shut the whole cluster down...

Actions

Copy link

Updated by lrupp almost 5 years ago

Due date set to 2020-02-29

Last chance, as it seems there is not interest I the cluster any more:

I will shut down all machines in the cluster end of February 2020. All services running on this cluster will not be available any longer after 2020-02-29.

Lars

Actions

Copy link

Updated by lrupp almost 5 years ago

Status changed from Feedback to Closed
% Done changed from 0 to 100

All CaaSP machines are now destroyed (virsh destroy), their virtual machine configuration and storage space is still there. If nobody speaks up, I will destroy these last bits as well at the end of March 2020.

Closing this ticket here. RIP CaaSP 3.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

openSUSE admin

Tags

Custom queries

tickets #54977

kubernetes access broken again

Updated by jberry over 5 years ago

Updated by kbabioch over 5 years ago

Updated by jberry over 5 years ago

Updated by kbabioch over 5 years ago

Updated by lrupp about 5 years ago

Updated by lrupp about 5 years ago

Updated by lrupp almost 5 years ago

Updated by lrupp almost 5 years ago