Project

General

Profile

Actions

tickets #54977

closed

kubernetes access broken again

Added by jberry over 4 years ago. Updated about 4 years ago.

Status:
Closed
Priority:
High
Assignee:
Category:
Core services and virtual infrastructure
Target version:
-
Start date:
2019-08-01
Due date:
2020-02-29
% Done:

100%

Estimated time:

Description

A repeat of #51059.

$ kubectl api-versions
error: couldn't get available api versions from server: Get https://caasp-master.infra.opensuse.org:6443/api?timeout=32s: failed to refresh token: oauth2: cannot fetch token: 500 Internal Server Error
Response: {"error":"server_error"}

Assuming there are some related issues as the current deployment is not entirely working.

Actions #1

Updated by jberry over 4 years ago

  • Private changed from Yes to No
Actions #2

Updated by kbabioch over 4 years ago

  • Assignee set to kbabioch

Hi jberry,

last time, if I remember correctly, the problem was related to disks running full. I'm currently on caasp-admin.infra.opensuse.org, but so far the status of the cluster looks fine:


caasp-admin:~ # kubectl get nodes --all-namespaces
NAME STATUS ROLES AGE VERSION
caasp-master1 Ready master 1y v1.10.11
caasp-master2 Ready master 1y v1.10.11
caasp-master3 Ready master 1y v1.10.11
caasp-worker1 Ready 1y v1.10.11
caasp-worker2 Ready 1y v1.10.11
caasp-worker3 Ready 1y v1.10.11
caasp-worker4 Ready 1y v1.10.11


caasp-admin:~ # kubectl cluster-info
Kubernetes master is running at https://api.infra.caasp.local:6443
Dex is running at https://api.infra.caasp.local:6443/api/v1/namespaces/kube-system/services/dex:dex/proxy
KubeDNS is running at https://api.infra.caasp.local:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
Tiller is running at https://api.infra.caasp.local:6443/api/v1/namespaces/kube-system/services/tiller:tiller/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

What does not workcaasp-admin:~ # zypper dup
Warning: You are about to do a distribution upgrade with all enabled repositories. Make sure these repositories are compatible before you continue. See 'man zypper' for more information about this command.
Refreshing service 'SUSE_CaaS_Platform_3.0_x86_64'.
Permission to access 'https://updates.suse.com/SUSE/Updates/SUSE-CAASP/3.0/x86_64/update/repodata/repomd.xml?LONG_STRING' denied.
is the zypper upddate:

Unfortunately I'm not too familiar with setting up clusters like this. Can only try to get in touch with the CaaSP team next week.

Actions #3

Updated by jberry over 4 years ago

I still get the same error locally, although I can access the k8s dashboard via the token I had used previously.

Actions #4

Updated by kbabioch over 4 years ago

At least we identified that the Velum container is not running on caasp-admin.infra.opensuse.org:

Aug 29 14:01:21 caasp-admin hyperkube[2276]: I0829 14:01:21.293893 2276 kuberuntime_manager.go:757] checking backoff for container "velum-mariadb" in pod "velum-private-127.0.0.1_default(129e6359555d04e433543a00b8dde025)"
Aug 29 14:01:21 caasp-admin hyperkube[2276]: I0829 14:01:21.294762 2276 kuberuntime_manager.go:767] Back-off 20s restarting failed container=velum-mariadb pod=velum-private-127.0.0.1_default(129e6359555d04e433543a00b8dde025)
Aug 29 14:01:21 caasp-admin hyperkube[2276]: E0829 14:01:21.294826 2276 pod_workers.go:186] Error syncing pod 129e6359555d04e433543a00b8dde025 ("velum-private-127.0.0.1_default(129e6359555d04e433543a00b8dde025)"), skipping: failed to "StartContainer" for "velum-mariadb" with CrashLoopBackOff: "Back-off 20s restarting failed container=velum-mariadb pod=velum-private-127.0.0.1_default(129e6359555d04e433543a00b8dde025)"


CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7aa8ec77b682 3e53833cc4c0 "entrypoint.sh /usr/…" 7 seconds ago Up 7 seconds k8s_velum-mariadb_velum-private-127.0.0.1_default_129e6359555d04e433543a00b8dde025_17012
c3f338023753 3e53833cc4c0 "entrypoint.sh /usr/…" About a minute ago Exited (1) 49 seconds ago k8s_velum-mariadb_velum-private-127.0.0.1_default_129e6359555d04e433543a00b8dde025_17011
a253da85d9e3 5b825d06ba53 "sh -c 'umask 377; i…" 3 minutes ago Exited (0) 3 minutes ago k8s_mariadb-secrets_velum-private-127.0.0.1_default_129e6359555d04e433543a00b8dde025_5
487b68141214 3e53833cc4c0 "/setup-mysql.sh" 3 minutes ago Up 3 minutes k8s_mariadb-user-secrets_velum-public-127.0.0.1_default_a391db3c1de70a74b3e4c8032d9c08aa_5
add95a4a0709 sles12/pause:1.0.0 "/usr/share/suse-doc…" 3 minutes ago Up 3 minutes k8s_POD_velum-public-127.0.0.1_default_a391db3c1de70a74b3e4c8032d9c08aa_5
6fe4b866c69d sles12/pause:1.0.0 "/usr/share/suse-doc…" 3 minutes ago Up 3 minutes k8s_POD_velum-private-127.0.0.1_default_129e6359555d04e433543a00b8dde025_5
a175ddede17f da3d576af7a2 "bash /usr/local/bin…" 3 minutes ago Up 3 minutes k8s_haproxy_haproxy-127.0.0.1_kube-system_40582128b581e009d37995bbc5e9bdf2_1
b927a8206174 sles12/pause:1.0.0 "/usr/share/suse-doc…" 3 minutes ago Up 3 minutes k8s_POD_haproxy-127.0.0.1_kube-system_40582128b581e009d37995bbc5e9bdf2_1
6cbce7a6ed72 da3d576af7a2 "bash /usr/local/bin…" 7 weeks ago Exited (137) 5 minutes ago k8s_haproxy_haproxy-127.0.0.1_kube-system_40582128b581e009d37995bbc5e9bdf2_0
ca7831ce1807 sles12/pause:1.0.0 "/usr/share/suse-doc…" 7 weeks ago Exited (0) 5 minutes ago k8s_POD_haproxy-127.0.0.1_kube-system_40582128b581e009d37995bbc5e9bdf2_0
f5033071c4a7 5b825d06ba53 "entrypoint.sh bundl…" 7 weeks ago Exited (1) 7 weeks ago k8s_velum-event-processor_velum-public-127.0.0.1_default_a391db3c1de70a74b3e4c8032d9c08aa_1169
18d6bc9bc0fe d0c49ae0ad52 "/usr/local/bin/entr…" 3 months ago Exited (0) 7 weeks ago k8s_openldap_velum-public-127.0.0.1_default_a391db3c1de70a74b3e4c8032d9c08aa_4
6cd685172a71 5b825d06ba53 "entrypoint.sh bundl…" 3 months ago Exited (1) 7 weeks ago k8s_velum-api_velum-public-127.0.0.1_default_a391db3c1de70a74b3e4c8032d9c08aa_3
0655fdad4944 5b825d06ba53 "entrypoint.sh bin/r…" 3 months ago Exited (137) 7 weeks ago k8s_velum-dashboard_velum-public-127.0.0.1_default_a391db3c1de70a74b3e4c8032d9c08aa_3
bc51e2895142 b66c8f309faf "salt-minion.sh" 3 months ago Exited (137) 7 weeks ago k8s_salt-minion-ca_velum-public-127.0.0.1_default_a391db3c1de70a74b3e4c8032d9c08aa_3
93005d51f344 2e0c00c42883 "salt-api" 3 months ago Exited (0) 7 weeks ago k8s_salt-api_velum-public-127.0.0.1_default_a391db3c1de70a74b3e4c8032d9c08aa_3
99d03bcb6806 22d737cbcffb "entrypoint.sh salt-…" 3 months ago Exited (0) 7 weeks ago k8s_salt-master_velum-public-127.0.0.1_default_a391db3c1de70a74b3e4c8032d9c08aa_3
201350bde254 sles12/pause:1.0.0 "/usr/share/suse-doc…" 3 months ago Exited (0) 7 weeks ago k8s_POD_velum-public-127.0.0.1_default_a391db3c1de70a74b3e4c8032d9c08aa_3

Actions #5

Updated by lrupp over 4 years ago

  • Category set to Core services and virtual infrastructure
Actions #6

Updated by lrupp over 4 years ago

  • Status changed from New to Feedback

What is the status here? - The Prio is high and nothing happens after 5 months?

Beside the mentioned problem: the caasp machines are running unsupported Leap 42.3 and need an upgrade anyway. Is someone taking care of this?

Otherwise it might be a good idea to free up resources and shut the whole cluster down...

Actions #7

Updated by lrupp about 4 years ago

  • Due date set to 2020-02-29

Last chance, as it seems there is not interest I the cluster any more:

I will shut down all machines in the cluster end of February 2020. All services running on this cluster will not be available any longer after 2020-02-29.

Lars

Actions #8

Updated by lrupp about 4 years ago

  • Status changed from Feedback to Closed
  • % Done changed from 0 to 100

All CaaSP machines are now destroyed (virsh destroy), their virtual machine configuration and storage space is still there. If nobody speaks up, I will destroy these last bits as well at the end of March 2020.

Closing this ticket here. RIP CaaSP 3.

Actions

Also available in: Atom PDF