tickets #54977

kubernetes access broken again

Added by jberry 7 months ago. Updated 18 days ago.

Status:FeedbackStart date:01/08/2019
Priority:HighDue date:29/02/2020
Assignee:kbabioch% Done:


Category:General infrastructure
Target version:-
Duration: 152


A repeat of #51059.

$ kubectl api-versions
error: couldn't get available api versions from server: Get failed to refresh token: oauth2: cannot fetch token: 500 Internal Server Error
Response: {"error":"server_error"}

Assuming there are some related issues as the current deployment is not entirely working.


#1 Updated by jberry 7 months ago

  • Private changed from Yes to No

#2 Updated by kbabioch 7 months ago

  • Assignee set to kbabioch

Hi jberry,

last time, if I remember correctly, the problem was related to disks running full. I'm currently on, but so far the status of the cluster looks fine:

caasp-admin:~ # kubectl get nodes --all-namespaces
caasp-master1 Ready master 1y v1.10.11
caasp-master2 Ready master 1y v1.10.11
caasp-master3 Ready master 1y v1.10.11
caasp-worker1 Ready 1y v1.10.11
caasp-worker2 Ready 1y v1.10.11
caasp-worker3 Ready 1y v1.10.11
caasp-worker4 Ready 1y v1.10.11

caasp-admin:~ # kubectl cluster-info
Kubernetes master is running at https://api.infra.caasp.local:6443
Dex is running at https://api.infra.caasp.local:6443/api/v1/namespaces/kube-system/services/dex:dex/proxy
KubeDNS is running at https://api.infra.caasp.local:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
Tiller is running at https://api.infra.caasp.local:6443/api/v1/namespaces/kube-system/services/tiller:tiller/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

What does not workcaasp-admin:~ # zypper dup
Warning: You are about to do a distribution upgrade with all enabled repositories. Make sure these repositories are compatible before you continue. See 'man zypper' for more information about this command.
Refreshing service 'SUSE_CaaS_Platform_3.0_x86_64'.
Permission to access '' denied.
is the zypper upddate:

Unfortunately I'm not too familiar with setting up clusters like this. Can only try to get in touch with the CaaSP team next week.

#3 Updated by jberry 7 months ago

I still get the same error locally, although I can access the k8s dashboard via the token I had used previously.

#4 Updated by kbabioch 6 months ago

At least we identified that the Velum container is not running on

Aug 29 14:01:21 caasp-admin hyperkube[2276]: I0829 14:01:21.293893 2276 kuberuntime_manager.go:757] checking backoff for container "velum-mariadb" in pod "velum-private-"
Aug 29 14:01:21 caasp-admin hyperkube[2276]: I0829 14:01:21.294762 2276 kuberuntime_manager.go:767] Back-off 20s restarting failed container=velum-mariadb pod=velum-private-
Aug 29 14:01:21 caasp-admin hyperkube[2276]: E0829 14:01:21.294826 2276 pod_workers.go:186] Error syncing pod 129e6359555d04e433543a00b8dde025 ("velum-private-"), skipping: failed to "StartContainer" for "velum-mariadb" with CrashLoopBackOff: "Back-off 20s restarting failed container=velum-mariadb pod=velum-private-"

7aa8ec77b682 3e53833cc4c0 " /usr/…" 7 seconds ago Up 7 seconds k8s_velum-mariadb_velum-private-
c3f338023753 3e53833cc4c0 " /usr/…" About a minute ago Exited (1) 49 seconds ago k8s_velum-mariadb_velum-private-
a253da85d9e3 5b825d06ba53 "sh -c 'umask 377; i…" 3 minutes ago Exited (0) 3 minutes ago k8s_mariadb-secrets_velum-private-
487b68141214 3e53833cc4c0 "/" 3 minutes ago Up 3 minutes k8s_mariadb-user-secrets_velum-public-
add95a4a0709 sles12/pause:1.0.0 "/usr/share/suse-doc…" 3 minutes ago Up 3 minutes k8s_POD_velum-public-
6fe4b866c69d sles12/pause:1.0.0 "/usr/share/suse-doc…" 3 minutes ago Up 3 minutes k8s_POD_velum-private-
a175ddede17f da3d576af7a2 "bash /usr/local/bin…" 3 minutes ago Up 3 minutes k8s_haproxy_haproxy-
b927a8206174 sles12/pause:1.0.0 "/usr/share/suse-doc…" 3 minutes ago Up 3 minutes k8s_POD_haproxy-
6cbce7a6ed72 da3d576af7a2 "bash /usr/local/bin…" 7 weeks ago Exited (137) 5 minutes ago k8s_haproxy_haproxy-
ca7831ce1807 sles12/pause:1.0.0 "/usr/share/suse-doc…" 7 weeks ago Exited (0) 5 minutes ago k8s_POD_haproxy-
f5033071c4a7 5b825d06ba53 " bundl…" 7 weeks ago Exited (1) 7 weeks ago k8s_velum-event-processor_velum-public-
18d6bc9bc0fe d0c49ae0ad52 "/usr/local/bin/entr…" 3 months ago Exited (0) 7 weeks ago k8s_openldap_velum-public-
6cd685172a71 5b825d06ba53 " bundl…" 3 months ago Exited (1) 7 weeks ago k8s_velum-api_velum-public-
0655fdad4944 5b825d06ba53 " bin/r…" 3 months ago Exited (137) 7 weeks ago k8s_velum-dashboard_velum-public-
bc51e2895142 b66c8f309faf "" 3 months ago Exited (137) 7 weeks ago k8s_salt-minion-ca_velum-public-
93005d51f344 2e0c00c42883 "salt-api" 3 months ago Exited (0) 7 weeks ago k8s_salt-api_velum-public-
99d03bcb6806 22d737cbcffb " salt-…" 3 months ago Exited (0) 7 weeks ago k8s_salt-master_velum-public-
201350bde254 sles12/pause:1.0.0 "/usr/share/suse-doc…" 3 months ago Exited (0) 7 weeks ago k8s_POD_velum-public-

#5 Updated by lrupp about 1 month ago

  • Category set to General infrastructure

#6 Updated by lrupp about 1 month ago

  • Status changed from New to Feedback

What is the status here? - The Prio is high and nothing happens after 5 months?

Beside the mentioned problem: the caasp machines are running unsupported Leap 42.3 and need an upgrade anyway. Is someone taking care of this?

Otherwise it might be a good idea to free up resources and shut the whole cluster down...

#7 Updated by lrupp 18 days ago

  • Due date set to 29/02/2020

Last chance, as it seems there is not interest I the cluster any more:

I will shut down all machines in the cluster end of February 2020. All services running on this cluster will not be available any longer after 2020-02-29.


Also available in: Atom PDF