action #68785
closed[monitoring] Setup of QA generic monitoring instance
0%
Updated by okurz over 4 years ago
What I did:
- Connect with virt-manager to to qsf-cluster.qa.suse.de
- Configure new machine with virt-install, network install loading from download.opensuse.org/tumbleweed/repo/oss/
- 4 cores, 8GB RAM, name "stats", description "stats (Maintainer: okurz@suse.de)", 40GB new storage, kernel options
autoyast=https://w3.suse.de/~okurz/ay.xml
- content of ay.xml:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE profile>
<profile xmlns="http://www.suse.com/1.0/yast2ns" xmlns:config="http://www.suse.com/1.0/configns">
<general>
<mode>
<confirm config:type="boolean">false</confirm>
</mode>
</general>
<bootloader>
<global>
<timeout config:type="integer">0</timeout>
</global>
</bootloader>
<networking>
<keep_install_network config:type="boolean">true</keep_install_network>
</networking>
<software>
<install_recommended config:type="boolean">true</install_recommended>
<products config:type="list">
<product>openSUSE</product>
</products>
<packages config:type="list">
<package>openssh</package>
<package>sudo</package>
</packages>
</software>
<user_defaults>
<expire/>
<group>100</group>
<groups/>
<home>/home</home>
<inactive>-1</inactive>
<no_groups config:type="boolean">true</no_groups>
<shell>/bin/bash</shell>
<skel>/etc/skel</skel>
<umask>022</umask>
</user_defaults>
<users config:type="list">
<user>
<username>root</username>
<user_password>$6$OHtabasWX3LK$dzWQazasWNgjg8h5afcT9ZtQltxDpkiDYZFzMOdg2f2frJ7euW10b4kHVvABPx8KxN4BbChgqja.tiZJ63ks41</user_password>
<encrypted config:type="boolean">true</encrypted>
</user>
<user>
<authorized_keys config:type="list">
<authorized_key>ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAILAtWUGdPW5LO1rMqVULy0VWKJ4ba+y2uglpi3gaZvuB okurz@linux-28d6.suse</authorized_key>
</authorized_keys>
<encrypted config:type="boolean">true</encrypted>
<fullname>Oliver Kurz</fullname>
<gid>100</gid>
<home>/home/okurz</home>
<home_btrfs_subvolume config:type="boolean">false</home_btrfs_subvolume>
<shell>/bin/bash</shell>
<uid>1000</uid>
<user_password>$6$OHtabasWX3LK$dzWQazasWNgjg8h5afcT9ZtQltxDpkiDYZFzMOdg2f2frJ7euW10b4kHVvABPx8KxN4BbChgqja.tiZJ63ks41</user_password>
<username>okurz</username>
</user>
</users>
<groups config:type="list">
<group>
<groupname>wheel</groupname>
<userlist>okurz</userlist>
</group>
</groups>
<partitioning config:type="list">
<drive>
<initialize config:type="boolean">true</initialize>
<partitions config:type="list">
<partition>
<mount>/</mount>
<size>max</size>
<filesystem config:type="symbol">btrfs</filesystem>
</partition>
<partition>
<mount>swap</mount>
<size>auto</size>
</partition>
</partitions>
</drive>
</partitioning>
<ntp-client>
<ntp_policy>auto</ntp_policy>
<ntp_servers config:type="list">
<ntp_server>
<address>2.opensuse.pool.ntp.org</address>
<iburst config:type="boolean">true</iburst>
<offline config:type="boolean">false</offline>
</ntp_server>
</ntp_servers>
<ntp_sync>manual</ntp_sync>
</ntp-client>
<scripts>
<post-scripts config:type="list">
<script>
<filename>setup.sh</filename>
<interpreter>shell</interpreter>
<debug config:type="boolean">true</debug>
<source><![CDATA[
echo '%wheel ALL=(ALL) NOPASSWD: ALL' >>/etc/sudoers
echo '0 3 * * 0 root zypper -n dup --replacefiles --auto-agree-with-licenses --force-resolution --download-in-advance' >> /etc/cron.d/auto-update
systemctl enable --now sshd
zypper -n ar -f http://download.opensuse.org/tumbleweed/repo/non-oss/ repo-non-oss
zypper -n ar -f http://download.opensuse.org/tumbleweed/repo/oss/ repo-oss
zypper -n ar -f http://download.opensuse.org/update/tumbleweed/ repo-update
curl -sfL https://get.k3s.io | sh -
]]></source>
</script>
</post-scripts>
</scripts>
<firewall>
<enable_firewall config:type="boolean">true</enable_firewall>
<start_firewall config:type="boolean">true</start_firewall>
<FW_CONFIGURATIONS_EXT>sshd</FW_CONFIGURATIONS_EXT>
</firewall>
<ssh_import>
<import config:type="boolean">true</import>
<device>/dev/vda2</device>
</ssh_import>
<timezone>
<hwclock>UTC</hwclock>
<timezone>Europe/Berlin</timezone>
</timezone>
</profile>
Machine is accessible now but without a nice DNS name yet. For now it's 1c036.qa.suse.de (stats.qa.suse.de is already used by "snipe-vm", purpose unknown). k8s can be used:
k3s check-config
k3s kubectl get node
k3s kubectl create deployment hello-node --image=k8s.gcr.io/echoserver:1.4
k3s kubectl get deployments
k3s kubectl get pods
k3s kubectl get events
k3s kubectl config view
k3s kubectl expose deployment hello-node --type=LoadBalancer --port=8080
k3s kubectl get services
curl http://localhost:8080
3s kubectl delete service hello-node
k3s kubectl delete deployment hello-node
Setup helm on the same machine with curl -s https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash
as per https://helm.sh/docs/intro/install/ , configured as per https://rancher.com/docs/k3s/latest/en/cluster-access/ with
export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
kubectl get pods --all-namespaces
helm ls --all-namespaces
Then installed grafana:
helm repo add bitnami https://charts.bitnami.com/bitnami
helm install my-release --set admin.password=susetesting bitnami/grafana
Also https://kubeapps.com/ looks fancy. Installed following https://github.com/kubeapps/kubeapps/blob/master/docs/user/getting-started.md as well as rancher with https://rancher.com/docs/rancher/v2.x/en/installation/k8s-install/helm-rancher/ 'cause why not :)
I failed to manually setup the right parameters to make the grafana instance accessible from outside so I did
helm install mygrafana --set admin.password=susetesting --set service.type=LoadBalancer bitnami/grafana
which makes grafana available on http://1c036.qa.suse.de:3000
I followed https://randy-stad.gitlab.io/posts/2020-01-29-k3s-traefik-dashboard/ to have a nice dashboard for traefik but so far I did not manage to provide nicer DNS names. But first https://randy-stad.gitlab.io/posts/2020-01-29-k3s-traefik-dashboard/ for the name of the host itself.
Updated by okurz about 4 years ago
- Due date set to 2020-07-24
- Status changed from In Progress to Feedback
waiting until after next QA SLE metrics workshop to decide how to go on.
Updated by okurz about 4 years ago
- Due date deleted (
2020-07-24)
We had the QA SLE metrics workshop but szarate+jorauch could not yet show something on grafana instances for now. We shortly discussed the question what instance to use.
@szarate, @jorauch (added as watchers) I recommend we can use both personal instances (that you set up) as well as http://1c036.qa.suse.de:3000/ for experimentation and https://stats.openqa-monitor.qa.suse.de/ as our main production instance where we can ensure proper provisioning using https://gitlab.suse.de/openqa/salt-states-openqa/-/tree/master/openqa/monitoring . We can also configure a CNAME entry for the host to be less openqa-centric. WDYT?
EDIT: Added the CNAME proposal in https://gitlab.suse.de/qa-sle/qanet-configs/-/merge_requests/12
Updated by jorauch about 4 years ago
- Related to action #68758: [functional][qe-core] create grafana instance for mean cycle and lead times added
Updated by okurz about 4 years ago
- Status changed from Feedback to Workable
- Assignee deleted (
okurz)
DNS name merged in https://gitlab.suse.de/qa-sle/qanet-configs/-/merge_requests/12 but I think the grafana instance (or nginx?) does not answer to the new hostname yet.
Updated by okurz about 4 years ago
- Subject changed from Setup of QA generic monitoring instance to [monitoring] Setup of QA generic monitoring instance
Updated by okurz about 4 years ago
certificate problems resolved in #69613 . https://monitor.qa.suse.de also works now but so far http://monitor.qa.suse.de does not redirect. That should be done next.
Updated by okurz about 4 years ago
task from last metrics workshop meeting: Include a description on the home dashboard. I shortly looked up how the home dashboard can be changed. I found a way to change content of the main text window but could not save it. Also it seems as if the home dashboard can not come from provisioning directly. We could also include a link to the qa metrics internal wiki page.
References¶
Updated by okurz about 4 years ago
- Status changed from Workable to In Progress
- Assignee set to okurz
Trying to include the following HTML on the home dashboard:
<div class="text-center dashboard-header">
<span>Home Dashboard for SUSE QA</span>
<p>
Monitoring, Alerting, Trending for SUSE QA. Mainly used by the team <a href="https://progress.opensuse.org/projects/qa/wiki/Wiki#QA-tools-Team-description">SUSE QA Tools</a>.
</p>
<p>
Find the overall status of the openqa.suse.de (OSD) infrastructure on <a href"https://stats.openqa-monitor.qa.suse.de/d/4KkGdvvZk/osd-status-overview?orgId=1">OSD status overview</a>
</p>
<p>
Please find more information for on the <a href="https://confluence.suse.com/display/qasle/QA+Metrics">QA Metrics</a> page.
</p>
</div>
but I realized I can do markdown as well. The header isn't that fancy but still I prefer markdown:
# Home Dashboard for SUSE QA
Monitoring, Alerting, Trending for SUSE QA. Mainly used by the team [SUSE QA Tools](https://progress.opensuse.org/projects/qa/wiki/Wiki#QA-tools-Team-description).
Find the overall status of the openqa.suse.de (OSD) infrastructure on [OSD status overview](https://stats.openqa-monitor.qa.suse.de/d/4KkGdvvZk/osd-status-overview?orgId=1).
Please find more information for on the [QA Metrics](https://confluence.suse.com/display/qasle/QA+Metrics) page.
For this I saved in the preferences of the home dashboard a copy as "Home Copy" and changed that. I can also save that dashboard however it does not seem to be used as the first entry point
Updated by okurz about 4 years ago
- Status changed from In Progress to Resolved
I needed to "star" the home dashboard and then I could select "Home" as the new dashboard for the organisation in https://stats.openqa-monitor.qa.suse.de/org . Updated the text to be smaller because previously it would only show the first three lines or so. This should suffice for now.
As there is also a complete backup of the grafana database with https://gitlab.suse.de/qa-sle/backup-server-salt/-/blob/master/rsnapshot/rsnapshot.conf#L30 I consider the ticket resolved.