openSUSE Project Management Tool: Issueshttps://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842019-06-12T13:48:56ZopenSUSE Project Management Tool
Redmine openQA Tests - coordination #52949 (Resolved): [qe-core][RPi3][epic] SLES on RaspberryPihttps://progress.opensuse.org/issues/529492019-06-12T13:48:56Zthehejikthehejik@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>We introduced openQA tests for RPi3 (using general aarch64 QEMU workers) on SLE15-SP1 product which are using only textmode "initialization" process based on jeos-firstboot services (as other JeOSes does).</p>
<p>But now we started with SLE12-SP5 which is using original SLE12 method with X11 graphical yast2-firstboot service. And of course the JeOS tests are failing in openQA then, see <a href="https://openqa.suse.de/tests/2972221" class="external">https://openqa.suse.de/tests/2972221</a></p>
<p>So basically we would need to add support for JeOS openQA tests to be able test graphical as well as textmode image initialization methods for upcoming products like SLE12-SP5 and SLE15-SP2 and up.</p>
<p>There is a small chance we will switch all upcoming SLE12 images to use the textmode jeos-firstboot but it's still not clear so please stay tuned - afaerber said on 12.06.2019: I raised the topic for the PRD but am not aware of a clarification yet.</p>
<p>Until now tests were performed manually on milestone candidates following the instructions from <a href="https://bugzilla.suse.com/tr_show_plan.cgi?plan_id=6328" class="external">https://bugzilla.suse.com/tr_show_plan.cgi?plan_id=6328</a></p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> Automatic testing for SLES on Raspberry Pi is covered for Tumbleweed and SLE15-SP3</li>
<li><strong>AC2:</strong> Sub-Tasks are resolved</li>
</ul>
<a name="Further-information"></a>
<h2 >Further information<a href="#Further-information" class="wiki-anchor">¶</a></h2>
<ul>
<li>The disk image is generated reusing JeOS setup in OBS, so the behavior is similar: <a href="https://openqa.opensuse.org/tests/1382875" class="external">opensuse-Tumbleweed-JeOS-for-kvm-and-xen-x86_64-Build20200901-jeos@64bit_virtio</a></li>
<li>Tumbleweed on RPi: <a href="https://openqa.opensuse.org/tests/overview?distri=microos&distri=opensuse&version=Tumbleweed&build=20200825&groupid=3&flavor=JeOS-for-RPi" class="external">https://openqa.opensuse.org/tests/overview?distri=microos&distri=opensuse&version=Tumbleweed&build=20200825&groupid=3&flavor=JeOS-for-RPi</a></li>
<li>There was a previous attempt to automate testing for SLE: <a href="https://openqa.suse.de/tests/overview?arch=&machine=aarch64&modules=firstrun&groupid=162&distri=sle&build=4.131&version=12-SP5" class="external">https://openqa.suse.de/tests/overview?arch=&machine=aarch64&modules=firstrun&groupid=162&distri=sle&build=4.131&version=12-SP5</a></li>
</ul>
openQA Tests - action #37946 (Resolved): [sle][migration][sle15sp3] detect "Unknown" architecture...https://progress.opensuse.org/issues/379462018-06-27T15:12:40Zthehejikthehejik@suse.com
<p>I already filed a bug for the issue, problem is that openqa/reviewers didn't detect the issue so please add some softfail at least.</p>
<p><a href="https://bugzilla.suse.com/show_bug.cgi?id=1099325" class="external">https://bugzilla.suse.com/show_bug.cgi?id=1099325</a></p>
<p>The issue is visible at <a href="https://openqa.suse.de/tests/1773768#step/upgrade_select/1" class="external">https://openqa.suse.de/tests/1773768#step/upgrade_select/1</a></p>
openQA Tests - action #36403 (Rejected): [caasp] test for udp flannel backendhttps://progress.opensuse.org/issues/364032018-05-22T12:55:31Zthehejikthehejik@suse.com
<p>14:39:26 - abonini: mkravec, thehejik do we have openqa tests for deploying caasp v3 with udp flannel backend?<br>
14:39:55 - mkravec: abonini: no, we use defaults for this setting<br>
14:40:17 - abonini: should we add one?<br>
14:40:36 - abonini: udp flannel backend is 100% supported not tech preview<br>
14:40:44 - abonini: talked with flavio<br>
14:43:30 - mkravec: abonini: we can add test for it<br>
14:43:58 - thehejik: abonini: creating ticket for "udp flannel backend" test<br>
14:45:38 - abonini: thehejik, mkravec cool<br>
14:47:11 - abonini: just a fyi, you need to change the port to 8285 also, it's documented in the ui in network overlay settings<br>
14:50:54 - thehejik: abonini: thanks, noted</p>
<p>I assume that configuration of this is available under "Overlay network settings" in velum before bootstrapping.</p>
<p>We should incorporate this test only for one flavor in openQA - let say that DVD will use defaults and VMX will use flannel.</p>
openQA Project - action #35895 (Resolved): [os-autoinst-openvswitch][theory] problem with assigin...https://progress.opensuse.org/issues/358952018-05-04T10:18:22Zthehejikthehejik@suse.com
<p>Anton noticed an issue when support_server failed due network problems - ss leased 10.0.2.22 most probably from different ss in the same VLAN.</p>
<p><a href="https://openqa.suse.de/tests/1667461" class="external">https://openqa.suse.de/tests/1667461</a> failed support_server oqw3 tap4 3.5. 20:33(+2 on worker)<br>
<a href="https://openqa.suse.de/tests/1667530" class="external">https://openqa.suse.de/tests/1667530</a> parallel_failed master oqw7 tap12 3.5. 20:33<br>
<a href="https://openqa.suse.de/tests/1667523" class="external">https://openqa.suse.de/tests/1667523</a> parallel_failed slave oqw6 tap0 3.5. 20:33</p>
<p>So I did a check on oqw{3,7,6} and found a potential issue that on oqw7 we have more than one tap device assigned to VLAN tag=50.</p>
<p>Anton has added OVS_DEBUG=1 to his testsuites so with another occurrence of this issue we can debug it in autoinst-log.txt (and see if some foreign taps are assigned to the same VLAN).</p>
<p>note to myself ... determine VLAN tag, see test start time => sudo journalctl -u os-autoinst-openvswitch | grep -e "tag.*50"</p>
<p>thehejik@openqaworker7:~> sudo journalctl -u os-autoinst-openvswitch | grep -e "tag.*50"<br>
kvě 03 22:29:50 openqaworker7 ovs-vsctl[16033]: ovs|00001|vsctl|INFO|Called as ovs-vsctl set port tap12 tag=50<br>
kvě 03 22:29:51 openqaworker7 ovs-vsctl[16058]: ovs|00001|vsctl|INFO|Called as ovs-vsctl set port tap13 tag=50<br>
kvě 03 22:33:18 openqaworker7 ovs-vsctl[16899]: ovs|00001|vsctl|INFO|Called as ovs-vsctl set port tap12 tag=50<br>
kvě 03 22:51:39 openqaworker7 ovs-vsctl[24378]: ovs|00001|vsctl|INFO|Called as ovs-vsctl set port tap9 tag=50<br>
kvě 03 22:52:32 openqaworker7 ovs-vsctl[24896]: ovs|00001|vsctl|INFO|Called as ovs-vsctl remove port tap9 tag 50</p>
<p>Related problem can be that VLAN tags are probably not removed when the test fail.</p>
openQA Tests - action #34813 (Resolved): [microos] tests for caasp toolchain modulehttps://progress.opensuse.org/issues/348132018-04-12T13:18:49Zthehejikthehejik@suse.com
<p>According to PRD document for CaaSP 3.0 we should test Toolchain module/add-on in CaaSP 3.0 for kernel debugging purposes.</p>
<ul>
<li>The module will be distributed for end users as a new SCC channel for CaaSP 3 over SUSEConnect (online SCC repo only, no DVD)</li>
<li>Thorsten wants to test installing of KMP kernel modules (like Nvidia) and little debugging with strace, gdb, (tcpdump)</li>
<li>URI of the module in IBS is SUSE:Products:SUSE-CaaSP-Toolchain:3:x86_64</li>
</ul>
<p>Trello card from SCC team <a href="https://trello.com/c/LL8lkF4E/91-caasp3-toolchain-module" class="external">https://trello.com/c/LL8lkF4E/91-caasp3-toolchain-module</a> (card in current sprint so it should be ready soon)</p>
<p>The card has been marked as done today 18.4.2018.</p>
<p>We have discussed the workflow with aherzig:</p>
<ul>
<li>user will register CaaSP by using SUSEConnect -r </li>
<li>after registering there should be free toolchain extension available in SUSEConnect -l output:
AVAILABLE EXTENSIONS AND MODULES</li>
</ul>
<p>FREE EXTENSIONS</p>
<p>SUSE CaaS Plattform Toolchain 3.0 x86_64<br>
Install with: SUSEConnect -p caasp-toolchain/3.0/x86_64</p>
<p>MORE INFORMATION</p>
<p>You can find more information about available modules here:<br>
<a href="https://www.suse.com/products/server/features/modules.html" class="external">https://www.suse.com/products/server/features/modules.html</a></p>
<ul>
<li>register Toolchain module by calling SUSEConnect -p caasp-toolchain/3.0/x86_64</li>
<li>install tcpdump|gcc|gdb|strace</li>
</ul>
openQA Tests - action #33700 (Resolved): [slenkins][qam] tcpd test fails in 2_tcpdmatch - hostnam...https://progress.opensuse.org/issues/337002018-03-23T08:22:42Zthehejikthehejik@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>openQA test in scenario sle-12-SP3-Server-DVD-Updates-x86_64-slenkins-twopence-tcpd-control@64bit fails in<br>
<a href="https://openqa.suse.de/tests/1566506#step/2_tcpdmatch/1" class="external">slenkins_control</a></p>
<pre><code>coolo: thehejik: https://openqa.suse.de/tests/1566506#step/2_tcpdmatch/1 - this looks like a problem with the openvswitch network. it started 2 weeks ago - but is not consistent. can you throw theories at the problem please? :)
thehejik: coolo: yes, vsvecova already reported, maybe it has something to do with https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/4537 and mkravec told me that we shouldn't set fqdn hostname by hostnamectl but just hostname without domain so we need to investigate
coolo: thehejik: checking the salt commits - we did the openvswitch config 9 days before the problem started
thehejik: coolo: hopefully its not openvswitch related this time
coolo: thehejik: as our DNS setup was fixed, we should just revert these hacks
mkravec: coolo: I will do it
</code></pre>
<a name="Reproducible"></a>
<h2 >Reproducible<a href="#Reproducible" class="wiki-anchor">¶</a></h2>
<p>Fails since (at least) Build <a href="https://openqa.suse.de/tests/1566219" class="external">20180323-1</a></p>
<a name="Expected-result"></a>
<h2 >Expected result<a href="#Expected-result" class="wiki-anchor">¶</a></h2>
<p>Last good: <a href="https://openqa.suse.de/tests/1565811" class="external">20180321-3</a> (or more recent)</p>
<a name="Further-details"></a>
<h2 >Further details<a href="#Further-details" class="wiki-anchor">¶</a></h2>
<p>Always latest result in this scenario: <a href="https://openqa.suse.de/tests/latest?test=slenkins-twopence-tcpd-control&flavor=Server-DVD-Updates&machine=64bit&arch=x86_64&version=12-SP3&distri=sle" class="external">latest</a></p>
openQA Infrastructure - action #33253 (Resolved): [salt] add support for multiple multi-host work...https://progress.opensuse.org/issues/332532018-03-14T10:02:33Zthehejikthehejik@suse.com
<p>Currently we have support only for one multihost WORKER_CLASS="tap" but we would need to create different multi-host cluster for WORKER_CLASS="caasp_x86_64" ideally separated from the "tap" cluster. Later we would need probably even more for aarch64 and so.</p>
<p>Changes should be incorporated into <a href="https://gitlab.suse.de/openqa/salt-states-openqa/blob/master/openqa/openvswitch.sls" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/blob/master/openqa/openvswitch.sls</a></p>
openQA Tests - action #32338 (Resolved): [aarch64]Prepare support_server image based on SLE12SP3 ...https://progress.opensuse.org/issues/323382018-02-27T08:47:49Zthehejikthehejik@suse.com
<p>Yesterday we enabled multimachine worker setup for aarch64 (openqaworker-arm-{1..3} - openqaworker-arm-2 has "tap" WORKER_CLASS already configured), it would be nice to have support_server image for aarch64 based on SLE12SP3.</p>
openQA Infrastructure - action #32314 (Resolved): [salt] make GRE tunnels salt-states compatible ...https://progress.opensuse.org/issues/323142018-02-26T16:22:57Zthehejikthehejik@suse.com
<p>Problem is that in past we had for every worker instance (instance of worker on worker host) defined its own WORKER_CLASS in form:<br>
<code>1: <br>
WORKER_CLASS: something<br>
2:<br>
WORKER_CLASS: something_else</code></p>
<p>but now we have Global settings (aarch64) valid for every worker instance (amount defined by numofworkers: 20)<br>
<code>numofworkers: 20<br>
global:<br>
WORKER_CLASS: qemu_aarch64,qemu_aarch64_slow_worker</code></p>
<p>Btw we should also restart wickedd service when the initial mm ovs setup is done to get it working (see poo#32296)</p>
openQA Tests - action #32107 (Resolved): [sle][functional][workable] let support_server wait for ...https://progress.opensuse.org/issues/321072018-02-21T13:14:42Zthehejikthehejik@suse.com
<p>Currently the support_server VM and its DHCP service doesn't wait for initial bootup of remote VM when the installation is done and then the remote VM doesn't have any IPv4 address.</p>
<p>Fix already done <a href="https://github.com/thehejik/os-autoinst-distri-opensuse/commit/e22f8303936ba81bd3e3bf038cb15c9d5976b63f" class="external">https://github.com/thehejik/os-autoinst-distri-opensuse/commit/e22f8303936ba81bd3e3bf038cb15c9d5976b63f</a></p>
<p>Before: <a href="https://openqa.suse.de/tests/1487230#step/remote_target/3" class="external">https://openqa.suse.de/tests/1487230#step/remote_target/3</a> - note there is no ipv4 address for eth0 adaptor.<br>
After: <a href="http://dhcp195.suse.cz/tests/594#step/remote_target/3" class="external">http://dhcp195.suse.cz/tests/594#step/remote_target/3</a></p>
openQA Project - action #20920 (Resolved): [tools] Try out if we can connect more than 2 ovs brid...https://progress.opensuse.org/issues/209202017-07-28T14:41:32Zthehejikthehejik@suse.com
<p>Coolo wants spread a load for slenkins/autoyast tests on all multi-machine workers because w3 is pretty overloaded and w8+w9 (now dedicated for caasp) are idle.</p>
<p>w8 and w9 are connected already over GRE but we will need to try out if we can connect w8+w9 with w3 at the same time. Then we can use same WORKER_CLASS=tap on those workers.</p>
openQA Project - action #20002 (Resolved): [tools] openqa sometimes doesn't update job_dependenci...https://progress.opensuse.org/issues/200022017-06-22T14:28:02Zthehejikthehejik@suse.com
<p>For multi-machine jobs (caasp and slenkins) openQA time to time doesn't schedule all child jobs by triggering its parent CaaSP-controller or slenkins-<em>-control job.<br>
It seems that not all child jobs are running because **there are missing entries for that jobs in job_dependencies SQL table</em>*.</p>
<p>Example of broken job <a href="https://openqa.suse.de/tests/1016423" class="external">https://openqa.suse.de/tests/1016423</a> CaaSP-controller (In this case we miss admin node so then the whole test failed)</p>
<pre><code>`# select count(child_job_id) from job_dependencies where parent_job_id=1016423;
count
-------
22
(1 row)`
</code></pre>
<p>If you try examine some successful CaaSP-controller job (eg. id=1015418) you should get count=25 (1x controller, 1x admin, 1x master, 22x workers).</p>
<p>I'm not able to reproduce the issue on request but the problem sometimes occurs in my local openqa instance using sqlite and also o.s.d using postgresql. The broken job dependency could be solved by posting iso again.</p>
<p>Maybe it has something to do with scheduler which just skips some db insert queries. </p>
<p>I'm sorry being so brief but I really don't know more.</p>
openQA Tests - action #19942 (Resolved): [slenkins] [openqa] all slenkins tests failing - new sup...https://progress.opensuse.org/issues/199422017-06-20T14:33:50Zthehejikthehejik@suse.com
<p>Directories <a href="http://download.suse.de/install/SLP/SLE-12-SP1*" class="external">http://download.suse.de/install/SLP/SLE-12-SP1*</a> dissappeard so then all slenkins tests will fail soon ... <a href="https://openqa.suse.de/tests/1011420#step/slenkins_control/7" class="external">https://openqa.suse.de/tests/1011420#step/slenkins_control/7</a></p>
<p>We would need to create new support_serve image based on SP2 using repos from o.s.d <a href="https://openqa.suse.de/assets/repo/fixed/" class="external">https://openqa.suse.de/assets/repo/fixed/</a></p>
<p>Maybe this will help <a href="https://github.com/os-autoinst/openQA/blob/master/docs/WritingTests.asciidoc#support-server-based-tests" class="external">https://github.com/os-autoinst/openQA/blob/master/docs/WritingTests.asciidoc#support-server-based-tests</a></p>
openQA Project - action #19806 (Resolved): eth0 address of one node is sometime in use by other t...https://progress.opensuse.org/issues/198062017-06-13T14:15:45Zthehejikthehejik@suse.com
<p><a href="https://openqa.suse.de/tests/998491#step/setup/23" class="external">https://openqa.suse.de/tests/998491#step/setup/23</a></p>
<p>in the screen above you can see that we have correct ifcfg-eth0 file with all needed entries but eth0 didn't get an IP after calling rcnetwork restart. Maybe we can do a check in loop and restart network until we have an IP.</p>
<p>Maybe we can also replace rcnetwork restart by "wicked ifdown eth0 && wicked ifup eth0" or by some systemctl call</p>
<p>function responsible for that is mm_network::configure_static_ip</p>
<p>Maybe it is just SP2 product bug.</p>
openQA Project - action #19704 (Resolved): [tools][caasp] rise number of workers for CaaSP dedica...https://progress.opensuse.org/issues/197042017-06-09T13:59:52Zthehejikthehejik@suse.com
<p>Currently we have 16 workers on openqaworker8 and openqaworker9, in the case they will be dedicated only for CaaSP we can rise the amount of workers. </p>
<p>Only limitation was that any test running on that hosts can allocate about 40GB (633GB total/40=~16 workers). But CaaSP is not allocating such amount of space for their images (yes internally it is also 40GB but the image is full of zeros so the qcow2 file is much smaller).</p>
<p>So theoretically we have 256GB of RAM, and 633GB total disk space and 16 physical CPU cores in HT.</p>
<p>Currently pool dir/mount which is also containing cache dir occupies about 54GB. So we have about 633-54=579GB free space for pool+cache needs. </p>
<ul>
<li>maximum occupied size by caasp qcow2 image can be about 10GB ... we can have 579/10=~57 workers</li>
<li>256GB RAM - i/o cache 10GB - 8GB for host itself =~ 238GB RAM available / 8GB (currently only 4GB) per CaaSP instance =~ 30 workers</li>
<li>we have 32 virtual cores = 32 workers</li>
</ul>
<p>Minimum intersect of values above gives possibility to have even 30 workers on each host, but IMO it would be better start with let say 24 workers on each.</p>