https://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842022-12-10T06:18:44ZopenSUSE Project Management ToolopenQA Infrastructure - action #121789: MultiMachine tests lose ability to communicatehttps://progress.opensuse.org/issues/121789?journal_id=5838282022-12-10T06:18:44Zdimstardimstar@opensuse.org
<ul></ul><p>Ow19 and 20 need to be affected</p>
<p>Tests that ended up on ow4 passed</p>
openQA Infrastructure - action #121789: MultiMachine tests lose ability to communicatehttps://progress.opensuse.org/issues/121789?journal_id=5838402022-12-10T09:34:09Zokurzokurz@suse.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-4 status-3 priority-4 priority-default closed" href="/issues/115418">action #115418</a>: Setup ow19+20 to be able to run MM tests size:M</i> added</li></ul> openQA Infrastructure - action #121789: MultiMachine tests lose ability to communicatehttps://progress.opensuse.org/issues/121789?journal_id=5838432022-12-10T09:34:32Zokurzokurz@suse.com
<ul><li><strong>Project</strong> changed from <i>openQA Tests</i> to <i>openQA Infrastructure</i></li><li><strong>Category</strong> deleted (<del><i>Bugs in existing tests</i></del>)</li><li><strong>Target version</strong> set to <i>Ready</i></li></ul> openQA Infrastructure - action #121789: MultiMachine tests lose ability to communicatehttps://progress.opensuse.org/issues/121789?journal_id=5839962022-12-12T09:42:03Zlivdywanliv.dywan@suse.com
<ul></ul><p>Covered briefly in the daily. We'll see if Fabian can look into it on account of having set this up last week, pending response in "factory" - if that doesn't happen I'm prepared to look into it and see what I can figure out</p>
openQA Infrastructure - action #121789: MultiMachine tests lose ability to communicatehttps://progress.opensuse.org/issues/121789?journal_id=5840532022-12-12T12:08:58Zlivdywanliv.dywan@suse.com
<ul><li><strong>Tags</strong> set to <i>infra</i></li></ul><p>Brought up in the infra daily. I assume we consider this infra.</p>
openQA Infrastructure - action #121789: MultiMachine tests lose ability to communicatehttps://progress.opensuse.org/issues/121789?journal_id=5846172022-12-13T15:59:08Zfavogtfvogt@suse.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li></ul><p>There was a test running on ow19 and ow20, with the VMs able to ping each other in both directions and each VM being able to ping their host through 10.0.2.2.<br>
The VM on ow20 was able to reach the outside (beyond the worker), but not the VM on ow19.<br>
Using tcpdump, it was visible that the ICMP echo requests went from the tap device to br0 with the correct IP rewriting (by OVS), but did not end up on eth0.<br>
It turns out that both <code>net.ipv4.conf.eth0.forwarding</code> and <code>net.ipv4.conf.br1.forwarding</code> were set to 0. Changing them to <code>1</code> again with <code>sysctl -w</code> restored networking completely.</p>
<p>I'm not sure what the cause is, but:</p>
<pre><code>openqaworker19:~ # cat /etc/sysctl.d/70-yast.conf
net.ipv4.ip_forward = 0
net.ipv6.conf.all.forwarding = 0
net.ipv6.conf.all.disable_ipv6 = 0
</code></pre>
<p>I deleted that file now. Maybe it's fixed, let's see.</p>
openQA Infrastructure - action #121789: MultiMachine tests lose ability to communicatehttps://progress.opensuse.org/issues/121789?journal_id=5859882022-12-19T11:37:21Zfavogtfvogt@suse.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Resolved</i></li><li><strong>% Done</strong> changed from <i>0</i> to <i>100</i></li></ul><blockquote>
<p>[12:36] DimStar: Did <a href="https://progress.opensuse.org/issues/121789" class="external">https://progress.opensuse.org/issues/121789</a> happen again?<br>
[12:36] fvogt: don't think I'd seen that popping up the last few days</p>
</blockquote>
openQA Infrastructure - action #121789: MultiMachine tests lose ability to communicatehttps://progress.opensuse.org/issues/121789?journal_id=5859912022-12-19T11:37:27Zfavogtfvogt@suse.com
<ul><li><strong>Assignee</strong> set to <i>favogt</i></li></ul> openQA Infrastructure - action #121789: MultiMachine tests lose ability to communicatehttps://progress.opensuse.org/issues/121789?journal_id=5859972022-12-19T11:54:02Zokurzokurz@suse.com
<ul></ul><p><a class="user active user-mention" href="https://progress.opensuse.org/users/20030">@favogt</a> great that you could fix it. I am just afraid the next time on problems we will be in a similar situation. Do you have an idea what can be added to the documentation or even better to our software to clearly indicate what the problems are before we start failing tests?</p>
openQA Infrastructure - action #121789: MultiMachine tests lose ability to communicatehttps://progress.opensuse.org/issues/121789?journal_id=5861442022-12-19T15:53:53Zfavogtfvogt@suse.com
<ul></ul><p>okurz wrote:</p>
<blockquote>
<p><a class="user active user-mention" href="https://progress.opensuse.org/users/20030">@favogt</a> great that you could fix it. I am just afraid the next time on problems we will be in a similar situation. Do you have an idea what can be added to the documentation or even better to our software to clearly indicate what the problems are before we start failing tests?</p>
</blockquote>
<p>Not sure. The networking setup is fairly complex and I don't really understand all parts either. There's already a section about OVS debugging in the documentation which is somewhat helpful: <a href="http://open.qa/docs/#_debugging_open_vswitch_configuration" class="external">http://open.qa/docs/#_debugging_open_vswitch_configuration</a></p>
<p>What I did was applying tcpdump to all interfaces along the path to figure out where it goes wrong.</p>
<p>Some more complete documentation on how MM networking with OVS works would be helpful not only for troubleshooting I'd say. The main missing part is how OVS is configured (IP rewriting) and how it plays together with VLANs, GRE tunnels and masquerading.</p>
openQA Infrastructure - action #121789: MultiMachine tests lose ability to communicatehttps://progress.opensuse.org/issues/121789?journal_id=5862972022-12-20T08:24:43Zokurzokurz@suse.com
<ul></ul><p>favogt wrote:</p>
<blockquote>
<p>okurz wrote:</p>
<blockquote>
<p><a class="user active user-mention" href="https://progress.opensuse.org/users/20030">@favogt</a> great that you could fix it. I am just afraid the next time on problems we will be in a similar situation. Do you have an idea what can be added to the documentation or even better to our software to clearly indicate what the problems are before we start failing tests?</p>
</blockquote>
<p>Not sure. The networking setup is fairly complex and I don't really understand all parts either. There's already a section about OVS debugging in the documentation which is somewhat helpful: <a href="http://open.qa/docs/#_debugging_open_vswitch_configuration" class="external">http://open.qa/docs/#_debugging_open_vswitch_configuration</a></p>
<p>What I did was applying tcpdump to all interfaces along the path to figure out where it goes wrong.</p>
</blockquote>
<p>ok, thx</p>
<blockquote>
<p>Some more complete documentation on how MM networking with OVS works would be helpful not only for troubleshooting I'd say. The main missing part is how OVS is configured (IP rewriting) and how it plays together with VLANs, GRE tunnels and masquerading.</p>
</blockquote>
<p>yeah, I just don't think there is anyone else right now that feels more confident to write that up than you are :)</p>
openQA Infrastructure - action #121789: MultiMachine tests lose ability to communicatehttps://progress.opensuse.org/issues/121789?journal_id=5868612022-12-21T12:29:01Zokurzokurz@suse.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-4 status-1 priority-4 priority-default" href="/issues/122299">action #122299</a>: openQA worker should fail with explicit error message if multi-machine test is triggered but requirements are not fulfilled</i> added</li></ul>