action #16346
closed[sles][functional][tools][s390x] Test timeouts on s390 after 2 hours
0%
Description
Tests timout for some reason on s390
Logs:
https://openqa.suse.de/tests?match=%3A127401.java-1_8_0-openjdk.4082
https://openqa.suse.de/tests/748476/file/autoinst-log.txt
Seems to hit s390 on SP3 too.
Updated by oholecek about 7 years ago
2hrs is the default max limit job can run. Use MAX_JOB_TIME variable to change this limit.
Updated by okurz about 7 years ago
that comment makes no sense. These scenarios are not supposed to run for 2h, check previous ones.
vpelcak, please use https://progress.opensuse.org/projects/openqav3/wiki#defects - could have gave us more context to understand that previous ones still worked :-)
Updated by oholecek about 7 years ago
It does make sense if reacted just on the topic of the issue.
Anyway, the log seems interesting - at the beginning there is
running on openqaworker2:5 (Linux 4.7.5-2.g02c4d35-default #1 SMP PREEMPT Mon Sep 26 08:11:45 UTC 2016 (02c4d35) x86_64)
Timestamp 8:11:45 UTC
Throughout the logs it is all UTC-1
07:48:45.8652 30050 Rsync: Synchronization of needles directory took 0.12 seconds
07:48:46.0459 30050 usingenv DESKTOP=gnome
07:48:46.0460 30050 usingenv DISTRI=sle
07:48:46.0460 30050 usingenv NOAUTOLOGIN=1
...
07:58:23.8844 30302 >>> testapi::_check_backend_response: match=current-pattern-selected,on-category timed out after 0
07:58:23.9174 Debug: /var/lib/openqa/share/tests/sle/tests/installation/select_patterns.pm:38 called testapi::wait_screen_change
07:58:23.9175 30302 <<< testapi::wait_screen_change(timeout=10)
07:58:23.9258 Debug: /var/lib/openqa/share/tests/sle/tests/installation/select_patterns.pm:37 called testapi::send_key
And then suddenly something like UTC+1, but minutes are off too.
07:58:23.9258 Debug: /var/lib/openqa/share/tests/sle/tests/installation/select_patterns.pm:37 called testapi::send_key
07:58:23.9259 30302 <<< testapi::send_key(key='down')
09:48:52.5455 30050 signalhandler got TERM - loop 1
09:48:52.5458 30050 awaiting death of commands process
09:48:52.5495 30050 commands process exited: 30156
09:48:52.5496 30050 awaiting death of testpid 30302
09:48:52.5552 30050 test process exited: 30302
09:48:52.5552 30050 isotovideo failed
09:48:52.5569 30050 killing backend process 30354
09:48:52.5570 30354 backend got TERM
Seems some strange time changes. Is it known for s390 workers to behave like this?
Updated by coolo about 7 years ago
don't get fooled - the time is not jumping. The jobs are really so long on the worker
Updated by coolo about 7 years ago
Jan 31 12:48:53 openqaworker2 worker[26121]: [WARN] job 748496 spent more time than MAX_JOB_TIME
Jan 31 12:48:53 openqaworker2 worker[26121]: can't open /var/lib/openqa/pool/5/testresults/result-select_patterns.json: No such file or directory at /usr/share/openqa/script/../lib/Ope
Jan 31 12:48:54 openqaworker2 worker[26121]: [INFO] cleaning up 00748496-sle-12-SP2-Server-DVD-Incidents-s390x-Build:127315.libseccomp.4020-qam-minimal@s390x-zVM-hsi-l3
Jan 31 12:48:54 openqaworker2 worker[26121]: [INFO] got job 748507: 00748507-sle-12-SP2-Server-DVD-Incidents-s390x-Build:127321.libica2.3947-qam-minimal@s390x-zVM-hsi-l3
Jan 31 12:48:54 openqaworker2 worker[26121]: [INFO] 28386: WORKING 748507
Jan 31 14:03:54 openqaworker2 worker[26121]: [ERROR] 502 response: Proxy Error (remaining tries: 2)
Jan 31 14:48:54 openqaworker2 worker[26121]: [WARN] max job time exceeded, aborting 00748507-sle-12-SP2-Server-DVD-Incidents-s390x-Build:127321.libica2.3947-qam-minimal@s390x-zVM-hsi-l
Jan 31 14:48:54 openqaworker2 worker[26121]: killed 28386
Jan 31 14:48:54 openqaworker2 worker[26121]: [WARN] job 748507 spent more time than MAX_JOB_TIME
Updated by vpelcak about 7 years ago
okurz wrote:
vpelcak, please use https://progress.opensuse.org/projects/openqav3/wiki#defects - could have gave us more context to understand that previous ones still worked :-)
For QAM minimal tests are always supposed to pass. Either the minimal tests pass all green (or soft fail with the known issues not relevant to the update - like PPC failing to reboot before the timeout when testing GRUB) or we raise the issue. So, it always used to work if we start to report problems.
But sure, I can mention that in the future.
Updated by vpelcak about 7 years ago
https://openqa.suse.de/tests/752541#live - install test froze
I noticed that tests reach 59% of the progress (before the select_pattern test) and then they timeout.
There are no logs shown in https://openqa.suse.de/tests/752541#downloads
Updated by okurz about 7 years ago
Updated by okurz about 7 years ago
Also, other backends seem to tell when the job has been cancelled because of timeout. does not seem to happen on s390x, please check backend.
Updated by mgriessmeier about 7 years ago
nvm - I see, this was already in the logs posted from coolo some time ago...
anyone has an idea how I can debug this further?
What I found just now, after the job dies because of MAX_JOB_TIME exceeded, in the worker log:
openqaworker2 worker[12208]: can't open /var/lib/openqa/pool/3/testresults/result-select_patterns.json: No such file or directory at /usr/share/openqa/script/../lib/OpenQA/Worker/Jobs.pm line 735.
Updated by coolo about 7 years ago
So this has nothing to do with send_key - it's just the next call into the backend, while the backend is stuck on reading VNC. It stalls there
fcntl(18, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(18, F_SETFL, O_RDWR|O_NONBLOCK) = 0
read(18, "\0\0\0\3\1@\0\200\0\300\0@\0\0\0\0)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e"..., 8192) = 8192
fcntl(18, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
fcntl(18, F_SETFL, O_RDWR) = 0
read(18, ")e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e"..., 8192) = 8192
read(18, "!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$\20\242!\4!\4)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e"..., 8192) = 8192
read(18, "!$!$!$!$!$!$!$!$\1@\0\300\2\300\0@\0\0\0\0\20\242!\4!\4)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e\20\242!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$"..., 8192) = 8192
read(18, "!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$"..., 8192) = 8192
read(18, "!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$"..., 8192) = 8192
read(18, "!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$"..., 8192) = 8192
read(18, "!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$"..., 8192) = 8192
read(18, "!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$"..., 8192) = 8192
read(18, "!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!\4)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e\20\242!\4!\4)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e"..., 8192) = 8192
read(18, "!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$"..., 8192) = 8192
read(18, "!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$"..., 8192) = 8192
read(18, "!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$"..., 8192) = 8192
read(18, "!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$"..., 8192) = 8192
read(18, ")e)e)e)e)e)e)e)e)e)e)e)e)e)e\0\0\1\300\1@\0\200\0\0\0\0)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e\20\242!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$"..., 8192) = 8192
read(18, "\347<\347\\\357}\357}\326\272\347<\337\33\357}\357}s\316!$!$\245T\357}\357}\336\373\347\\\357}\255\325\31\3cL\357}\347\\\347\\\337\33\347\\\357}\336\373!\204\204\20\3068\357}\357}\265\326s\316\31\3\2767\357}\316\271\255u\337\33\357}\245t!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$\20\242!\4!\4!\4!\4!\4!\4!\4!\4!\4!\4!\4)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e"..., 8192) = 8192
read(18, "!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$"..., 8192) = 8192
read(18, "\337\0331\306!$!$\214\221\214\221!$1\306\336\373s\316)\305J\311\316\271[+!$R\212\347\\\265\366!$!$!$!$\265\366\347\\R\212!$\265\326cl!$!$!$!$!$!$!$!$\357}1\306\31#J\251\316\271cL!$!$!$!$!$!$!$!$!$!$!$!$!$!$\2454s\316!$!$!$!$R\312\2767!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$\357}1\306!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$"..., 8192) = 8192
read(18, "!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$\20\202\0\0\0\0\0\0\0\0\0\0\0 \10a\0 \0\0\0\0\0\0\0\0\20\242!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$"..., 8192) = 8192
read(18, "!$!$!$!$!$!$!$!$\20\242!\4!\4!\4!\4!\4!\4!\4!\4!\4!\4!\4)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e\20\242!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$\2040\377\377\367\276\367\276\367\276\326\2321\307)\246)\246)\246c,\367\236\367\276\367\276\367\276\367\276\367\276\367\276"..., 8192) = 8192
read(18, "!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$\20\242!\4!\4!\4!\4!\4!\4!\4!\4!\4!\4!\4)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e"..., 8192) = 8192
read(18, "\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314\34\314"..., 8192) = 8192
read(18, "\24\212\33\350!\305!\305!\305!\305!\305!\305!\305\32\206%M\33\350!\305\32%\34)!\305\32\4\35,%M\32%!\305!\305!\305!\305!\305!\305!\305\34*\34\212!\305!\305\34j\34\212!\305!\305\34J%M%M\34\353!\305!\305\34)\35\f!\305!\305\34\t%M%M\23\310!\305\31\244%M\33H!\305\32F%M%M\34\t!\305!\305\35\f%M\34\353!\305!\305\34)\34\313!\305!\305!\305!\305!\305!\305!\305!\305\35,\32E!\305\33H%M%M\33H!\305\32\5%M%M\32F!\305\33H%M%M%M%M%M%M%M%M%M%M%M"..., 8192) = 8192
read(18, "!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$cM\275\367\347<\347<\347<\347<\347<\347<\347<\347<\347<\347<\347<\347<\347<\347<\347<\347<\347<\347<\347<\347<\347<\347<\347<\347<\347<\347<\347<\347<\316\232\316y!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$!$"..., 8192) = 8192
read(18,
Updated by coolo about 7 years ago
14:32:50.3431 16432 NOR 4
14:32:50.3432 16432 UP 320,128 320 x 64 0
14:32:50.3433 16432 Reading 40960
14:32:50.3441 16432 UP 320,192 704 x 64 0
14:32:50.3442 16432 Reading 90112
14:32:50.3457 16432 UP 320,256 640 x 64 0
14:32:50.3458 16432 Reading 81920
14:32:50.3471 16432 UP 0,576 320 x 128 0
14:32:50.3472 16432 Reading 81920
And the last read stops after the 9th 8192 - trying to get another 8192. But as this 8192 won't get full, it's blocking until the end. I guess we need to use sysread here instead :(
Updated by coolo about 7 years ago
I switched to sysread, but it still blocks
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2335, ...}) = 0
write(3, "15:41:09.9934 26239 Reading 8192"..., 34) = 34
read(18, ")e)e)e)e)e)e)e)e)e)e)e)e)e)e)e)e"..., 81920) = 81880
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2335, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2335, ...}) = 0
write(3, "15:41:09.9936 26239 Read 81880\n", 31) = 31
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2335, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2335, ...}) = 0
write(3, "15:41:09.9938 26239 Reading 40\n", 31) = 31
read(18,
Updated by coolo about 7 years ago
- Status changed from New to In Progress
- Assignee set to coolo
I have no idea where the 40 bytes could get lost :(
Updated by coolo about 7 years ago
I tried to move the jobs to openqaworker6: same problem, even as the network config is different.
I tried tcpdump on the network interface: wireshark reports the missing bytes just as openqa does.
I tried to reproduce with zkvm and tcpdump on the hypervisor: unfortunately it's not reproducible enough.
I reenabled ZLRE: works flawless again. So I will just do that :(
Updated by coolo about 7 years ago
- Status changed from In Progress to Resolved
https://github.com/os-autoinst/os-autoinst/pull/719 - I deployed the ZLRE reenabling on w2 and there was no single job incomplete since then. The missing bytes are still curious
Updated by mgriessmeier about 7 years ago
- Status changed from Resolved to Feedback
happens again - looks like the same issue...
https://openqa.suse.de/tests/790138
Updated by coolo about 7 years ago
- Assignee changed from coolo to mgriessmeier
the change I did was merely a workaround - so now find the real problem in the network setup. Good luck
Updated by mgriessmeier about 7 years ago
- Subject changed from Test timeouts on s390 after 2 hours to [sles][functional][tools][s390x] Test timeouts on s390 after 2 hours
- Priority changed from Urgent to Normal
change name and lower priority, since didn't occur for quite a time
Updated by mgriessmeier about 7 years ago
- Status changed from Feedback to Resolved
not seen any longer - probably some hiccup
Reopen if it occurs again
Updated by mgriessmeier about 7 years ago
- Status changed from Resolved to Feedback
happened again here:
https://openqa.suse.de/tests/846138
probably similiar to
https://progress.opensuse.org/issues/17872
Updated by mgriessmeier almost 7 years ago
- Status changed from Feedback to Resolved
closing again -> not happening anymore