action #107017
closedRandom asset download (cache service) failures on openqaworker1
Added by favogt almost 3 years ago. Updated almost 3 years ago.
100%
Description
Observation¶
On openqaworker1, several jobs incomplete due to a setup failure, e.g.:
https://openqa.opensuse.org/tests/2191909
[2022-02-16T21:47:58.251560+01:00] [debug] Found HDD_1, caching opensuse-Tumbleweed-x86_64-20220215-textmode@64bit.qcow2
[2022-02-16T21:47:59.691676+01:00] [info] Downloading opensuse-Tumbleweed-x86_64-20220215-textmode@64bit.qcow2, request #8258 sent to Cache Service
[2022-02-16T22:44:08.642013+01:00] [info] +++ worker notes +++
[2022-02-16T22:44:08.652197+01:00] [info] End time: 2022-02-16 21:44:08
[2022-02-16T22:44:08.652264+01:00] [info] Result: timeout
https://openqa.opensuse.org/tests/2193698
[2022-02-17T13:37:16.233412+01:00] [debug] Found ISO, caching openSUSE-Leap-15.3-GNOME-Live-x86_64-Build9.374-Media.iso
[2022-02-17T13:37:40.012614+01:00] [info] Downloading openSUSE-Leap-15.3-GNOME-Live-x86_64-Build9.374-Media.iso, request #9191 sent to Cache Service
[2022-02-17T14:33:00.965395+01:00] [info] +++ worker notes +++
[2022-02-17T14:33:00.965569+01:00] [info] End time: 2022-02-17 13:33:00
[2022-02-17T14:33:00.965644+01:00] [info] Result: timeout
[2022-02-17T14:33:00.974060+01:00] [info] Uploading autoinst-log.txt
Most likely this started after the worker count and cache sizes were increased after the NVMe deployment.
The minion web interface shows some errors for the GNOME-Live download:
---
args:
- 2193699
- iso
- openSUSE-Leap-15.3-GNOME-Live-x86_64-Build9.374-Media.iso
- http://openqa1-opensuse
attempts: 1
children: []
created: 2022-02-17T12:36:17Z
delayed: 2022-02-17T12:36:17Z
expires: ~
finished: 2022-02-17T12:38:45Z
id: 9188
lax: 0
notes:
lock: openSUSE-Leap-15.3-GNOME-Live-x86_64-Build9.374-Media.iso.http://openqa1-opensuse
parents: []
priority: 0
queue: default
result: 'Job terminated unexpectedly (exit code: 11, signal: 0)'
retried: ~
retries: 0
started: 2022-02-17T12:36:43Z
state: failed
task: cache_asset
time: 2022-02-17T13:45:42Z
worker: 18
---
args:
- 2193696
- iso
- openSUSE-Leap-15.3-GNOME-Live-x86_64-Build9.374-Media.iso
- http://openqa1-opensuse
attempts: 1
children: []
created: 2022-02-17T12:36:56Z
delayed: 2022-02-17T12:36:56Z
expires: ~
finished: 2022-02-17T12:41:18Z
id: 9190
lax: 0
notes:
lock: openSUSE-Leap-15.3-GNOME-Live-x86_64-Build9.374-Media.iso.http://openqa1-opensuse
parents: []
priority: 0
queue: default
result: 'Job terminated unexpectedly (exit code: 2, signal: 0)'
retried: ~
retries: 0
started: 2022-02-17T12:38:11Z
state: failed
task: cache_asset
time: 2022-02-17T13:45:42Z
worker: 18
Might be some miscellaneous network issues? I couldn't find any further information.
Updated by favogt almost 3 years ago
- Copied from action #103524: OW1: performance loss size:M added
Updated by kraih almost 3 years ago
One of those jobs exited with a segfault, that is definitely not expected behaviour and cannot be caused by Perl code (must have been an XS module misbehaving). Hopefully just a one-time occurrence, otherwise we might have to check if any Perl related packages have been updated recently.
Updated by kraih almost 3 years ago
Looked up the jobs in question in the database:
8258|[2191909,"hdd","opensuse-Tumbleweed-x86_64-20220215-textmode@64bit.qcow2","http:\/\/openqa1-opensuse"]|2022-02-16 20:47:58|2022-02-16 20:47:58|2022-02-16 20:48:50|0|"DBD::SQLite::db do failed: database is locked at \/usr\/share\/openqa\/script\/..\/lib\/OpenQA\/CacheService.pm line 77.\n"||0|2022-02-16 20:47:59|failed|cache_asset|10|default|1|[]|{"lock":"opensuse-Tumbleweed-x86_64-20220215-textmode@64bit.qcow2.http:\/\/openqa1-opensuse"}||0
9191|[2193698,"iso","openSUSE-Leap-15.3-GNOME-Live-x86_64-Build9.374-Media.iso","http:\/\/openqa1-opensuse"]|2022-02-17 12:37:16|2022-02-17 12:37:16|2022-02-17 12:41:18|0|"Job terminated unexpectedly (exit code: 11, signal: 0)"||0|2022-02-17 12:38:15|failed|cache_asset|18|default|1|[]|{"lock":"openSUSE-Leap-15.3-GNOME-Live-x86_64-Build9.374-Media.iso.http:\/\/openqa1-opensuse"}||0
It does look like the machine is having issues, and it's not the network.
Updated by kraih almost 3 years ago
Might be SQLite related, maybe a storage issue? There's quite a few more similar cases:
5728|2022-02-14 21:32:56|failed|"DBD::SQLite::db do failed: database is locked at \/usr\/share\/openqa\/script\/..\/lib\/OpenQA\/CacheService.pm line 77.\n"
6945|2022-02-16 00:43:19|failed|"DBD::SQLite::db do failed: database is locked at \/usr\/share\/openqa\/script\/..\/lib\/OpenQA\/CacheService.pm line 77.\n"
6988|2022-02-16 01:12:39|failed|"DBD::SQLite::db do failed: database is locked at \/usr\/share\/openqa\/script\/..\/lib\/OpenQA\/CacheService.pm line 77.\n"
8234|2022-02-16 20:21:07|failed|"DBD::SQLite::db do failed: database is locked at \/usr\/share\/openqa\/script\/..\/lib\/OpenQA\/CacheService.pm line 77.\n"
8272|2022-02-16 21:18:22|failed|"DBD::SQLite::db do failed: database is locked at \/usr\/share\/openqa\/script\/..\/lib\/OpenQA\/CacheService.pm line 77.\n"
8289|2022-02-16 21:30:35|failed|"Job terminated unexpectedly (exit code: 2, signal: 0)"
8291|2022-02-16 21:30:22|failed|"DBD::SQLite::db do failed: database is locked at \/usr\/share\/openqa\/script\/..\/lib\/OpenQA\/CacheService.pm line 77.\n"
8293|2022-02-16 21:35:39|failed|"Worker went away"
8428|2022-02-16 22:51:41|failed|"Job terminated unexpectedly (exit code: 2, signal: 0)"
8665|2022-02-17 01:02:37|failed|"Job terminated unexpectedly (exit code: 2, signal: 0)"
8789|2022-02-17 02:30:15|failed|"Job terminated unexpectedly (exit code: 0, signal: 15)"
9343|2022-02-17 14:37:14|failed|"DBD::SQLite::db do failed: database is locked at \/usr\/share\/openqa\/script\/..\/lib\/OpenQA\/CacheService.pm line 77.\n"
3272|2022-02-13 02:30:13|failed|"Job terminated unexpectedly (exit code: 0, signal: 15)"
5444|2022-02-14 19:09:07|failed|"DBD::SQLite::db do failed: database is locked at \/usr\/share\/openqa\/script\/..\/lib\/OpenQA\/CacheService.pm line 77.\n"
5445|2022-02-14 19:09:10|failed|"DBD::SQLite::db do failed: database is locked at \/usr\/share\/openqa\/script\/..\/lib\/OpenQA\/CacheService.pm line 77.\n"
5446|2022-02-14 19:09:10|failed|"DBD::SQLite::db do failed: database is locked at \/usr\/share\/openqa\/script\/..\/lib\/OpenQA\/CacheService.pm line 77.\n"
6319|2022-02-15 18:46:35|failed|"DBD::SQLite::db do failed: database is locked at \/usr\/share\/openqa\/script\/..\/lib\/OpenQA\/CacheService.pm line 77.\n"
6557|2022-02-15 21:17:52|failed|"DBD::SQLite::db do failed: database is locked at \/usr\/share\/openqa\/script\/..\/lib\/OpenQA\/CacheService.pm line 77.\n"
6572|2022-02-15 21:27:09|failed|"DBD::SQLite::db do failed: database is locked at \/usr\/share\/openqa\/script\/..\/lib\/OpenQA\/CacheService.pm line 77.\n"
6953|2022-02-16 00:49:09|failed|"DBD::SQLite::db do failed: database is locked at \/usr\/share\/openqa\/script\/..\/lib\/OpenQA\/CacheService.pm line 77.\n"
7026|2022-02-16 01:37:05|failed|"Job terminated unexpectedly (exit code: 11, signal: 0)"
7435|2022-02-16 05:51:25|failed|"DBD::SQLite::db do failed: database is locked at \/usr\/share\/openqa\/script\/..\/lib\/OpenQA\/CacheService.pm line 77.\n"
7835|2022-02-16 14:45:41|failed|"DBD::SQLite::db do failed: database is locked at \/usr\/share\/openqa\/script\/..\/lib\/OpenQA\/CacheService.pm line 77.\n"
7941|2022-02-16 16:06:13|failed|"Job terminated unexpectedly (exit code: 2, signal: 0)"
8243|2022-02-16 20:29:20|failed|"DBD::SQLite::db do failed: database is locked at \/usr\/share\/openqa\/script\/..\/lib\/OpenQA\/CacheService.pm line 77.\n"
8247|2022-02-16 20:35:51|failed|"DBD::SQLite::db do failed: database is locked at \/usr\/share\/openqa\/script\/..\/lib\/OpenQA\/CacheService.pm line 77.\n"
8258|2022-02-16 20:48:50|failed|"DBD::SQLite::db do failed: database is locked at \/usr\/share\/openqa\/script\/..\/lib\/OpenQA\/CacheService.pm line 77.\n"
8371|2022-02-16 22:06:09|failed|"DBD::SQLite::db do failed: database is locked at \/usr\/share\/openqa\/script\/..\/lib\/OpenQA\/CacheService.pm line 77.\n"
8388|2022-02-16 22:11:38|failed|"DBD::SQLite::db do failed: database is locked at \/usr\/share\/openqa\/script\/..\/lib\/OpenQA\/CacheService.pm line 77.\n"
8405|2022-02-16 22:24:23|failed|"DBD::SQLite::db do failed: database is locked at \/usr\/share\/openqa\/script\/..\/lib\/OpenQA\/CacheService.pm line 77.\n"
8413|2022-02-16 22:32:29|failed|"Job terminated unexpectedly (exit code: 2, signal: 0)"
8420|2022-02-16 22:37:58|failed|"DBD::SQLite::db do failed: database is locked at \/usr\/share\/openqa\/script\/..\/lib\/OpenQA\/CacheService.pm line 77.\n"
8421|2022-02-16 22:39:30|failed|"DBD::SQLite::db do failed: database is locked at \/usr\/share\/openqa\/script\/..\/lib\/OpenQA\/CacheService.pm line 77.\n"
8425|2022-02-16 22:45:08|failed|"DBD::SQLite::db do failed: database is locked at \/usr\/share\/openqa\/script\/..\/lib\/OpenQA\/CacheService.pm line 77.\n"
8507|2022-02-16 23:34:24|failed|"Job terminated unexpectedly (exit code: 2, signal: 0)"
8658|2022-02-17 00:47:46|failed|"DBD::SQLite::db do failed: database is locked at \/usr\/share\/openqa\/script\/..\/lib\/OpenQA\/CacheService.pm line 77.\n"
8667|2022-02-17 01:04:11|failed|"Job terminated unexpectedly (exit code: 11, signal: 0)"
8668|2022-02-17 01:04:15|failed|"DBD::SQLite::db do failed: database is locked at \/usr\/share\/openqa\/script\/..\/lib\/OpenQA\/CacheService.pm line 77.\n"
8670|2022-02-17 01:07:35|failed|"Job terminated unexpectedly (exit code: 11, signal: 0)"
8671|2022-02-17 01:07:16|failed|"DBD::SQLite::db do failed: database is locked at \/usr\/share\/openqa\/script\/..\/lib\/OpenQA\/CacheService.pm line 77.\n"
8676|2022-02-17 01:13:16|failed|"DBD::SQLite::db do failed: database is locked at \/usr\/share\/openqa\/script\/..\/lib\/OpenQA\/CacheService.pm line 77.\n"
8679|2022-02-17 01:19:36|failed|"Job terminated unexpectedly (exit code: 11, signal: 0)"
8681|2022-02-17 01:20:20|failed|"DBD::SQLite::db do failed: database is locked at \/usr\/share\/openqa\/script\/..\/lib\/OpenQA\/CacheService.pm line 77.\n"
8783|2022-02-17 02:30:02|failed|"Job terminated unexpectedly (exit code: 0, signal: 15)"
8794|2022-02-17 02:30:15|failed|"Job terminated unexpectedly (exit code: 0, signal: 15)"
8840|2022-02-17 03:19:54|failed|"DBD::SQLite::db do failed: database is locked at \/usr\/share\/openqa\/script\/..\/lib\/OpenQA\/CacheService.pm line 77.\n"
9188|2022-02-17 12:38:45|failed|"Job terminated unexpectedly (exit code: 11, signal: 0)"
9189|2022-02-17 12:41:12|failed|"Job terminated unexpectedly (exit code: 2, signal: 0)"
9190|2022-02-17 12:41:18|failed|"Job terminated unexpectedly (exit code: 2, signal: 0)"
9191|2022-02-17 12:41:18|failed|"Job terminated unexpectedly (exit code: 11, signal: 0)"
The cases of Job terminated unexpectedly
without a signal are especially concerning.
Updated by favogt almost 3 years ago
kraih wrote:
Might be SQLite related, maybe a storage issue? There's quite a few more similar cases:
...
The cases ofJob terminated unexpectedly
without a signal are especially concerning.
Indeed. A coredump or just a backtrace would help already... I don't see any SEGVs in dmesg though.
If it's caused by sqlite, then the procedure from https://progress.opensuse.org/issues/67000#note-31 might work here. Or just deleting the database altogether.
Updated by kraih almost 3 years ago
favogt wrote:
If it's caused by sqlite, then the procedure from https://progress.opensuse.org/issues/67000#note-31 might work here. Or just deleting the database altogether.
You might be right, file corruption would be an explanation, and deleting the sqlite file could help.
Updated by kraih almost 3 years ago
I've recreated the cache database, let's see what happens.
Updated by kraih almost 3 years ago
Something is still very wrong, i can't even check the journal mode because the database is constantly locked:
SQLite version 3.36.0 2021-06-18 18:36:39
Enter ".help" for usage hints.
sqlite> PRAGMA journal_mode;
Error: database is locked
sqlite> PRAGMA journal_mode;
Error: database is locked
sqlite> PRAGMA journal_mode;
Error: database is locked
sqlite> PRAGMA journal_mode;
Error: database is locked
sqlite> PRAGMA journal_mode;
Error: database is locked
sqlite> PRAGMA journal_mode;
Error: database is locked
sqlite> PRAGMA journal_mode;
Error: database is locked
sqlite> PRAGMA journal_mode;
Error: database is locked
sqlite> PRAGMA journal_mode;
Error: database is locked
sqlite> PRAGMA journal_mode;
Error: database is locked
Which should mean very slow disk I/O.
Updated by nicksinger almost 3 years ago
Just to clarify, we're talking about data residing below /var/lib/openqa, right?
We have this data and the root fs both on a RAID 1 so I'd assume more damage if hardware is broken. But we have to check. From a first glance at the system everything looks (almost) fine:
openqaworker1:~ # mdadm --detail /dev/md127
/dev/md127:
Version : 1.2
Creation Time : Wed Feb 9 16:22:20 2022
Raid Level : raid0
Array Size : 1390651392 (1326.23 GiB 1424.03 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Wed Feb 9 16:22:20 2022
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Layout : original
Chunk Size : 512K
Consistency Policy : none
Name : openqaworker1:openqa (local to host openqaworker1)
UUID : 4030b549:812c9e8a:22ad7ec4:d4478a72
Events : 0
Number Major Minor RaidDevice State
0 259 1 0 active sync /dev/nvme0n1
1 259 0 1 active sync /dev/nvme1n1
openqaworker1:~ # smartctl -a /dev/nvme0n1
smartctl 7.0 2019-05-21 r4917 [x86_64-linux-5.3.18-150300.59.49-default] (SUSE RPM)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: INTEL SSDPE2ME400G4
Serial Number: CVMD4352001W400FGN
Firmware Version: 8DV10151
PCI Vendor/Subsystem ID: 0x8086
IEEE OUI Identifier: 0x5cd2e4
Controller ID: 0
Number of Namespaces: 1
Namespace 1 Size/Capacity: 400.088.457.216 [400 GB]
Namespace 1 Formatted LBA Size: 512
Local Time is: Thu Feb 17 17:24:31 2022 CET
Firmware Updates (0x02): 1 Slot
Optional Admin Commands (0x0006): Format Frmw_DL
Optional NVM Commands (0x0006): Wr_Unc DS_Mngmt
Maximum Data Transfer Size: 32 Pages
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 25.00W - - 0 0 0 0 0 0
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 2
1 - 512 8 2
2 - 512 16 2
3 - 4096 0 0
4 - 4096 8 0
5 - 4096 64 0
6 - 4096 128 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 29 Celsius
Available Spare: 99%
Available Spare Threshold: 10%
Percentage Used: 40%
Data Units Read: 1.312.831.523 [672 TB]
Data Units Written: 2.954.476.718 [1,51 PB]
Host Read Commands: 8.937.663.770
Host Write Commands: 15.006.645.683
Controller Busy Time: 12.752
Power Cycles: 103
Power On Hours: 56.986
Unsafe Shutdowns: 22
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Error Information (NVMe Log 0x01, max 64 entries)
Num ErrCount SQId CmdId Status PELoc LBA NSID VS
0 1 13 0x0000 0x4502 0x000 0 1 -
openqaworker1:~ # smartctl -a /dev/nvme1n1
smartctl 7.0 2019-05-21 r4917 [x86_64-linux-5.3.18-150300.59.49-default] (SUSE RPM)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: INTEL SSDPEKNW010T8
Serial Number: BTNH117605HA1P0B
Firmware Version: 004C
PCI Vendor/Subsystem ID: 0x8086
IEEE OUI Identifier: 0x5cd2e4
Controller ID: 1
Number of Namespaces: 1
Namespace 1 Size/Capacity: 1.024.209.543.168 [1,02 TB]
Namespace 1 Formatted LBA Size: 512
Local Time is: Thu Feb 17 17:24:33 2022 CET
Firmware Updates (0x14): 2 Slots, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Maximum Data Transfer Size: 32 Pages
Warning Comp. Temp. Threshold: 77 Celsius
Critical Comp. Temp. Threshold: 80 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 4.00W - - 0 0 0 0 0 0
1 + 3.00W - - 1 1 1 1 0 0
2 + 2.20W - - 2 2 2 2 0 0
3 - 0.0300W - - 3 3 3 3 5000 5000
4 - 0.0040W - - 4 4 4 4 5000 9000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 53 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 2%
Data Units Read: 12.041.103 [6,16 TB]
Data Units Written: 28.202.638 [14,4 TB]
Host Read Commands: 82.636.540
Host Write Commands: 137.849.821
Controller Busy Time: 4.807
Power Cycles: 1
Power On Hours: 226
Unsafe Shutdowns: 0
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Error Information (NVMe Log 0x01, max 256 entries)
No Errors Logged
/dev/nvme0n1 has one error logged so maybe worth investigating further
Updated by nicksinger almost 3 years ago
openqaworker1:~ # nvme error-log /dev/nvme0n1
Error Log Entries for device:nvme0n1 entries:64
.................
Entry[ 0]
.................
error_count : 1
sqid : 13
cmdid : 0
status_field : 0x4502(READ_ERROR: The read data could not be recovered from the media)
parm_err_loc : 0
lba : 0
nsid : 0x1
vs : 0
trtype : The transport type is not indicated or the error is not transport related.
cs : 0
trtype_spec_info: 0
.................
Updated by kraih almost 3 years ago
I've masked all 20 workers and restarted the machine. Tried to run hdparm --direct -t -T /dev/md127
, but even after 10 minutes didn't get any output and the process wouldn't react to a kill -9
.
Updated by kraih almost 3 years ago
After reboot i'm seeing some results that look ok:
$ hdparm --direct -t -T /dev/md127
/dev/md127:
Timing O_DIRECT cached reads: 1582 MB in 2.00 seconds = 790.47 MB/sec
Timing O_DIRECT disk reads: 4572 MB in 3.00 seconds = 1523.69 MB/sec
I'll bring the first two workers online again to collect some data.
Updated by favogt almost 3 years ago
That means the issue from https://progress.opensuse.org/issues/103524 is back or still there, even after replacement of nvme1n1.
The issue is that performance tanks at a random point of use, and flushing (triggered by e.g. QEMU and hdparm
) take several minutes.
Read access is also impacted:
openqaworker1:~ # dd if=/dev/nvme1n1 bs=1M count=1000 status=progress iflag=direct of=/dev/null
1033895936 Bytes (1,0 GB, 986 MiB) kopiert, 37 s, 27,9 MB/s
1000+0 Datensätze ein
1000+0 Datensätze aus
1048576000 Bytes (1,0 GB, 1000 MiB) kopiert, 37,5085 s, 28,0 MB/s
In contrast, /dev/nvme0n1
:
openqaworker1:~ # dd if=/dev/nvme0n1 bs=1M count=1000 status=progress iflag=direct of=/dev/null
1000+0 Datensätze ein
1000+0 Datensätze aus
1048576000 Bytes (1,0 GB, 1000 MiB) kopiert, 0,877994 s, 1,2 GB/s
I tried the latest kernel from Kernel:stable:Backport to see whether it makes a difference, but it was still slow after a reboot.
However, it was resolved instantly after running fstrim
:
openqaworker1:~ # fstrim -v /var/lib/openqa
/var/lib/openqa: 1,2 TiB (1312675467264 Bytes) getrimmt
openqaworker1:~ # dd if=/dev/nvme1n1 bs=1M count=1000 status=progress iflag=direct of=/dev/null
1000+0 Datensätze ein
1000+0 Datensätze aus
1048576000 Bytes (1,0 GB, 1000 MiB) kopiert, 0,743051 s, 1,4 GB/s
A daily fstrim.timer might help, I'll enable that.
Updated by favogt almost 3 years ago
- Is duplicate of action #103524: OW1: performance loss size:M added
Updated by favogt almost 3 years ago
- Status changed from New to Closed
- % Done changed from 0 to 100
So just https://progress.opensuse.org/issues/103524 again, with slightly different symptoms on top.
Updated by kraih almost 3 years ago
- Status changed from Closed to New
- % Done changed from 100 to 0
The SQLite issues are also back this morning:
461|[2195408,"other","Tumbleweed.x86_64-1.0-libvirt-Snapshot20220216.vagrant.libvirt.box","http:\/\/openqa1-opensuse"]|2022-02-18 09:41:14|2022-02-18 09:41:14|2022-02-18 09:41:53|0|"DBD::SQLite::db do failed: database is locked at \/usr\/share\/openqa\/script\/..\/lib\/OpenQA\/CacheService.pm line 77.\n"||0|2022-02-18 09:41:17|failed|cache_asset|4|default|1|[]|{"lock":"Tumbleweed.x86_64-1.0-libvirt-Snapshot20220216.vagrant.libvirt.box.http:\/\/openqa1-opensuse"}||0
Disk I/O gets so slow that SQLite can't keep up anymore with journal commits. So, i can confirm that this is not a cache service specific issue.
Updated by kraih almost 3 years ago
- Status changed from New to Closed
- Assignee set to kraih