Project

General

Profile

Actions

action #160685

open

[qe-core] End to end testing of Databases using NFS

Added by szarate about 1 month ago. Updated 4 days ago.

Status:
New
Priority:
High
Assignee:
-
Category:
-
Target version:
-
Start date:
2024-05-21
Due date:
% Done:

0%

Estimated time:
Difficulty:


Related issues 2 (1 open1 closed)

Related to openQA Tests - action #115196: [qe-core] Prepare for ALP - Schedule Databases testsuite for ALPRejecteddvenkatachala2023-05-252023-05-25

Actions
Related to openQA Tests - coordination #109572: [qe-core][epic] MariaDB Galera TestingNewszarate

Actions
Actions #1

Updated by szarate about 1 month ago

  • Related to action #115196: [qe-core] Prepare for ALP - Schedule Databases testsuite for ALP added
Actions #2

Updated by acarvajal about 1 month ago

We need a reproducer for this, and so far I am not sure we have one.

Out of the customers affected and listed in the ticket, the one that caught my eye was Walgreens with their 9-node HANA Scale Out database.

HANA Scale Out is a configuration of HANA in which the database is spread out among many nodes; some tables are in the memory of only one node, while others are in other nodes, but always any table resides on only one node. File system backing is used mainly for database logs (Re Do and System Replication logs for example) and to store the data when the database is shut down. Usually, in a N Nodes Scale Out setup, N-1 nodes hold the actual data, while the remaining node works as a hot spare in case any of the nodes go down. For this, all nodes need to have access to the DB files from the other nodes. AFAIU there are many ways to share the files from one node to another, including NFS. For example, this Scale Out guide from Amazon has the instruction to do so using their NFS solution: https://docs.aws.amazon.com/sap/latest/sap-hana/fsx-host-scaleout.html

I'm guessing Walgreens' 9-node installation was probably a set of 3+1 HANA Scale Out installations with system replication (8 nodes) and a majority maker (9th node), but I could be mistaken. It could also had been a 8+1 HANA Scale Out installation without system replication.

In any case, it's a complicated scenario only to test NFS regressions.

I wonder if installing a single machine HANA (no Scale Out, no Scale Up), but having the file systems (/hana/data, /hana/log, /hana/shared and /usr/sap//home) be mounted over NFS, updating the system to the faulty kernel version, and stopping and starting the database several times, would be enough to certify it's working? This is assuming the faulty kernel goes into the HANA node ... as I'm not sure whether it was on the NFS server side (I'm guessing Walgreens' installation - which is in the cloud - uses a cloud service for NFS instead of a box with SLES, but again, not sure)

Does anybody have access to the faulty kernel? If so, I can prepare a 12-SP5 system as described above with NFS backing and we can do a quick test to see if it reproduces the issue.

Actions #3

Updated by szarate about 1 month ago

Adding #109572 which is about galera, but from major database players, cluster configurations could prove an interesting component to check in this same context.

Actions #4

Updated by szarate about 1 month ago

Actions #5

Updated by slo-gin 4 days ago

This ticket was set to High priority but was not updated within the SLO period. Please consider picking up this ticket or just set the ticket to the next lower priority.

Actions

Also available in: Atom PDF