action #119431
opencoordination #80142: [saga][epic] Scale out: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes
coordination #92854: [epic] limit overload of openQA webUI by heavy requests
action #97190: Limit size of initial requests everywhere, e.g. /, /tests, etc., over webUI and API
Inform users e.g. in the webUI if not all results are returned size:M
0%
Description
Acceptance criteria¶
- AC1: When data on query or API routes is limited the user is informed that not all results are returned
Suggestions¶
- Wait for outcome of #114421
- Research industry best-practice for the API design/ UX (use of HTTP headers, additional information in the JSON data...) e.g. GitHub API
Updated by okurz about 2 years ago
- Copied from action #114421: Add a limit where it makes sense after we have it for /tests, query configurable size:M added
Updated by robert.richardson about 2 years ago
- Status changed from New to Blocked
- Assignee set to robert.richardson
Updated by robert.richardson about 2 years ago
- Status changed from Blocked to Feedback
PR for https://progress.opensuse.org/issues/114421 was merged
Updated by livdywan about 2 years ago
- Subject changed from Inform users e.g. in the webUI if not all results are returned to Inform users e.g. in the webUI if not all results are returned size:M
- Description updated (diff)
- Status changed from Feedback to Workable
- Assignee deleted (
robert.richardson)
Updated by robert.richardson about 2 years ago
- Assignee set to robert.richardson
Both github and gitlab avoid listing full result sets through the use of pagination
GitHub (API):¶
GitHub's API uses two pagination methods: page-based pagination and cursor-based pagination. If the link
header includes page
, then the operation uses page-based pagination. If the link
header includes before
and after
, then the operation uses cursor-based pagination.
Traversing with pagination - GitHub Docs
- If GitHub takes more than 10 seconds to process an API request, GitHub will terminate the request and you will receive a timeout response.
Page-/Offset-based pagination
The link header for page-based pagination will tell you information about the prev
, next
, first
, or last
pages. If you did not request a specific page, then the response will default to the first page and information about the first and previous pages will be omitted.
- differing default amount of elements per route, for some routes you can pass a
per_page
parameter (max: 100)
Cursor based pagination
Cursor pagination uses terms before
and after
in order to navigate through pages. rel="next"
and rel="prev"
this mark the cursor point in the data set and provides a reference for traveling to the page before
and after
the current page.
- Unlike page-based pagination, the results will not return the last page number in the response.
- Because cursor based pagination creates a reference point in the data set, it cannot calculate the total number of results.
GitLab (API):¶
Gitlab uses offset pagination for all routes as default, for some big, ressource-heavy routes there is optional keyset pagination.
API Docs | GitLab
GitLab application limits | GitLab
Page-/Offset-Based Pagination: (flexible parameters, runtime increases when accessing pages in high numbers)
Link Headers are returned with each response. They haverel
set toprev
,next
,first
, orlast
and contain the relevant URL.
also includes e.gnext-page
,page
,per-page
,prev-page
,request-id
,runtime
,total
,total-pages
For large Resultsets Gilab limits the maximum requested offset into the set of results (50000), this pagination limit can be deactivated in self managed installation. For these “limited” Routes Gitlab alternatively offers keyset-based pagination (see below)
For performance reasons, if a query returns more than 10,000 records, GitLab doesn’t return the following headers:x-total, x-total-pages, rel="last" link
Keyset-based pagination: (runtime is independent of the size of the collection, less flexible)
Request always includesorder_by
,sort
, optional:per_page
Link Headers link to next page with filter or id that excludes already-retrieved records from the next ResultSet (e.g.id_after
)
Updated by okurz about 2 years ago
- Priority changed from Low to High
This is actually getting more important due to the "regressions" that users can not get all results anymore, e.g. see https://suse.slack.com/archives/C02CANHLANP/p1668584467225009
Updated by robert.richardson about 2 years ago
- Status changed from Workable to In Progress
Updated by okurz about 2 years ago
- Status changed from In Progress to Workable
- Assignee deleted (
robert.richardson) - Priority changed from High to Low
Actually priority can be low again with #120315#note-18
As discussed with him unassigning Robert
Updated by kraih about 2 years ago
I think one important question here needs to be answered before this can move forward. Are we ok with simply informing the user that the maximum number of results is returned? Or do we need to know if there really are more results? Because the latter is really hard and will require most SQL queries to be modified in significant ways. I tend to use the PostgreSQL COUNT(*) OVER()
feature for this, but including it in DBIx::Class queries would be tricky.
Edit: This is not true for all queries. While going over the various limits, i noticed that many are simple enough that we could detect more results simply by requesting one more than the limit (and then removing it manually from the resultset). Pagination is still the proper solution though.
Updated by kraih about 2 years ago
- Assignee set to kraih
Assigning it to myself to see if i can dig up some best practices.
Updated by okurz about 2 years ago
kraih wrote:
I think one important question here needs to be answered before this can move forward. Are we ok with simply informing the user that the maximum number of results is returned? Or do we need to know if there really are more results? Because the latter is really hard and will require most SQL queries to be modified in significant ways. I tend to use the PostgreSQL
COUNT(*) OVER()
feature for this, but including it in DBIx::Class queries would be tricky.
Well, the first would be a small improvement which you can introduce as a first PR of course. but it's a logical consequence that one would be interested to know if there would be more results or not so we need the second step anyway.
Updated by kraih about 2 years ago
Well, the first would be a small improvement which you can introduce as a first PR of course. but it's a logical consequence that one would be interested to know if there would be more results or not so we need the second step anyway.
In that case it would probably be easier and more user friendly to just do pagination. Still a larger task though, that should probably be an epic with subtasks for each API endpoint that needs pagination.
Updated by livdywan about 2 years ago
kraih wrote:
I think one important question here needs to be answered before this can move forward. Are we ok with simply informing the user that the maximum number of results is returned? Or do we need to know if there really are more results? Because the latter is really hard and will require most SQL queries to be modified in significant ways. I tend to use the PostgreSQL
COUNT(*) OVER()
feature for this, but including it in DBIx::Class queries would be tricky.
AC1 says When data on query or API routes is limited the user is informed that not all results are returned
. So no. The goal is to let the user know that data is missing.
Updated by kraih about 2 years ago
cdywan wrote:
AC1 says
When data on query or API routes is limited the user is informed that not all results are returned
. So no. The goal is to let the user know that data is missing.
What the AC says is not in question. The ticket has been estimated wrong if that is really the goal. So i'm asking if we would be ok with a simplified solution that fits the ticket scope.
Updated by okurz about 2 years ago
Hm, then I suggest not continue with that approach and rather do proper pagination
Updated by kraih about 2 years ago
okurz wrote:
Hm, then I suggest not continue with that approach and rather do proper pagination
That would also be my preference.
Updated by kraih about 2 years ago
Not sure what we were thinking when we estimated this ticket. It is quite unclear which API endpoints even need pagination/limit information. All the linked tickets only have partial data. My starting point would probably be the limit settings, and then a search for the individual limits. That is a lot of endpoints, with very different query implementations.
So lets propose some next steps:
- This ticket is getting rejected, and a new epic created for API pagination.
- Then a linked ticket is created for a select API endpoint to work out the specifics of the pagination API (headers etc.). Which can then be applied to other API endpoints as well. The obvious first choice would be
GET /api/v1/assets
from #120315. - Another ticket to make a list of all API endpoints that need pagination and to create more linked tickets.
- Maybe as a followup some additions to the client to take advantage of
Link: next=...
headers.
Tests and actual API (headers/query params) should be quite similar between the API endpoints, what complicates this is all the different database queries.
Updated by okurz about 2 years ago
kraih wrote:
So lets propose some next steps:
- This ticket is getting rejected, and a new epic created for API pagination.
- Then a linked ticket is created for a select API endpoint to work out the specifics of the pagination API (headers etc.).
Good idea but please in opposite order ;)
We also have the ticket
#119428 about pagination so maybe just add or reference your suggestions there
Updated by kraih about 2 years ago
okurz wrote:
Good idea but please in opposite order ;)
We also have the ticket
#119428 about pagination so maybe just add or reference your suggestions there
Sounds good.
Updated by kraih about 2 years ago
Btw. I think /api/v1/job_settings/jobs
with the job_settings_max_recent_jobs
limit will be the toughest case here. It might be impossible to add pagination or to show the limit triggered.
Updated by okurz about 2 years ago
That's correct but here I think we can go a slightly different approach. It wouldn't be "pagination" but rather shifting the search window. We limit by job id, right? So we should offer to look deeper into the past on request
Updated by kraih about 2 years ago
okurz wrote:
That's correct but here I think we can go a slightly different approach. It wouldn't be "pagination" but rather shifting the search window. We limit by job id, right? So we should offer to look deeper into the past on request
We can move the window, but we cannot tell the user if there are more results to be found. Which doesn't really help with the problem you ran into earlier where you only got 2 of 257 results. All you could do in that case is try more windows of 20000 job ids, again and again. Of course it's better than nothing.
Updated by kraih about 2 years ago
- Related to action #120841: Add pagination for GET /api/v1/assets size:M added
Updated by kraih about 2 years ago
- Status changed from Workable to Blocked
Blocked by #119428. Which has all the subtasks for API endpoints that still need pagination support.
Updated by kraih about 2 years ago
- Related to action #119428: Ensure users can get all the data for limited queries, e.g. with pagination added
Updated by okurz over 1 year ago
- Status changed from Blocked to Workable
#119428 might not be completely done but I think all related queries about "results" are covered. I think you/we can continue. WDYT?