Project

General

Profile

action #119431

coordination #80142: [saga][epic] Scale out: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes

coordination #92854: [epic] limit overload of openQA webUI by heavy requests

action #97190: Limit size of initial requests everywhere, e.g. /, /tests, etc., over webUI and API

Inform users e.g. in the webUI if not all results are returned size:M

Added by okurz 3 months ago. Updated about 2 months ago.

Status:
Blocked
Priority:
Low
Assignee:
Category:
Feature requests
Target version:
Start date:
2022-10-26
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Acceptance criteria

  • AC1: When data on query or API routes is limited the user is informed that not all results are returned

Suggestions

  • Wait for outcome of #114421
  • Research industry best-practice for the API design/ UX (use of HTTP headers, additional information in the JSON data...) e.g. GitHub API

Related issues

Related to openQA Project - action #120841: Add pagination for GET /api/v1/assets size:MResolved2022-11-22

Related to openQA Project - action #119428: Ensure users can get all the data for limited queries, e.g. with paginationBlocked2022-11-22

Copied from openQA Project - action #114421: Add a limit where it makes sense after we have it for /tests, query configurable size:MResolved

History

#1 Updated by okurz 3 months ago

  • Copied from action #114421: Add a limit where it makes sense after we have it for /tests, query configurable size:M added

#2 Updated by cdywan 3 months ago

  • Description updated (diff)

#3 Updated by robert.richardson 3 months ago

  • Status changed from New to Blocked
  • Assignee set to robert.richardson

#4 Updated by tinita 3 months ago

  • Description updated (diff)

#5 Updated by robert.richardson 3 months ago

  • Status changed from Blocked to Feedback

#6 Updated by cdywan 3 months ago

  • Subject changed from Inform users e.g. in the webUI if not all results are returned to Inform users e.g. in the webUI if not all results are returned size:M
  • Description updated (diff)
  • Status changed from Feedback to Workable
  • Assignee deleted (robert.richardson)

#7 Updated by robert.richardson 3 months ago

  • Assignee set to robert.richardson

Both github and gitlab avoid listing full result sets through the use of pagination

GitHub (API):

GitHub's API uses two pagination methods: page-based pagination and cursor-based pagination. If the link header includes page, then the operation uses page-based pagination. If the link header includes before and after, then the operation uses cursor-based pagination.
Traversing with pagination - GitHub Docs

  • If GitHub takes more than 10 seconds to process an API request, GitHub will terminate the request and you will receive a timeout response.

Page-/Offset-based pagination
The link header for page-based pagination will tell you information about the prevnextfirst, or lastpages. If you did not request a specific page, then the response will default to the first page and information about the first and previous pages will be omitted.

  • differing default amount of elements per route, for some routes you can pass a per_page parameter (max: 100)

Cursor based pagination
Cursor pagination uses terms before and after in order to navigate through pages. rel="next" and rel="prev" this mark the cursor point in the data set and provides a reference for traveling to the page before and after the current page.

  • Unlike page-based pagination, the results will not return the last page number in the response.
  • Because cursor based pagination creates a reference point in the data set, it cannot calculate the total number of results.

GitLab (API):

Gitlab uses offset pagination for all routes as default, for some big, ressource-heavy routes there is optional keyset pagination.
API Docs | GitLab
GitLab application limits | GitLab

  • Page-/Offset-Based Pagination: (flexible parameters, runtime increases when accessing pages in high numbers)
    Link Headers are returned with each response. They have rel set to prevnextfirst, or last and contain the relevant URL.
    also includes e.g next-page, page, per-page, prev-page, request-id, runtime, total, total-pages

  • For large Resultsets Gilab limits the maximum requested offset into the set of results (50000), this pagination limit can be deactivated in self managed installation. For these “limited” Routes Gitlab alternatively offers keyset-based pagination (see below)
    For performance reasons, if a query returns more than 10,000 records, GitLab doesn’t return the following headers:x-total, x-total-pages, rel="last" link

  • Keyset-based pagination: (runtime is independent of the size of the collection, less flexible)
    Request always includes order_by, sort, optional: per_page
    Link Headers link to next page with filter or id that excludes already-retrieved records from the next ResultSet (e.g. id_after )

#8 Updated by okurz 3 months ago

  • Priority changed from Low to High

This is actually getting more important due to the "regressions" that users can not get all results anymore, e.g. see https://suse.slack.com/archives/C02CANHLANP/p1668584467225009

#9 Updated by robert.richardson 3 months ago

  • Status changed from Workable to In Progress

#10 Updated by okurz 3 months ago

  • Status changed from In Progress to Workable
  • Assignee deleted (robert.richardson)
  • Priority changed from High to Low

Actually priority can be low again with #120315#note-18

As discussed with him unassigning Robert

#11 Updated by kraih 3 months ago

I think one important question here needs to be answered before this can move forward. Are we ok with simply informing the user that the maximum number of results is returned? Or do we need to know if there really are more results? Because the latter is really hard and will require most SQL queries to be modified in significant ways. I tend to use the PostgreSQL COUNT(*) OVER() feature for this, but including it in DBIx::Class queries would be tricky.

Edit: This is not true for all queries. While going over the various limits, i noticed that many are simple enough that we could detect more results simply by requesting one more than the limit (and then removing it manually from the resultset). Pagination is still the proper solution though.

#12 Updated by kraih 3 months ago

  • Assignee set to kraih

Assigning it to myself to see if i can dig up some best practices.

#13 Updated by okurz 3 months ago

kraih wrote:

I think one important question here needs to be answered before this can move forward. Are we ok with simply informing the user that the maximum number of results is returned? Or do we need to know if there really are more results? Because the latter is really hard and will require most SQL queries to be modified in significant ways. I tend to use the PostgreSQL COUNT(*) OVER() feature for this, but including it in DBIx::Class queries would be tricky.

Well, the first would be a small improvement which you can introduce as a first PR of course. but it's a logical consequence that one would be interested to know if there would be more results or not so we need the second step anyway.

#14 Updated by kraih 3 months ago

Well, the first would be a small improvement which you can introduce as a first PR of course. but it's a logical consequence that one would be interested to know if there would be more results or not so we need the second step anyway.

In that case it would probably be easier and more user friendly to just do pagination. Still a larger task though, that should probably be an epic with subtasks for each API endpoint that needs pagination.

#15 Updated by cdywan 3 months ago

kraih wrote:

I think one important question here needs to be answered before this can move forward. Are we ok with simply informing the user that the maximum number of results is returned? Or do we need to know if there really are more results? Because the latter is really hard and will require most SQL queries to be modified in significant ways. I tend to use the PostgreSQL COUNT(*) OVER() feature for this, but including it in DBIx::Class queries would be tricky.

AC1 says When data on query or API routes is limited the user is informed that not all results are returned. So no. The goal is to let the user know that data is missing.

#16 Updated by kraih 3 months ago

cdywan wrote:

AC1 says When data on query or API routes is limited the user is informed that not all results are returned. So no. The goal is to let the user know that data is missing.

What the AC says is not in question. The ticket has been estimated wrong if that is really the goal. So i'm asking if we would be ok with a simplified solution that fits the ticket scope.

#17 Updated by okurz 3 months ago

Hm, then I suggest not continue with that approach and rather do proper pagination

#18 Updated by kraih 3 months ago

okurz wrote:

Hm, then I suggest not continue with that approach and rather do proper pagination

That would also be my preference.

#19 Updated by kraih 3 months ago

Not sure what we were thinking when we estimated this ticket. It is quite unclear which API endpoints even need pagination/limit information. All the linked tickets only have partial data. My starting point would probably be the limit settings, and then a search for the individual limits. That is a lot of endpoints, with very different query implementations.

So lets propose some next steps:

  • This ticket is getting rejected, and a new epic created for API pagination.
  • Then a linked ticket is created for a select API endpoint to work out the specifics of the pagination API (headers etc.). Which can then be applied to other API endpoints as well. The obvious first choice would be GET /api/v1/assets from #120315.
  • Another ticket to make a list of all API endpoints that need pagination and to create more linked tickets.
  • Maybe as a followup some additions to the client to take advantage of Link: next=... headers.

Tests and actual API (headers/query params) should be quite similar between the API endpoints, what complicates this is all the different database queries.

#20 Updated by kraih 3 months ago

  • Status changed from Workable to In Progress

#21 Updated by okurz 3 months ago

kraih wrote:

So lets propose some next steps:

  • This ticket is getting rejected, and a new epic created for API pagination.
  • Then a linked ticket is created for a select API endpoint to work out the specifics of the pagination API (headers etc.).

Good idea but please in opposite order ;)

We also have the ticket
#119428 about pagination so maybe just add or reference your suggestions there

#22 Updated by kraih 3 months ago

okurz wrote:

Good idea but please in opposite order ;)

We also have the ticket
#119428 about pagination so maybe just add or reference your suggestions there

Sounds good.

#23 Updated by kraih 3 months ago

Btw. I think /api/v1/job_settings/jobs with the job_settings_max_recent_jobs limit will be the toughest case here. It might be impossible to add pagination or to show the limit triggered.

#24 Updated by okurz 3 months ago

That's correct but here I think we can go a slightly different approach. It wouldn't be "pagination" but rather shifting the search window. We limit by job id, right? So we should offer to look deeper into the past on request

#25 Updated by kraih 3 months ago

okurz wrote:

That's correct but here I think we can go a slightly different approach. It wouldn't be "pagination" but rather shifting the search window. We limit by job id, right? So we should offer to look deeper into the past on request

We can move the window, but we cannot tell the user if there are more results to be found. Which doesn't really help with the problem you ran into earlier where you only got 2 of 257 results. All you could do in that case is try more windows of 20000 job ids, again and again. Of course it's better than nothing.

#26 Updated by kraih 3 months ago

  • Related to action #120841: Add pagination for GET /api/v1/assets size:M added

#27 Updated by kraih 3 months ago

  • Status changed from In Progress to Blocked

Blocked by #120841.

#28 Updated by okurz about 2 months ago

  • Status changed from Blocked to Workable

blocker resolved.

#29 Updated by kraih about 2 months ago

  • Status changed from Workable to Blocked

Blocked by #119428. Which has all the subtasks for API endpoints that still need pagination support.

#30 Updated by kraih about 2 months ago

  • Related to action #119428: Ensure users can get all the data for limited queries, e.g. with pagination added

Also available in: Atom PDF