action #119431: Inform users e.g. in the webUI if not all results are returned size:M - openQA Project (public) - openSUSE Project Management Tool

Actions

action #119431

open

coordination #80142: [saga][epic] Scale out: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes

coordination #92854: [epic] limit overload of openQA webUI by heavy requests

action #97190: Limit size of initial requests everywhere, e.g. /, /tests, etc., over webUI and API

Inform users e.g. in the webUI if not all results are returned size:M

Added by okurz over 2 years ago. Updated almost 2 years ago.

Status:

Workable

Priority:

Low

Assignee:

Category:

Feature requests

Target version:

QA (public) - future

Start date:

2022-10-26

Due date:

% Done:

Estimated time:

Description

Acceptance criteria¶

AC1: When data on query or API routes is limited the user is informed that not all results are returned

Suggestions¶

Wait for outcome of #114421
Research industry best-practice for the API design/ UX (use of HTTP headers, additional information in the JSON data...) e.g. GitHub API

Related issues 3 (1 open — 2 closed)

Actions

Copy link

Updated by okurz over 2 years ago

Copied from action #114421: Add a limit where it makes sense after we have it for /tests, query configurable size:M added

Actions

Copy link

Updated by livdywan over 2 years ago

Description updated (diff)

Actions

Copy link

Updated by robert.richardson over 2 years ago

Status changed from New to Blocked
Assignee set to robert.richardson

Actions

Copy link

Updated by tinita over 2 years ago

Description updated (diff)

Actions

Copy link

Updated by robert.richardson over 2 years ago

Status changed from Blocked to Feedback

PR for https://progress.opensuse.org/issues/114421 was merged

Actions

Copy link

Updated by livdywan over 2 years ago

Subject changed from Inform users e.g. in the webUI if not all results are returned to Inform users e.g. in the webUI if not all results are returned size:M
Description updated (diff)
Status changed from Feedback to Workable
Assignee deleted (~~robert.richardson~~)

Actions

Copy link

Updated by robert.richardson over 2 years ago

Assignee set to robert.richardson

Both github and gitlab avoid listing full result sets through the use of pagination

GitHub (API):¶

GitHub's API uses two pagination methods: page-based pagination and cursor-based pagination. If the link header includes page, then the operation uses page-based pagination. If the link header includes before and after, then the operation uses cursor-based pagination.
Traversing with pagination - GitHub Docs

If GitHub takes more than 10 seconds to process an API request, GitHub will terminate the request and you will receive a timeout response.

Page-/Offset-based pagination
The link header for page-based pagination will tell you information about the prev, next, first, or lastpages. If you did not request a specific page, then the response will default to the first page and information about the first and previous pages will be omitted.

differing default amount of elements per route, for some routes you can pass a per_page parameter (max: 100)

Cursor based pagination
Cursor pagination uses terms before and after in order to navigate through pages. rel="next" and rel="prev" this mark the cursor point in the data set and provides a reference for traveling to the page before and after the current page.

Unlike page-based pagination, the results will not return the last page number in the response.
Because cursor based pagination creates a reference point in the data set, it cannot calculate the total number of results.

GitLab (API):¶

Gitlab uses offset pagination for all routes as default, for some big, ressource-heavy routes there is optional keyset pagination.
API Docs | GitLab
GitLab application limits | GitLab

Page-/Offset-Based Pagination: (flexible parameters, runtime increases when accessing pages in high numbers)
Link Headers are returned with each response. They have rel set to prev, next, first, or last and contain the relevant URL.
also includes e.g next-page, page, per-page, prev-page, request-id, runtime, total, total-pages
For large Resultsets Gilab limits the maximum requested offset into the set of results (50000), this pagination limit can be deactivated in self managed installation. For these “limited” Routes Gitlab alternatively offers keyset-based pagination (see below)
For performance reasons, if a query returns more than 10,000 records, GitLab doesn’t return the following headers:x-total, x-total-pages, rel="last" link
Keyset-based pagination: (runtime is independent of the size of the collection, less flexible)
Request always includes order_by, sort, optional: per_page
Link Headers link to next page with filter or id that excludes already-retrieved records from the next ResultSet (e.g. id_after )

Actions

Copy link

Updated by okurz over 2 years ago

Priority changed from Low to High

This is actually getting more important due to the "regressions" that users can not get all results anymore, e.g. see https://suse.slack.com/archives/C02CANHLANP/p1668584467225009

Actions

Copy link

Updated by robert.richardson over 2 years ago

Status changed from Workable to In Progress

Actions

Copy link

#10

Updated by okurz over 2 years ago

Status changed from In Progress to Workable
Assignee deleted (~~robert.richardson~~)
Priority changed from High to Low

Actually priority can be low again with #120315#note-18

As discussed with him unassigning Robert

Actions

Copy link

#11

Updated by kraih over 2 years ago

I think one important question here needs to be answered before this can move forward. Are we ok with simply informing the user that the maximum number of results is returned? Or do we need to know if there really are more results? Because the latter is really hard and will require most SQL queries to be modified in significant ways. I tend to use the PostgreSQL COUNT(*) OVER() feature for this, but including it in DBIx::Class queries would be tricky.

Edit: This is not true for all queries. While going over the various limits, i noticed that many are simple enough that we could detect more results simply by requesting one more than the limit (and then removing it manually from the resultset). Pagination is still the proper solution though.

Actions

Copy link

#12

Updated by kraih over 2 years ago

Assignee set to kraih

Assigning it to myself to see if i can dig up some best practices.

Actions

Copy link

#13

Updated by okurz over 2 years ago

kraih wrote:

I think one important question here needs to be answered before this can move forward. Are we ok with simply informing the user that the maximum number of results is returned? Or do we need to know if there really are more results? Because the latter is really hard and will require most SQL queries to be modified in significant ways. I tend to use the PostgreSQL COUNT(*) OVER() feature for this, but including it in DBIx::Class queries would be tricky.

Well, the first would be a small improvement which you can introduce as a first PR of course. but it's a logical consequence that one would be interested to know if there would be more results or not so we need the second step anyway.

Actions

Copy link

#14

Updated by kraih over 2 years ago

Well, the first would be a small improvement which you can introduce as a first PR of course. but it's a logical consequence that one would be interested to know if there would be more results or not so we need the second step anyway.

In that case it would probably be easier and more user friendly to just do pagination. Still a larger task though, that should probably be an epic with subtasks for each API endpoint that needs pagination.

Actions

Copy link

#15

Updated by livdywan over 2 years ago

kraih wrote:

I think one important question here needs to be answered before this can move forward. Are we ok with simply informing the user that the maximum number of results is returned? Or do we need to know if there really are more results? Because the latter is really hard and will require most SQL queries to be modified in significant ways. I tend to use the PostgreSQL COUNT(*) OVER() feature for this, but including it in DBIx::Class queries would be tricky.

AC1 says When data on query or API routes is limited the user is informed that not all results are returned. So no. The goal is to let the user know that data is missing.

Actions

Copy link

#16

Updated by kraih over 2 years ago

cdywan wrote:

AC1 says When data on query or API routes is limited the user is informed that not all results are returned. So no. The goal is to let the user know that data is missing.

What the AC says is not in question. The ticket has been estimated wrong if that is really the goal. So i'm asking if we would be ok with a simplified solution that fits the ticket scope.

Actions

Copy link

#17

Updated by okurz over 2 years ago

Hm, then I suggest not continue with that approach and rather do proper pagination

Actions

Copy link

#18

Updated by kraih over 2 years ago

okurz wrote:

Hm, then I suggest not continue with that approach and rather do proper pagination

That would also be my preference.

Actions

Copy link

#19

Updated by kraih over 2 years ago

Not sure what we were thinking when we estimated this ticket. It is quite unclear which API endpoints even need pagination/limit information. All the linked tickets only have partial data. My starting point would probably be the limit settings, and then a search for the individual limits. That is a lot of endpoints, with very different query implementations.

So lets propose some next steps:

This ticket is getting rejected, and a new epic created for API pagination.
Then a linked ticket is created for a select API endpoint to work out the specifics of the pagination API (headers etc.). Which can then be applied to other API endpoints as well. The obvious first choice would be GET /api/v1/assets from #120315.
Another ticket to make a list of all API endpoints that need pagination and to create more linked tickets.
Maybe as a followup some additions to the client to take advantage of Link: next=... headers.

Tests and actual API (headers/query params) should be quite similar between the API endpoints, what complicates this is all the different database queries.

Actions

Copy link

#20

Updated by kraih over 2 years ago

Status changed from Workable to In Progress

Actions

Copy link

#21

Updated by okurz over 2 years ago

kraih wrote:

So lets propose some next steps:

This ticket is getting rejected, and a new epic created for API pagination.

Then a linked ticket is created for a select API endpoint to work out the specifics of the pagination API (headers etc.).

Good idea but please in opposite order ;)

We also have the ticket
#119428 about pagination so maybe just add or reference your suggestions there

Actions

Copy link

#22

Updated by kraih over 2 years ago

okurz wrote:

Good idea but please in opposite order ;)

We also have the ticket
#119428 about pagination so maybe just add or reference your suggestions there

Sounds good.

Actions

Copy link

#23

Updated by kraih over 2 years ago

Btw. I think /api/v1/job_settings/jobs with the job_settings_max_recent_jobs limit will be the toughest case here. It might be impossible to add pagination or to show the limit triggered.

Actions

Copy link

#24

Updated by okurz over 2 years ago

That's correct but here I think we can go a slightly different approach. It wouldn't be "pagination" but rather shifting the search window. We limit by job id, right? So we should offer to look deeper into the past on request

Actions

Copy link

#25

Updated by kraih over 2 years ago

okurz wrote:

That's correct but here I think we can go a slightly different approach. It wouldn't be "pagination" but rather shifting the search window. We limit by job id, right? So we should offer to look deeper into the past on request

We can move the window, but we cannot tell the user if there are more results to be found. Which doesn't really help with the problem you ran into earlier where you only got 2 of 257 results. All you could do in that case is try more windows of 20000 job ids, again and again. Of course it's better than nothing.

Actions

Copy link

#26