Skip to content

Commit d118865

Browse files
committed
handle the pagination Link headers for _list requests
The patchwork REST API defaults to sending a maximum of 30 items for API requests which return a list. If the API request has more than 30 items, then pwclient will only ever see the first page, and confuse users who expect to see the entire set of items. To handle this, the API includes 'Link' headers in the response which indicate whether there is more data, and if so, what URL the data is available at. Add a method to extract the page number query parameter from the Link header URL if it exists. Use this to recursively call _list() until there is no more next page to obtain. Implement extraction of the page number as a new static method which deals with the complexity of analyzing the link headers. We only check for the first link header, and only the first 'rel="next"' portion of it. Split the discovered URL using urllib.parse.urlparse to find the query section and locate the page=<value> bit. This method avoids needing to add a new dependency (such as the requests library), and should work for valid Link headers provided by patchwork. At any point, if we fail to find the expected data, link parsing is stopped and we return the set of items we've found so far. This should fix pwclient to properly handle multiple pages of data. This will likely cause some invocations of pwclient to slow down as they must now query every available page. This is preferable to not returning the available data. A future improvement to reduce this would be to extend -n and -N to work with pages so that we avoid downloading unnecessary data. Signed-off-by: Jacob Keller <[email protected]>
1 parent 4d2f914 commit d118865

File tree

1 file changed

+39
-2
lines changed

1 file changed

+39
-2
lines changed

pwclient/api.py

+39-2
Original file line numberDiff line numberDiff line change
@@ -581,6 +581,28 @@ def _detail(
581581
data, _ = self._get(url)
582582
return json.loads(data)
583583

584+
@staticmethod
585+
def _get_next_page(headers):
586+
link_header = next((data for header, data in headers if header == 'Link'), None)
587+
if link_header is None:
588+
return None
589+
590+
rel = '; rel="next"'
591+
592+
url = next((l[:-len(rel)] for l in link_header.split(',') if l.endswith(rel)), None)
593+
if url is None:
594+
return None;
595+
596+
if not (url.startswith('<') and url.endswith('>')):
597+
return None;
598+
599+
parsed_link = urllib.parse.urlparse(url[1:-1])
600+
page = next((x for x in parsed_link.query.split('&') if x.startswith('page=')), None)
601+
if page is None:
602+
return None
603+
604+
return int(page[5:])
605+
584606
def _list(
585607
self,
586608
resource_type,
@@ -594,8 +616,23 @@ def _list(
594616
url = f'{url}{resource_id}/{subresource_type}/'
595617
if params:
596618
url = f'{url}?{urllib.parse.urlencode(params)}'
597-
data, _ = self._get(url)
598-
return json.loads(data)
619+
data, headers = self._get(url)
620+
621+
items = json.loads(data)
622+
623+
page_nr = self._get_next_page(headers)
624+
if page_nr is None:
625+
return items
626+
627+
if params is None:
628+
params = {}
629+
params['page'] = page_nr
630+
631+
items += self._list(resource_type, params,
632+
resource_id=resource_id,
633+
subresource_type=subresource_type)
634+
635+
return items
599636

600637
# project
601638

0 commit comments

Comments
 (0)