Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: R.py, a small subset of Python Requests (github.com/gabrielsroka)
3 points by gabrielsroka on Aug 12, 2024 | hide | past | favorite | 2 comments
i revisited the work that i started in 2022 re: writing a subset of Requests but using the standard library: https://github.com/gabrielsroka/r

back then, i tried using urllib.request (from the standard library, not urllib3) but it lacks what Requests/urllib3 has -- connection pools and keep alive [or that's where i thought the magic was] -- so my code ran much slower.

it turns out that urllib.request uses http.client, but it closes the connection. so by using http.client directly, i can keep the connection open [that's where the real magic is]. now my code runs as fast as Requests/urllib3, but in 5 lines of code instead of 4,000-15,000+

moral of the story: RTFM over and over and over again.

  """Fetch users from the Okta API and paginate."""
  
  import http.client
  import json
  import re
  import urllib.parse
  
  # Set these:
  host = 'domain.okta.com'
  token = 'xxx'
  url = '/api/v1/users?' + urllib.parse.urlencode({'filter': 'profile.lastName eq "Doe"'})
  
  headers = {'authorization': 'SSWS ' + token}
  conn = http.client.HTTPSConnection(host)
  while url:
     conn.request('GET', url, headers=headers)
     res = conn.getresponse()
     for user in json.load(res):
         print(user['id'])
     links = [link for link in res.headers.get_all('link') if 'rel="next"' in link]
     url = re.search('<https://[^/]+(.+)>', links[0]).group(1) if links else None
https://docs.python.org/3/library/urllib.request.html

https://docs.python.org/3/library/http.client.html



IIRC somewhere on the Python mailing list archives there's an email about whether to add the HTTP redirect handling and then SSL support - code to urllib, urllib2, or create urllib or httplib.

How does performance compare to HTTPX, does it support HTTP 1.1 request pipelining, does it support HTTP/2 or HTTP/3?


Clickable links

https://github.com/gabrielsroka/r

https://docs.python.org/3/library/urllib.request.html

https://docs.python.org/3/library/http.client.html

A more complete example:

    import http.client
    import urllib.parse
    import json as _json
    import re
    import time

    # Set these:
    host = 'example.okta.com'
    token = ....'

    conn = http.client.HTTPSConnection(host)
    headers = {'authorization': 'SSWS ' + token}

    def main():
        res = get('/api/v1/users/me')
        me = res.json
        user_id = me['id']
        print(me['id'])

        start = time.time()

        for user in get_objects('/api/v1/users', filter='profile.lastName eq "Doe"', limit=2):
            print(user['id'])

        end = time.time()

        print(f'{end - start:5.1f} sec (http.client)')

        # Update a user.
        # res = post('/api/v1/users/' + user_id, {'profile': {'title': 'admin'}})
        # me = res.json
        # print(me['profile']['title'], res.headers['x-rate-limit-remaining'])

    def rh(method, url, json=None):
        _headers = headers.copy()
        if json:
            body = _json.dumps(json, separators=(',', ':')).encode()
            _headers['Content-Type'] = 'application/json'
        else:
            body = None
        conn.request(method, url, body, _headers)
        res = conn.getresponse()
        if res.reason != 'No Content':
            res.json = _json.load(res)
        return res

    def get_objects(url, **fields):
        if fields: url += '?' + urllib.parse.urlencode(fields)
        while url:
            res = get(url)
            for o in res.json:
                yield o
            links = [link for link in res.headers.get_all('link') if 'rel="next"' in link]
            url = re.search('<https://[^/]+(.+)>', links[0]).group(1) if links else None

    def get(url):
        return rh('GET', url)

    def post(url, json=None):
        return rh('POST', url, json)

    main()




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: