ET Phone Home
For these examples we of course need an API, so I made one with rate-limiting (I'm curious how httpolars compares to a traditional optimised Python thread/process pool API fetching routine).
I spun up 2 GET endpoints on localhost [i.e. my PC] with FastAPI:
/noopwhich returns its input unmodified (a "no-op") ,{"value": "x"}
→{"value": "x"}/factorialwhich gives the factorial of the input.{"number": 3}→{"number": 3, "factorial": 6}
from math import factorial
from fastapi import FastAPI, Request
from slowapi import Limiter, _rate_limit_exceeded_handler
from slowapi.errors import RateLimitExceeded
from slowapi.util import get_remote_address
limiter = Limiter(key_func=get_remote_address)
app = FastAPI()
app.state.limiter = limiter
app.add_exception_handler(RateLimitExceeded, _rate_limit_exceeded_handler)
@app.get("/noop")
@limiter.limit("4/2 seconds")
async def read_noop(request: Request, value: str | None = None):
return {"value": value}
@app.get("/factorial")
@limiter.limit("50/minute")
async def read_factorial(request: Request, number: int | None = None):
return {"number": number, "factorial": factorial(number)}
def run_app():
import uvicorn
uvicorn.run(app, host="127.0.0.1", port=8000)
Set up: extracting a JSONPath to a Polars Series
Polars has functionality to work with JSON as a dtype:
json_decode(): Parse string values as JSON (takes adtypeor infers it)json_path_match(): Extract the first match of JSON string with the provided JSONPath expression.
We're interested in the latter: we'll pass a JSONPath to specify a field/sub-field of the response
to put in a Polars Series, to make a new column in the DataFrame.
Throws errors if invalid JSON strings are encountered. All return values will be cast to
Stringregardless of the original value.
The following helper function jsonpath:
- takes
response, the name of the Polars column we've put our HTTP response body (JSON string) in, and wraps it inpl.col()if it's not already a Polars expression. - reads the string as JSON and accesses the
pathJSONPath (will always give a string value)
import polars as pl
def jsonpath(response: str | pl.Expr, path: str):
"""Accept either the response `Expr` or reference by its column name."""
response = pl.col(response) if isinstance(response, str) else response
return response.str.json_path_match(f"$.{path}")
Demo 1: doing nothing
Let's call the /noop endpoint which will respond with our input,
and let's give the letters x, y, z as the input, in 3 separate calls.
url = "http://localhost:8000/noop"
df = pl.DataFrame({"value": ["x", "y", "z"]})
Now let's make a Polars Expr expression, just like when we call pl.col() on a column name.
import httpolars as httpl
response = httpl.api_call("value", endpoint=url)
Nothing got sent over the internet yet
No requests are executed by calling httpl.api_call(), it constructs the Polars expression on the input column pl.col("value")
If we print its repr we see that:
<Expr ['col("value")./home/louis/dev/h…'] at 0x7F9D56B716D0>
This is a statement of intent:
- to pass a
valuecolumn (not yet defined, only named)... - ...to the endpoint
url(defined, more specifically in the Rust extension as theApiCallKwargsstruct'sendpointfield, here).
That response variable denotes the column where we'll get back the response body as a string type
Polars column (a JSON string). Next we put it through our helper function jsonpath:
value = jsonpath(response, "value")
So now we've got a string dtype scalar column, still named value (the name is the same because
Polars just sees this as a transform on the column named value).
Let's look at the data briefly:
>>> df
shape: (3, 1)
┌───────┐
│ value │
│ --- │
│ str │
╞═══════╡
│ x │
│ y │
│ z │
└───────┘
>>> df.with_columns(response)
shape: (3, 1)
┌───────────────┐
│ value │
│ --- │
│ str │
╞═══════════════╡
│ {"value":"x"} │
│ {"value":"y"} │
│ {"value":"z"} │
└───────────────┘
>>> df.with_columns(jsonpath(response, "value"))
shape: (3, 1)
┌───────┐
│ value │
│ --- │
│ str │
╞═══════╡
│ x │
│ y │
│ z │
└───────┘
Neat! We've successfully done absolutely nothing. Give yourself a pat on the back.
When we stick the resulting Series back on the DataFrame, nothing changes (we simply overwrite the DataFrame column with identical data).
Demo 2: counting permutations
Moving onto our endpoint that computes the factorial of our input,
let's say we're interested in knowing how many arrangements there are of a set of number items
(the number of permutations of N things = N!, n factorial):
url = "http://localhost:8000/factorial"
df = pl.DataFrame({"number": [1, 2, 3]})
This time let's take the response (the JSON string) and 'store it' by renaming it to "response".
This way with_columns will put it alongside rather than overwriting the "number" column.
response = httpl.api_call("number", endpoint=url).alias("response")
Likewise let's also extract the "number" field in the response and rename its column to "supplied", so we can check we got back the same integer we put in.
in_ = jsonpath("response", "number").str.to_integer().alias("supplied")
And to keep organised, let's rename the value in the "factorial" key of the response to "permutations" (so our code is clear on what the value is needed for):
out = jsonpath("response", "factorial").str.to_integer().alias("permutations")
Then let's run the HTTP calls by using that response variable, selecting the columns, and then
drop the response column (which we just name by a string of its name here):
result = df.with_columns(response).with_columns([in_, out]).drop("response")
The result is correct:
┌────────┬──────────┬──────────────┐
│ number ┆ supplied ┆ permutations │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞════════╪══════════╪══════════════╡
│ 1 ┆ 1 ┆ 1 │
│ 2 ┆ 2 ┆ 2 │
│ 3 ┆ 3 ┆ 6 │
└────────┴──────────┴──────────────┘
and our FastAPI server recorded just a single GET request for each data point, as expected:
INFO: 127.0.0.1:59410 - "GET /factorial?number=1 HTTP/1.1" 200 OK
INFO: 127.0.0.1:59418 - "GET /factorial?number=2 HTTP/1.1" 200 OK
INFO: 127.0.0.1:59430 - "GET /factorial?number=3 HTTP/1.1" 200 OK