Accelerating psycopg2


Back to an old task: psycopg2, the most popular Postgres connector for Python, can gain huge performance, if ever turned to binary mode.


The Postgres backend (TCP) protocol, is interfaced through "libpq", the library provided by the database as an API. Psycopg2 is a Python wrapper for that C library, also conforming to the database-agnostic "DBAPI" specification.

All these would "talk" the SQL text language, like all similar dbs do. In SQL, we mix statements and data in a single string:

SELECT 'foo', 'bar';

'foo' and 'bar' may be data supplied by users, untrusted, or binary, or just long enough to require formatting.

`libpq` supports a few ways to send that query:

  • PQexec() which accepts a single string (containing escaped parameters)
  • PQexecParams() which can take a list of parameters to substitute for "$1 , $2 ..." etc.

In the second case, we don't need to embed data in the SQL string, which eliminates the need to escape untrusted input, and also saves us from the trouble of computing that large string.

More important, `PQexecParams()` optionally supports /binary/ transfer of these params, which promises a much easier serialization, in terms of CPU usage. We don't need to turn everything to string, use decimal digits, escape non-ascii values etc.

For a start, we need a scenario to test against:


    for r in range(1, 10):
        ran.extend([r, 0.4* r, u'αβγ' + str(r)])
    qry = 'SELECT ' + ', '.join([ '%s' for x in ran]) + ' FROM generate_series(1,1000);'
    for i in range(0, int(options.num_reps)):
        cr.execute(qry, ran)
        res = cr.fetchall()

Which means, we construct a set of 4x10 "input parameters", we put them in a SQL query and use `generate_series` to have Postgres answer us with 1000 rows of that data. Repeat the query and fetch N times.

Let's run (and time) the script, using "compatible" mode with PQexec(). N=1000

got 1000000 x (1, Decimal('0.4'), '\xce\xb1\xce\xb2\xce\xb31', 2, Decimal('0.8')) ...
real 2m53.053s
user 2m42.180
sys 0m0.767s

Then, activate "binary cursor", which uses PQexecParams(). Only sending parameters in binary, from client to server. But asking for text tuples back from the server:

Using a binary cursor. Bintuples: False
got 1000000 x (1, 0.4, '\xce\xb1\xce\xb2\xce\xb31', 2, 0.8) ...
real 0m20.102s
user 0m8.237s
sys 0m0.363s

Huge gain: 162.180sec down to 8.237sec !

Then, also activate "binary tuples", which asks the server for binary response data:

Using a binary cursor. Bintuples: True
got 1000000 x (1, 0.4, '\xce\xb1\xce\xb2\xce\xb31', 2, 0.8) ...
real 0m19.928s
user 0m8.180s
sys 0m0.367s

Not much difference. Repeat the experiment a few times:

Using a binary cursor. Bintuples: True
got 1000000 x (1, 0.4, '\xce\xb1\xce\xb2\xce\xb31', 2, 0.8) ...

real 0m20.893s
user 0m8.573s
sys 0m0.300s

But then, realized that a bug in "bintuples" code would only the ask 1st query in binary, all subsequent in text mode. Fix that, run again:

Using a binary cursor. Bintuples: True
trying with: [1, 0.4, u'\u03b1\u03b2\u03b31', 2, 0.8]
real 0m34.722s
user 0m28.137s
sys 0m0.283s

Oops! Something must be wrong here, we got +20sec of CPU processing, which makes no sense. Given that we only try to decode two integers and a string (decimals are /not/ implemented), there must be something fishy in the implementation.

Valgrind is our help. It can profile (aka. annotate the time taken at each C-level function of a compiled binary; unlike "gprof" that needs us to recompile everything - including python) our library and help diagnose the source of CPU strain.

Indeed, in my first take on decoding fetch data, I had preferred implicit `unicode` conversion rather than just passing a stream of 8-bit characters to the caller. This proved to be expensive, and would break the comparison with old mode (which was not converting).

Reverted to plain "str" output, ran again:

Using a binary cursor. Bintuples: True
trying with: [1, 0.4, u'\u03b1\u03b2\u03b31', 2, 0.8]
got 1000000 x (1, 0.1, '\xce\xb1\xce\xb2\xce\xb31', 2, 0.1) ...

real 0m15.193s
user 0m7.223s
sys 0m0.403s

Now, this result proves the initial theory, saves us -1sec of decoding, apparently also 4sec of server-side encoding effort.

Conclusion

Although these tests are rough, very few data types are implemented so far, we have positive results along the theory. We can save significant amounts of CPU time by using `PQexecParams()` , mainly in the query direction. The response direction is less affected, because the 'libpq' API does already send each row, each tuple in its own container, efficiently allocated. Note that in border cases, where machines are stressed and memory starts to swap, less variable copying may mean lower disk I/O, too.

Confident with these results, we can continue implementing more data types, which may even bring further savings due to their nature: datetimes will be treated in "native" binary forms, array and composite types will need less parsing.

Q/A:

1. will my query run 10x faster?

No, it won't . Postgres will need as much time querying and crunching your data as before. We only save /most of/ the time we would spend /pushing/ that data into the backend-frontend connection.

2. is it safer, now?

Yes, although I trust libpq's "PQescapeXX()" methods not to miss any case, transferring data in an isolated channel means we eliminate SQL injection in principle.

3. can you just "switch on" the feature and boost all existing code?

Sadly, not. Going binary is not API-compatible with the previous situation: in binary mode, we need to explicitly define the data types of parameters. Tricks like "INSERT INTO logs(log_date) VALUES('2014-10-20')" will no longer work, because "2014-10-20" will be hard-typed to a string instead of automatically cast to date. Our application code will need to adapt to that..

4. Is it production ready?

Not yet. We need to implement enough data types, write converters in C for all of them, before binary mode can replace text in any existing application

5. Does it worth it, then?

IMHO, yes. Big data is just around the corner. Our applications are getting bigger and more sophisticated every week. Our (stored) data increases every day. Postgres is improving in every aspect, now challenging NoSQL high-performance DBs. Having a sub-optimal Python connector is no longer acceptable.

Follow us