gRPC
20/08/2020 | Jump to [TL;DR takeaway] or [Why?]
A project I was working on required that the client send ~300,000 ints to a server. This rpc was invoked 25 times with different values which meant the network cost quickly became a bottleneck for a resource-constrained client. It was infeasible to expect the client to incur a 300K*25*4bytes = 27.5MB
cost.
The key property that I ended up exploiting was that we weren't concerned about the ordering. We just wanted this set of ints to go through. At a high level, if we sort and delta encode the ints, the resultant diff elements will be smaller than the original elements. Our bane, having to send ~300K ints, became a boon as the more number of elements you have to pack within 232, the smaller the difference between each element; this allows more compact packing.
Concretely, assuming a uniform distribution in the original list, each diff element will be 232/300K = 14316
on average, which fits in 14 bits. Of course, we can't use 14 bits as a hard upper bound to fit our elements, and will probably need to build a packer which can exploit the expected lower bit cost per element, but still encode elements which require more bits than average.
Prelim results showed that we probably don't need to handwrite this tricky piece of code ourselves. Simple gRPC
with gzip enabled gave us the benefits we were looking for.
Running a quick and dirty experiment allows us to ascertain that this result isn't fickle. Since there is no clear way to approximate a realistic workload for all systems, I tested with 2 common scenarios: The list of 300K ints is either sampled from a uniform distribution or a normal distribution. Across these 2 scenarios, I evaluated the upload cost with and without applying delta encoding.
TL;DR: If you want to send a set (or list of small valued) integers (over gRPC
), you can easily save bandwidth by delta encoding your dataset with negligible compute overhead.
Pretty late. My suggestion: if you're sending more than 10K ints, the benefits become quickly apparent.
While running trials for the previous experiment, the system config remained constant. Specifically, the client's requests were always gzip'ed in flight. This improvement could be due to either gRPC
or gzip and it'd be better to reduce the number of variables in our experiment. Since it's easy to toggle compression usage, I tested the data cost with and without compression.
The gap between an uncompressed raw stream and an uncompressed delta encoded stream makes it clear that while gzip is doing useful work, it's not the one giving us the bulk of the savings. What we're searching for now is what magic gRPC
, or rather protobuf
, employs.
A quick search yields the answer: Protobuf
encodes ints as varints, which serializes them in a variable number of bytes. It uses LEB128 which has an overhead of 1 bit per byte. Smaller valued ints will take fewer bytes, so a delta-encoded set would occupy less space — this is exactly what we were looking for.
We do stand on the shoulders of giants.
/fin
Aside: On the flip side, that means it'd take 5 bytes to serialize an int which is greater than 2(32-4) = 228. It'd be good to keep in mind to use fixed32
instead of int32
if your dataset contains predominantly large valued ints (no, I won't run experiments for that).