pybind11 With Sklearn Pipelines - Introduction
I’ve used scikit-learn pipelines extensively in adtech over the past few years.
Adtech typically demands sub-20ms latency, requiring careful balance between network calls to caches and local processing. Local operations like feature transformation for user attributes (location, interests) eliminate network delays.
For fast local processing, options include Cython, NumPy, and pybind11.
pybind11 enables scikit-learn pipelines to call C++ code, accelerating heavy operations like branching and looping. Stack-allocated arrays also perform significantly faster than Python equivalents.
Our goal is to convert:
class PythonPlusOne:
"""A pure Python transformer that adds 1 to specified keys in a dictionary.
Input format: Simple dictionary like {"a": 1, "b": 2}
Output format: Same dictionary with specified keys incremented by 1
"""
def __init__(self, columns):
"""Initialize with list of column names (keys) to transform."""
self.columns = columns
def fit(self, X, y=None):
"""Fit method (no-op for this transformer)."""
return self
def transform(self, X):
"""Transform the input dictionary by adding 1 to specified columns."""
if not isinstance(X, dict):
raise TypeError("Input must be a dictionary")
result = X.copy()
for col in self.columns:
if col in result:
result[col] = result[col] + 1.0
else:
raise KeyError(f"Column '{col}' not found in input dictionary")
return result
def fit_transform(self, X, y=None):
"""Convenience method to fit and transform in one step."""
return self.fit(X, y).transform(X)
into:
class AddOneToKeys : private AddOneToKeysImpl {
public:
using AddOneToKeysImpl::AddOneToKeysImpl;
AddOneToKeys& fit(const py::object& X, const py::object& y = py::none()) {
return *this;
}
std::unordered_map<std::string, double> transform(const std::unordered_map<std::string, double>& X) {
return AddOneToKeysImpl::transform(X);
}
std::unordered_map<std::string, double> fit_transform(const std::unordered_map<std::string, double>& X,
const py::object& y = py::none()) {
return transform(X);
}
};
inline std::unordered_map<std::string, double> add_scalar_to_dict(
const std::unordered_map<std::string, double>& input,
double scalar) {
std::unordered_map<std::string, double> result;
for (const auto& [key, value] : input) {
result[key] = value + scalar;
}
return result;
}
This series documents my notes on using pybind11 on scikit-learn pipelines.
A quick overview on performance:
- AddOneToKeys: A transformer that adds 1.0 to specified keys in a dictionary. For example, if you have {"a": 5, "b": 10}
and transform keys ["a"]
, you get {"a": 6, "b": 10}
.
Dict Size | Python | Pure C++ | Pybind11 | Python vs C++ | Pybind11 vs Python |
---|---|---|---|---|---|
10 | 0.5 μs | 0.5 μs | 1.5 μs | Same speed | 3.0x slower |
100 | 3.6 μs | 4.0 μs | 13.5 μs | 1.1x faster | 3.8x slower |
1000 | 36.9 μs | 43.6 μs | 145.5 μs | 1.2x faster | 3.9x slower |
10000 | 449.7 μs | 676.1 μs | 2253.6 μs | 1.5x faster | 5.0x slower |
- RollingStatistics: Computes rolling mean, standard deviation, and z-score for each key in a dictionary using a sliding window over sorted values. This involves sorting, windowing, and statistical calculations.
- IterativeComputation: Performs 1000 iterations of complex mathematical transformations (sin, cos, sqrt, log, exp) on each dictionary value. This is a CPU-intensive operation designed to test computational performance.
RollingStatistics Performance (window=5):
Dict Size | Python | Pure C++ | Pybind11 | Pybind11 Speedup | Pure C++ Speedup |
---|---|---|---|---|---|
10 | 14.6 μs | 2.8 μs | 4.7 μs | 3.1x | 5.2x |
100 | 146.2 μs | 33.4 μs | 55.7 μs | 2.6x | 4.4x |
1000 | 1537 μs | 451 μs | 751.7 μs | 2.0x | 3.4x |
IterativeComputation Performance (1000 iterations):
Dict Size | Python | Pure C++ | Pybind11 | Pybind11 Speedup | Pure C++ Speedup |
---|---|---|---|---|---|
10 | 158 μs | 22 μs | 36.4 μs | 4.3x | 7.2x |
100 | 1575 μs | 222 μs | 370.2 μs | 4.3x | 7.1x |
1000 | 15620 μs | 2364 μs | 3939.5 μs | 4.0x | 6.6x |
The series of documents will essentially cover 4 scenarios:
┌─────────────────────────────────────────────────────────────────┐
│ Which Pattern Should You Use? │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Start Here: What is your main application language? │
│ │
│ Python-based Application C++-based Application │
│ │ │ │
│ ▼ ▼ │
│ Need Performance? Need Python libs? │
│ Yes / \ No Yes / \ No │
│ / \ / \ │
│ ▼ ▼ ▼ ▼ │
│ Pattern B Pattern A Pattern C Pattern D │
│ (Py→C++) (Py→Py) (C++→Py) (C++→C++) │
│ │
└─────────────────────────────────────────────────────────────────┘