pybind11 With Sklearn Pipelines - Core Requirements
Core requirements for building a Python binding:
- Copyable
Original: Copy:
┌──────────────┐ ┌──────────────┐
│ Copyable │ │ Copyable │
│ data: 0x1000 │ │ data: 0x2000 │ Different!
│ size: 5 │ │ size: 5 │
└──────┬───────┘ └──────┬───────┘
│ │
▼ ▼
[1,2,3,4,5] [1,2,3,4,5]
at 0x1000 at 0x2000
- Since they have their own share of data, we are safe.
A demonstration class inside CLion:
class Copyable {
int* value;
public:
Copyable(int v) : value(new int(v)) {
std::cout << "Constructor: Copyable(" << v << ") at " << this
<< ", value at " << value << std::endl;
}
// Copy constructor - MUST do deep copy
Copyable(const Copyable& other) : value(new int(*other.value)) {
std::cout << "Copy constructor: at " << this << " from " << &other
<< ", new value at " << value << " = " << *value << std::endl;
}
// Copy assignment - MUST do deep copy
Copyable& operator=(const Copyable& other) {
std::cout << "Copy assignment: to " << this << " from " << &other << std::endl;
if (this != &other) {
*value = *other.value; // Or: delete value; value = new int(*other.value);
}
return *this;
}
// Destructor - MUST clean up
~Copyable() {
std::cout << "Destructor: at " << this << ", deleting value at " << value << std::endl;
delete value;
}
int get() const { return *value; }
void set(int v) { *value = v; }
void print() const {
std::cout << "Copyable at " << this << ": value = " << *value
<< " (stored at " << value << ")" << std::endl;
}
};
int main() {
std::cout << "1. Create object:" << std::endl;
Copyable s1(42);
s1.print();
std::cout << "\n2. Copy construction:" << std::endl;
Copyable s2(s1);
s1.print();
s2.print();
std::cout << "\n3. Modify copy (should not affect original):" << std::endl;
s2.set(100);
s1.print();
s2.print();
std::cout << "\n4. Copy assignment:" << std::endl;
Copyable s3(0);
s3 = s1;
s3.print();
If you execute this code block:
1. Create object:
Constructor: Copyable(42) at 0x7ffc9f231a08, value at 0x62c889ccb6c0
Copyable at 0x7ffc9f231a08: value = 42 (stored at 0x62c889ccb6c0)
2. Copy construction:
Copy constructor: at 0x7ffc9f231a10 from 0x7ffc9f231a08, new value at 0x62c889ccb6e0 = 42
Copyable at 0x7ffc9f231a08: value = 42 (stored at 0x62c889ccb6c0)
Copyable at 0x7ffc9f231a10: value = 42 (stored at 0x62c889ccb6e0)
3. Modify copy (should not affect original):
Copyable at 0x7ffc9f231a08: value = 42 (stored at 0x62c889ccb6c0)
Copyable at 0x7ffc9f231a10: value = 100 (stored at 0x62c889ccb6e0)
4. Copy assignment:
Constructor: Copyable(0) at 0x7ffc9f231a18, value at 0x62c889ccb700
Copy assignment: to 0x7ffc9f231a18 from 0x7ffc9f231a08
Copyable at 0x7ffc9f231a18: value = 42 (stored at 0x62c889ccb700)
The object s1
has a value of of 42 stored in 0x62c889ccb6c0
address, when we copy, it’s stored at another address: 0x62c889ccb6c0
.
When we modify the value on s2
, this should not modify the value stored in s1
.
- Movable
Move Operation Visualization:
Before Move: After Move:
┌────────────────┐ ┌────────────────┐
│ Source │ │ Source │
│ buffer: 0x1000 │ ────────> │ buffer: nullptr│ Invalidated!
│ size: 5 │ │ size: 0 │
│ resource: ✓ │ │ resource: ✗ │
└────────────────┘ └────────────────┘
↓ Ownership transferred
┌────────────────┐ ┌────────────────┐
│ Destination │ │ Destination │
│ buffer: nullptr│ ────────> │ buffer: 0x1000 │ Now owns!
│ size: 0 │ │ size: 5 │
│ resource: ✗ │ │ resource: ✓ │
└────────────────┘ └────────────────┘
If we have file discriptors, these will be using a unique_ptr.
We need to move the unique_ptr correctly to the new object.
class MoveOnly {
std::unique_ptr<int> value; // unique_ptr is inherently move-only
public:
// Constructor
MoveOnly(int v) : value(std::make_unique<int>(v)) {
std::cout << "Constructor: MoveOnly(" << v << ") at " << this
<< ", value at " << value.get() << std::endl;
}
// Move constructor - TRANSFER ownership
MoveOnly(MoveOnly&& other) noexcept : value(std::move(other.value)) {
std::cout << "Move constructor: at " << this << " from " << &other
<< ", took value at " << value.get() << std::endl;
}
// Move assignment - TRANSFER ownership
MoveOnly& operator=(MoveOnly&& other) noexcept {
std::cout << "Move assignment: to " << this << " from " << &other << std::endl;
if (this != &other) {
value = std::move(other.value);
}
return *this;
}
// DELETE copy operations - Cannot copy!
MoveOnly(const MoveOnly&) = delete;
MoveOnly& operator=(const MoveOnly&) = delete;
// Destructor
~MoveOnly() {
std::cout << "Destructor: at " << this << ", value "
<< (value ? "exists" : "moved-from") << std::endl;
}
// Getters
int get() const {
return value ? *value : -1;
}
bool hasValue() const {
return value != nullptr;
}
void print() const {
std::cout << "MoveOnly at " << this << ": ";
if (value) {
std::cout << "value = " << *value << " (at " << value.get() << ")";
} else {
std::cout << "empty (moved-from)";
}
std::cout << std::endl;
}
};
int main() {
std::cout << "1. Create object:" << std::endl;
MoveOnly m1(42);
m1.print();
std::cout << "\n2. Move construction:" << std::endl;
MoveOnly m2(std::move(m1));
std::cout << "Original after move:" << std::endl;
m1.print(); // m1 is now empty
std::cout << "New object:" << std::endl;
m2.print();
std::cout << "\n3. Move assignment:" << std::endl;
MoveOnly m3(100);
m3 = std::move(m2);
std::cout << "Source after move:" << std::endl;
m2.print(); // m2 is now empty
std::cout << "Target after move:" << std::endl;
m3.print();
return 0
}
If you do run this, then you’ll get:
1. Create object:
Constructor: MoveOnly(42) at 0x7ffcf0455b98, value at 0x5ff4c11026c0
MoveOnly at 0x7ffcf0455b98: value = 42 (at 0x5ff4c11026c0)
2. Move construction:
Move constructor: at 0x7ffcf0455ba0 from 0x7ffcf0455b98, took value at 0x5ff4c11026c0
Original after move:
MoveOnly at 0x7ffcf0455b98: empty (moved-from)
New object:
MoveOnly at 0x7ffcf0455ba0: value = 42 (at 0x5ff4c11026c0)
3. Move assignment:
Constructor: MoveOnly(100) at 0x7ffcf0455ba8, value at 0x5ff4c11026e0
Move assignment: to 0x7ffcf0455ba8 from 0x7ffcf0455ba0
Source after move:
MoveOnly at 0x7ffcf0455ba0: empty (moved-from)
Target after move:
MoveOnly at 0x7ffcf0455ba8: value = 42 (at 0x5ff4c11026c0)
Either the move construction or assignment, the original value is now empty compared to the new object which own his value.
We can see that the value of 42 is always at 0x5ff4c11026c0
even after moves.
- Neither
Basically everything we do in pybind11 must pass by reference.
#include <iostream>
#include <mutex>
#include <memory>
class NonCopyable {
static int instance_count;
const int id;
mutable std::mutex mutex_; // Mutex is non-copyable/non-movable
int value;
public:
// Constructor
NonCopyable(int v) : id(++instance_count), value(v) {
std::cout << "Constructor: NonCopyable #" << id << " with value " << v
<< " at " << this << std::endl;
}
// DELETE all copy and move operations
NonCopyable(const NonCopyable&) = delete;
NonCopyable& operator=(const NonCopyable&) = delete;
NonCopyable(NonCopyable&&) = delete;
NonCopyable& operator=(NonCopyable&&) = delete;
// Destructor
~NonCopyable() {
std::cout << "Destructor: NonCopyable #" << id << " at " << this << std::endl;
}
// Thread-safe getter
int get() const {
std::lock_guard<std::mutex> lock(mutex_);
return value;
}
// Thread-safe setter
void set(int v) {
std::lock_guard<std::mutex> lock(mutex_);
value = v;
std::cout << "NonCopyable #" << id << " value changed to " << v << std::endl;
}
int getId() const { return id; }
void print() const {
std::lock_guard<std::mutex> lock(mutex_);
std::cout << "NonCopyable #" << id << " at " << this
<< ": value = " << value << std::endl;
}
};
int NonCopyable::instance_count = 0;
// Can only pass by reference or pointer
void useByReference(NonCopyable& nc) {
std::cout << "\nInside useByReference:" << std::endl;
nc.print();
nc.set(999);
}
void useByPointer(NonCopyable* nc) {
std::cout << "\nInside useByPointer:" << std::endl;
if (nc) {
nc->print();
nc->set(777);
}
}
// Factory function returns via pointer
std::unique_ptr<NonCopyable> createNonCopyable(int value) {
std::cout << "\nInside createNonCopyable:" << std::endl;
return std::make_unique<NonCopyable>(value);
}
int main() {
std::cout << "1. Create objects:" << std::endl;
NonCopyable nc1(42);
NonCopyable nc2(100);
nc1.print();
nc2.print();
// These would cause compilation errors:
// NonCopyable nc3 = nc1; // Error: copy constructor deleted
// NonCopyable nc4(nc1); // Error: copy constructor deleted
// NonCopyable nc5(std::move(nc1)); // Error: move constructor deleted
// nc2 = nc1; // Error: copy assignment deleted
// nc2 = std::move(nc1); // Error: move assignment deleted
std::cout << "\n2. Pass by reference:" << std::endl;
useByReference(nc1);
std::cout << "After function:" << std::endl;
nc1.print();
std::cout << "\n3. Pass by pointer:" << std::endl;
useByPointer(&nc2);
std::cout << "After function:" << std::endl;
nc2.print();
std::cout << "\n4. Create via factory (heap allocation):" << std::endl;
auto nc3 = createNonCopyable(200);
nc3->print();
std::cout << "\n5. Array of non-copyable objects:" << std::endl;
// Can't use std::vector (requires copyable/movable)
// But can use array of pointers
std::unique_ptr<NonCopyable> array[3];
for (int i = 0; i < 3; ++i) {
array[i] = std::make_unique<NonCopyable>(i * 10);
array[i]->print();
}
std::cout << "\n6. Cleanup:" << std::endl;
return 0;
}
Output:
1. Create objects:
Constructor: NonCopyable #1 with value 42 at 0x7ffd823a5b60
Constructor: NonCopyable #2 with value 100 at 0x7ffd823a5ba0
NonCopyable #1 at 0x7ffd823a5b60: value = 42
NonCopyable #2 at 0x7ffd823a5ba0: value = 100
2. Pass by reference:
Inside useByReference:
NonCopyable #1 at 0x7ffd823a5b60: value = 42
NonCopyable #1 value changed to 999
After function:
NonCopyable #1 at 0x7ffd823a5b60: value = 999
3. Pass by pointer:
Inside useByPointer:
NonCopyable #2 at 0x7ffd823a5ba0: value = 100
NonCopyable #2 value changed to 777
After function:
NonCopyable #2 at 0x7ffd823a5ba0: value = 777
4. Create via factory (heap allocation):
Inside createNonCopyable:
Constructor: NonCopyable #3 with value 200 at 0x645b3ee3a6c0
NonCopyable #3 at 0x645b3ee3a6c0: value = 200
5. Array of non-copyable objects:
Constructor: NonCopyable #4 with value 0 at 0x645b3ee3a700
NonCopyable #4 at 0x645b3ee3a700: value = 0
Constructor: NonCopyable #5 with value 10 at 0x645b3ee3a740
NonCopyable #5 at 0x645b3ee3a740: value = 10
Constructor: NonCopyable #6 with value 20 at 0x645b3ee3a780
NonCopyable #6 at 0x645b3ee3a780: value = 20
6. Cleanup:
Destructor: NonCopyable #6 at 0x645b3ee3a780
Destructor: NonCopyable #5 at 0x645b3ee3a740
Destructor: NonCopyable #4 at 0x645b3ee3a700
Destructor: NonCopyable #3 at 0x645b3ee3a6c0
Destructor: NonCopyable #2 at 0x7ffd823a5ba0
Destructor: NonCopyable #1 at 0x7ffd823a5b60
Basically everything we do in pybind11 must work with references or pointers, especially for non-copyable objects. This example shows that when we pass by reference:
The memory address stays the same (0x7ffd823a5b60) - proving no copy was made Changes inside the function affect the original object (value changes from 42 to 999) The object remains at the same memory location throughout its lifetime
This is crucial for pybind11 because Python objects behave similarly - they’re always passed by reference, never copied unless explicitly requested. Non-copyable C++ classes naturally fit this model.
A full example with the bindings:
#include <pybind11/pybind11.h>
#include <pybind11/stl.h>
#include "non_copyable.cpp"
namespace py = pybind11;
PYBIND11_MODULE(non_copyable_module, m) {
m.doc() = "pybind11 example module for non-copyable objects";
py::class_<NonCopyable>(m, "NonCopyable")
.def(py::init<int>())
.def("get", &NonCopyable::get)
.def("set", &NonCopyable::set)
.def("getId", &NonCopyable::getId)
.def("print", &NonCopyable::print);
m.def("useByReference", &useByReference, py::arg("nc"));
m.def("useByPointer", &useByPointer, py::arg("nc"));
m.def("createNonCopyable", &createNonCopyable);
}
A build script:
#!/usr/bin/env python3
import subprocess
import sys
import pybind11
import sysconfig
def build():
python_include = sysconfig.get_paths()['include']
compile_flags = sysconfig.get_config_var('CFLAGS')
if compile_flags:
compile_flags = compile_flags.split()
else:
compile_flags = []
# Get linker flags
ldflags = sysconfig.get_config_var('LDSHARED')
if ldflags:
# Extract just the flags, not the compiler command
ldflags_list = ldflags.split()[1:]
else:
ldflags_list = []
compile_cmd = [
'c++',
'-O3',
'-Wall',
'-shared',
'-std=c++14',
'-fPIC',
f'-I{pybind11.get_include()}',
f'-I{python_include}',
'non_copyable_bindings.cpp',
'-o',
f'non_copyable_module.so'
] + ldflags_list
print("Building non_copyable module...")
print(" ".join(compile_cmd))
result = subprocess.run(compile_cmd, capture_output=True, text=True)
if result.returncode == 0:
print("Build successful!")
else:
print("Build failed!")
print(result.stderr)
sys.exit(1)
if __name__ == "__main__":
build()
#!/usr/bin/env python3
import non_copyable_module as ncm
def main():
# This creates the C++ object and Python holds a pointer to it
print("1. Creating NonCopyable object:")
obj = ncm.NonCopyable(42)
obj.print()
print("\n2. Getting value:")
print(f"Value from getter: {obj.get()}")
print("\n3. Setting new value:")
obj.set(100)
obj.print()
# This passes a reference to the C++ function
print("\n4. Passing to C++ function by reference:")
ncm.useByReference(obj) # No copying!
print("After function call:")
obj.print()
print("\n5. Passing to C++ function by pointer:")
ncm.useByPointer(obj)
print("After function call:")
obj.print()
print("\n6. Creating object via factory function:")
obj2 = ncm.createNonCopyable(200)
obj2.print()
print("\n7. Multiple objects demonstration:")
obj3 = ncm.NonCopyable(300)
obj4 = ncm.NonCopyable(400)
print(f"Object IDs: {obj.getId()}, {obj2.getId()}, {obj3.getId()}, {obj4.getId()}")
# Python cannot copy these objects
# This would fail if we tried: obj_copy = copy.copy(obj)
print("\n8. Deleting objects explicitly:")
print("Deleting obj3...")
del obj3 # C++ destructor called here
print("Deleting obj4...")
del obj4
# Python's reference counting manages the C++ object's lifetime
print("\n9. Script ending, remaining objects will be destroyed...")
# obj and obj2 will be destroyed when script ends
if __name__ == "__main__":
main()
The full example would generate:
1. Creating NonCopyable object:
Constructor: NonCopyable #1 with value 42 at 0x337ffc00
NonCopyable #1 at 0x337ffc00: value = 42
2. Getting value:
Value from getter: 42
3. Setting new value:
NonCopyable #1 value changed to 100
NonCopyable #1 at 0x337ffc00: value = 100
4. Passing to C++ function by reference:
Inside useByReference:
NonCopyable #1 at 0x337ffc00: value = 100
NonCopyable #1 value changed to 999
After function call:
NonCopyable #1 at 0x337ffc00: value = 999
5. Passing to C++ function by pointer:
Inside useByPointer:
NonCopyable #1 at 0x337ffc00: value = 999
NonCopyable #1 value changed to 777
After function call:
NonCopyable #1 at 0x337ffc00: value = 777
6. Creating object via factory function:
Inside createNonCopyable:
Constructor: NonCopyable #2 with value 200 at 0x3380a430
NonCopyable #2 at 0x3380a430: value = 200
7. Multiple objects demonstration:
Constructor: NonCopyable #3 with value 300 at 0x3380a490
Constructor: NonCopyable #4 with value 400 at 0x337f8c10
Object IDs: 1, 2, 3, 4
8. Deleting objects explicitly:
Deleting obj3...
Destructor: NonCopyable #3 at 0x3380a490
Deleting obj4...
Destructor: NonCopyable #4 at 0x337f8c10
9. Script ending, remaining objects will be destroyed...
Destructor: NonCopyable #1 at 0x337ffc00
Destructor: NonCopyable #2 at 0x3380a430