Proposal:
Motivation
The pickle fallback path in _PyObject_GetXIData() does unnecessary work on successful transfers of types that are not in the XIData registry, creating and discarding a NotShareableError that user code never sees.
This is a follow-up to gh-148072 / GH-148125, which cached pickle.dumps and pickle.loads per interpreter. That change removed module lookup overhead, but there is still avoidable cost in the fallback path itself.
This extra exception work adds measurable overhead to cross-interpreter transfers that rely on pickle fallback.
Approach
Check the XIData registry first with lookup_getdata() before calling _get_xidata().
- If the type is registered, keep using the existing path unchanged.
- If the type is not registered, skip
_get_xidata() and go directly to the function/pickle fallback chain.
On total failure, raise the same NotShareableError as the existing code path, so external behavior stays the same.
Benchmark results
This removes unnecessary exception creation on a hot fallback path and gives a measurable speedup for transfers that rely on pickle fallback.
In local benchmarks on Apple M2, combined with GH-148125, this improved round-trip transfer times for mutable objects such as list and dict, especially for larger payloads.
Large-payload benchmark results (500 iterations, Apple M2)
Baseline is upstream/main after GH-148125 (pickle cache). "After" includes this change.
| Data |
Before |
After |
Speedup |
list [1 x 100KB] |
1.19ms |
0.12ms |
9.9x |
list [1 x 1MB] |
10.26ms |
0.20ms |
51x |
list [10 x 100KB] |
9.99ms |
0.10ms |
100x |
dict {10 x 100KB} |
10.41ms |
0.26ms |
40x |
Related
Validation
Local testing passed:
./python -m test test_interpreters
./python -m test test_concurrent_futures
Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere
Links to previous discussion of this feature:
Linked PRs
Proposal:
Motivation
The pickle fallback path in
_PyObject_GetXIData()does unnecessary work on successful transfers of types that are not in the XIData registry, creating and discarding aNotShareableErrorthat user code never sees.This is a follow-up to gh-148072 / GH-148125, which cached
pickle.dumpsandpickle.loadsper interpreter. That change removed module lookup overhead, but there is still avoidable cost in the fallback path itself.This extra exception work adds measurable overhead to cross-interpreter transfers that rely on pickle fallback.
Approach
Check the XIData registry first with
lookup_getdata()before calling_get_xidata()._get_xidata()and go directly to the function/pickle fallback chain.On total failure, raise the same
NotShareableErroras the existing code path, so external behavior stays the same.Benchmark results
This removes unnecessary exception creation on a hot fallback path and gives a measurable speedup for transfers that rely on pickle fallback.
In local benchmarks on Apple M2, combined with GH-148125, this improved round-trip transfer times for mutable objects such as
listanddict, especially for larger payloads.Large-payload benchmark results (500 iterations, Apple M2)
Baseline is
upstream/mainafter GH-148125 (pickle cache). "After" includes this change.list [1 x 100KB]list [1 x 1MB]list [10 x 100KB]dict {10 x 100KB}Related
Validation
Local testing passed:
./python -m test test_interpreters./python -m test test_concurrent_futuresHas this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere
Links to previous discussion of this feature:
Linked PRs