Branchless Binary Search
Perform binary search with conditional moves or arithmetic updates instead of unpredictable branches.
17 notes
Perform binary search with conditional moves or arithmetic updates instead of unpredictable branches.
CPython 3.13 copy-and-patch JIT: template JIT design, trace selection, and the roadmap toward full JIT compilation.
pyperformance benchmark suite, microbenchmark pitfalls, timer resolution, and interpreter warm-up effects.
Using perf, Instruments, and py-spy to profile CPython; reading perf maps generated by the JIT.
Object allocation locality, arena page placement, freelists for common types, and cache-line–aware design.
Type version tags, LOAD_ATTR inline cache hits, and the attribute specialization guards for slots and descriptors.
Compact dict memory layout, hash collision probing strategy, split-table sharing, and lookup specialization.
PEP 590 vectorcall protocol, _Py_TPFLAGS_HAVE_VECTORCALL, and stack-based argument passing for zero-overhead calls.
CALL_PY_EXACT_ARGS, CALL_BUILTIN_FAST, and the fast-path conditions that bypass the generic call machinery.
Adaptive counter logic, specialization guards, and how CPython 3.11+ rewrites opcodes to LOAD_ATTR_SLOT and friends.
Inline cache entries appended to CACHE instructions in the bytecode array and their layout per opcode family.
Computed goto dispatch table, the switch fallback, and how opcode prediction reduces branch mispredictions.
Choose and implement hash tables that perform reliably under mixed key types, uneven access patterns, and adversarial input.
Design hash table layouts that minimize cache misses and align memory access patterns with hardware behavior.
Achieve predictable worst-case bounds for hash-based structures rather than relying solely on average-case expectations.
Understand how memory hierarchy effects cause hash table performance to deviate from asymptotic expectations.
Control hash table performance by monitoring the load factor and growing the bucket array before collisions accumulate.