8. Performance Internals – brain

Chapter	Title
72	Interpreter Dispatch
73	Inline Caches
74	Specializing Adaptive Interpreter
75	Function Call Fast Paths
76	Vectorcall
77	Dictionary Performance
78	Attribute Access Performance
79	Memory Locality
80	Profiling CPython
81	Benchmarking CPython

72. Interpreter DispatchComputed goto dispatch table, the switch fallback, and how opcode prediction reduces branch mispredictions.

73. Inline CachesInline cache entries appended to CACHE instructions in the bytecode array and their layout per opcode family.

74. Specializing Adaptive InterpreterAdaptive counter logic, specialization guards, and how CPython 3.11+ rewrites opcodes to LOAD_ATTR_SLOT and friends.

75. Function Call Fast PathsCALL_PY_EXACT_ARGS, CALL_BUILTIN_FAST, and the fast-path conditions that bypass the generic call machinery.

76. VectorcallPEP 590 vectorcall protocol, _Py_TPFLAGS_HAVE_VECTORCALL, and stack-based argument passing for zero-overhead calls.

77. Dictionary PerformanceCompact dict memory layout, hash collision probing strategy, split-table sharing, and lookup specialization.

78. Attribute Access PerformanceType version tags, LOAD_ATTR inline cache hits, and the attribute specialization guards for slots and descriptors.

79. Memory LocalityObject allocation locality, arena page placement, freelists for common types, and cache-line–aware design.

80. Profiling CPythonUsing perf, Instruments, and py-spy to profile CPython; reading perf maps generated by the JIT.

81. Benchmarking CPythonpyperformance benchmark suite, microbenchmark pitfalls, timer resolution, and interpreter warm-up effects.