# gopy hash secret spec

# 1607. Hash secret bootstrap (`hash/secret.go`)

`bootstrap_hash.c` does two things:

1. Provides the `_PyOS_URandom*` family that fills a buffer with OS
   entropy (Windows BCryptGenRandom, Linux getrandom, /dev/urandom,
   etc.).
2. Initializes `_Py_HashSecret`, the per-process secret used by
   SipHash to randomize hash output. The initialization respects
   the `PYTHONHASHSEED` environment variable: a positive integer
   gives a deterministic LCG-derived secret; `0` zeroes the secret
   (so hashes are not randomized, useful for debugging); the literal
   string `"random"` uses OS entropy; absence of the variable also
   uses OS entropy.

In v0.1 we port only the secret-init portion. The hashing functions
themselves (SipHash-1-3, FNV, x86_aes acceleration) land in v0.4
together with `pyhash.c`; until then, no code reads the secret.

## What we drop and why

- `_PyOS_URandom`, `_PyOS_URandomNonblock`. Go's `crypto/rand.Reader`
  performs the same job and works on every supported OS (Windows,
  macOS, Linux, all the BSDs). We use it directly. The non-blocking
  variant is unnecessary because Go's reader already uses
  `getrandom(GRND_NONBLOCK)` where the kernel allows it.
- The Linux/macOS fallback paths to `/dev/urandom`. `crypto/rand`
  takes care of that fallback.
- `dev_urandom_close` (called from `_Py_HashRandomization_Fini`).
  No file descriptor to close.
- `_Py_HashSecret_Initialized` global flag. We use a `sync.Once`.

## What we port

The `lcg_urandom` LCG is a numerical match for the C implementation
and must produce byte-identical output for a given seed; CPython tests
that run with a fixed `PYTHONHASHSEED` rely on it. We port it
verbatim.

```c
x = x0;
for (i = 0; i < size; i++) {
    x = x * 214013 + 2531011;
    out[i] = (x >> 16) & 0xff;
}
```

## Go API

```go
package hash

// SecretSize is the size of the hash secret. Matches
// sizeof(_Py_HashSecret_t) in CPython 3.14: 24 bytes covering the
// SipHash key (16) plus FNV salt (8).
const SecretSize = 24

// Secret is the byte vector consumed by SipHash and FNV in v0.4.
// Until then it is filled but not read.
var Secret [SecretSize]byte

// SecretMode classifies how Init was resolved.
type SecretMode int

const (
    SecretRandom         SecretMode = iota // OS entropy
    SecretZeroed                          // PYTHONHASHSEED=0
    SecretSeeded                          // PYTHONHASHSEED=<positive int>
)

// Init seeds Secret. cfg is the resolved configuration; if nil, the
// PYTHONHASHSEED environment variable is consulted. Init is safe to
// call from multiple goroutines; only the first call performs work.
//
// Returns the resolved mode and an error if PYTHONHASHSEED is set
// to a value that is neither "random", "0", nor a non-negative
// integer in [1, 4294967295].
func Init(cfg *Config) (SecretMode, error)

// Config mirrors the relevant fields of PyConfig consumed by
// _Py_HashRandomization_Init. The full PyConfig lives in
// initconfig (v0.7).
type Config struct {
    UseHashSeed bool   // true if hash_seed is set explicitly
    HashSeed    uint32 // value when UseHashSeed is true
}

// Reset is exposed for tests. It clears the once-flag so a follow-up
// Init runs again. Production code should not call it.
func Reset()
```

## PYTHONHASHSEED parsing

Matches CPython's `config_init_hash_seed` (in `initconfig.c`, ported in
v0.7) but until that file lands we handle parsing here so the secret
can be seeded at process start by `cmd/gopy`. The parser accepts:

| Input              | Resolved                                |
|--------------------|-----------------------------------------|
| unset / `""`       | random (OS entropy)                     |
| `"random"`         | random                                  |
| `"0"`              | zeroed secret                           |
| decimal `1..4294967295` | seeded with that value via lcg_urandom |
| anything else      | error                                   |

## Tests

`hash/secret_test.go`:

- `Init` with `cfg.UseHashSeed=false` produces a non-zero `Secret`
  (probabilistic; the chance of a 24-byte zero from crypto/rand is
  vanishing).
- `Init` with `HashSeed=0` produces a zeroed `Secret`.
- `Init` with a fixed `HashSeed` produces deterministic output that
  matches a hand-computed LCG run.
- The LCG matches a precomputed reference for seed `0xdeadbeef`,
  generated by running the C implementation once and pasting the
  bytes into the test. (We compute the expected sequence in Go and
  document the C origin.)
- `Init` is idempotent: a second call with a different config does
  not change `Secret`.

## Cross-runtime parity

Once the hash port (1640 in the next spec batch) lands in v0.4, we
add a `compat/hash` test that hashes a small corpus of strings under
`PYTHONHASHSEED=0` and asserts byte-equality with CPython's output.
Until then, parity is assumed via the deterministic LCG.

