Files

127 lines
4.3 KiB
Markdown

# vanity
vanity is an executable model for auditing optimisation failures caused by
specialisation, inlining, and representation lowering.
the project describes a small typed source calculus, lowers it into an explicit target IR, and checks whether source and target observations remain related across pass boundaries. the interesting cases are not parser or codegen bugs but failures of the abstraction theorem that an optimiser implicitly assumes when it rewrites polymorphic code
## the idea
the source language keeps representation abstract. it has functions, type
abstraction and application, products, sums, recursive types, recursive
bindings and an explicit `tick`. the effect is intentionally small and just enough to make DCE rewrites observable.
the target IR makes layout explicit. values can be boxed or unboxed, products and sums have concrete representations, functions carry a boxed or unboxed calling convention, and worker wrapper lowering is represented as a term rather than hidden in the compiler
the audit asks a simple question at each boundary:
- did the source and target terminate in the same class?
- did the effect trace remain equivalent?
- did representation lowering preserve the selected relation?
- did the target IR validate its arity and shape constraints?
- did the pass emit the proof obligation needed to justify the rewrite?
## pipeline
the pipeline is controlled by optimisation flags:
```ocaml
type optimisation_flags = {
specialise : bool;
inline : bool;
repr_lower : bool;
unsafe_repr_eq : bool;
unsafe_strict_unroll : bool;
unsafe_effect_drop : bool;
}
```
compilation records each intermediate term:
```text
source -> specialise -> inline -> effect rewrite -> repr lower -> target IR
```
each pass may also emit obligations:
- preserve the source relation at a rewrite site
- account for exposed representation specific tests
- prove that strictness did not shift termination
- preserve effect observations
- justify worker wrapper argument and result correspondence
safe profiles leave those obligations discharged or merely assumed where the
model has no full proof engine. unsafe profiles deliberately refute them so the failure appears as a small source target delta.
## relations
the comparison engine supports three relations:
- compare source observations without representation lowering
- relate boxed source values to unboxed target values
- restrict comparison to ground integer and boolean values
these relations are intentionally small. the point is to make the failure mode visible, not to model a compiler front end
## corpus
the checked corpus contains witnesses for both preserved and broken rewrites
### safe polymorphic instantiation
this case instantiates a polymorphic constant function at `bool`
```ocaml
forall a. a -> a -> bool
```
specialisation alone should preserve the baseline relation. no
representation specific primitive is introduced, so the free theorem remains
intact
### representation exposure
this case instantiates the same polymorphic shape at `int` and then permits an unsafe inline profile to rewrite the abstract branch into `eq_int`
the source term is parametric. the target term is no longer parametric because it can observe integer representation
### strictness shift
this case places a divergent recursive computation behind a `roll` boundary
```ocaml
roll mu a. int loop
```
a safe lowering keeps the payload latent and an unsafe unroll path forces it early to change termination
### effect shift
this case uses an effectful dead binding:
```ocaml
let _ = tick "alloc" 0 in 1
```
the result value is independent of the binding but the observation is not.
dropping the binding is only valid if the eliminated term is effect free
### worker wrapper
this case lowers a boxed pair function through an unboxed worker
```ocaml
int * bool -> int * bool
```
the wrapper preserves the boxed source interface while the worker receives
`RInt` and `RBool`. the target validator checks that worker arity and result shape still match the representation relation.
## notes
- the model is call by value.
- target validation is structural and representation aware
- generated campaigns sample typed source terms and shrink failing specimens
- unsafe profiles are supposed to fail; they are witnesses, not broken tests
- this is a semantic audit harness and not a complete optimiser