Pointer provenance through int casts and alias analysis soundness
This post looks at pointer provenance in the presence of pointer to integer casts and shows how benign rewrites can violate alias analysis soundness. The focus is on mid level IR where optimisations depend on assumptions about object identity that are stronger than raw bit equality.
Consider the following:
int *p = malloc(sizeof(int));
uintptr_t i = (uintptr_t)p;
uintptr_t j = i + 0;
int *q = (int*)j;
*q = 1;
At the level of bits p and q are identical. A naive interpretation treats them as aliases. Modern optimisers don’t rely on bit equality alone. They track provenance which encodes the allocation a pointer originates from and constrains the set of valid accesses.
In LLVM IR a cast from pointer to integer discards provenance. The integer is an unstructured value with no attachment to an allocation. When the integer is cast back to a pointer the result isn’t guaranteed to carry the original provenance. Optimisations are therefore permitted to assume that the new pointer does not alias the original object unless analysis can prove otherwise.
This creates a tension. Real programs use such round trips for address arithmetic and tagging. If provenance is treated as lost then the optimiser can prove non aliasing where aliasing exists. If provenance is preserved through integers then many standard optimisations become ineffective.
Alias analysis depends on stable invariants. Pointers derived from distinct allocations do not alias and provenance is preserved along def use chains that don’t pass through opaque operations. The integer cast is such an operation. Once a pointer flows into an integer the def use chain no longer carries structured information about its origin.
A trivial pattern that exposes the issue is:
int *p = malloc(sizeof(int));
uintptr_t i = (uintptr_t)p;
free(p);
int *q = (int*)i;
*q = 42;
If the optimiser assumes that the round trip through uintptr_t destroys provenance it can treat q as unrelated to the allocation returned by malloc. This can justify reordering or eliminating the store since no live object is observed to be modified. At runtime the store clearly targets the same address which is now invalid. The transformation relies on a provenance model that does not match the execution.
Even without deallocation similar effects arise when memory operations are moved across each other. Passes such as global value numbering and loop invariant code motion depend on alias analysis results. If a casted pointer is assumed not to alias a load can be hoisted or a store may be delayed past another access to the same location.
One approach is to treat specific integer operations as provenance preserving when they are provably no ops on the address such as adding zero. This requires reasoning about integer expressions and reattaching provenance on the cast back to a pointer. Another approach is to make provenance explicit in the IR using intrinsics or metadata to indicate when it’s preserved or intentionally dropped.
In practice though, implementations mix these. Analyses are conservative around difficult integer manipulation and permissive for simple patterns. The difficulty is maintaining consistency across passes so that if one pass assumes that provenance is preserved and another assumes that it is lost, their composition can introduce unsound transformations.
The main requirement is that alias analysis remains sound with respect to the chosen provenance model. Integer casts mark a boundary where reasoning based on bits alone is insufficient. Without a coherent treatment of provenance the optimisations that appear local can change the observable behaviour of the program