# Integrals

## Prefer 32-bit ints to other sizes

• 32-bits is the sweet spot as 64-bit ALU can handle two calculations
• 8, 16-bit computations convert to 32-bits so don’t use smaller
• Use small ints in arrays

## Prefer unsigned to signed

• Unsigned is faster
• Except when converting to floating point

# Most numbers are small

• If you find optimisations that work with small numbers, use them

# Floating point

• Double and single precision equivalent speed
• 80-bit extended only slightly slower
• But don’t mix them (because conversions)
• Ints to float point cheap
• Floating point to any integral is expensive

# Strength reduction

Use minimum strength operations when optimising as the stronger ones are more costly.

## Speed hierarchy

operation cycles
Comparisons <1
(u)int add, subtract, bitops, shift (2 x 32-bits) 1
(u)int32 multiply, floating point multiply TBC
Floating point division, remainder TBC
(u)int division, remainder TBC

### Why is it cheaper to do floating point division than integer?

Format of floating point: mantissa and exponent, already friendly to division.

Division in integers is unnatural. Division is a search!

# Minimise indirect writes

• Writes through a pointer: indirection
• Hard to speculate, hard to enregister (put in a register)
• Whenever you write it’s really a read and a write
• Because you transfer a 64-byte cache line: CPU has to read the cache line, update the word and write it back; if you can keep all values for a calculation in one CL all the better (even inject fake writes!)
• The more dirty cache you have, the more writes you need to do

## Improvements

• Fewer indirect writes
• Regular access patterns

# Checkpoint

• You can’t improve what you can’t measure (you can’t measure what you don’t measure :))
• Always choose good baselines
• Optimise code judiciously for today’s dynamics

# The forgotten parallelism

• Reservation station
• Instruction-level parallelism on a single core

## How do us ALUs? – ILP

• Pipelining
• Superscaler execution
• Out-of-order execution
• Register renaming
• Speculative execution
• … and many more

better instruction level parallelism = fewer data dependencies

# Part 2

• Measure
• Use good baselines
• Reduce strength
• Reduce dependencies
• Knowing math is knowing optimisation

Unsigned integer overflow has well-defined behaviour; signed integer overflow is undefined.

# Part 3

• Incorporate conditionals into expressions rather than using ifs: + (size & 1)
• Always use infinite loops (except most times)

## 1st order conclusions

• Code that wants to be fast is left-leaning (goes to the left of the page: no ifs, decision points etc)
• Don’t mix hot and cold code, break asap
• “Can’t write heavily nested code in 80 characters”

## 2nd order conclusions

• Tension with generic programming: “generic programming is why we can’t have nice things”

### Timeline

• 1990: OOP
• 2000: GP (generic programming)
• 2020: Design by introspection (inspect and customise everything, everywhere)

You can’t acheive the best sort with generic programming.