--- tocpdeth: 3 --- # Content addressable hashing summary redun uses hashing to create deterministic ids for many records in the redun data model. The design of this hashing scheme has several properties: - Every hash is unique across all records. - While a cryptographically secure hash function can be assumed to not produce hash collisions, we can still have collisions on pre-image data if we are not careful. For example, a CallNode could have the same hash as a file that contains a pickle of a CallNode, if the hashing schema is not carefully designed. - To achieve this property, every pre-image is prefixed with a record type, such as 'Task', 'CallNode', etc. - It is also easy to have pre-image collisions if arguments to a hash are not well escaped (e.g. if tab is used as an argument delimiter, you need to always escape tabs in string arguments). We solve this systematically by defining a consistent way of efficiently serializing structures using bittorrent encoding format (bencode). - When one parental record refers to a child record, the child record's hash is included in the parent's hash. - This forms a [Merkle Tree](https://en.wikipedia.org/wiki/Merkle_tree) that gives unique hashes to nodes in the resulting DAG. - Hashes are represented by hex strings instead of bytes to aid easy debugging and display. - If greater hashing efficiency is desired, hashes could be kept in their binary byte representation. - Hashes are truncated to 40 hex bytes for easier display. - If the reduced hash space is a concern this truncation could be removed.