In this post I want to talk a little about my current project, Attaca.
I’m currently working on Attaca, a fast, distributed, and resilient version control system for extremely large files and repositories, designed for scientists working on projects containing terabytes or petabytes of data. (Warning: not currently in any sort of working condition as of 8/29/17!) There are a few key components to this design; the first is the git data structure, with one special modification. The second is a distributed hash table (DHT) which is used to store the objects of the graph data structure, and the third is a technique called “hashsplitting” or “hash chunking”. Resilience is handled by the distributed object storage system, and version control itself by the git-like data structure.
(N.B. The word “fuck” appears multiple times in this post. I recommend that the reader temporarily not consider “fuck” as profanity, as it isn’t used that way here.)