I am hoping that you have an idea by now what BitCoin and Ethereum are, and what they can do, but what about the technology that is supporting them?
Blockchain technology is a clear and provable system that will improve the way we exchange value and assets, administering contracts, and distributing data. The technology is a shared, secure ledger of transactions distributed among a network of computers, rather than resting with a single provider. Businesses are using blockchain as a standard data layer to allow a new group of applications. Now, business methods and data can be distributed across various systems, which eliminates loss, decreases the risk of cheating, and produces new revenue streams.
When fully automated, blockchain can enforce consistency in execution, assist with dispute resolution, increase accountability, and deliver end-to-end transparency that can inform better business decisions.
At the moment, decentralized blockchain apps have few options to store data. Decentralized storage options are:
- Peer to peer file system, like IPFS
- Decentralized cloud file storages, like Storj, Ethereum Swarm, etc.
- Distributed Databases, like Apache Cassandra, Rethink DB, etc.
- Storing everything in blockchain itself (not ideal)
Let’s consider the above in detail:
Storing all the data directly to the blockchain:
Saving everything in blockchain is the most straightforward answer. Currently, most of the decentralized applications work this way precisely. However, this approach has significant disadvantages.
- First of all transactions to blockchain will be slow to confirm. The blockchain is fast for money transfer (taking about a minute), but it is extremely slow for a rich application data flow. A rich application may need to process thousands of transactions per second.
- Secondly, the data will be changeless. The strength of blockchain is the immutability which gives it high robustness, but it is a weakness for data storage. For example, a user may want to edit their profile or replace their photo. However, all the previous data will sit in blockchain permanently and can be seen by anyone.
- Thirdly, the immutability results in one more shortcoming – the capacity. If all the applications would store their data in the blockchain, the blockchain size will proliferate, exceeding publicly available hard drive capacity. Full nodes can require specialized hardware. It may result in the dangerous centralization of the blockchain. That’s why keeping data in blockchain only is not a good option for a rich decentralized application.
Peer to peer file system, such as the InterPlanetary File System:
IPFS provides the capability to distribute files on client computers and unites them in the global file system. The technology is based on BitTorrent protocol and Distributed Hash Table. There are several good moments. It is peer to peer – to share anything first put it on your computer. It will be downloaded only if anyone needs it. It is content addressable, so it is impossible to forge content by the given address.
Popular files can be downloaded thanks to BitTorrent protocol very quickly. However, it also has some disadvantages. You should stay online if you want to share your data. At least before someone becomes interested and wants to download them from you. It serves only static files, and they can not be modified or removed once uploaded. Moreover, you can not search these files with their meaningful content.
Decentralized cloud file storages:
There are also decentralized cloud file storages that lift some of IPFS limitations. From the user’s point of view, these storages are just cloud storages like Dropbox or pCloud for example. The difference is that the content is hosted on user’s computers who offer their hard drive space for rent, rather than in data centers.
There are plenty of such projects nowadays. For example, Sia, Storj, Ethereum Swarm. You don’t need to stay online to share your files anymore. Just upload the data, and it is available in the cloud. These storages are highly reliable, fast enough, have enormous capacity. Still, they serve static data only, no content search anyway and, since they are built on the rented hardware, they are not free.
Since we need to store structured data and seek advanced query capabilities we may look at the distributed NoSQL databases. Why NoSQL? Because strict transactional SQL databases cannot be genuinely distributed due to the restrictions of the CAP-theorem. To make a database distributed, we must sacrifice either consistency or availability. NoSQL databases choose availability over consistency replacing it with so-called “eventual consistency” where all the database nodes in the network become consistent sometime later.
There are many mature realizations of such databases, for example, MongoDB, Apache Cassandra, RethinkDB and so on. They are outstanding – fast, scalable, fault tolerant, support rich query language but still have fatal drawback for our application. They are not Byzantine-proof. All the nodes of the cluster fully trust each other. So any malicious node can destroy the whole database.
There is another project called BigChainDB that claims to solve the data storage and transaction speed problem. It is also a blockchain but with enormous data capacity and quick transactions.
BigChainDB is built upon RethinkDB cluster, that is why it shows such a high throughput – it is one of the underlying NoSQL databases.
All the BigChainDB nodes (denoted BDB on the slide) are connected to the cluster and have full write access to the database. Here comes a problem – the whole BigChainDB is not byzantine-proof! Any malicious BDB node can destroy the RethinkDB cluster. The BigChainDB team is aware of this problem and promises to solve it sometime in the future. However, it is the cornerstone of the architecture and changing it may not be possible. Anyway, BigChainDB may be suitable for a private blockchain.