When to use blockchain or a traditional database?

Blockchains are a particularly interesting topic right now in financial technology for a wide variety of applications. But despite all the noise there is a significant amount of noise and misdirected interest in a technology, that while revolutionary for some applications, is not suited for every and all use cases.

What is a blockchain?

A blockchain is effectively a small variant on top of the same distributed database technology and algorithms that have existed for 25 years. A blockchain is a very simplistic database that is a append-only immutable store that can be written to by a collection of agents who sign their transactions and engage in confirming other agents transactions through a distributed consensus protocol based on cryptographic hashing. There are many implementations of these ideas but in some form they all share these common features.

The blockchain is a specialized use case rather than a complete divergence from traditional database technology. In many cases, blockchain technology is not appropriate to use. In most cases, the question of whether blockchain is appropriate to use can best be answered by the question: What do I need that my traditional database is not giving me?

  1. Do you need a database in the first place?
  2. Does your application depend on extreme fault-tolerance?
  3. Does my application depend on a shared writes from parties with potentially unaligned interests?
  4. What time horizon do I need writes and reads to be consistent?
  5. What groups of parties (agents) need to be responsible for consensus?
  6. Is a trusted third party needed to audit transactions?

Do I need a database in the first place?

First and foremost, we must ask the question: do I need a database in the first place. Typically the criterion for the need of a traditional database is having large amounts of data that does not fit in memory (or an Excel file), and requires that data to be queried and manipulated by automated business processes. Often times, the size and complexity of the data has reached a point where manual processes and manual human labor cannot keep the data internally consistent (accurately up-to-date) across all entities who need to read from it.

A fairly large amount of companies simply have not even moved many core data processes into traditional databases at all. The move over to a blockchain storage and processing system is a much more drastic migration. The process of turning an Excel file into a traditional relational database is the best first step to integrating modern database technology.

Blockchain is not a silver bullet that will immediately transform a company run on an overgrown Excel spreadsheet to running their business on a scalable globally consistent database.

Does my application depend on extreme fault-tolerance?

In particular, is it necessary that the state of all transactions over the data needs to be replicated across every client, queryable throughout history and verified independently for it’s consistency? Certain blockchain implementations predominate at this task of extreme fault-tolerance, but it does so at a very high cost of data movement and write times.

The primary tradeoff of a massively replicated datastore is that each entity involved in the confirmation process must have a full copy of every transaction. For some applications (financial, healthcare, privacy sensitive, etc.) this is a non-starter. While data can be encrypted on-chain, the metadata about the specific sender and receiver of information is still eternally stored and visible to all entities. If sharing this metadata is infeasible or illegal, blockchain may not a suitable solution. Instead, a centralized store managed by a single legally accountable entity is the right solution.

Does my application depend on a shared writes from parties with potentially misaligned interests?

The killer application of many blockchain datastores is the ability for potentially misaligned entities to transact information without the ability for any single party to disrupt or manipulate the exchange of information. Different systems achieve this through different means and those systems have different tradeoffs. Yet, systems like Bitcoin have demonstrated extreme resilience to attacks while routinely allowing large networks of self-interested parties to safely and securely exchange value and data.

What time horizon do I need writes and reads to be consistent?

Different points in the design space of distributed databases make different compromises between the throughput and consistency of read and write protocols to interface with the underlying storage engine. Some systems are eventually consistent having strong probabilistic bounds on the time for the system to converge on consensus. Distributed databases are therefore more likely to be data consistent. That being said, if your requirements pertain to necessity of speed and throughput rather than data consistency and reliability, then a traditional database is a better fit for you.

What groups of agents need to be responsible for consensus and are transactions public or private??

In other words, which parties need to be responsible for deciding the correctness of data reads and writes? And, how private do the transactions need to be? If the answer is: a small number of trusted private parties e, then it is recommended to have a private blockchain. If the answer is: a large number of agents and I do not need to know them (public) but still trust them, then a public blockchain is a better fit.

The reasoning behind this is related to the number of parties, the degree of privacy of the transaction, and the trust factor of those parties. Private blockchains require authority on who decides what data will be read/written to the single source of truth. Public blockchains tend to allow anyone who interacts with it to participate in consensus, therefore a number of them may be compromised actors trying to pass incorrect information to the blockchain. Through consensus, trustless parties can still be trustful due to the consensus protocol filtering out incorrect writes at scale.

Is a trusted third party needed to audit transactions?

There are a number of institutions and parties that exist in the world today to double check, audit, and reconcile data. For clarity, regulatory bodies are an example, but mostly, if these cases are post transactional they are a better case for blockchain usage. If there are internal based third parties, the term disintermediation comes to mind as the involved use case of blockchain smart contracts would remove their need to exist if at all. It is very much a “do not use blockchain for now”.