Online transaction processing (OLTP) is the real-time data processing behind ATM withdrawals, credit card payments, ticketing and reservation systems, online shopping, and electronic commerce in general. Online transaction processing systems are purpose-built to handle large numbers of transactions by large numbers of concurrent users.
OLTP databases provide the back end or storage layer for electronic commerce, and in fact for most modern computer applications. While OLTP databases have traditionally been SQL relational databases, it’s also possible to use some NoSQL databases for the same purposes. Most of our discussion below will be in terms of SQL relational databases.
OLTP vs. OLAP
OLTP databases typically handle a large number of small, quick transactions from many users. The transactions involve changing the database in ways guaranteed to be consistent, using CRUD (create, read, update, delete) operations inside the transaction. While OLTP databases sometimes also support analytical queries, that functionality is often performed in separate OLAP (online analytical processing) databases or data warehouses. OLTP databases are optimized for collecting and modifying data. OLAP databases are optimized for analysis.
What is CRUD?
CRUD (create, read, update, and delete) is the basic set of database operations. In a SQL database, INSERT statements perform record creation, SELECT statements read records, UPDATE statements update records, and DELETE statements delete records. These statements comprise DML (data manipulation language). SQL databases also support DDL (data definition language) to define databases, tables, indexes, views, and other database objects.
What is a database transaction?
A database transaction in a SQL database is a wrapper for a sequence of SQL statements with two possible endpoints: COMMIT or ROLLBACK the batch. For example, a bank transfer involves withdrawing an amount from one account and depositing the same amount to a different account. If both operations succeed, then the transaction commits. If either operation fails, then the transaction — which includes both operations — rolls back to the state before the transaction began, so that the total amount of money in the two accounts is constant.
What are ACID database properties?
Database transactions should exhibit all four ACID properties: atomicity, consistency, isolation, and durability. Atomicity is guaranteed by transaction commits and rollbacks, as described above. The whole transaction is treated as a single, atomic operation.
Consistency is the end product of proper transaction implementation: the total amount of money in the accounts involved in the transfer remains constant. Isolation means that other transactions cannot detect any intermediate states of a transaction. Durability means that once a transaction has committed, the new values are not undone, even if the system fails.
ACID properties are easier to guarantee in a centralized database. They are harder to guarantee in a clustered or distributed database.
For example, some distributed databases only claim eventual consistency, which allows them to say that a transaction has committed before all database nodes have finished writing. That speeds up distributed transactions, but requires later transactions that expect consistency either to wait for all the writes to complete, or to read from the original location of the transaction.
Distributed databases that guarantee strong consistency may have higher transaction latencies, but are much less likely to cause application bugs than eventually consistent databases, for example when a remote read completes before a previous transaction finishes writing in all locations.
What is transaction latency?
Latency refers both to the response time of the database and to the end-to-end response time of the application. Transaction latency is the time from the beginning of the transaction until the transaction commits.
Database schemas for OLTP
In order to support high transaction rates, database schemas for OLTP databases typically involve small row sizes and minimal indexes. Historically, that meant making sure the database schema was in third normal form.
What is third normal form?
Third normal form (3NF), defined in 1971 by Edgar F. Codd, is a set of requirements for database schemas to reduce the duplication of data, avoid data anomalies, ensure referential integrity, and simplify data management. It basically says that any given table only contains fields that are attributes of the primary key.
If you have a table of patients with a primary key that is the patient number, its fields must be about the patient, not the hospital, not the doctor, and not the insurer, although the table can contain references (foreign keys) to other tables about those things. Bill Kent’s clever summary of 3NF is “[every] non-key [attribute] must provide a fact about the key, the whole key, and nothing but the key, so help me Codd.”
Can NoSQL databases function as OLTP?
While we have mostly discussed relational databases with strong consistency, there are some NoSQL databases that are designed for OLTP. If you are in the position of needing or wanting a NoSQL database for transaction processing, you need to restrict yourself to NoSQL databases with ACID properties. Avoid databases that are limited to eventual consistency for OLTP, especially for financial applications. Do check with your auditors before committing to a database for financial transaction processing.
Measuring OLTP performance
Early in the history of relational databases every vendor promoted a different transaction processing performance benchmark that had been tweaked for its own product. The Transaction Processing Performance Council was formed to create and audit vendor-neutral benchmarks. TPC Benchmark C (TPC-C) is a widely used OLTP benchmark. There are other public database benchmarks which may apply to your case; you can also create your own, but honest benchmarks that reflect real-world usage are surprisingly tricky to write and run.
Overall, OLTP databases should simply do their job, which is to record transactions quickly and durably. For analysis, consider setting up a separate data lake or data warehouse and an ETL or ELT process to populate the analysis database from the OLTP database. OLTP is one thing; OLAP is another.