write-amplification-btree-vs-lsm
You write 4 KiB to your database. Your SSD reports writing 20 KiB.
Where did the extra 16 KiB come from?
That is called write amplification, and it silently affects performance, SSD lifespan, and database efficiency.
One major factor behind this is the type of storage engine your database uses: B-Tree or LSM Tree.
What is Write Amplification?
Write amplification is the ratio between:
- Data actually written to the SSD
- Data originally written by the application/database
Example:
Application writes: 10 MB and SSD actually writes: 30 MB Write Amplification Factor = 3Formula:
Data Written to SSD
Write Amplification Factor = -------------------
Data Written by User
Lower write amplification is generally better because it means:
- Better SSD lifespan
- Less disk bandwidth usage
- Better write throughput
Why SSDs Write More Than You Asked
The SSD is not magically creating extra user data. The extra writes come from the internal work required to safely update flash memory.
SSDs usually cannot overwrite tiny amounts of data directly. Even if your application changes only a few bytes, the SSD often needs to:
- Read an entire flash page/block
- Modify the changed bytes in memory
- Write the updated page to a new location
- Mark the old page as invalid
Later, the SSD garbage collector cleans up those invalid pages.
So even a tiny write can become much larger internally.
Simple Example
Application writes:
+----------+
| 10 bytes |
+----------+
SSD internally:
+--------------------+
| Read 4KiB page |
+--------------------+
↓
+--------------------+
| Modify 10 bytes |
+--------------------+
↓
+--------------------+
| Write NEW 4KiB |
+--------------------+
↓
Old page marked invalid
This is one of the major sources of write amplification in storage systems.
Two Layers of Write Amplification
There are actually two layers of write amplification:
- SSD-level amplification
- Database-level amplification
Sometimes both combine together and become much worse.
How B-Trees Can Increase Write Amplification
Many traditional databases use B-Trees.
In a simplified view, B-Trees organize data into pages/nodes. When a record changes, the corresponding page must also change.
Suppose a leaf node gets updated. The SSD cannot directly overwrite only that tiny modified part, so the entire page may need to be rewritten internally.
Sometimes the update also propagates upward:
- Leaf node changes
- Parent node updates
- Possibly grandparent/root updates
[ROOT]
|
+-------+-------+
| |
[NODE] [NODE]
|
[LEAF PAGE]
↑
small update here
Possible rewrites:
- leaf page
- parent node
- root node
This means a very small logical database write can turn into many physical SSD writes.
Since B-Trees often update pages in-place, they can generate a lot of random write activity.
Why LSM Trees Exist
This is one reason many modern databases use LSM Trees (Log-Structured Merge Trees).
Instead of constantly modifying pages in-place, LSM Trees mostly write data sequentially by appending new records.
B-Tree:
update page → rewrite page → random writes
LSM Tree:
append new data → sequential writes
Sequential writes are usually much friendlier to SSDs and reduce random rewrite pressure.
This is why LSM-based systems are often very good for:
- Heavy write workloads
- High ingestion systems
- Logging and analytics
- Large-scale distributed databases
The Tradeoff
Nothing comes for free.
While LSM Trees improve write performance, they usually sacrifice some read efficiency.
Reads may become slower because data can exist across:
- Memory tables
- Multiple SSTables
- Different compaction levels
B-Trees, on the other hand, are often excellent for reads because the structure is optimized for direct lookups.
B-Tree:
✔ Better read throughput
✘ More random writes
LSM Tree:
✔ Better write throughput
✘ More complex/slower reads
Conclusion
Write amplification is not just an SSD problem. Database design can make it significantly better or significantly worse.
B-Trees and LSM Trees represent two very different approaches to handling storage:
- B-Trees optimize reads
- LSM Trees optimize writes
Sometimes the real bottleneck is not your database algorithm — it is the physics of flash storage underneath it.