Recovery For Database
May 26, 2025
How to do recovery for database?
What do we mean by recovery for database? Txns have 2 properties: durability and atomicity.
First, how to do durability? Easy way is writing every dirty pages of this txn then commit, very straightforward ahuh?
Then, how to do atomicity? Easy way is do not write any dirty pages, if rollback, then just drop this page and fetch it again from disk.
These approaches have big performance issue, requires random I/O and big buffer pool. Worse is they are conflict, so it is impossible to have both two. For example, what if two txns modify the same page, one is committing and other one is rolling back?
The first one is force/non-force, the second one is steal/non-steal. Looks like force and non-steal is perfect, but not realistic.
What if force and steal? Problem is when corrupting, rolling back changes are already in disk. Good, then we need to rollback changes. How? keep a undo log, having everything that need to be undo (like what this page lookk like before). Question: do we need to flush ‘ROLLBACK’ log before tells the user it’s already rolled back?
Similar with non-force and non-steal? Now what if corrupt but some committed pages are still in memory? Redo it. We can keep a log of changes need to be done. If corrupted, processing all the redo log.
Here is comes non-force and steal. We keep both undo and redo log, as write ahead log, containing both , what the page was and what the page will be(Not exactly the page, but the thing that be changed in this page). So everything is good now, If database corrupts, we process WAL from the beginning, first redo then undo. (Why not first undo then redo?) Then the database will be in the same state as corrupted before.
Problem of WAL? too much processing. Database need to process from the beginning of the log, what if the database had been running for years? Most of the processing (particularly maybe the first 99% logs)is useless because they are already in disk for committed, and rollbacked ones are cancelled much time ago.
Will be helpful if we know where to start. If we give every log record a sequence number(log sequence number, LSN), it will be good if we know the exactly LSN where we are sure that before this one everything is good. Hard to know with WAL. Why? because txns are always changing pages, we do not know in which LSN all dirty pages are written.
What about creating one ourselves? In a specific time, stop accepting txns, flushing all dirty pages, rolling back all. Then write CHECKPOINT. Then we can truncate the log, next time start from here.
Works but big performance issue.
Why we do not know when is THE LSN to be the checkpoint? because there are activate txns making dirty pages and trying to flush them all the time. It would be solved if we can flush buffer pool. In this LSN, start to flush buffer pool, set checkpoint start. when finish the flushing, set chectpoint finish. If the database corrupts, we can load the flushed buffer pool to memory. Because buffer pool and disk represent all the state of the database, then it’s perfect. (Question: actually we do not need the checkpoint finish do we).
Unfortunataly, it is impractical, because buffer pool is so large. Some database has 100+ GB, requires tens of seconds to flush even with SSD, this will impact the performance of ongoing txns.
But do we really need to flush all the pages in buffer? Clean pages, no need cuz they are the same as in disk. We only care about dirty ones. Ok to flush all dirty ones? better than the whole buffer pool, but actually no need to do that. Because we already have the LOGS! we can only maintain which pages are dirty. But how do we rebuild the dirty pages? from logs. If we know which page is dirty and the first LSN to modidy this page, we can rebuild this page right? Fetch it from disk, process the log from that LSN.
Here comes dirty page table. With pageID (as it is) and recLSN (recovery LSN, which is the first LSN that changes this log). This helps to rebuild all the dirty pages, any with later commits log, we can make sure committed txns have their dirty pages in the disk.
But what about aborted txns? Can we undo them? While we can go through all the logs, find aborted one. But what about the canceled one(aka half executed one).