Tabulation and Memoization
Tabulation and memoization are two tactics that can be used to implement DP algorithms.
Tabulation is a bottom-up approach. It starts by solving the lowest level subproblem. The solution then lets us solve the next subproblem, and so forth. We iteratively solve all subproblems in this way until we’ve solved all subproblems, thus finding the solution to the original problem. We save time when a subproblem needs the answer to a subproblem that has been called before, and thus has had its value tabulated.
Memoization, on the other hand, is a top-down approach. It starts with the highest-level subproblems (the ones closest to the original problem), and recursively calls the next subproblem, and the next. We save time when a subproblem A recurses into a subproblem B that has already been called. Since B and all subproblems below it are memoized, we can avoid repeating the entire recursion tree generated by B, saving a lot of computation.
What's the Difference?
Both tabulation and memoization store the answers to subproblems as they are solved. They both operate on the same tradeoff: sacrifice space for time savings by caching answers to solved problems. However, they differ subtly in the way that they use these stored values. Since tabulation is bottom up, every subproblem must be answered to generate the final answer, so the order in which we choose to solve subproblems is important. We also end up solving every single subproblem, even ones that aren't needed for the final solution. In memoization, finding the optimal solution doesn’t necessarily require us to recurse fully through all subproblems, since there is no strict order in which subproblems need to be solved. If we can be certain that a subproblem will not contribute to the final solution, we don't need to look at it, nor its subsequent recursion tree.
Another way to understand DP is to consider it as building up a directed acyclic graph (DAG) of subproblems, looking up the stored values of nodes as needed. If we topologically sort this DAG, oriented from left to right, we can see that tabulation builds up the DAG from the leftmost node in the sorting, steadily answering subproblems and adding nodes to the right. Since we started at the left, tabulation will have to build up the entire DAG, being careful to choose the correct next node in the dependency. Memoization, on the other hand, builds up the DAG recursively, starting from the right side. If we find that a node is not optimal, we no longer have to continue examining its neighbors leftwards.
To summarize, the major differences between tabulation and memoization are:
- tabulation has to look through the entire search space; memoization does not
- tabulation requires careful ordering of the subproblems is; memoization doesn’t care much about the order of recursive calls.
It is often easier to implement DP solutions with memoization. If we define the subproblems recursively by creating a recurrence relation, then it is straightforward to translate the relation into code.
Of course, problems can be formulated with tabulation as well. This requires more thought however, and the conscious ordering of subproblems, so requires subtler design.
In terms of storage, tabulation can potentially have a smaller footprint than memoization. If we order our subproblems carefully, we can have certain subproblems that are reused early on, then never used again. This enables us to cache the answer to each subproblem for just a limited amount of time. For example, when generating the Fibonacci sequence from the bottom up, we first add 1 and 1 together to get 2. Then, we add 1 and 2 together to get 3. And so on. Any numbers that we’ve calculated before that can be discarded, since we don’t need them anymore to continue generating the sequence. Thus, we create a “sliding window” that enables us to cache just two numbers at any given time. This is an extreme example, but illustrates the importance of ordering your subproblems wisely to minimize space usage in tabulation.
Memoization, on the other hand, does not give us the ability to minimize our storage. Since the calls to subproblems are recursive, we cannot make any smart or definite decisions on how to order the subproblems. Instead, we have to cache all answers to subproblems, in case another recursive call will need them in the future.
Memoization on very complex problems can be problematic, since there is so much overhead that comes with recursion—each recursive call requires that we keep the entire recursion tree in memory. If the recursion is deep enough, it could overflow the function call stack.
A way to speed up memoization is to parallelize the recursive calls, while maintaining a global storage for memoized answers to subproblems. Since the ordering of subproblems in memoization does not matter (unlike tabulation), this is viable.
Tabulation is often faster than memoization, because it is iterative and solving subproblems requires no overhead. However, it has to go through the entire search space, which means that there is no way to easily optimize the runtime.
In theory, every tabulation problem can be solved with memoization, and vice-versa.
Tabulation is a good method if you know that you need to calculate the answers to all subproblems. It is also useful when there is a logical ordering of the subproblems. For example, in the CheckersDP example, we used memoization to solve the problem. However, since we know that we will eventually explore all squares on the checkerboard, tabulation would have been a viable and potentially more efficient way to solve this problem.
Memoization is a good approach if there is no easily apparent ordering of subproblems, and if the entire search space does not need to be explored.
Here is a real-world application of DP to auto-correcting text.