Learning algorithmic techniques: dynamic programming

November 10, 2019

Tags: programming, Julia, optimization

Three nice techniques

In the past months, I’ve found myself really appreciating some nice techniques for constructing algorithms. These are probably quite familiar to those with a computer science background, but are new to me. There are three in particular that I have in mind; I’ll just highlight the first two here, and then discuss the third in detail.

The first such technique I learned about was solving the traveling salesman problem via mixed-integer programming, using lazy constraints. In essence, you encode a solution to the traveling salesman problem (a tour) as a permutation (of the cities to visit), which you represent with a binary matrix. Then you minimize the cost of the tour over all permutation matrices. The difficulty is that this can yield “tours” that don’t aren’t full cycles: a permutation of \((1,2,3,4)\) could be \(1\) goes to \(2\), \(2\) goes to \(1\), \(3\) goes to \(4\), and \(4\) goes to \(3\). That’s actually just two transpositions instead of a single cycle of all the cities. So instead, one needs to rule out all the permutations that aren’t cycles. However, there doesn’t seem to be a good way to do that; one could impose exponentially many linear constraints, one for each non-full-cycle, for example, but that would be very inefficient (if you can even fit them all into memory). But what you can do is solve the problem without such constraints, and then just add constraints corresponding to any non-full-cycles that you obtain in the solution. Then you re-solve the problem. That way, you only need to impose the constraints that actually matter (which usually is far fewer than the total number of constraints). I thought this was really cool! I learned about it from the blog post MIP - Travelling Salesman by Ole Kröger. To explore it for myself, I wrote a Julia package TravelingSalesmanExact.jl to use this technique for solving the problem (using JuMP.jl). In fact, shortly after coming across this approach to the traveling salesman problem, I realized that I could use the same technique to find good upper bounds on an optimization problem that came up in my research (that project is still in progress, so I won’t go into details here).

The second algorithmic technique is that of using outer approximations iteratively to solve mixed-integer non-linear programming problems. Essentially, one alternates between mixed-integer linear solves, and continuous non-linear (though hopefully convex) solves. Each solution provides some information for the next problem; the mixed-integer solve gives a new set of integer values to use in the continuous problem, and the solution to the continuous problem generates another linear constraint to use in the mixed-integer problem (kind of like the lazy constraints in the TSP algorithm above). I learned about this from the paper Outer Approximation With Conic Certificates For Mixed-Integer Convex Problems which describes using the technique to solve mixed-integer SDPs, and corresponds to the Julia package Pajarito.jl. It’s also described well in the review paper Mixed-integer nonlinear optimization.

The third technique is that of dynamic programming. There are lots of explanations of dynamic programming out there, but the one that helped me understand was the following. Consider the task of computing the \(n\)th Fibonacci number, defined by \(f(n) = f(n-1) + f(n-2)\) for \(n >2\), and \(f(1) = f(2) = 1\).

With a naive recursive implementation, we have the following tree of function calls:

f(6)f(5)f(4)f(4)f(3)f(3)f(2) = 1f(3)f(2) = 1f(2) = 1f(1) = 1f(2) = 1f(1) = 1f(2) = 1f(1) = 1

We can see the exponential complexity of this algorithm: to compute \(f(n+1)\) instead of \(f(n)\), we add a whole subtree of calculations. However, instead we can memoize the results, that is, cache them so we don’t need to recompute them. For example, after we compute \(f(4)\) the first time, we don’t need to compute it again. This results in the following call tree:

f(6)f(5)f(4)f(4)f(3)f(3)f(2) = 1f(2) = 1f(1) = 1

We can see that the algorithm has been linearized. To compute \(f(n+1)\), we don’t need to compute an extra whole subtree, and instead just do one more step. This kind of self-similarity of the problem allows us to reuse the work.

Using dynamic programming to solve a covering problem

In the following, I’ll use Julia’s notation for ranges, by which I mean an interval of integers. If a and b are integers, then a:b means the set \(\{a, a+1, a+2, \ldots, b\}\).

Our task is to partially covering a range of integers from a collection of subranges. More specifically, given a range r and a collection of ranges C, we aim to find the smallest subcollection of C that covers at least K elements of r.

This problem was posed by Jakob Nissen in the random channel of the Julia slack, and in CoverRanges.jl I’ve implemented a dynamic programming algorithm made by Mathieu Tanneau, which is partly based on the solution to the case when K = length(r) given here: https://stackoverflow.com/a/294419.

In the following, I’ll show a couple examples of the problem and the solution using CoverRanges.jl, and then go through the actual code of the algorithm.

A few examples

Let’s consider julia r = 1:6, C = [1:1, 2:5, 6:6, 1:3, 4:6] and K = 6. This is just the case of needing an 100% cover of the range. A naive greedy algorithm might take the longest range 2:5. But then to cover 1 and 6, two more ranges would be needed. Instead, the optimum is to take the two ranges of length 3, namely 1:3 and 4:6.

julia> using CoverRanges

julia> r = 1:6

julia> C = [1:1, 2:5, 6:6, 1:3, 4:6];

julia> C[solve_min_covering(r, 6, C)]
2-element Array{UnitRange{Int64},1}:

We can see that’s exactly what we get. Let’s make it more interesting. We’ll tile two copies of C next to each other:

julia> append!(C, [r .+ 6 for r in C])
10-element Array{UnitRange{Int64},1}:

and let’s take r=1:12. Now, if K=8, one can just take the two largest ranges:

julia> r = 1:12

julia> C[solve_min_covering(r, 8, C)]
2-element Array{UnitRange{Int64},1}:

But for K = 10, we’ll take the two length-3 ranges to cover the first six, then the length 4 range to complete the partial cover:

julia> C[solve_min_covering(r, 10, C)]
3-element Array{UnitRange{Int64},1}:

whereas for K=11, the best we can do is cover all 12 by four sets of length-3 ranges:

julia> C[solve_min_covering(r, 11, C)]
4-element Array{UnitRange{Int64},1}:

CoverRanges.jl also provides an AccessDict object, which is a dictionary that keeps track of when its elements are retrieved or set. Note that only enough methods are implemented for it to be used as a memoization cache in the package; for other uses, more methods would likely be needed (e.g. getindex and setindex!). This can help provide more detailed information about how the algorithm works.

julia> using CoverRanges: LeftToCover, PartialCover

julia> cache = AccessDict{LeftToCover{Int}, PartialCover{Int}}()
AccessDict{LeftToCover{Int64},PartialCover{Int64},Int64} with 0 entries

julia> solve_min_covering(r, 11, C; cache = cache)
4-element Array{Int64,1}:

julia> total_gets(cache)

julia> total_sets(cache)

The algorithm in code

The rest of this section was actually generated from the source code itself, using Literate.jl to transform it into the version below.

The idea of the algorithm is that we can simply start from the left-most element of the range r that we wish to cover, and decide whether or not to include it in our partial cover. If we do want to include it, the optimal range to use to cover it is the one which extends the furthest to the right (out of all the ranges which include the element). Then it remains to solve a smaller instance of the problem. Likewise, if we don’t include the element, we again are left with a smaller instance of the problem. In other words, we aim to solve the dynamic programming equation

min_number_to_cover(a:b, K) = min(1 + min_number_to_cover(f:b, K - (f-a+1)), min_number_to_cover((a+1):b, K))

where min_number_to_cover(r, K) is the minimum number of ranges needed to cover at least K elements of r and f is the endpoint furthest to the right of any range in C that includes a.

Why is this the right recursive equation for this problem? Let us say we reach node a. We can either

  • include node a in our set of nodes covered, in which case we choose the range that includes a and stretches as far as possible to the right; in that case, we have to solve the problem on the remaining set of nodes (this corresponds to min_number_to_cover(f:b, K - (f-a+1))).
  • Otherwise, we skip node a and need to solve the problem with the remaining nodes a+1:b, still needing to cover K nodes (this corresponds to min_number_to_cover( (a+1):b, K))).

Let’s begin the implementation. First, we need an object to represent an impossible task, which is never chosen in a minimum and which is invariant under adding +1. Julia’s Inf does this for us, but Inf is a floating point number, and we’d prefer to stick to integers (with the exception of this infinity).

struct ComparisonInf end
const CompInf = ComparisonInf()
Base.isless(::ComparisonInf, ::ComparisonInf) = false

for T in (Int8, Int16, Int32, Int64, Int128)
    @eval Base.isless(::ComparisonInf, ::$T) = false
    @eval Base.isless(::$T, ::ComparisonInf) = true
    @eval Base.:+(::$T, c::ComparisonInf) = c
    @eval Base.:+(c::ComparisonInf, ::$T) = c

In the above list, we could add other types if needed, such as SaferIntegers.jl numeric types to ensure our algorithm does not overflow.

Next, we define an struct to hold an instance or subinstance of our problem.

struct LeftToCover{T}
    "Range to partially cover"
    "Number of nodes we are required to cover"

And likewise, one to hold a solution (or part of one).

struct PartialCover{T}
    "Minimum number needed to partially cover the range"
    m::Union{T, ComparisonInf}
    "Index of latest range added"
    "Key to point towards next range in the partial cover"

Next, to find the range with the furthest right endpoint including a given node, we just write a simple loop.

    find_furthest_range(a, C) -> index, right_endpoint

Given a node `a`, finds the range `r` in `C` such that `r` contains `a` and has
the largest right endpoint.
function find_furthest_range(a, C)
    # This could be optimized more by sorting `C` first and terminating early
    best_f_so_far = typemin(a) # right endpoint
    best_i_so_far = nothing # index
    for (i, r) in enumerate(C)
        a ∈ r || continue
        if last(r) > best_f_so_far
            best_f_so_far = last(r)
            best_i_so_far = i
    return best_i_so_far, best_f_so_far

Now comes the heart of the algorithm. We compute

min_number_to_cover(a:b, K) = min(1 + min_number_to_cover(f:b, K - (f-a+1)), min_number_to_cover((a+1):b, K))

recursively with the help of a memoization cache, which maps from (sub)instances of the problem to solutions.

function min_covering(key::LeftToCover{T}, furthest_ranges, cache::AbstractDict{LeftToCover{T}, PartialCover{T}}) where {T}
    get!(cache, key) do
        r = key.r;  K = key.K
        K <= 0 && return PartialCover{T}(zero(T), 0, key) # Nothing left to cover
        K > length(r) && return PartialCover{T}(CompInf, 0, key) # Impossible task
        a, b = first(r), last(r)
        i_a, f_a = furthest_ranges[a] # index and right endpoint of optimal range including `a`

        # Solve the subproblem in which we skip `a`
        skip_cover = min_covering(LeftToCover{T}(a+1:b, K), furthest_ranges, cache)
        i_a === nothing && return skip_cover # cannot cover `a` means we must skip it

        # Solve the subproblem in which we include `a` in our partial cover
        add_key = LeftToCover{T}( (f_a+1) : b, K - (f_a - a + 1))
        add_cover = min_covering(add_key, furthest_ranges, cache)

        # Decide whether or not to include `a` and return the better result
        if add_cover.m + 1 <= skip_cover.m
            return PartialCover{T}(add_cover.m + T(1), i_a, add_key)
            return skip_cover

Note that in the case that we choose to skip a, we return skip_cover, which is the solution for the subproblem that skips a. But when we choose to include a in the partial cover, we make a new PartialCover object instead of returning add_cover. That’s because if we include a by using the range that ends at f_a, we really are adding a range to the partial cover represented by add_cover, so we make a new PartialCover object, one whose value of m is one larger (corresponding to the extra range we added), which holds the index i_a of the range that we added, and which holds add_key, which we can use to recover add_cover from cache later. In all, this lets us hold the sequence of ranges used in the final optimal cover efficiently (as a linked list).

Lastly, we wrap min_covering in a helper function that constructs furthest_ranges, the memoization cache, and unwinds the linked list that holds the solution in order to recover the indices of the ranges included in the minimal partial cover we have found.

    solve_min_covering(r::UnitRange{T}, K, C; [cache]) -> Union{Nothing, Vector{Int}}

Returns either `nothing` or a vector of indices of `C` corresponding to a
minimal set of ranges which covers at least `K` elements of `r`. Returns
`nothing` if and only if the task is impossible.

* `r`: range to partially cover
* `K`: number of elements of `r` that must be
* `C`: vector of ranges to use in constructing the partial cover
* `cache`: optionally provide an `AbstractDict{LeftToCover{T}, PartialCover{T}}`
which is used as a memoization cache

!!! warning
    Overflow is not checked.
function solve_min_covering(r::UnitRange{T}, K, C; cache =  Dict{LeftToCover{T},
PartialCover{T}}()) where {T}
    # `furthest_ranges` is all we need from `C`
    furthest_ranges = Dict(a => find_furthest_range(a, C) for a in r)

    # Recursive function to compute the covering
    result = min_covering(LeftToCover{T}(r, K), furthest_ranges, cache)

    k = result.m # minimum number of ranges needed
    k == CompInf && return nothing

    # We store the list of ranges essentially as a linked list.
    # To recover the optimal list of ranges, we traverse through
    # the cache, at each step getting the `key` for the next element in the cache.
    range_inds = Int[]
    sizehint!(range_inds, k)
    while k > 1
        push!(range_inds, result.i)
        k = result.m
        result = cache[result.key]
    return range_inds