Three nice techniques
In the past months, I’ve found myself really appreciating some nice techniques for constructing algorithms. These are probably quite familiar to those with a computer science background, but are new to me. There are three in particular that I have in mind; I’ll just highlight the first two here, and then discuss the third in detail.
The first such technique I learned about was solving the traveling salesman problem via mixed-integer programming, using lazy constraints. In essence, you encode a solution to the traveling salesman problem (a tour) as a permutation (of the cities to visit), which you represent with a binary matrix. Then you minimize the cost of the tour over all permutation matrices. The difficulty is that this can yield “tours” that don’t aren’t full cycles: a permutation of \((1,2,3,4)\) could be \(1\) goes to \(2\), \(2\) goes to \(1\), \(3\) goes to \(4\), and \(4\) goes to \(3\). That’s actually just two transpositions instead of a single cycle of all the cities. So instead, one needs to rule out all the permutations that aren’t cycles. However, there doesn’t seem to be a good way to do that; one could impose exponentially many linear constraints, one for each non-full-cycle, for example, but that would be very inefficient (if you can even fit them all into memory). But what you can do is solve the problem without such constraints, and then just add constraints corresponding to any non-full-cycles that you obtain in the solution. Then you re-solve the problem. That way, you only need to impose the constraints that actually matter (which usually is far fewer than the total number of constraints). I thought this was really cool! I learned about it from the blog post MIP - Travelling Salesman by Ole Kröger. To explore it for myself, I wrote a Julia package TravelingSalesmanExact.jl to use this technique for solving the problem (using JuMP.jl). In fact, shortly after coming across this approach to the traveling salesman problem, I realized that I could use the same technique to find good upper bounds on an optimization problem that came up in my research (that project is still in progress, so I won’t go into details here).
The second algorithmic technique is that of using outer approximations iteratively to solve mixed-integer non-linear programming problems. Essentially, one alternates between mixed-integer linear solves, and continuous non-linear (though hopefully convex) solves. Each solution provides some information for the next problem; the mixed-integer solve gives a new set of integer values to use in the continuous problem, and the solution to the continuous problem generates another linear constraint to use in the mixed-integer problem (kind of like the lazy constraints in the TSP algorithm above). I learned about this from the paper Outer Approximation With Conic Certificates For Mixed-Integer Convex Problems which describes using the technique to solve mixed-integer SDPs, and corresponds to the Julia package Pajarito.jl. It’s also described well in the review paper Mixed-integer nonlinear optimization.
The third technique is that of dynamic programming. There are lots of explanations of dynamic programming out there, but the one that helped me understand was the following. Consider the task of computing the \(n\)th Fibonacci number, defined by \(f(n) = f(n-1) + f(n-2)\) for \(n >2\), and \(f(1) = f(2) = 1\).
With a naive recursive implementation, we have the following tree of function calls:
We can see the exponential complexity of this algorithm: to compute \(f(n+1)\) instead of \(f(n)\), we add a whole subtree of calculations. However, instead we can memoize the results, that is, cache them so we don’t need to recompute them. For example, after we compute \(f(4)\) the first time, we don’t need to compute it again. This results in the following call tree:
We can see that the algorithm has been linearized. To compute \(f(n+1)\), we don’t need to compute an extra whole subtree, and instead just do one more step. This kind of self-similarity of the problem allows us to reuse the work.
Using dynamic programming to solve a covering problem
In the following, I’ll use Julia’s notation for ranges, by which I mean an
interval of integers. If a
and b
are integers, then a:b
means the set
\(\{a, a+1, a+2, \ldots, b\}\).
Our task is to partially covering a range of integers from a collection of
subranges. More specifically, given a range r
and a collection of ranges C
, we aim to find the smallest subcollection of C
that covers at least K
elements of r
.
This problem was posed by Jakob Nissen in the random channel of the Julia slack,
and in CoverRanges.jl I’ve
implemented a dynamic programming algorithm made by Mathieu Tanneau, which is
partly based on the solution to the case when K = length(r)
given here:
https://stackoverflow.com/a/294419.
In the following, I’ll show a couple examples of the problem and the solution
using CoverRanges.jl
, and then go through the actual code of the algorithm.
A few examples
Let’s consider julia r = 1:6
, C = [1:1, 2:5, 6:6, 1:3, 4:6]
and K = 6
. This is
just the case of needing an 100% cover of the range. A naive greedy algorithm
might take the longest range 2:5
. But then to cover 1
and 6
, two more
ranges would be needed. Instead, the optimum is to take the two ranges of length
3
, namely 1:3
and 4:6
.
julia> using CoverRanges
julia> r = 1:6
1:6
julia> C = [1:1, 2:5, 6:6, 1:3, 4:6];
julia> C[solve_min_covering(r, 6, C)]
2-element Array{UnitRange{Int64},1}:
1:3
4:6
We can see that’s exactly what we get. Let’s make it more interesting. We’ll
tile two copies of C
next to each other:
julia> append!(C, [r .+ 6 for r in C])
10-element Array{UnitRange{Int64},1}:
1:1
2:5
6:6
1:3
4:6
7:7
8:11
12:12
7:9
10:12
and let’s take r=1:12
. Now, if K=8
, one can just take the two largest ranges:
julia> r = 1:12
1:12
julia> C[solve_min_covering(r, 8, C)]
2-element Array{UnitRange{Int64},1}:
2:5
8:11
But for K = 10
, we’ll take the two length-3 ranges to cover the first six,
then the length 4 range to complete the partial cover:
julia> C[solve_min_covering(r, 10, C)]
3-element Array{UnitRange{Int64},1}:
1:3
4:6
8:11
whereas for K=11
, the best we can do is cover all 12 by four sets of length-3 ranges:
julia> C[solve_min_covering(r, 11, C)]
4-element Array{UnitRange{Int64},1}:
1:3
4:6
7:9
10:12
CoverRanges.jl
also provides an AccessDict
object, which is a dictionary that
keeps track of when its elements are retrieved or set. Note that only enough
methods are implemented for it to be used as a memoization cache in the
package; for other uses, more methods would likely be needed (e.g. getindex
and setindex!
). This can help provide more detailed information about how the
algorithm works.
julia> using CoverRanges: LeftToCover, PartialCover
julia> cache = AccessDict{LeftToCover{Int}, PartialCover{Int}}()
AccessDict{LeftToCover{Int64},PartialCover{Int64},Int64} with 0 entries
julia> solve_min_covering(r, 11, C; cache = cache)
4-element Array{Int64,1}:
4
5
9
10
julia> total_gets(cache)
7
julia> total_sets(cache)
22
The algorithm in code
The rest of this section was actually generated from the source code itself, using Literate.jl to transform it into the version below.
The idea of the algorithm is that we can simply start from the left-most element
of the range r
that we wish to cover, and decide whether or not to include it
in our partial cover. If we do want to include it, the optimal range to use to
cover it is the one which extends the furthest to the right (out of all the
ranges which include the element). Then it remains to solve a smaller instance
of the problem. Likewise, if we don’t include the element, we again are left
with a smaller instance of the problem. In other words, we aim to solve the
dynamic programming equation
min_number_to_cover(a:b, K) = min(1 + min_number_to_cover(f:b, K - (f-a+1)), min_number_to_cover((a+1):b, K))
where
min_number_to_cover(r, K)
is the minimum number of ranges needed to cover at
least K
elements of r
and f
is the endpoint furthest to the right of any
range in C
that includes a
.
Why is this the right recursive equation for this problem? Let us say we reach
node a
. We can either
- include node
a
in our set of nodes covered, in which case we choose the range that includesa
and stretches as far as possible to the right; in that case, we have to solve the problem on the remaining set of nodes (this corresponds tomin_number_to_cover(f:b, K - (f-a+1))
). - Otherwise, we skip node
a
and need to solve the problem with the remaining nodesa+1:b
, still needing to coverK
nodes (this corresponds tomin_number_to_cover( (a+1):b, K))
).
Let’s begin the implementation.
First, we need an object to represent an impossible task, which is never chosen in a
minimum and which is invariant under adding +1
. Julia’s Inf
does this for
us, but Inf
is a floating point number, and we’d prefer to stick to integers
(with the exception of this infinity).
struct ComparisonInf end
const CompInf = ComparisonInf()
Base.isless(::ComparisonInf, ::ComparisonInf) = false
for T in (Int8, Int16, Int32, Int64, Int128)
@eval Base.isless(::ComparisonInf, ::$T) = false
@eval Base.isless(::$T, ::ComparisonInf) = true
@eval Base.:+(::$T, c::ComparisonInf) = c
@eval Base.:+(c::ComparisonInf, ::$T) = c
end
In the above list, we could add other types if needed, such as SaferIntegers.jl numeric types to ensure our algorithm does not overflow.
Next, we define an struct to hold an instance or subinstance of our problem.
struct LeftToCover{T}
"Range to partially cover"
r::UnitRange{T}
"Number of nodes we are required to cover"
K::T
end
And likewise, one to hold a solution (or part of one).
struct PartialCover{T}
"Minimum number needed to partially cover the range"
m::Union{T, ComparisonInf}
"Index of latest range added"
i::Int
"Key to point towards next range in the partial cover"
key::LeftToCover{T}
end
Next, to find the range with the furthest right endpoint including a given node, we just write a simple loop.
"""
find_furthest_range(a, C) -> index, right_endpoint
Given a node `a`, finds the range `r` in `C` such that `r` contains `a` and has
the largest right endpoint.
"""
function find_furthest_range(a, C)
# This could be optimized more by sorting `C` first and terminating early
best_f_so_far = typemin(a) # right endpoint
best_i_so_far = nothing # index
for (i, r) in enumerate(C)
a ∈ r || continue
if last(r) > best_f_so_far
best_f_so_far = last(r)
best_i_so_far = i
end
end
return best_i_so_far, best_f_so_far
end
Now comes the heart of the algorithm. We compute
min_number_to_cover(a:b, K) = min(1 + min_number_to_cover(f:b, K - (f-a+1)), min_number_to_cover((a+1):b, K))
recursively with the help of a memoization cache, which maps from (sub)instances of the problem to solutions.
function min_covering(key::LeftToCover{T}, furthest_ranges, cache::AbstractDict{LeftToCover{T}, PartialCover{T}}) where {T}
get!(cache, key) do
r = key.r; K = key.K
K <= 0 && return PartialCover{T}(zero(T), 0, key) # Nothing left to cover
K > length(r) && return PartialCover{T}(CompInf, 0, key) # Impossible task
a, b = first(r), last(r)
i_a, f_a = furthest_ranges[a] # index and right endpoint of optimal range including `a`
# Solve the subproblem in which we skip `a`
skip_cover = min_covering(LeftToCover{T}(a+1:b, K), furthest_ranges, cache)
i_a === nothing && return skip_cover # cannot cover `a` means we must skip it
# Solve the subproblem in which we include `a` in our partial cover
add_key = LeftToCover{T}( (f_a+1) : b, K - (f_a - a + 1))
add_cover = min_covering(add_key, furthest_ranges, cache)
# Decide whether or not to include `a` and return the better result
if add_cover.m + 1 <= skip_cover.m
return PartialCover{T}(add_cover.m + T(1), i_a, add_key)
else
return skip_cover
end
end
end
Note that in the case that we choose to skip a
, we return skip_cover
, which
is the solution for the subproblem that skips a
. But when we choose to include
a
in the partial cover, we make a new PartialCover
object instead of
returning add_cover
. That’s because if we include a
by using the range that
ends at f_a
, we really are adding a range to the partial cover represented by
add_cover
, so we make a new PartialCover
object, one whose value of m
is one
larger (corresponding to the extra range we added), which holds the index i_a
of the range that we added, and which holds add_key
, which we can use to
recover add_cover
from cache
later. In all, this lets us hold the sequence
of ranges used in the final optimal cover efficiently (as a linked list).
Lastly, we wrap min_covering
in a helper function that constructs furthest_ranges
,
the memoization cache, and unwinds the linked list that holds the solution in order
to recover the indices of the ranges included in the minimal partial cover we have found.
"""
solve_min_covering(r::UnitRange{T}, K, C; [cache]) -> Union{Nothing, Vector{Int}}
Returns either `nothing` or a vector of indices of `C` corresponding to a
minimal set of ranges which covers at least `K` elements of `r`. Returns
`nothing` if and only if the task is impossible.
* `r`: range to partially cover
* `K`: number of elements of `r` that must be
covered
* `C`: vector of ranges to use in constructing the partial cover
* `cache`: optionally provide an `AbstractDict{LeftToCover{T}, PartialCover{T}}`
which is used as a memoization cache
!!! warning
Overflow is not checked.
"""
function solve_min_covering(r::UnitRange{T}, K, C; cache = Dict{LeftToCover{T},
PartialCover{T}}()) where {T}
# `furthest_ranges` is all we need from `C`
furthest_ranges = Dict(a => find_furthest_range(a, C) for a in r)
# Recursive function to compute the covering
result = min_covering(LeftToCover{T}(r, K), furthest_ranges, cache)
k = result.m # minimum number of ranges needed
k == CompInf && return nothing
# We store the list of ranges essentially as a linked list.
# To recover the optimal list of ranges, we traverse through
# the cache, at each step getting the `key` for the next element in the cache.
range_inds = Int[]
sizehint!(range_inds, k)
while k > 1
push!(range_inds, result.i)
k = result.m
result = cache[result.key]
end
return range_inds
end