What the OP is describing is, except for the coding, some now classic applied math, i.e., <i>operations research</i>. That applied math got used much less often than one might have expected because the (A) data gathering was too much pain, expense, and botheration, (B) there was too much software to write, and writing it was too clumsy and expensive, (C) the computing was too slow and too expensive, and (D) even when got a good solution ready for production, the production situation commonly changed so fast that the work of (A)-(C) could not change and revise fast enough to keep up. And, of course, the custom, one-shot software was vulnerable to bugs. Net, in practice, a lot of projects failed. Mostly successful projects needed big bucks and lots of unusually insightful sponsorship high in a big organization.<p>But, now (A)-(D) are no longer so difficult. This should be the beginning of a new <i>Golden Age</i> for such work.<p>Sure, since the results of such optimization can look darned <i>smart</i>, smarter than the average human, might call the work <i>artificial intelligence</i>. Really, though, the work is mostly just some classic applied math now enabled in practice by the progress in computer hard/software.<p>The OP is
a nice introduction to vehicle routing via integer linear programming (ILP) set partitioning. For the <i>linear programming</i> and the case of ILP, I'll give a quick view below. But, now, let's just dig in:<p>Here is an explanation of the <i>secret</i> approach, technique, trick that can work for vehicle routing and many other problems: The real problems can have some just awful non-linear cost functions, just absurdly tricky constraints, e.g., from labor contract work rules, equipment maintenance schedules, something can't do near point A near lunch, even handle some random things, even responding to them in real-time (<i>dynamically</i>), etc. yet can still have a good shot at getting a least cost solution or nearly so. The "nearly so" part can mean save a lot of money not available otherwise. When there is randomness, then try to get least expected cost.<p>So, first, the trick is to do the work in two steps.<p>The first step call <i>evaluation</i> and the second, <i>optimization</i>.<p>From 50,000 feet up, all the tricky, non-linear, goofy stuff gets handled by essentially enumeration in the first part leaving some relatively simple data
for the optimization in the second part.<p>In practice, this first
step typically needs a lot
of data on, say, the streets of
a city and requires writing some software
unique to the specific problem.
The second step, the optimization, may require deriving some math and/or writing some unique software, but
the hope is that the step can
be done just by routine application
of some existing optimization software.<p>The OP mentions the now famous optimization software Gurobi from R. Bixby and maybe some people from Georgia Tech, e.g., from George Nemhauser and Ellis Johnson (long at IBM Research and behind IBM's Optimization Subroutine Library (OSL) and its application to crew scheduling at American Airlines).<p>First Step.<p>Suppose you are in Chicago and have 20,000 packages to deliver and 300 trucks. Okay, what trucks deliver what packages to make all the deliveries on time, not overload any trucks, and minimize the cost of driving the trucks? You do have
for each package the GPS coordinates and street address. And you have a lot of data on the streets, where and when traffic is heavy during the day, etc.<p>Okay, let's make some obvious,
likely doable progress: Of those 20,000 packages, maybe have only 15,000 unique addresses. So, for each address, <i>bundle</i> all the packages that go to that address. Then regard the problem as visiting 15,000
addresses instead of delivering
20,000 packages.<p>So, you write some software to <i>enumerate</i>. The enumeration
results in a collection of candidate routes, stops, and packages to be delivered for a single truck. For each of those candidates, you adjust the order in which the stops are made to minimize cost -- so here get some <i>early</i>, first-cut, simple <i>optimization</i>. You keep only those candidates that get the packages delivered on time, meet other criteria, etc. You may have
1 million candidate single
truck routes. For each of the the candidates, you find the (expected) operating cost.<p>So, suppose you have n = 1 million candidate single truck routes.<p>Also you have m = 15,000 addresses to visit.<p>So, you have a table with
m = 15,000 rows and n = 1 million columns. Each column is for some one candidate route. Each row is for some one address. In each column there is a 1 in the
row of each address that candidate
route visits and a 0 otherwise. One more row at the top of the table is, for each column, the operating cost of that candidate route.<p>So, you have a table
of 0's and 1's with
m = 15,000 rows
and n = 1 million
columns. You have a row
with 1 million costs, one
cost for each column.<p>Again, you have 300 trucks.
So, you want to pick, from
the n columns, some
<= 300 columns so that
all the m addresses get served and the total costs of the
columns selected is minimized.
That is, if add the columns as column <i>vectors</i>, then
get all 1's.<p>Second Step.<p>Well, consider variables x_i for i = 1 to n = 1 million. Then we want x_i = 1 if we use the route in column i and 0 otherwise. Let the cost of the route in column i be c_i. We want the total cost (TeX notation):<p>z(x) = sum_{i = 1}^n x_i c_i<p>to be minimized. So, right, we take the big table of m = 15,000 rows and n = 1 million columns and call it m x n matrix A = [a_{ij}]. We let m x 1 column vector b have all 1's. We regard x as n x 1 where in row j = 1 to n is x_j.
Then, we get linear program<p>minimize z(x)<p>subject to<p>Ax = b<p>x >= 0<p>So, this is a case of <i>linear
programming</i>.<p>Except in our problem we have one more
<i>constraint</i> -- each x_i is
0 or 1, and in this case
our problem is 0-1 integer
linear programming.<p>Linear Programming.<p>In linear programming with n
variables, with the real
numbers R, we go into
the n-dimensional
vector space R^n.
The<p>Ax = b<p>x >= 0<p>are the <i>constraints</i>, and
the set of all x that satisfies
those is the <i>feasible region</i> F,
a subset of R<i>n.<p>In R^n, a </i>closed half space*
is a plane and everything
on some one side of it.<p>Then F can be regarded as
an intersection of m closed
half spaces. So, F has flat
sides, straight edges, and
some sharp points (<i>extreme
points</i>).<p>Well,
if there is an optimal
solution, then there is
an optimal solution at at
least one
of those extreme points.
So, the famous Dantzig
simplex algorithm
looks for optimal solutions
in iterations were
each iteration starts
at an extreme point, moves
along an edge, and stops
at the next extreme point.
That's for the geometric
view; the algebraic view
is a tweak of the standard
Gauss elimination algorithm.<p>Linear programming and the
simplex and other algorithms have
a huge collection of nice
properties, including
some surprisingly good
performance both in
practice and in theory.<p>But, asking for each x_i
to be an integer is
in principle and usually
in practice a
huge
difference and gives us
a problem in NP-complete --
at one time, this was a huge,
bitter surprise.<p>Warning: At one time, the field of the applied math of optimization in operations research sometimes
had an attitude, that is, placed
a quasi-religious importance
on <i>optimal</i> solutions
and was contemptuous of
any solutions even 10 cents short
of optimal. Well, that attitude
was costly for all concerned.
Instead of all the concentration
on saving the last 10 cents,
consider saving the first $1 million. Commonly in practice,
we can get close to optimality,
may be able to show that
we are within 1% of optimality,
so close the rest wouldn't even
buy a nice dinner, and see no
way in less than two weeks more of
computer time to try to
get an optimal solution.<p>So, concentrate on the big, fat doughnut, not the hole.<p>How to solve ILP problems is
a huge subject -- e.g., can start with George Nemhauser --
but a major fraction of the
techniques exploit some
surprisingly nice properties
of the simplex algorithm.
Right, likely the best
known approach is the
tree search technique of
<i>branch and bound</i>.