Home AI/ML Mixed-Integer Programming (MIP) Explained: Python Optimization Guide

Mixed-Integer Programming (MIP) Explained: Python Optimization Guide

UPS’s ORION routing system saves the company roughly 100 million miles of driving every single year, cuts fuel consumption by 10 million gallons, and eliminates around 100,000 metric tons of CO2 emissions. It is not powered by a mysterious neural network or a secret reinforcement-learning trick. At its core, ORION is a gigantic Mixed-Integer Program (MIP) — a mathematical optimization model with yes/no decisions, integer counts, and linear relationships — solved to near-optimality day after day. Airlines like American and Delta use the same kind of math to schedule crews across tens of thousands of flights, saving hundreds of millions of dollars each year. Amazon’s same-day delivery network is essentially one enormous MIP being re-solved every few minutes.

Mixed-Integer Programming is probably the most valuable piece of applied mathematics that most software engineers have never written a line of code for. If you have ever faced a problem that has the flavor of “pick which things to do, how many, and in what order to minimize cost or maximize profit,” you have almost certainly faced a MIP without knowing it. This guide will show you what MIP is, how to formulate problems in it, how the solvers work under the hood, and how to write real Python code that runs today.

The Big Idea Behind MIP

Imagine you run a small delivery business and you are deciding which of five warehouses to open and which customers to serve from each. Opening a warehouse is a yes/no decision. The number of trucks you buy is an integer. The volume of product you ship each day can be a continuous number. Your cost depends on all of this in a mostly linear way: fixed costs for opening, variable costs for shipping. You want to minimize total cost subject to meeting customer demand. Congratulations — you have just described a Mixed-Integer Linear Program.

A MIP is an optimization problem where some variables must take integer (or binary 0/1) values, others can be continuous, the objective is linear, and the constraints are linear. The “mixed” refers to that combination of integer and continuous variables. When every variable is continuous, you have a Linear Program (LP) — solvable in polynomial time by the simplex or interior-point methods. When every variable is integer, you have a pure Integer Program (IP). In practice, most real problems are MIPs because real business decisions mix discrete choices with continuous quantities.

LP vs IP vs MIP: What Actually Changes

The theoretical jump from LP to MIP is enormous. LP is polynomial-time solvable; MIP is NP-hard. That means as problems grow, solution time can explode. But in practice, modern MIP solvers routinely handle problems with millions of variables because the structure of real problems is usually much friendlier than the worst case.

Aspect LP IP (Pure Integer) MIP
Variable types All continuous All integer/binary Mix of continuous and integer
Complexity Polynomial (P) NP-hard NP-hard
Typical size solvable Millions of variables Thousands to millions Thousands to millions
Algorithm Simplex / Interior point Branch and cut Branch and cut
Use case Resource allocation, blending Pure combinatorial Most real business problems
Example Refinery product mix TSP, graph coloring Facility location, scheduling

 

Why “Just Round the LP Solution” Fails

A tempting shortcut: solve the LP relaxation (pretending integer variables are continuous), then round to the nearest integer. This is almost always wrong, and it can be spectacularly wrong. Consider a simple example: maximize x + y subject to x + y ≤ 1.5 with x, y ∈ {0, 1}. The LP relaxation says x = 0.5, y = 1.0 for an objective of 1.5. Round naively and you might get (1, 1) — infeasible — or (0, 1) for an objective of 1, or (1, 0) for 1. The true MIP optimum is 1. Now imagine a constraint like “x + y + z + … ≤ 1″ for opening one warehouse out of 100: rounding the LP fractional solution gives nonsense.

The gap between the LP relaxation’s optimal value and the true MIP optimal value is called the integrality gap. A formulation with a small integrality gap is called a “tight” or “strong” formulation. Much of the art of MIP modeling is about making this gap as small as possible without exploding the problem size.

MIP Geometry: LP Relaxation vs Integer Feasibility x y LP feasible region LP optimum (fractional) (x=4.2, y=6.8), obj=11.0 MIP optimum (integer) (x=4, y=6), obj=10.0 integrality gap = 10% Legend Integer feasible point LP optimum (corner, fractional) MIP optimum (best integer) LP feasible region Key insight: Rounding the LP optimum (4.2, 6.8) does NOT give the MIP optimum. The best integer point may lie deep inside — not on the boundary. Tighter formulations shrink the LP polygon toward the integer hull — faster solves.

When MIP Shines and When It Doesn’t

MIP is the right tool when your problem has a clear discrete structure, a mostly linear cost model, and you value a provable guarantee of optimality (or a bounded gap). Classic MIP sweet spots include assignment (which workers do which jobs), scheduling (which tasks on which machines in which order), routing (vehicle paths through customers), facility location (where to put depots), network design (which links to build), capacity planning (how much to invest), and portfolio optimization with discrete constraints (cardinality limits, round-lot purchases).

MIP is not the right tool when your problem is entirely continuous (just use LP or QP), when the cost function is wildly nonlinear and can’t be reasonably linearized (consider nonlinear solvers or genetic algorithms), when you have no clear discrete structure to exploit, or when you need answers in milliseconds on problems a solver would need minutes for. Real-time control, for example, often uses a heuristic or learned policy — sometimes trained by solving many MIPs offline.

Key Takeaway: MIP gives you a provable optimum (or a proven gap) for problems with discrete decisions. It scales shockingly far in practice thanks to decades of algorithmic engineering, but it pays off most when your problem genuinely has that yes/no, integer-count structure.

Formulating a MIP Step by Step

Formulating a MIP is half art, half engineering. You define decision variables, write an objective, and encode business rules as linear constraints. The same problem can be modeled in many ways, and the differences matter enormously for solve time.

Decision Variables

MIPs have three common variable types:

  • Continuous (e.g., liters of fuel, dollars invested): any real number in a range.
  • Integer (e.g., number of trucks, number of workers): non-negative integers.
  • Binary (e.g., open warehouse yes/no, buy stock yes/no): 0 or 1. Binaries are by far the most common in modeling because they encode logical choices.

Objective Function

The objective is a linear combination of the decision variables: for example, minimize total cost = sum of (fixed cost × open_i) + sum of (unit cost × shipment_ij). Keeping the objective linear is a soft rule; many “nonlinear” costs can be linearized by introducing auxiliary variables and constraints.

Linear Constraints and Logical Constraints

Constraints are ≤, ≥, or = relations between linear expressions. The power comes from using binary variables to encode logic:

  • At most k:i xi ≤ k
  • At least k:i xi ≥ k
  • Exactly one:i xi = 1 (assignment)
  • Implication (if x=1 then y=1): y ≥ x
  • Mutual exclusion (x and y cannot both be 1): x + y ≤ 1

The Big-M Method for If-Then Logic

One of the oldest and most abused tricks in MIP is the “Big-M” method. Suppose you want to express: “if binary y = 0, then continuous x must be 0; if y = 1, x can go up to its natural upper bound.” You write:

x ≤ M * y     # where M is a "big enough" number

If y = 0, the constraint forces x ≤ 0 so x = 0. If y = 1, xM which is effectively no upper bound. Simple. But Big-M is dangerous: choosing M too large weakens the LP relaxation (increases the integrality gap) and introduces numerical instability. Modern solvers like Gurobi and CPLEX support indicator constraints (y = 1 ⇒ x ≤ c) natively, which are both tighter and safer.

Caution: A common bug is setting M = 1e9 “just to be safe.” This wrecks numerical stability and makes your LP relaxation useless. Pick the smallest M that is still a valid upper bound on the quantity involved.

Worked Example: The 0/1 Knapsack

You have a bag with capacity W and n items, each with weight wi and value vi. Pick a subset of items that maximizes total value without exceeding capacity.

Variables: xi ∈ {0, 1} = 1 if item i is chosen.

Objective: maximize ∑i vi xi

Constraints:i wi xiW

That’s it. Two lines of math, which we’ll see below translate to maybe five lines of Python.

Worked Example: Uncapacitated Facility Location

You have m candidate warehouse sites and n customers. Opening warehouse i costs fi. Serving customer j from warehouse i costs cij. Each customer must be served by exactly one open warehouse.

Variables:

  • yi ∈ {0, 1} = 1 if warehouse i is open.
  • xij ∈ [0, 1] = fraction of customer j‘s demand served from i (often also binary in assignment form).

Objective: minimize ∑i fi yi + ∑i,j cij xij

Constraints:

  • i xij = 1 for all j (each customer served fully)
  • xijyi for all i, j (can only ship from an open warehouse)

Notice the last constraint. The naive Big-M version would be ∑j xij ≤ M · yi — a single aggregated constraint per warehouse. Instead, the disaggregated xijyi gives one constraint per customer-warehouse pair. More constraints, but a dramatically tighter LP relaxation and far faster solves. This is a canonical example of why formulation matters.

How MIP Solvers Actually Work

Understanding the inside of a MIP solver is not just academic curiosity. It changes how you model, how you interpret solver logs, and why tiny-looking reformulations can swing solve time by 100x.

Branch and Bound

The core algorithm is branch and bound. Start by solving the LP relaxation (drop integrality requirements). If the LP solution happens to be integer, you are done. Otherwise, pick a fractional variable — say x = 2.7 — and create two subproblems: one with x ≤ 2, one with x ≥ 3. Solve each LP relaxation. Recurse. The tree of subproblems grows, but entire branches can be pruned by three rules:

  • Infeasibility: the LP of a subproblem has no feasible solution.
  • Bound dominance: the LP bound of a subproblem is worse than the best integer solution found so far (the incumbent). No solution in this branch can beat the incumbent.
  • Integer feasibility: the LP solution of a subproblem is already integer — update the incumbent if better.

Branch and Bound Tree x ≤ 2 x ≥ 3 y ≤ 3 y ≥ 4 y ≤ 3 y ≥ 4 x ≤ 1 x = 2 ROOT (LP relax) x=2.7, y=3.4 obj=18.2 Node A x=2, y=3.6 obj=17.4 Node B x=3, y=3.1 obj=17.8 A1: integer feas. x=2, y=3 obj=16 (incumbent) A2 x=1.5, y=4 obj=17.0 B1: INFEASIBLE pruned B2: bound 15.5 < 16, pruned A2a: bound 15.8 < incumbent, pruned A2b: OPTIMAL x=2, y=4 obj=17 ★ LP node (fractional) Integer feasible Infeasible (pruned) Bound-dominated (pruned) Optimal solution

Cutting Planes

Pure branch and bound can blow up quickly. The breakthrough that made modern MIP practical was cutting planes: additional linear inequalities added to the LP relaxation that are valid for all integer solutions but cut off the fractional LP optimum. Classical Gomory cuts, derived from the simplex tableau, were the first systematic cuts. Modern solvers apply dozens of families — mixed-integer rounding cuts, flow cover cuts, knapsack cover cuts, clique cuts, lift-and-project cuts, and many more. Combine cuts with branching and you get branch and cut, the dominant paradigm since the 1990s.

Heuristics Inside the Solver

A good upper bound (in a minimization) lets the solver prune aggressively. Modern solvers include sophisticated primal heuristics: the feasibility pump rounds the LP solution and projects back toward feasibility; RINS (Relaxation Induced Neighborhood Search) fixes variables that agree between the LP relaxation and the incumbent, then solves a smaller MIP in the remaining space; local branching defines a Hamming-distance neighborhood around the incumbent. These routinely find feasible solutions within seconds on problems that pure branch and bound might struggle with.

Presolve: The Secret Sauce

Before any branching, solvers run presolve — a suite of transformations that tighten bounds, eliminate redundant constraints, fix variables, detect implied integralities, and detect special structures like set covering or packing. On real-world models, presolve often shrinks problems by 30–70% before the first LP is solved. If Gurobi appears to solve your million-variable MIP instantly, presolve is usually why.

Warm Starts and Incumbents

If you have a feasible solution from a heuristic, a previous solve, or a human expert, feed it to the solver as a MIP start. The solver immediately has an incumbent for pruning, and the search focuses on proving optimality or improving from there. This single practice can turn a one-hour solve into a one-minute solve.

Python Implementation: Full Working Examples

We will use PuLP for simple examples and Pyomo for more advanced ones. Both are open-source, both switch between solvers easily. Install with pip install pulp pyomo. PuLP ships with the CBC solver by default.

Example 1: 0/1 Knapsack

from pulp import LpProblem, LpVariable, LpMaximize, lpSum, LpBinary, value

items = ['A', 'B', 'C', 'D', 'E']
weights = {'A': 2, 'B': 3, 'C': 4, 'D': 5, 'E': 9}
values  = {'A': 3, 'B': 4, 'C': 5, 'D': 8, 'E': 10}
capacity = 10

prob = LpProblem("Knapsack", LpMaximize)
x = LpVariable.dicts("item", items, cat=LpBinary)

# Objective: maximize total value
prob += lpSum(values[i] * x[i] for i in items)

# Constraint: total weight ≤ capacity
prob += lpSum(weights[i] * x[i] for i in items) <= capacity

prob.solve()

print(f"Status: {prob.status}")
print(f"Total value: {value(prob.objective)}")
for i in items:
    if x[i].value() > 0.5:
        print(f"  Take {i} (w={weights[i]}, v={values[i]})")

Running this prints items A, B, D, C (or whichever subset the solver finds) with total value 20 and weight 9. CBC handles it in milliseconds.

Example 2: TSP with MTZ Subtour Elimination

The Traveling Salesman Problem is the classic routing benchmark. The subtle challenge in a MIP formulation is to forbid subtours — disconnected loops. The Miller-Tucker-Zemlin (MTZ) formulation adds auxiliary order variables ui and the constraint uiuj + n · xijn − 1 for all i ≠ j (except node 0). MTZ is weaker than the exponential subtour elimination constraints but it fits in a compact formulation.

from pulp import LpProblem, LpVariable, LpMinimize, lpSum, LpBinary, LpInteger
import math, random

random.seed(42)
n = 8
coords = [(random.uniform(0, 100), random.uniform(0, 100)) for _ in range(n)]
d = [[math.hypot(coords[i][0]-coords[j][0], coords[i][1]-coords[j][1])
      for j in range(n)] for i in range(n)]

prob = LpProblem("TSP", LpMinimize)
x = [[LpVariable(f"x_{i}_{j}", cat=LpBinary) if i != j else None
      for j in range(n)] for i in range(n)]
u = [LpVariable(f"u_{i}", lowBound=0, upBound=n-1, cat=LpInteger) for i in range(n)]

# Objective: total distance
prob += lpSum(d[i][j] * x[i][j] for i in range(n) for j in range(n) if i != j)

# Each node entered and left exactly once
for i in range(n):
    prob += lpSum(x[i][j] for j in range(n) if j != i) == 1
    prob += lpSum(x[j][i] for j in range(n) if j != i) == 1

# MTZ subtour elimination (fix u[0] = 0)
prob += u[0] == 0
for i in range(1, n):
    for j in range(1, n):
        if i != j:
            prob += u[i] - u[j] + n * x[i][j] <= n - 1

prob.solve()
tour = [0]
cur = 0
for _ in range(n - 1):
    for j in range(n):
        if j != cur and x[cur][j].value() > 0.5:
            tour.append(j)
            cur = j
            break
print("Tour:", tour, "length:", prob.objective.value())

For 8 cities this is a toy; for 50–100 cities MTZ plus a good solver still works. Beyond that, practitioners use lazy subtour elimination callbacks — adding cuts only when violated — which scales to thousands of cities.

Example 3: Production Scheduling with Setup Times

We have 3 machines and 6 jobs. Each job must run on one machine. Each machine has a processing time per job and a setup time per (predecessor, job) pair. Minimize makespan (time when the last machine finishes).

from pulp import LpProblem, LpVariable, LpMinimize, lpSum, LpBinary, LpContinuous

jobs = list(range(6))
machines = list(range(3))
proc = {(j, m): 5 + ((j + m) % 4) for j in jobs for m in machines}
setup = {(i, j): 1 + ((i * 3 + j) % 3) for i in jobs for j in jobs if i != j}
BIG_M = sum(proc.values())

prob = LpProblem("SchedWithSetup", LpMinimize)

y = {(j, m): LpVariable(f"y_{j}_{m}", cat=LpBinary)
     for j in jobs for m in machines}          # job assignment
s = {j: LpVariable(f"s_{j}", lowBound=0, cat=LpContinuous) for j in jobs}  # start time
# z[i,j,m] = 1 if i precedes j on machine m
z = {(i, j, m): LpVariable(f"z_{i}_{j}_{m}", cat=LpBinary)
     for i in jobs for j in jobs if i != j for m in machines}
C_max = LpVariable("Cmax", lowBound=0, cat=LpContinuous)

# Each job on exactly one machine
for j in jobs:
    prob += lpSum(y[j, m] for m in machines) == 1

# Completion time ≤ makespan
for j in jobs:
    prob += s[j] + lpSum(proc[j, m] * y[j, m] for m in machines) <= C_max

# Disjunctive: if i and j both on machine m, one before the other
for i in jobs:
    for j in jobs:
        if i >= j:
            continue
        for m in machines:
            prob += z[i, j, m] + z[j, i, m] >= y[i, m] + y[j, m] - 1
            prob += s[j] >= s[i] + proc[i, m] + setup[i, j] - BIG_M * (1 - z[i, j, m])
            prob += s[i] >= s[j] + proc[j, m] + setup[j, i] - BIG_M * (1 - z[j, i, m])

prob += C_max                                   # minimize makespan
prob.solve()

print("Makespan:", C_max.value())
for m in machines:
    assigned = sorted([j for j in jobs if y[j, m].value() > 0.5],
                      key=lambda j: s[j].value())
    print(f"Machine {m}: " +
          " -> ".join(f"J{j}(s={s[j].value():.1f})" for j in assigned))

This is a miniature of real job-shop scheduling. Notice the Big-M disjunctive constraints — exactly the place where indicator constraints in Gurobi/CPLEX would be cleaner. On 6 jobs, CBC solves it in under a second; on 50 jobs it starts to struggle, and a commercial solver becomes valuable.

Example 4: Multi-Period Facility Location

from pulp import LpProblem, LpVariable, LpMinimize, lpSum, LpBinary, LpContinuous

warehouses = ['W1', 'W2', 'W3', 'W4']
customers  = ['C1', 'C2', 'C3', 'C4', 'C5', 'C6']
periods    = [1, 2, 3]

fixed_cost  = {'W1': 1000, 'W2': 1500, 'W3': 1200, 'W4': 900}
capacity    = {'W1': 80,   'W2': 120,  'W3': 100,  'W4': 70}
demand      = {(c, t): 15 + (hash((c, t)) % 10) for c in customers for t in periods}
ship_cost   = {(w, c): 2 + ((hash((w, c)) % 7)) for w in warehouses for c in customers}

prob = LpProblem("MultiPeriodFL", LpMinimize)

y = {(w, t): LpVariable(f"y_{w}_{t}", cat=LpBinary)
     for w in warehouses for t in periods}      # open warehouse w at time t
x = {(w, c, t): LpVariable(f"x_{w}_{c}_{t}", lowBound=0, cat=LpContinuous)
     for w in warehouses for c in customers for t in periods}

# Objective
prob += (lpSum(fixed_cost[w] * y[w, t] for w in warehouses for t in periods)
         + lpSum(ship_cost[w, c] * x[w, c, t]
                 for w in warehouses for c in customers for t in periods))

# Demand satisfaction
for c in customers:
    for t in periods:
        prob += lpSum(x[w, c, t] for w in warehouses) >= demand[c, t]

# Capacity & open-only-then-ship
for w in warehouses:
    for t in periods:
        prob += lpSum(x[w, c, t] for c in customers) <= capacity[w] * y[w, t]

# Commitment: once open, stay open (y non-decreasing)
for w in warehouses:
    for t in periods[:-1]:
        prob += y[w, t + 1] >= y[w, t]

prob.solve()
print("Total cost:", prob.objective.value())
for t in periods:
    opens = [w for w in warehouses if y[w, t].value() > 0.5]
    print(f"Period {t}: open => {opens}")

This pattern — binary open/close decisions, continuous flows, demand and capacity constraints, time-coupling — is the skeleton of countless supply-chain models, including Amazon’s and Walmart’s. At enterprise scale you’d add multi-echelon structure, stochastic demand, and thousands of SKUs, but the mathematical shape is the same.

Tip: For recurring jobs — like a nightly re-solve of a supply-chain model — orchestrate the pipeline with Apache Airflow so data ingestion, MIP solve, and result publishing are all versioned and retryable.

Solvers Compared: Open Source vs Commercial

Your solver choice can change your solve time by two orders of magnitude. Here is the lay of the land as of 2026.

Solver License Speed (relative) Best For
CBC Open source (EPL) 1x Default in PuLP, small/medium problems
GLPK Open source (GPL) 0.7x Teaching, tiny problems
HiGHS Open source (MIT) 3–5x Modern OSS default, fast LP
SCIP Academic/ZIB (free for research) 5–10x Research, mixed constraint/integer
Gurobi Commercial (free academic) 30–100x Industrial gold standard
CPLEX Commercial (free academic) 25–80x IBM ecosystem, enterprise
FICO Xpress Commercial 20–80x Finance, large models

 

The 10–100x advantage of commercial solvers over CBC is real. It comes from decades of cutting-plane engineering, better presolve, parallel branch and bound, and tuned heuristics. If you are solving MIPs for a living, a Gurobi or CPLEX license pays for itself in the first serious project. For academics, both vendors offer free licenses; researchers have no excuse not to try them.

If you prefer solver-agnostic code, use Pyomo (SolverFactory('gurobi'), SolverFactory('cbc'), SolverFactory('highs')) or python-mip. PuLP also supports multiple backends but with a thinner abstraction.

Real-World Applications

The abstract math only feels exciting when you see where it shows up. Below are domains where MIP runs the world.

MIP in the Wild: Ten Domains MIP engine Airline Crew Scheduling AA, Delta: $100M+/yr savings Vehicle Routing UPS ORION: 100M miles/yr saved Facility Location Supply chains, warehousing Manufacturing Job shop, lot sizing Sports League Scheduling MLB, NBA (CMU research) Healthcare Rostering Nurse/doctor scheduling Portfolio Optimization Cardinality, round-lot Telecom Network Design Capacity & routing Energy Grid Unit Commitment PJM, ERCOT day-ahead Retail Assortment Inventory + shelf space

Airline Crew Scheduling

Every major airline solves two massive MIPs daily: crew pairing (sequences of flights that form a round trip) and crew rostering (assigning pairings to specific pilots/flight attendants with rest, qualification, and base constraints). Sabre, American, Delta, and United collectively attribute hundreds of millions of dollars in annual savings to optimization. The models have millions of variables and rely heavily on column generation — a decomposition where new columns (pairings) are priced in on demand rather than enumerated upfront.

UPS ORION

ORION (On-Road Integrated Optimization and Navigation) re-optimizes delivery routes for 55,000+ drivers. It fuses MIP with heuristics because true VRP with time windows at that scale is brutal. The reported savings: 100M miles/year, 10M gallons fuel, 100K tonnes CO2, $300–400M/year. Few software projects can claim that kind of impact.

Energy Grid Unit Commitment

Regional transmission operators like PJM (serving 65M people across the US East) solve unit commitment MIPs to decide which generators to start/stop and at what output for every hour of the next day. Binary variables capture on/off, integer variables capture startup sequences, continuous variables capture MW output. A single solve handles thousands of units with ramp, minimum up/down, and reserve constraints, and it runs in under 20 minutes. Electricity market clearing prices literally emerge from the dual variables of these MIPs.

Healthcare Staff Scheduling

The nurse rostering problem is legendary in the OR literature. Every hospital has idiosyncratic rules: max consecutive nights, minimum rest, skill mix per shift, fairness, preferences. MIP is the workhorse, often combined with constraint programming for the pure feasibility parts.

Sports League Scheduling

Carnegie Mellon researchers have built MLB and NBA schedules for years using MIP. Constraints include travel distance, venue availability, TV windows, traditional rivalries, and competitive balance. Sports scheduling is a beloved test bed because the constraints are crisp and the benefits (TV revenue, fan experience) are tangible.

Portfolio Optimization with Discrete Constraints

Pure mean-variance portfolio optimization is a QP — no integers. Real portfolios, however, often demand cardinality constraints (“hold at most 40 names”) and round-lot constraints (“buy shares in multiples of 100”). These require binaries and integers, turning the problem into a MIQP. LP/QP alone cannot model them; you need MIP.

Others Worth Naming

Telecom network design (backbone capacity, protection routing), manufacturing job-shop scheduling (plus the related lot-sizing and assembly-line balancing), retail assortment and inventory optimization, chip-design floorplanning, railway crew and rolling-stock scheduling, waste collection routing, and even protein design and kidney-exchange matching. The last one is quietly heroic: kidney-exchange programs in the US and UK use MIP to match donor-recipient pairs in cycles and chains, saving lives every week.

Domain Typical vars Typical constraints Typical solve
Airline crew rostering 1M–10M 100K–1M Hours (column gen)
Unit commitment 100K–500K 500K–2M 10–20 minutes
Multi-echelon supply chain 50K–500K 50K–500K Minutes
Job shop scheduling 10K–100K 50K–500K Seconds to minutes
Portfolio with cardinality 1K–10K 1K–20K Seconds
Nurse rostering 10K–50K 20K–100K Minutes

 

Practical Tips and Common Pitfalls

Experience with MIPs is mostly pattern recognition. Here is what practitioners learn the hard way.

Prefer Tight Formulations Over Compact Ones

When in doubt, write more constraints if it tightens the LP relaxation. The facility-location example earlier — using xijyi (O(mn) constraints) instead of ∑j xij ≤ M · yi (O(m) constraints) — is the canonical lesson. The disaggregated form looks bloated but solves 10–100x faster.

Choose Big-M Carefully, or Don’t Use It

Always pick the smallest valid M. If the quantity is a time, M might be the makespan upper bound (sum of all processing times). If it’s a flow, M is the capacity. In Gurobi, CPLEX, and newer versions of SCIP, use indicator constraints (model.addGenConstrIndicator in gurobipy). They are numerically safer and often tighter.

Set MIP Gap and Time Limits

In business, proving the last 0.1% of optimality is rarely worth 10 hours of compute. Set a MIP gap tolerance (e.g., 1–5%) and a time limit. Most solvers will return the best feasible solution found with a verified bound when either condition hits.

# In PuLP with CBC
solver = pulp.PULP_CBC_CMD(timeLimit=300, gapRel=0.02, msg=True)
prob.solve(solver)

# In Pyomo with Gurobi
from pyomo.environ import SolverFactory
opt = SolverFactory('gurobi')
opt.options['TimeLimit'] = 300
opt.options['MIPGap'] = 0.02
opt.solve(model, tee=True)

Warm Start From a Heuristic

Get any feasible solution first — a greedy assignment, a previous day’s plan, a quick metaheuristic — and pass it as a MIP start. Incumbent-driven pruning is the single largest speedup you can get for free.

Decomposition for Huge Problems

When a monolithic MIP gets too big, decompose. Benders decomposition splits into a master problem (discrete decisions) and subproblems (continuous given the discrete choice), iterating with cuts. Dantzig-Wolfe decomposition and column generation handle problems with natural block structure (airline pairings, cutting stock). Lagrangian relaxation relaxes coupling constraints with penalty multipliers. Modern solvers automate some of this, but for the truly big problems you still hand-decompose.

Read the Solver Log

Solver logs tell a story: initial LP bound, first primal solution, rate of gap closure, cuts applied, node count, parallel thread usage. If the gap is stuck after 80% of your time limit, you likely need a tighter formulation or a better heuristic, not a bigger machine.

Caution: Do not mix units. Mixing variables in the range [0, 1] with coefficients in the range [0, 1e7] gives the solver numerical nightmares. Scale everything into reasonable ranges (1e-3 to 1e3 ideally). Bad scaling is the single most common cause of “Gurobi says infeasible but I’m sure it’s feasible.”

MIP vs Alternatives: GA, CP, RL

MIP is powerful but not universal. Knowing when to reach for something else is the mark of a seasoned modeler. See our companion post on Genetic Algorithms for the black-box counterpart.

MIP vs Genetic Algorithms

GA is a metaheuristic: it evolves a population of candidate solutions using selection, crossover, and mutation. It handles black-box fitness functions, arbitrary nonlinearity, and doesn’t require explicit constraints. But it gives no optimality guarantee. Use GA when your objective or constraints are wildly nonlinear, when evaluating a candidate is a simulation, or when you cannot write a linear formulation. Use MIP when you can and you want a provable optimum (or bounded gap).

MIP vs Constraint Programming

Constraint Programming (CP) excels at pure feasibility and scheduling problems with complex logical structure (e.g., disjunctive scheduling with hundreds of global constraints like AllDifferent or Cumulative). CP doesn’t need linearity and handles logic elegantly. MIP wins when the objective is a linear cost and you benefit from strong LP-based bounds. Some hybrid solvers (like Google OR-Tools CP-SAT) blur the line beautifully.

MIP vs Reinforcement Learning

RL learns a policy that maps state to action, typically for sequential decision problems under uncertainty. MIP solves a single deterministic instance to optimality. They attack different problems. You might use MIP to solve tomorrow’s nominal plan and an RL policy to react to disruptions in real time, trained offline on thousands of perturbed MIP solutions.

Criterion MIP GA CP RL
Optimality guarantee Yes (bounded gap) No Yes No
Needs linear structure Yes No No No
Best on pure discrete logic Good OK Excellent Poor
Best on continuous + discrete Excellent OK Weak OK
Real-time decisions (ms) Rarely Maybe Sometimes Yes
Requires training data No No No Yes
Handles uncertainty natively No (needs stochastic MIP) No No Yes

 

MIP composes well with other methods. Demand forecasts from time-series models feed MIP inputs. Solutions are stored in specialized databases — see our time-series database comparison. And when models are deployed to production systems that also run classifiers like one-class SVMs for anomaly detection, or graph models like Graph Attention Networks for relational features, MIP ties the optimization layer together. Clean engineering practice matters here: write solver code with good clean-code principles and version it properly with Git best practices.

Frequently Asked Questions

When does MIP vs LP actually matter?

The moment you have a decision that is inherently yes/no or integer — opening a facility, assigning a worker, buying a discrete number of machines — LP alone cannot model it correctly. Rounding LP solutions is almost never safe. If all your decisions are continuous quantities like liters, dollars, or percentages, LP is fine and vastly faster. If any are binary or integer, you need MIP.

Should I use Gurobi or stick with CBC?

Start with CBC (free, ships with PuLP) to prototype. If your problem solves in seconds and you are not under time pressure, CBC is plenty. If you see solve times creeping into minutes or hours on problems that matter to the business, a Gurobi or CPLEX license typically pays for itself many times over. Academic users get both for free. HiGHS is a modern OSS middle ground that has closed a lot of the gap for many problem classes.

How big a MIP can solvers handle?

Modern solvers routinely handle millions of variables and constraints on ordinary servers. What matters more is the structure: highly symmetric or badly formulated problems with 10,000 variables can be harder than well-formulated problems with 1,000,000. Airline crew problems with billions of potential columns are solved daily via column generation. Rule of thumb: if presolve shrinks your model by 50%+, you are likely fine; if it doesn’t, expect pain.

MIP vs Genetic Algorithm — which should I use?

If you can write linear constraints and a linear objective, MIP gives a provable optimum and typically solves faster than a well-tuned GA on the same problem. If your objective requires a black-box simulator, has wild nonlinearities, or changes shape frequently, GA or other metaheuristics are a better fit. They can also be combined: use a GA to quickly find a good feasible solution and feed it as a MIP start.

Can MIP solve scheduling problems with thousands of tasks?

Yes, but usually with decomposition. Pure monolithic MIPs on 10,000+ tasks with intricate constraints tend to be impractical. Practitioners decompose by day, by machine group, or by crew. Hybrid approaches — MIP for the macro assignment, CP or local search for the detailed sequencing — are common. Google OR-Tools CP-SAT also handles very large scheduling with embedded SAT technology that sometimes outperforms MIP on pure scheduling problems.

Tip: Many teams find that the single biggest win comes not from a better solver, but from hiring one engineer who can reformulate a weak MIP into a strong one. Formulation skill still beats brute force in 2026.
Related Reading:

References

This post is for informational and educational purposes only; it is not investment, engineering, or business advice.

You Might Also Like

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *