Linear Programming

Lecturers: David Karger and Erik Demaine
Scribe: Grant Wang

October 1, 2003

Motivation

Linear programming is both an important tool in the construction of algorithms, and an important tool in proofs. Many results we have seen in this class can be cast in the framework of linear programming. For instance, the fact that the value of the maximum-flow of a graph is equal to the value of the minimum cut is purely a result of strong duality. In the next two weeks, we will go over some basics of linear programming, some algorithms for solving linear programs, and how to use linear programming to design approximation algorithms for NP-hard problems.

Introduction

A linear program is a set of linear inequalities and an objective function together. Each linear inequality takes the form: \[a_1 x_1 + a_2 x_2 + \dots a_n x_n \{\leq,=,\geq \} b.\] We may have multiple linear inequalities (say \(m\)). The objective function takes the form: \[c_1 x_1 + c_2 x_2 + \dots c_n x_n.\] The goal is to find an assignment of values to \(x_1, x_2 \dots x_n \in \mathbb{R}\) such that the objective function is maximized over all assignments to \(x_1, x_2 \dots x_n \in \mathbb{R}\) that obey each of the inequalities.

Now, imagine we had an algorithm for solving linear programs, i.e. an algorithm that given the inequalities and an objective function as input, outputs \(x_1, x_2 \dots x_n \in \mathbb{R}\) that maximizes the objective function over all assignments obeying the inequalities. To get an idea of the algorithmic power of such an algorithm, consider the following formulation of the minimum-cost circulation problem on a graph \(G=(V,E)\).

\[\begin{aligned} \textrm{minimize } & \sum_{v,w \in E} c_{vw} f_{vw} &\\ \textrm{subject to } & f_{vw} \leq u_{vw} & \forall v,w \in E\\ & \sum_{w} f_{vw} = 0 & \forall v \in V\\ & f_{vw} = -f_{wv} & \forall (v,w) \in E\\\end{aligned}\]

Here, the variables are \(f_{vw}\) which represents the flow on an edge \((v,w)\). The objective function says that we are minimizing the cost of the flow, and the linear inequalities state that the flow is no more than the capacity, flow is conserved at every node, and that positive flow going one way is negative flow going the other way across an edge. Given an algorithm for solving linear programming, we can instantly solve the minimum-cost circulation problem.

Definitions

We will represent the variables by \(x = (x_1, x_2, \dots, x_n) \in \mathbb{R}^n\). Then we have the following definitions:

Definition: \(x \in \mathbb{R}^n\) is feasible for an LP if it satisfies all the constraints.

Definition: \(x \in \mathbb{R}^n\) is optimal if it is feasible and optimizes the objective function over feasible \(x\).

Definition: An LP is feasible if there exists a feasible vector \(x\) for it.

Definition: An LP is unbounded if there exists a feasible vector \(x\) with arbitrarily good objective value.

The following lemma characterizes any LP:

Lemma: Every LP is either infeasible, has an optimal solution, or is unbounded.

The proof follows by the compactness of \(\mathbb{R}^n\), and because polytopes are closed sets.

We generally write linear programs in two forms: canonical form or standard form.

Definition: An LP is in canonical form if it has the form: \[\begin{aligned} \textrm{minimize } & & c^T x\\ \textrm{subject to} & & Ax \geq b \end{aligned}\] where \(c \in \mathbb{R}^n\) is the objective function, and each row of \(A\) is a constraint, e.g. \(a_i^T x \geq b_i\).

Can we always convert to canonical form? Consider an arbitrary LP; the following rules convert it to canonical form:

If it’s a maximization problem, negate \(c\) and minimize.
If a constraint is \(\leq\), negate it and convert it to \(\geq\).
If a constraint is \(=\), add two constraints with \(\leq\) and \(\geq\).

We also have standard form. The choice of these names leaves a bit to be desired...

Definition: An LP is in standard form if it has the form: \[\begin{aligned} \textrm{minimize } & & c^T x\\ \textrm{subject to} & & Ax = b\\ && x \geq 0 \end{aligned}\]

We can also convert any LP to standard form. If there is any inequality, we can add a slack variable. That is, if the inequality is: \(a^T x \leq b\) we can convert this to \(a^T x + s = b\). Note that the slack variables \(s\) are included in the vector of new variables \(x'\) of the new LP. To ensure that \(x' \geq 0\), we can do the following: for every variable \(x_i\) in the old LP which are unrestricted in sign, replace it with \(x^+ - x^-\), then, \(x_i\) can take on any value even when \(x^+, x^- \geq 0\), and include \(x^+\), \(x^-\) in the vector of new new variables \(x'\) of the new LP. Note that this will usually make the optimum solution in the new LP not unique. If \(x^+=3\), \(x^-=0\) is an optimum solution, then so is \(x^+=4\),\(x^-=1\). The reason for two forms is that canonical form is often useful for theory, while standard form is useful in practice.

Geometry

Since we are looking for a vector \(x \in \mathbb{R}^n\) that obeys certain linear constraints that maximizes some linear objective function, we might expect that the geometry of an LP would give us some intuition. Indeed, this is the case:

Any equality of the form \(a^T x = b\) is a hyperplane.
Any inequality of the form \(a^T x \leq b\) is a halfspace.
For the standard form of \(Ax = b, x\geq 0\), the intersection of hyperplanes is an affine subspace.
For the canonical form of \(Ax = b, x \geq 0\), the intersection of halfspaces is a polyhedron.

Thus, the cast of optimizing a linear objective function over all vectors satisfying linear inequalities is the problem of finding a point in a polyhedron that is furthest in the direction specified by the objective function.

The issue of convexity also comes up in LP’s. Recall that a set \(K \subseteq \mathbb{R}^n\) is convex if for all \(x,y \in K\) and for all \(\lambda \in [0,1]\), \(\lambda x + (1-\lambda) y \in K\). That is, any point on the line segment between \(x\) and \(y\) is in \(K\).

Note that a halfspace is convex: if \(a^T x \geq b\) and \(a^T x' \geq b\), then we have: \(\lambda a^T x \geq \lambda b\) and \((1-\lambda)a^T x' \geq (1-\lambda)b\). Adding up these two inequalities gives us the desired result. Since an intersection of convex sets is convex, polyhedra are convex.

Separating hyperplanes

Given a point \(x\), we can show that \(x\) is in the polyhedra \(P\) defined by the inequalities by showing that \(x\) satisfies each inequality. But if \(x \notin P\), can we show something similar? It turns out we can, and this is the notion of separating hyperplanes: for any convex set \(P\) and a point \(z \notin P\), there exists \(a \in \mathbb{R}^n, b \in \mathbb{R}\) such that:

\[\begin{aligned} a^T x \leq b & \forall x \in P\\ a^T z > b\end{aligned}\]

We prove that such a pair \(a,b\) exist.

Proof. Let \(y\) be the closest point in \(P\) to \(z\), and let \(a\) be the line from \(y\) to \(z\). Then \(a^T y < a^T z\). We argue that for all \(x \in P\), \(a^T x \leq b\), where \(b = a^T y\). Suppose by way of contradiction that this is not the case, i.e. there is some \(x \in P\) such that \(a^T x > b\). Since any point on the line segment from \(x\) to \(y\) is in \(P\), we can show by taking the derivative of the function \[f(\lambda) = ||\lambda x + (1-\lambda) y - z||^2\] with respect to \(\lambda\) that the first derivative is negative at \(0\). This contradicts the fact that we took \(y\) to be the minimum distance point in \(P\) to \(z\), and we have our desired contradiction. ◻

Structure of optima

A polyhedra has uncountably infinite points – so where do the optima lie? Let us start with a few definitions:

Definition: A point \(p \in P\) is a vertex if it is uniquely optimal for some objective function \(c\).

Definition: A point \(p \in P\) is an extreme point if it is not a convex combination of two other points in \(P\).

We show here that every vertex is an extreme point; we will show the other direction in a future lecture.

Proof. Suppose \(p\) is uniquely optimal for the objective function \(c\), but that \(p\) is not an extreme point, i.e. it is equal to \(p = \lambda x + (1-\lambda)y\) for some \(x,y \in P\).

But since \(p\) is between \(x,y\), it follows that \(c^T p\) is between \(c^T x\) and \(c^T y\), i.e:

\[c^T p = c^T (\lambda x + (1-\lambda) y) = \lambda c^T x + (1-\lambda) c^T y\]

But since \(p\) is optimal, it must be that \(c^T x, c^T y \geq c^T p\), and furthermore, since \(p\) lies on the line between \(x,y\), \(c^T p = \lambda c^T x + (1-\lambda) c^T y \leq max(c^T x, c^T y)\), hence it is either the case that \(c^T x = c^T p\) or \(c^T y = c^T p\), which contradicts uniqueness. ◻

We end with a few other definitions, which we will pursue in future lectures.

Definition: A constraint \(a^T x \leq b\) or \(a^T x = b\) is tight if \(a^T x = b\).

Definition: A vector \(x \in \mathbb{R}^n\) is basic for an LP if

all equality constraints are tight
\(n\) linearly independent constraints are tight

Another way of thinking about this is that if \(x\) is at an intersection of the boundaries of \(n\) linearly independent constraints, then \(x\) is the unique intersection. This is the intuition behind a basic solution.