Duality
Last Time
Introduced the definition of linear programs (LPs).
Standard form LP: \(\min c^Tx\), s.t., \(Ax=b, x\geq 0\).
Analyzed the structure of optimal solutions of such LP.
Proved that there is always a basic feasible solution (BFS) and its bit complexity is not too large.
\(\Rightarrow\) The decision problem \(LP-Dec\): for a given \(v\), does there exists \(x\) s.t. \(c^Tx\leq v, Ax=b, x\geq 0\), is in \(NP\). (If such \(x\) exists, then the optimal BFS \(x^*\) is a succinct certificate that can be checked in polynomial time.)
How about \(LP-Dec\in co-NP\)? Is there always a succinct certificate of infeasibility of an LP?
Duality
Checking feasibility of a given solution is easy. Motivating question: How to certify optimality of a given solution?
Key notion: Duality - strongest result in the theory of LP.
Enable one to give a (succinct) proof of optimality.
Recovers a lot of classic results: max-flow min-cut theorem, prices for min-cost flow, existence of a Nash equilibrium in every two-player zero-sum game, lots other stuff.
Want: Find a lower bound on \(v^*:=\min \set{c^Tx \mid Ax=b, x \ge 0}\).
Standard approach: try adding up linear combinations of existing equations.
\(\Rightarrow\) Try multiplying each constraint \(A_i x = b_i\) by some \(y_i\) and adding all of them together.
\(\Rightarrow\) We will always have that \(yAx=yb\).
If find \(y\) s.t. \(y^TA=c^T\), then \(y^Tb=y^TAx=c^Tx\). So, every feasible solution \(x\) has the same value \(v^*\)!
But, in general, cannot hope that such \(y\) exists - \(c\) might not be in the span of \(A^T\).
Instead, let us focus on a looser goal: find \(y\) s.t. \(y^TA \le c^T\).
\(\Rightarrow\) We can lower bound \(v^*\) by noting that \(y^T b= y^TAx \le c^Tx\), for any feasible \(x\). (The inequality holds as \(x\geq 0\).)
How about getting the best lower bound to be gotten this way?
Write an LP for that!
Dual LP: \(\max b^Ty\) s.t. \(A^Ty\leq c\). (Note that the dual LP is defined in a completely syntactic way - no thinking involved!)
Let \(w^*:=\max \set{b^Ty \mid A^Ty \le c}\).
We just have shown weak duality: \(w^*\leq v^*\).
Important observation (and a sanity check): The dual of the dual LP gives the primal LP.
Let the primal LP be \(\min c^Tx\) s.t. \(Ax=b, x\ge 0\).
Its dual LP is \(\max b^Ty\) s.t. \(A^Ty\leq c\).
Let us transform the latter LP into a standard form: \[\max b^Ty \text{ s.t. } A^Ty \le c\] \(\Rightarrow\) \[- \min -b^Ty \text{ s.t. } A^Ty + I s = c, s\geq 0\] \(\Rightarrow\) \[- \min (-b)^T(y^+-y^-) \text{ s.t. } A^Ty^+- A^Ty^-+ I s = c, s, y^+, y^-\geq 0\] \(\Rightarrow\) In matrix form, this LP dual becomes \[\begin{aligned} -\min \left( \begin{array}{ccc} -b&b&0 \end{array} \right) \left( \begin{array}{c} y^+\\y^-\\s \end{array} \right) \text{ s.t. } &&\left(\begin{array}{ccc} A^T&-A^T&I \end{array} \right) \left(\begin{array}{c} y^+\\ y^-\\ s \end{array} \right) = c\\ &&y^+,y^-,s \ge 0\\\end{aligned}\]
The dual of the above LP is then: \[-\max c^T z \text{ s.t. } \left(\begin{array}{ccc} A^T&-A^T&I \end{array} \right)^T z \le \left( \begin{array}{ccc} -b&b&0 \end{array} \right)^T\\ \] \(\Rightarrow\) \[\min -c^T z \text{ s.t. } A(-z)=b, z\leq 0\] \(\Rightarrow\) Setting \(x=-z\), we get \[\min c^T x \text{ s.t. } Ax=b, x\geq 0,\] as desired.
A simple corollary of weak duality and the above observation: If \(P\) is the primal LP and \(D\) is the dual LP then either
Both \(P\) and \(D\) are feasible and bounded.
if \(P\) is feasible, \(D\) either infeasible or bounded.
If \(P\) is feasible and unbounded (i.e., \(v^*=-\infty\)), \(D\) infeasible.
(Similarly, if \(D\) feasible and unbounded (i.e., \(w^*=\infty\)), \(P\) is infeasible.)
In fact, together with strong duality (see below), this implies that we can have only four possibilities: both \(P\) and \(D\) bounded and feasible, one of them unbounded and one of them infeasible,, both of them infeasible.
Strong Duality
Weak duality described below is just a tip of an iceberg.
Strong duality (key theorem of LP): If
\(P\) and \(D\) feasible then \(v^*=w^*\).
“Proof” by picture:
Consider a “dual” problem \(\min b^Ty\) s.t. \(A^Ty \ge c\), which we obtain by flipping the sign of \(y\) in our above dual problem formulation.
Suppose \(b\) points straight up.
Imagine that \(y\) is the position of a ball that falls down subject to gravity and the constraints correspond to “floors” that limit how far the ball can fall.
\(\Rightarrow\) Minimizing \(b^T y\) subject to our constraints corresponds here exactly to minimizing the height of the ball given the limitations posed by the floor.
\(\Rightarrow\) Let \(y^*\) be the minimum height position the ball settles in (according to physics). This is also the optimal solution for our dual LP.
Why does the ball stay in position \(y^*\)?
The forces exerted on the ball by the floors are canceling the “gravity” \(-b\).
\(\Rightarrow\) Force exerted by the floor \(i\) has to be normal (i.e., perpendicular) to the surface of that floor. That is, it is equal to \(A_i x_i\), for some scaling factor \(x_i\), where \(A_i\) is the \(i\)-th column of \(A\).
Equilibrium condition: \(\sum_i A_i x_i = b\). That is, \(Ax=b\).
Also, the force exerted by the floors can be only pushing the ball away. So, \(x_i\geq 0\), for each \(i\).
\(\Rightarrow\) \(x\geq 0\).
In other words, \(x\) is feasible (but not necessarily optimal) for the “primal” LP \(\min c^x\) s.t. \(Ax=b, x\geq 0\).
Furthermore, physics tells us that the floors that do not touch the ball, i.e., all floors \(i\) with \(A_i^T y^{*} >c_i\), do not exert any force. So, \(x_i=0\) for them.
\(\Rightarrow\) Compact way of writing that: \((c_i-A_i^Ty^{*})x_i=0\), for all \(i\). (This is complementary slackness condition!)
Thus we have that \(c^T x = \sum_i (A_i^T y^{*}) x_i = x^T A^T y^{*} = (Ax)^Ty^{*}=b^Ty^{*}\).
So, by weak duality, \(x\) is indeed optimal and \(v^*=c^Tx=b^Ty^*=w^*\).
Formal proof:
Consider the optimum solution \(y^*\) to the \(\min b^Ty\) s.t. \(A^Ty \ge c\). (The sign of \(y\) is still flipped.)
Let us choose a subset \(S\) of constraints that are tight, i.e., \(A_i^Ty^*=c\), and linearly independent. (Recall that \(A_i\) is the \(i\)-th column of \(A\).)
Note that \(|S|\) is at most \(m\), where \(m\) is the dimension of \(y^*\).
Let \(A_S\) and \(c_S\) be the matrix \(A\) and the vector \(c\) after dropping the rows/entries corresponding to the constraints not in \(S\).
\(\Rightarrow\) The matrix \(A_S^T\) is of full column rank and \(A_S^T y^*=c_S\).
Also, we still have that \(b^Ty^*=w^*=\min b^Ty\) s.t. \(A_S^Ty\geq c_S\).
Thus, if we prove strong duality wrt to \(A_S\), i.e., that there exists \(x_S^*\) s.t. \(A_Sx_S^*=b\), \(x_S^*\geq 0\), and \(c_S^T x_S^*=b^T y^*=w^*\), then we recover strong duality for the original LP by just considering \(x^*\) being \(x_S^*\) with all the coordinates not in \(S\) padded with zeros. Specifically, we will have that \(c^Tx^*=c_S^Tx_S^*=w^*\), \(Ax^*=A_Sx_S^*=b\) and \(x^*\geq 0\), as desired.
So, let us drop the reference to \(S\) and assume from now on wlog that \(A^T\) is of full column rank and \(A^Ty^*=c\).
Claim: There exists \(x^*\) such that \(Ax^*=b\).
Suppose not? Then, by the “duality” for linear equalities we considered earlier, tells us there exists a vector \(z\) with \(A^Tz=0\) but \(b^Tz\ne 0\).
Wlog assume \(b^Tz<0\). (Else, we just take \(-z\).)
Consider a solution \(y'=y^*+z\).
It is feasible since \(A^Ty'=A^Ty^*+A^Tz=A^Ty^*\).
Also, we have that \(b^Ty'=b^T(y^*+z)=b^Ty^*+b^Tz<b^Ty^*=w^*\). Solution \(y'\) is strictly better than the optimum. Contradiction!
(This formalizes our idea that if the forces are not in balance, the ball will fall further down.)
Claim: We have that \(b^Ty^*=c^Tx^*\).
We know that \(Ax^*=b\) and \(A^Ty^*=c\).
So \(b^Ty^*=(Ax^*)^Ty^*=(A^Ty^*)^Tx^*=c^Tx^*\).
Claim: It is the case that \(x^* \ge 0\).
(Formalizes the intuition that barriers have to push ball not pull it.)
Suppose not, i.e., some \(x_i^* < 0\).
Let \(c'=c+e_i\), where \(e_i\) is a vector with all-zeros except a \(1\) at coordinate \(i\). (This corresponds to tightening the \(i\)-th constraint, i.e., moving it “upwards”.)
Consider solution to \(A^T y'=c'\). (Such a solution has to exist as \(A^T\) has a full column rank.)
Clearly, \(c' \ge c\), and thus \(A^Ty'=c'\geq c\).
\(\Rightarrow\) \(y'\) is feasible for the original constraints \(A^Ty\ge c\).
The value of the objective is \(b^Ty'=(Ax^*)^Ty'=(A^Ty')^Tx^*=(c')^Tx^*\).
We assumed \(x_i^* < 0\), and increased \(c_i\).
So \(b^Ty'=(c')^Tx^* < c^Tx^* =b^Ty^*\).
Again, got a feasible solution \(y'\) with a better value than optimum. Contradiction!
Intuition: \(x_i^*\) is telling us how much the value of the optimum will change if we “tighten” \(i\)-th constraint.
\(\Rightarrow\) Making constraint tighter should increase opt (as it corresponds to a minimization problem).
\(\Rightarrow\) \(x_i^*\) should be thus non-negative.
Interesting corollary (of strong duality): For LPs, finding a solution that is merely feasible is algorithmically equivalent to finding a solution that is optimal.
Given an LP optimizer, we can find a feasible solution by optimizing with an arbitrary cost vector.
Given an LP feasible solution finding algorithm, can compute the optimal solution by combining primal and dual. Specifically, suffices to find a feasible solution to the following set of LP constraints: \[\{ Ax=b, x\geq 0, c^Tx=b^Ty, A^Ty\leq c\},\] with \(x\) and \(y\) being the variables.
Rules for Taking Duals
So far we defined a dual LP only when the primal LP was in standard form.
Since every LP can be converted to standard form, this was sufficient.
Still, having to go through the standard form to figure out the dual is a bit painful.
Thus, below, we provide a general rules for taking a dual of an arbitrary LP.
Consider a primal LP to be as follows: \[\begin{aligned} v^* &= &\min c_1x_1+c_2x_2+c_3x_3\\ A_{11}x_1+A_{12}x_2+A_{13}x_3 &= &b_1\\ A_{21}x_1+A_{22}x_2+A_{23}x_3 &\ge &b_2\\ A_{31}x_1+A_{32}x_2+A_{33}x_3 &\le &b_3\\ x_1 &\ge &0\\ x_2 & \le &0\\ x_3 &&UIS,\end{aligned}\] Here, we split the cost vector \(c\) and the variables \(x\) into the parts corresponding to different sign constraints. (UIS emphasizes the fact that the variables are unrestricted in sign.) Also, every submatrix of \(A\) corresponds to a different combination of the sign of the variable and the constraint.
The corresponding dual LP is: \[\begin{aligned} w&=&\max y_1b_1+y_2b_2+y_3b_3 \\ A_{11}^Ty_1+A_{21}^Ty_2+A_{31}^Ty_3 &\le &c_1\\ A_{12}^Ty_1+A_{22}^Ty_2+A_{32}^Ty_3 &\ge &c_2\\ A_{13}^Ty_1+A_{23}^Ty_2+A_{33}^Ty_3 &= &c_3\\ y_1 &&UIS\\ y_2 &\ge &0\\ y_3 &\le &0\end{aligned}\]
In general, variable corresponds to constraint (and vice versa):
PRIMAL minimize maximize DUAL \(\ge b_i\) \(\ge 0\) constraints \(\le b_i\) \(\le 0\) variables \(= b_i\) UIS \(\ge 0\) \(\le c_j\) variables \(\le 0\) \(\ge c_j\) constraints UIS \(= c_j\)
Some observations:
Adding primal constraints creates a new dual variable: more dual flexibility.
The tighter the primal constraint, the looser the dual variable (and vice versa). In particular, equality primal constraints introduces a dual variable with an unrestricted sign.
If constraint is in “natural” direction opposing the minimization/maximization, dual variable is positive (and vice versa).
Insights on Specific Problems via LP Duality
The derivation of the LP dual is completely automatic/”brain-dead”.
Still, understanding what problem the dual LP captures can bring a lot of non-trivial insight into the problem corresponding to the primal LP.
This is the real power of LP duality!
Nonetheless: Knowing the actual optimal dual solution may be useless for finding the primal optimum solution!
\(\Rightarrow\) More formally: if your algorithm runs in time \(T\) to find primal optimal solution given an optimal dual solution, it can be adapted to run in time \(O(T)\) and find the primal optimal solution without having to know the optimal dual solution!
\(\Rightarrow\) (See the problem set.)
Let us take a look at the strong duality “in action” on a couple of example problems we studied before.
Shortest Paths Problem
Let us formulate shortest paths as a “dual” (i.e., maximization) LP: \[\begin{aligned} w^*&=&\max d_t-d_s\\ \text{s.t.} && \\ d_j-d_i &\le &c_{e} \quad \quad\quad\quad\quad \text{ $\forall_{e=(i,j)}$}\end{aligned}\]
The constraint matrix \(A\) has one row per each edge and one column per each vertex.
The solution \(d\) corresponds to embedding of vertices on the real line.
\(\Rightarrow\) All the constraints and the cost vector are “shift-invariant”, i.e., shifting each \(d_i\) by the same value does not change anything.
\(\Rightarrow\) We can assume wlog \(d_s=0\) and then \(d_i\) is just the distance of vertex \(i\) from vertex \(s\) in this line embedding.
Each constraints is just a triangle inequality imposed by edge \(e\).
Optimal solution sets \(d_t\) to be the shortest \(s\)-\(t\) path distance with all the constraints corresponding to the edges on this path being tight.
What is the dual LP?
One variable \(y_e\) per each edge constraint. As these are upperbounding constraints, the variables \(y_e\) are non-negative (see the correspondence table above - note that our dual LP here is “primal” from the point of view of the table as our primal LP corresponds to maximization).
One constraint per each variable \(d_i\). Since variables were unbounded, the constraint is an equality constraints. (Again, see the table above.)
End result: \[\begin{aligned} v^*&=&\min \sum_e c_e y_e\\ \text{ s.t.} & &\\ \sum_{j} \left(y_{ji}-y_{ij}\right) & = & {\begin{cases} -1 & \text{if $i=s$}\\ 1 & \text{if $i=t$}\\ 0 & \text{otherwise} \end{cases}}\\ y &\ge &0\\\end{aligned}\]
So, the dual of the shortest path problem is the task of sending one unit of flow from \(s\) to \(t\) while minimizing the cost (and having no capacity constraints).
This makes a lot of sense and might seem not too profound but the key point is: we got this via purely syntactic manipulations!
Maximum Flow Problem
Given our maximum flow instance, let us change it into a circulation problem by adding an edge from \(t\) to \(s\) and make it have infinite capacity, i.e., \(u_{ts}=\infty\).
Our goal then becomes to find a circulation that preserves all capacity constraints and maximizes the flow on the added “back edge” from \(t\) to \(s\).
Our primal LP becomes: \[\begin{aligned} w^*& = &\max x_{ts}\\ \text{ s.t.} & &\\ \sum_w x_{vw}-x_{wv} &= &0 \quad \quad\quad\quad\quad \text{ $\forall_{v}$}\\ x_{e} &\le &u_{e} \quad \quad\quad\quad\quad \text{$\forall_{e}$}\\\ x &\ge &0,\end{aligned}\] with each variable \(x_e\) encoding the flow on edge \(e\).
In our dual problem we have two types of variables: \(z_v\) corresponding to each flow conservation constraint; and \(y_e\) corresponding to each capacity constraint.
One constraint per each variable \(x_e\).
The resulting dual LP is: \[\begin{aligned} v^*&=&\min \sum_{e} u_e y_{e}\\ \text{ s.t.} & &\\ z_v-z_w+y_{vw} &\ge &0 \quad \quad\quad\quad\quad \text{ $\forall_{(v,w)\neq (t,s)}$}\\ z_t-z_s+y_{ts} &\ge &1\\ y &\ge &0\end{aligned}\]
Think of \(y_{e}\) as “lengths”
Note: Can wlog assume that \(y_{ts}=0\) since otherwise dual infinite due to the fact that \(u_{ts}=\infty\).
So, we actually need to have that \(z_t-z_s \ge 1\).
Observe: Variables \(z_v\) can be viewed as “distances” and the corresponding constraints are just triangle inequality constraints for them wrt lengths given by \(y\)s.
Since the occurrences of \(z\)s are shift-invariant can assume wlog that \(z_s=0\) and thus \(z_v\) is again the distance of \(v\) from \(s\) in the line embedding given by \(z\).
So, our goal is to choose edge lengths \(y\) so as the sink \(t\) is at distance of at least \(1\) from \(s\) wrt lengths \(y\) while minimizing the total volume of the edges.
Side note: This gives good justification for shortest augmenting path argument and blocking flows. This is not unusual: many primal-dual algorithms leverage dual solution to indicate the best way to optimize the primal solution.
How to solve our optimization task?
One approach: find the minimum \(s\)-\(t\) cut \(S^*\), assign lengths \(1\) to all the edges leaving it and \(0\) to all the remaining ones. \(\Rightarrow\) Our volume will be equal to the capacity of this cut \(u(S^*)\). So, by weak duality, we get \(w^*\leq u(S^*)\), i.e., that the maximum flow value \(w^*\) is at most the capacity of the minimum \(s\)-\(t\) cut!
Would like use strong duality to also get the max flow min cut theorem, but to this end need to argue that \(v^*=w^*\geq u(S^*)\), i.e., the dual “volume” problem does not have a better solution that the one given by the minimum \(s\)-\(t\) cut.
Complementary Slackness
Duality leads to another idea: complementary slackness:
Given feasible solutions \(x\) and \(y\), we define their duality gap as \[c^Tx-b^Ty.\]
Duality gap measures “how far off” our dual objective is from the primal objective. That is, how far from being optimal the solutions \(x\) and \(y\) are.
Note: By weak duality we know that duality gap is always non-negative. Also, by strong duality, we know that when \(x\) and \(y\) is optimal the gap is zero.
Let us rewrite our dual LP by changing inequalities into equalities by introducing additional variables: \[\max b^T y \text{ s.t. } A^Ty+s=c, s\geq 0.\]
The variables \(s\) are called slack variables.
Observe that \[c^Tx-b^Ty = (A^Ty+s)^Tx-b^Ty = y^TAx+s^Tx-y^Tb=s^Tx.\]
So, \(x\) and \(y\) are optimal iff \(c^Tx-b^Ty=s^Tx=\sum_i s_i x_i\) is equal to \(0\).
Observe that \(s,y\geq 0\). So, \(\sum_i s_i x_i=0\) iff \(s_ix_i=0\) for each \(i\).
\(\Rightarrow\) Complementary slackness condition: Feasible primal and dual solutions \(x\) and \(y\) are optimal iff \(x_is_i=0\), for each \(i\).
Note that \(x_is_i=0\) implies that at most one of these variables can be non-zero.
\(\Rightarrow\) Either a primal variable \(x_i\) is zero or its corresponding dual constraint is tight (i.e., \(s_i=0\), which means that \(A_i^Ty=c_i\)).
Apply CS to Max-flow Duality
To do that, consider the optimal dual solution \(y^*\), \(z^*\).
Define a cut \(S=\{v \mid z_v^* < 1\}\).
\(\Rightarrow\) Need to have that \(s\in S\) and \(t\notin S\). So, \(S\) is an \(s\)-\(t\) cut.
We now use complementary slackness:
If \((v,w)\) leaves \(S\), then \(y_{vw}^* \ge z_w^*-z_v^* > 0\)
\(\Rightarrow\) the corresponding primal constraints \(x_{vw}^* \leq u_{vw}\) corresponding to the dual variable \(y_{vw}^*\) has to be tight, i.e., \(x_{vw}^*=u_{vw}\) for any primal optimal solution \(x^*\).
If \((v,w)\) enters \(S\), then \(z_v^*>z_w^*\). Also, we know \(y_{vw}^*\ge 0\)
\(\Rightarrow\) \(z_v^*+y_{vw}^*>z_w^*\) i.e. the dual constraint corresponding to the primal variable \(x_{vw}^*\) is not tight.
\(\Rightarrow\) \(x_{vw}^*=0\).
So, in other words, in the optimal solution \(x^*\) all edges leaving \(S\) are saturated, all edges entering \(S\) are empty.
\(\Rightarrow\) The value of the flow \(w^*\) is equal to the capacity \(u(S)\) of \(S\), which is at least the capacity \(u(S^*)\).
We indeed recovered the max-flow min-cut theorem!
Minimum Cost Circulation Problem
Can adapt our maximum flow formulation. Just need to add costs on edges (and make it be minimization instead of maximization problem).
The resulting primal LP: \[\begin{aligned} v^*&=&\min \sum_e c_{e} x_{e}\\ \text{ s.t.} & &\\ \sum_w x_{vw}-x_{wv} &= &0 \quad \quad\quad\quad\quad \text{ $\forall_{v}$}\\ x_{e} &\le &u_{e} \quad \quad\quad\quad\quad \text{$\forall_{e}$}\\\ x &\ge &0,\end{aligned}\]
The dual LP almost the same as before, except our right-side constraints changed (as our primal cost vector changed) and changing the primal problem to be a minimization one flipped the sign constraints on variables \(y\).
The resulting dual LP: \[\begin{aligned} w^*&=&\max \sum_{e} u_e y_{e}\\ \text{ s.t.} & &\\ z_v-z_w+y_{vw} &\le & c_{vw} \quad \quad\quad\quad\quad \text{ $\forall_{(v,w)}$}\\ y &\le &0\end{aligned}\]
Changing variables \(p_v=-z_v\) and rearranging the constraints gives us: \[\begin{aligned} w^*&=&\max \sum_{e} u_e y_{e}\\ \text{ s.t.} & &\\ y_{vw} &\le & c_{vw} +p_v -p_w \quad \quad\quad\quad\quad \text{ $\forall_{(v,w)}$}\\ y &\le &0\end{aligned}\]
But \(c_{vw} +p_v -p_w\) is just the reduced cost \(c^{(p)}_{vw}\) of the edge \((v,w)\)!
Note: Optimum \(\le 0\) since \(y=0\) is a feasible solution. (Put differently: zero circulation is feasible.)
Note: \(y\le 0\) says the objective function is the sum of the negative parts of the reduced costs. (Positive ones get truncated to \(0\).)
Let \(x^*\) and \(y^*\) (and prices \(p^*\)) be the primal and dual optimal solutions.
Invoking complementary slackness tells us that:
If \(x_{vw}^*<u_{vw}\) then the corresponding dual variable \(y_{vw}^*\) is equal to \(0\).
\(\Rightarrow\) \(c_{vw}^{(p^*)}\ge 0\).
If \(c_{vw}^{(p^*)}<0\) then \(x_{vw}^*=u_{vw}\).
This means that all edges with negative reduced cost have to be saturated!
On the other hand, if \(c_{vw}^{(p^*)} > 0\)
\(\Rightarrow\) the constraint on \(y_{vw}^*\) is slack (as \(y^*\leq 0\)).
\(\Rightarrow\) The corresponding variable \(x_{vw}^*=0\).
So, all the edges with positive reduced cost need to carry no flow!