Choice of rho vector in ADMM

Hey Folks,

I got a question relating the ADMM parameter rho.

In the osqp-paper (https://web.stanford.edu/~boyd/papers/pdf/osqp.pdf) section 5.2 Parameter Selection ‘Choosing rho’ it is stated that

My current Assumption would be:

Given the knowledge that some lines are equal constraints (given by evaluating upper & lower bounds) I would state that we a priori know a subset of constraints that will be active at the optimum solution. Assuming infinity as optimal factor for those constraints, why don’t we assign those lines a higher factor than 10³ * rho?

With that thoughts in mind there would arise 2 questions:

  1. How is the value of 10³ as factor derived? I did some tryouts and a higher value e.g. 10⁶ (closer to infinity which would be ‘ideal’) did improve the runtime greatly. If I do not miss something here (e.g. recovery from bad intermediate solutions, …) this may hold for general purpose as well. FYI: I did not try it on some kind of benchmark problem yet.

  2. If we detect constraints to be equal and hence set its rho to an active-value (10³) what are the reasonings behind updating them? Is the maximal ratio between rho_max and rho_min effecting the stability?

Thanks in advance for your answer.

When developing the solver we tried many different combinations of parameters on quite a large benchmark set of problems derived from various applications, with many millions of solves in total overall.

The factor of 10^3 was just what seemed to give the best performance overall across all problems types while remaining a nice round number. It is just a heuristic; there is no derivation behind it other than the observation that ‘infinite’ values would be best in principle.

If you choose the value larger than 10^3 then you start to get numerical conditioning problems when factoring the KKT matrix. That is also the reason that we keep the value as a multiplier on the current solver rho value, rather than just fixing a constant value for the equalities.

Thanks a lot for the quick & detailed response.
It clarifies my thoughts on that topic.

The origin of the idea is related to the scaling of the problem to be solved.

I already read some threads in this forum and on github that we are quite sensitive to scaling (which comes from ADMM, so that’s fine.).

When finding out about 10³ being numerically stable, did you also find some preferred manual scaling before giving the problem to osqp and it’s automatic scaling?

In my case the osqp-scaling is helping a bit but I am suspicious that some clever & more problem specific manual step could improve it.

You are perhaps conflating two different scalings within the solver code:

  1. There is a scaling of the parameter rho so that we have the possibility of assigning a different rho for each constraint. The solver then puts one value (the nominal one) for all inequalities, and another for all equalities (nominal * 1e3).

  2. There is an overall scaling of the problem data itself, which is done by applying a modified matrix equilibration method to the matrix [P A'; A 0] (I think our method also accounts for the size of the linear term in the cost, but I don’t remember the details). This can make a really big improvement for some problems.

I assume that you are referring to the second case, although I don’t know of any manual method that improves either case in any kind of consistent way. Certainly it might be possible for some specific application though. If that’s what you want to do then I think it would be best to just pre-scale the problem before passing it to the solver.

Note that the data scaling procedure ignores the linear terms in your constraints and (partially) the cost. You might have some success in manual scaling if those terms are very large or small relative to their matrix counterparts. In some cases, e.g. if you are using an exact penalty function method, you might also be able to make an informed guess at the size of the multipliers and use that as a warm start point.

1 Like

Thanks for making that point clear. I was indeed mixing it a bit. I gonna look what approaches fit best for the problem.