So I guess this post starts my blog. Welcome! 🙂
Over the past couple of days I was doing a few experiments with the Eigen linear algebra library, because the learning part of my supervised descent library was very slow. Inverting a 2000 x 2000 matrix took a few minutes, while Matlab did it in under a second. I ended up acquiring a bit of knowledge about Eigen and performing a few experiments I thought might be worth sharing.
Initially, I was using the FullPivLU decomposition from Eigen, and checked if the matrix I inverted was actually invertible. If you’re not so familiar with Eigen, their documentation provides a very nice overview table about all the available linear algebra decompositions here. My code looked roughly like this:
using namespace Eigen; // to keep the example concise
cv::Mat AtA = A.t() * A;
Map<Matrix<float, Dynamic, Dynamic, RowMajor>> AtA_Eigen(AtA.ptr(), AtA.rows, AtA.cols);
FullPivLU<Matrix<float, Dynamic, Dynamic, RowMajor>> luOfAtA(AtA_Eigen);
if (!luOfAtA.isInvertible()) std::cout << "Not invertible." << std::endl;
Matrix<float, Dynamic, Dynamic, RowMajor> AtAInv_Eigen = luOfAtA.inverse();
cv::Mat AtAInv(static_cast(AtAInv_Eigen.rows()), static_cast(AtAInv_Eigen.cols()), CV_32FC1, AtAInv_Eigen.data());
cv::Mat x = AtAInv * A.t() * b;
I’m pretty much using OpenCV everywhere in my code, but OpenCV is horrendously slow at inverting matrices larger than 500 x 500, so I dropped it in favour of Eigen. To people not familiar with the code, what I am doing is mapping the memory allocated by OpenCV with an Eigen::Map
, invert with Eigen, and map the data back to an OpenCV cv::Mat
. Eigen::RowMajor
is needed because OpenCV stores the data in row major layout in memory, whereas Eigen’s default is column major. No copying of data is involved here. I solve the normal equations AT * A * x = AT * b, thus calculating AT * A, inverting it, and then calculating x.
Now let’s analyse what is not good about above code. Profiling showed hotspots at the first and last line, which I didn’t expect, so I ran a few measurements:
A: 2000×2000, b: 2000×40 | A: 2000×4400, b: 4400×6 | |||
---|---|---|---|---|
AT * A | x = AtAInv * AT * b | AT * A | x = AtAInv * AT * b | |
OpenCV | 6.6 | 6.4 | 14.5 | 13.8 |
Eigen | 0.8 | 0.8 | 1.6 | 1.7 |
Numbers doubled again for both libraries for a 3500 x 3500 matrix. Basically, even for matrix-matrix multiplications, OpenCV is horribly slow.
Next, as I was still wondering what took Eigen so long to calculate the actual inverse, I wanted to figure out if the Eigen::Map
causes a slowdown or possibly the row major memory layout. I ended up benchmarking a few of the decomposition algorithms as well, which yielded a few interesting conclusions. First the results:
A: 2048×558, b: 2048×6 | A: 2048×2048, b: 2048×40 | A: 3500×3500, b: 3500×40 | |
---|---|---|---|
MatrixXf & LDLT::solve() | 0.55 | 0.57 | 4.4 |
MatrixXf & PartialPivLU::inverse() | 1.4 | 1.4 | 5.9 |
MatrixXf & PartialPivLU::solve() | 0.47 | 0.46 | 1.8 |
MatrixXf & ColPivHouseholderQR::inverse() | 6.7 | 10.8 | 59.0 |
Col-major Map & PartialPivLU::inverse() | 1.4 | 13.8 | 5.8 |
Row-major Map & PartialPivLU::inverse() | 1.3 | 12.9 | 5.6 |
Row-major Map & PartialPivLU::solve() | 0.52 | 0.56 | 1.8 |
Col-major Map & ColPivHouseholderQR::inverse() | 6.7 | 11.4 | 54.8 |
Row-major Map & ColPivHouseholderQR::inverse() | 6.5 | 9.7 | 44.0 |
Row-major Map & FullPivLU::inverse() | 40.0 | 38.4 | 142.5 |
Row-major Map & FullPivLU::solve() | 35.5 | ||
jacobiSvd(ComputeThinU | ComputeThinV)::solve() | 993.8 |
First, we can see that using an Eigen::Map
to custom data doesn’t incur a speed penalty. It’s equally fast than the Eigen::MatrixXf
counterpart – it’s essentially free, which is good news. Second, the fact that the data is in row-major layout doesn’t seem to be an issue – on the contrary, Eigen consistently ran about 10 percent faster when the data was in row-major memory order. This is a bit contradictory to what the Eigen documentation suggests (it says Eigen works best and is most tested in its default, column-major layout). Third, there is a huge difference between the slowest decomposition (FullPivLU) and the fastest (PartialPivLU). My initial choice of using FullPivLU might not have been the best. Note that I tested only a subset of all the available decompositions (for a complete list, see the link above).
The fourth conclusion requires a bit of an explanation first. You can see in the table that I sometimes used inverse()
and sometimes solve(MatrixXf b)
. To solve Ax=b, we would calculate the inverse of A and then calculate x=A-1*b. However, inverting the matrix might not be the best choice. If we want to solve a linear system of equations in the form of Ax=b, and we have a b, we can directly solve for a given b, and this often turns out to be an easier problem to solve than inverting the matrix. Eigen provides the solve
method to do this, and it indeed is a lot faster. Note that it is even faster than the table suggests because we don’t need to add the time to calculate A-1*b (or A-1*AT*b in our case). On the other hand, when using inverse()
, that time needs to be added.
However, it’s unfortunately not always possible to use solve()
: A lot of the decomposition algorithms have not (yet?) implemented the case where b is a matrix.
One important thing to note is that the computationally cheaper algorithms come with a steep price: They pose requirements on the matrix, for example that it is invertible or positive (semi-) definite. On the other hand, the more expensive decompositions like the FullPivLU are rank-revealing decompositions. Often, the more expensive algorithms are also more precise or more numerically stable.
There are actually a few more things to consider, which are outside the scope of this blog post: One reason why I solve the normal equations and not the original system in the first place is that the original system most likely has no solution anyway and I want to obtain the minimum least-squares solution. Instead of calculating the pseudo-inverse using SVD on the original problem (which is very slow), one can solve the normal equations and use a QR, LU or even LDLT decomposition. Plus, in practice, even in that case, a regularisation term is often added to ATA.
Technicalities: I used Eigen 3.2.2, OpenCV 2.4.8, and the experiments were run on a Core i7-4700MQ with turbo boost and HT enabled. I compiled with VS2013, 64bit, Release mode, and the default flags from CMake. I briefly ran with /openmp
and some more optimisations turned on, and in some cases I could see Eigen use more than 1 core, but I turned that off for these experiments. It’s worth noting that I ran every experiment only a couple of times and recorded the numbers by hand – so take the numbers only as estimates (but they were pretty consistent).
To wrap it up – my lessons learned were:
- OpenCV is very slow for large matrices, even for multiplications
- Prefer
solve()
overinverse()
, because it does the job we want to do (solving the linear system – not inverting a matrix) - A b in matrix form is not supported by most of Eigen’s
solve()
methods, and we are left with few decomposition options - Checking for invertibility is computationally very expensive because it requires a full LU decomposition (or similar). A potential workaround may be to use a cheap algorithm and then check the solution with
isClose()
For the interested reader, the newest version of above code is on my GitHub.