ARPACK in C#

Iterative Eigensolvers

Recently I found myself implementing a spectral clustering algorithm for one of the projects I’m working on. Spectral clustering itself is a vast topic, but the general approach is to calculate the weights between all objects you wish to cluster, produce a Laplacian matrix based on this, then find the k smallest eigenvalues and associated eigenvectors of the Laplacian matrix. Clustering is completed by grouping the rows of the eigenvectors into k clusters, and the result maps directly back to the required clusters in the original dataset.

Spectral clustering is extremely powerful, but it can be computationally time-consuming. In particular clustering into k groups requires finding the k smallest eigenvalues/vectors of a very large matrix, a problem for which there are very few solutions in C#. What is required is an interative eigensolver, one that computes the first eigenvectors alone, and will halt when its task is completed. The commonly used matrix libraries in C#, for example Accord.Math and Mathnet.Numerics, do not offer such a routine.

A popular library for this task is ARPACK, which implements the Implicitly Restarted Arnoldi Method. ARPACK is actually used at the back end of Matlab’s eigs method. Using ARPACK instead of one of the above libraries, you could hope to calculate the first few eigenvalues of a large matrix orders of magnitude faster than the above libraries, which would calculate all eigenvalues.

Using ARPACK in C# has a few notable problems, firstly it was coded in Fortran77, so cannot be easily included in your code, secondly the design of the library is complex, with a reverse communication protocol that requires you to perform additional vector matrix tasks throughout the procedure; it is not simply a case of running the function and reading the results.

Compiling ARPACK

ARPACK can’t be compiled into a managed assembly, instead you must compile into a non-managed Windows DLL, then reference this using PInvoke. There are numerous articles on compiling ARPACK into Windows, though few on the 64bit version. The resources are out there though, so I won’t dwell on this here. I will, however, provide a direct link to the x64 library since this is hard to produce! You can download it here, obviously I’m providing no support for this library.

Using ARPACK

Importing the Functions
Using the ARPACK DLL from within C# causes a definite headache. Depending on whether you are handling symmetric or non-symmetric matrices, and whether you are handling complex numbers, you will want to use a specific pair of functions. These are:

DSAUPD, DSEUPD Symmetric Real
DNAUPD, DNEUPD Non-symmetric Real
ZNAUPD, ZNEUPD Non-symmetric Complex

The non-normalised laplacian is symmetric, but the normalised laplacians that are often used in Spectral Clustering are not. I used DNAUPD and DNEUPD, you’ll want to select whichever pair is appropriate for your problem. Begin by importing them into your C# program:

/* DNAUPD */

[DllImport("arpack_x64.dll", EntryPoint = "dnaupd_", CallingConvention = CallingConvention.Cdecl)]
static extern unsafe void DNAUPD(
int* ido, byte* bmat, int* n, byte* which, int* nev, double* tol, double* resid, int* ncv,
double* v, int* ldv, int* iparam, int* ipntr, double* workd, double* workl, int* lworkl,
int* info);

/* DNEUPD */
[DllImport("arpack_x64.dll", EntryPoint = "dneupd_", CallingConvention = CallingConvention.Cdecl)]
static extern unsafe void DNEUPD(
int* rvec, byte* all, int* select, double* dr, double* di, double* z, int* ldz,
double* sigmar, double* sigmai, double* workev, byte* bmat, int* n, byte* which,
int* nev, double* tol, double* resid, int* ncv, double* v, int* ldv, int* iparam,
int* ipntr, double* workd, double* workl, int* lworkl, int* ierr);

Note: PInvoke is really good at marshalling between managed and unmanaged code, so good in fact that these two functions often run successfully even if the input parameters are wrong! Be sure to be extremely careful about the order of input parameters, the sizes of arrays, and so on. Also, while I’ve used native pointers here, and will need to used the fixed keyword to pin the data before using it, you can also use the ref keyword instead of a pointer, and simply rely on automatic marshalling.

Using the AUPD Functions
Given an matrix M, for which we want the smallest eigenvectors (ARPACK can also find the largest), how do we call the functions to achieve this? One of the more elegant things about the ARPACK library is its reverse communication interface. You would normally expect to convert our matrix into some compatible data structure, then pass it as a parameter to the relevant function. ARPACK works differently, in fact we don’t have to provide the matrix at all, which means we are free to use whatever format we wish to describe it. Essentially, ARPACK will work effectively whether your matrix is an Accord.Math 2D array, a MathNet.Numerics DenseMatrix, or any other structure. The library is used by repeated calls to the *AUPD function, followed by a single call to the *EUPD function.

During the repeated calls to *AUPD, the ARPACK library will pass back temporary work vectors to your program, and request that certain operations be performed upon them. For example, ARPACK may pass back a vector X, and request that you return M*X. X will be a sub-section of the workd array. This reverse communication is essentially how ARPACK uses the values in your matrix, without actually holding a copy of the matrix itself. The returned ido pointer specifies which operation is required, and the different operations are detailed on the respective pages for SNAUPD, DNAUPD and ZNAUPD.

In my case I was using DNAUPD in shift-invert mode, looking for the smallest eigenvectors. I begin by initialising the required parameters (not all are shown, the entire code can be found at the bottom of the article):

int ido = 0; // reverse communication parameter

byte bmat = 0x47; // "G" RHS matrix is identity matrix, i.e. standard eigen problem
byte[] which = new byte[] { 0x4C, 0x4D }; // "LM" in this mode find closest to sigma, which is close to zero

// Various work arrays, sizes are detailed in the ARPACK documentation
int ncv = Math.Min(Math.Max(2 * nev + 1, 20), n);
double[] v = new double[n * ncv];// Basis vectors
int ldv = n; // Length of each basis vector
int[] ipntr = new int[14]; // Internal pointers
double[] workd = new double[3 * n]; // Workspace
int lworkl = 3 * ncv * (ncv + 2); // Length of workspace workl
double[] workl = new double[lworkl]; // Workspace
int info = 0; // Generate random starting residuals
int ierr = 0; // DNEUPD error flag

int[] iparam = new int[11]; // Parameter array, different for DS/DN/ZN AUPD
iparam[0] = 1; // Specifies the shift strategy (1->exact)
iparam[2] = 300; // Maximum iterations
iparam[6] = 3; // Shift-invert mode.

Once the necessary data structures have been set up, if you’re using native pointers then you’ll need to pin your data like this:

fixed (byte* _which = which)

fixed (double* _resid = resid)
...
fixed (double* _workl = workl)
{
}

Then you’re ready to go! Loop while ido isn’t 99, which signals either a successful or unsuccessful end:

do

{
DNAUPD(
&ido, &bmat, &n, _which, &nev, &tol, _resid, &ncv,
_v, &ldv, _iparam, _ipntr, _workd, _workl, &lworkl, &info);

// IDO specifies the action to take between calls to DNAUPD.
switch (ido)
{
// Handle operations
}
} while (ido != 99);

if (info < 0) throw new Exception("DNAUPD error: " + info);[/code]</pre>
<p>When handling the operations, you'll need to implement each of the tasks specified in the documentation. For DNAUPD in shift-invert mode, there are 3 main possibilities (-1, 1 and 2):</p>
<pre>[code type="csharp"]switch (ido)
{
case -1:
// Copy workd(ipntr(0)) to workVec
Array.Copy(workd, ipntr[0] - 1, workVec, 0, n);

// Solve
solvedV = solver.Invoke(workVec);

// Copy result to workd(ipntr(1))
Array.Copy(solvedV, 0, workd, ipntr[1] - 1, n);
break;
case 1:
// Copy workd(ipntr(0)) to workVec
Array.Copy(workd, ipntr[0] - 1, workVec, 0, n);

// Solve
solvedV = solver.Invoke(workVec);

// Copy result to workd(ipntr(1))
Array.Copy(solvedV, 0, workd, ipntr[1] - 1, n);
break;
case 2:
// Copy X to Y
Array.Copy(workd, ipntr[0] - 1, workd, ipntr[1] - 1, n);
break;
case 99:
break;
default:
throw new InvalidOperationException("Unexpected IDO value");
}

Note: ARPACK is indexed from 1, unlike C#, so you need to -1 or +1 as appropriate when copying to and from ARPACK work arrays.

In this case, solver is a delegate I used to provide access to a linear solver that I could swap out easily. This function takes a vector and returns another vector. I used Accord.Math for this function:

// Pre-computed before any calls to ARPACK.

double[,] M = ...; // The matrix for which eigenvalues are required

// Calculate A - Sigma * B, where B is the identity matrix, Sigma is a double = 2.2E-16
double[,] AsB = M.Subtract(Sigma.Multiply(Accord.Math.Matrix.Identity(size)));

// Create solver
luDecomp = new Accord.Math.Decompositions.LuDecomposition(AsB);

// Wrap solve in delegate
SolverDelegate solver = delegate(double[] input) { return luDecomp.Solve(input); }

Using the EUPD Functions
Once iterations have completed, you must make a single call to *EUPD to finish the process. This will also let you specify that the function should return the eigenvectors if you need them.

byte all = 0x41; // 'A'
int rvec = 1; // Return eigenvectors
int[] select = new int[ncv];
double[] dr = new double[nev + 1];
double[] di = new double[nev + 1];
double[] z = new double[n * (nev + 1)];
int ldz = n;
double sigmai = 0.0; // Imaginary component of sigma, which we don't need for DNAUPD mode 3.
double[] workev = new double[3 * ncv];

fixed (int* _select = select)
fixed (double* _dr = dr)
fixed (double* _di = di)
fixed (double* _z = z)
fixed (double* _workev = workev)
{
    DNEUPD(&rvec, &all, _select, _dr, _di, _z, &ldz,
        &sigma, &sigmai, _workev, &bmat, &n, _which, &nev,
        &tol, _resid, &ncv, _v, &ldv, _iparam, _ipntr, _workd, _workl, &lworkl, &ierr);
}

if (ierr != 0) throw new Exception("DNEUPD error: " + ierr);

// Copy output vectors to an array if necessary
evals = new double[nev];
evecs = new double[nev][];
for (int i = 0; i < nev; i++)
{
    evecs[i] = new double[n];
}

for (int i = 0; i < nev; i++) evals[i] = dr[i];
{
    if (evecs != null)
    {
        for (int i = 0; i < nev; i++)
        {
            Array.Copy(v, i * n, evecs[i], 0, n);
        }
    }
}[/code]

Hopefully this is some help for people who are struggling getting ARPACK to run, it took me a fair while, so if this makes someone's life easier then it was worth documenting! Plus, I'll no doubt forget what I did at some point, and can come back here. The full class that solves for the smallest k eigenvectors can be downloaded here, wrapping everything up in a class makes it easy to add into a project and then (hopefully) forget about.

Leave a Reply

Your email address will not be published. Required fields are marked *