Matrix Inverse in Thunder (A CMU 15-418 Final Project)

by Richard Alex Hofer (rhofer) and Owen Kahn (okahn)

Our goal for this project is to implement distributed matrix inverse.

Note: This was not our original project

Final writeup

Our writeup is avaiable here

Progress Report

Our progress report is available here

Background

Matrices are important. Lots of frameworks built on top of Spark provide distributed matrices and a few operations, but it appears that none of them offer a matrix inverse which can be very useful. This is likely a sign that it is a very difficult operation to implement and we probably should have paid more attention to this.

The Challenge

Matrix inverse does not distribute easily and computing any part of it generally requires access to many parts of the original matrix. This communication makes its implementation difficult when working with distributed matrices. It also does not lend itself well to Spark primitives.

Resources

We will test on Amazon EC2 because it is easy to deploy Thunder there and it allows us to easily test on many different cluster configurations.

Goals and Deliverables

Plan to Achieve

A correct implementation of distributed matrix inverse which is faster than using a single machine (which would be forced to read and write from disk).

Hope to Achieve

Ideally we would like to be competitive with the implementation provided by the paper we our working from.

Platform Choice

As discussed in Resources we will use Amazon EC2.

Schedule

April 25

Have a correct implementation ✓

May 2

Run our implementation on EC2 ✓

May 3

Run reference implementation on EC2 ✓

May 6

Develop tests and upload to S3 ✓

May 8

Investigate unreasonable memory requirements (ongoing)
Contact the Spark community to learn about Spark internals ✓

May 10

Have presentation prepared

Results

We have a correct implementation of inverse which falls apart if run on a larger cluster due to unidentified overhead internal to Spark. We're attempting to figure out why this occurs, but are having difficulties since we cannot find anyone who is doing a similar type of computation.