DP flops on the cheap

2015-06-09 at 11:44 am

I'm not the only one with piles of leftover hardware from cryptocoin mining days. Usually this is some AM3+ board with a couple of x16 PCIe slots, the cheapest cpu and a bunch of AMD (ex ATI) gpus. It is hard to give new and interesting life to such hardware, but with sufficiently motivating hobbies, there's a way.

Recently I came across an interesting project called DroneCFD. It's a bunch of python that does everything that needs to be done in order to put a 3D geometry of something that pretends to fly into a virtual wind tunnel and create some nice pictures with OpenFOAM and Paraview. Which is something that I wanted to do for some time now, since my hobby is fast developing into a proper sport (we already have world cup and world championship is within 6 years) and we'll want to have the best possible models.

So we need OpenFOAM. What's that? It's something called CFD, computational fluid dynamics. Actually it's a library of many different solvers for various fluid simulations, from laminar, turbulent, supersonic, recently it started to spread into multiphysics as well (heat, EM). It's completely opensource and popular in academic circles. There are many commercial tools out there that cover the same problem areas, but their cost is completely out of scope for hobbyists. If you want to explore CFD further, there are a couple of courses on youtube that can give you an insight into the math involved.

The gist of the problem is this: in CFD you describe your problem with complex sets of differential equations that you need to solve for each time step. When this is translated into numerical algorithms, you end up with large sparse matrices that you shuffle in and out of main memory. Since they're usually too big to fit into cpu caches, the memory bandwidth becomes your limiting factor. So if you want to run your simulations fast, you're looking for a system with highest possible memory bandwidth. And here is where our gpus come into play.

I've set up a system from my leftovers, got a second hand FX8320 cpu for it and cashed out for 32GB of cl9 1866MHz memory. Then I set up the whole optimized environment with the help of excellent EasyBuild framework. It consists of latest GNU GCC 5.2.0 compiler, AMD ACML 5.3.1 math libraries, OpenMPI 1.6.5 and OpenFOAM 2.3.1. With this setup I ran the simulation with example geometry, included with DroneCFD and it needed ExecutionTime = 36820.82s, ClockTime = 38151s to perform 3000 time steps of simulation.

Then I noticed that AMD receltny released clMath libraries on github. They implement some commonly used math routines in OpenCL 1.2 and 2.0, which means that you can run them on cpu, gpu or any other device that implements OpenCL. One nice thing about these libraries is that at least clBLAS includes a handy client and a bit of python scripts that enable one to do some benchmarking of the hardware. And that's exactly what I did.

First, I ran some tests to demonstrate cpu vs gpu difference. I used 7970 gpu here with OpenCL 1.2 version of the clMath libs. This is what I got:

cpu vs tahiti

X axis presents matrix size, Y is Gflops measured by the library. Here I performed single and double precision general matrix-matrix multiplication on cpu and on gpu. Y scale should almost be logarithmic to see cpu performance in more detail ;) There are a lot of interesting things worth more detailed discussions on this graph, but it serves a purpose I wanted to demonstrate - gpus are many many times faster than cpus for this kind of work. No wonder scientific communities are jumping on them like crazy.

Since 7970 is not the only kind of gpu I have lying around, I replaced it with R9-290, rebuilt clMath libs with OpenCL 2.0 and rerun the tests:

Couple of things to note here. R9 290 is based on a new, different architecture (called Hawaii) than 7970 (Tahiti). While the older architecture has about half the performance at double precision compared to single precision (which makes sense, as dp numbers take twice as much space in memory as sp numbers), newer architecture fails to reach the dp performance of the older one for most of the explored range. If single precision is good enough for your problem, then newer equals better. But with most of the engineering problems demanding double precision math, it turns out that previous generation of gpus offers more. 

There's one limiting factor with these gaming gpus: they have relatively small amout of memory. While r9 290 has 4GB, 7970 has only 3GB and these are both small if you want to run some decent numeric simulation. There are two ways to grow beyond that: first is to cash out for "professional" gpu products with up to 32GB or memory and then, if even that is not enough, distribute your simulation across many gpus and many systems with MPI. But that is beyond our hobby again.

There are two things I want to do for next step: first I want to run OpenFOAM linked with these clMath libraries and measure any improvement. I assume that copying data to and from gpu for each time step will kill any performance bennefits that gpu can offer. But I want to have something to compare to, as I discovered a company that ported exactly the solvers I'm interested in to run fully on gpu, only doing the copy at the beginning and at the end of simulation. Also they offered affordable prices for their work for us hobbyists so stay tuned for part 2 :)

Redundancy is not everything

2010-05-10 at 8:08 pm

Years ago when I was still maintaining highly available server clusters and thinking how to improve them, I learned quickly that redundancy of the servers by itself only brings you complications. The key to a meaningful redundant server setup are the sensory methods that monitor health of each server and the logic that acts upon those health states. One of the lessons I learned was that when you monitor some parameter via different methods and you get different outputs, it's usually the method that's at fault, be it either a timing issue or some simple text parsing (everyone loves to play with float numbers in bash</sarcasm>) error.

Now I just read an excellent report from  the Dutch Safety Board about a crash of Turkish 737-800 near Amsterdam Schiphol Airport last year. I was particularly interested in this accident because I know aircrafts usually carry two radio altimeters and I wondered what chain of events triggered a wrong reading from a single one that lead the plane to crash. Let me present my own view of this report and some thoughts that I got about the state of aviation software in general.


How you do things?

2010-24-02 at 1:10 pm

This blog post often comes very handy in various discussions. In today's small busnisess with startup mentality, the question of how to do things is often ignored and all focus is put on getting things done. Since my job is to make things repeatable, with predictable result, I must focus mostly on how, which creates all kinds of interesting situations.

And how do you see yourself?

Learning from mistakes

2010-16-02 at 1:49 pm

I was baking this post in the back of my head for about half a year. Two events finally forced to convert it into writing - google admitting a breakin and my discovery of a particular FAA site.

I remember myself as a kid, playing with legos and wanting to build thing with them that legos physically werent capable of. Even back then I wanted to do more with my toys than what they were meant to handle. I carried this over to computers, which soon lead me to linux & BSD world of free unices, where one has all the chance to use the computer in a way he sees fit instead of bowing your head and using it in a way someone else (like Microsoft and more recently Google) intended. This path surely involves lots of experimentations, some failures and some successes, lots of thinking, analyzing, more or less correct presumptions, causes and effects, etc. In general, I developed my own sistematic approach to a problem. Each new solution always consists of past expiriences and here is where my gripe lies.


Software "design"

2008-17-08 at 01:17 am

It all started with me having too much free time at my workplace.

So I started researching about homebrew autopilots for model planes, that led me to software that runs avionics onboard the real planes, which ultimately led me to programming practices used to write such software. These two blog posts say most of the things I wanted to write here.

At the end, one thing is clear: designing and writing software must stop being something of artistic value and creative in a sense of expression (which often leads to "broken by design" state) and has to become something that is formally provable to be correct (see also Correctness by construction (1,2)). A good first candidate for this are mission critical systems on which lives depend on - such as airplane avionics. It's good to see that tools for this exist and are freely available. In the light of recent DNS vulnerability, which is there by design, I wonder what would happen if DNS was designed and implemented with such methods.

It was also pleasantly suprised for me to find out that I use some of the methods described in my own work. That's probably why I have free time in the first place ;)

How I see modern Air Traffic Control

2008-17-02 at 01:24 am

A few years ago I started expanding and overlapping my hobbies. Obvious overlap between being a system administrator (and integrator and architect) and flying is Air traffic control or ATC. I joined the virtual flying community and observed the procedures there and visited Slovenia Control and Ljubljana airport with them. I'm also reading AeroSafety world online magazine and am following aviation-safety.net and flightglobal rss feeds. If it is worth anything, I've also seen all AirCrash Investigation tv documentaries and IVTV dvds. I also asked our ATC for their SOP documents but was understandably denied the request as they're internal documents.

The most pressing problem of ATC today is how to increase the capacity of airways to safely carry more planes (=passengers) from point A to point B. Based on the little information I have available, I put together a sysadmin's view on how I think it should be done.

  • 1