DP flops on the cheap

2015-06-09 at 11:44 am

I'm not the only one with piles of leftover hardware from cryptocoin mining days. Usually this is some AM3+ board with a couple of x16 PCIe slots, the cheapest cpu and a bunch of AMD (ex ATI) gpus. It is hard to give new and interesting life to such hardware, but with sufficiently motivating hobbies, there's a way.

Recently I came across an interesting project called DroneCFD. It's a bunch of python that does everything that needs to be done in order to put a 3D geometry of something that pretends to fly into a virtual wind tunnel and create some nice pictures with OpenFOAM and Paraview. Which is something that I wanted to do for some time now, since my hobby is fast developing into a proper sport (we already have world cup and world championship is within 6 years) and we'll want to have the best possible models.

So we need OpenFOAM. What's that? It's something called CFD, computational fluid dynamics. Actually it's a library of many different solvers for various fluid simulations, from laminar, turbulent, supersonic, recently it started to spread into multiphysics as well (heat, EM). It's completely opensource and popular in academic circles. There are many commercial tools out there that cover the same problem areas, but their cost is completely out of scope for hobbyists. If you want to explore CFD further, there are a couple of courses on youtube that can give you an insight into the math involved.

The gist of the problem is this: in CFD you describe your problem with complex sets of differential equations that you need to solve for each time step. When this is translated into numerical algorithms, you end up with large sparse matrices that you shuffle in and out of main memory. Since they're usually too big to fit into cpu caches, the memory bandwidth becomes your limiting factor. So if you want to run your simulations fast, you're looking for a system with highest possible memory bandwidth. And here is where our gpus come into play.

I've set up a system from my leftovers, got a second hand FX8320 cpu for it and cashed out for 32GB of cl9 1866MHz memory. Then I set up the whole optimized environment with the help of excellent EasyBuild framework. It consists of latest GNU GCC 5.2.0 compiler, AMD ACML 5.3.1 math libraries, OpenMPI 1.6.5 and OpenFOAM 2.3.1. With this setup I ran the simulation with example geometry, included with DroneCFD and it needed ExecutionTime = 36820.82s, ClockTime = 38151s to perform 3000 time steps of simulation.

Then I noticed that AMD receltny released clMath libraries on github. They implement some commonly used math routines in OpenCL 1.2 and 2.0, which means that you can run them on cpu, gpu or any other device that implements OpenCL. One nice thing about these libraries is that at least clBLAS includes a handy client and a bit of python scripts that enable one to do some benchmarking of the hardware. And that's exactly what I did.

First, I ran some tests to demonstrate cpu vs gpu difference. I used 7970 gpu here with OpenCL 1.2 version of the clMath libs. This is what I got:

cpu vs tahiti

X axis presents matrix size, Y is Gflops measured by the library. Here I performed single and double precision general matrix-matrix multiplication on cpu and on gpu. Y scale should almost be logarithmic to see cpu performance in more detail ;) There are a lot of interesting things worth more detailed discussions on this graph, but it serves a purpose I wanted to demonstrate - gpus are many many times faster than cpus for this kind of work. No wonder scientific communities are jumping on them like crazy.

Since 7970 is not the only kind of gpu I have lying around, I replaced it with R9-290, rebuilt clMath libs with OpenCL 2.0 and rerun the tests:

Couple of things to note here. R9 290 is based on a new, different architecture (called Hawaii) than 7970 (Tahiti). While the older architecture has about half the performance at double precision compared to single precision (which makes sense, as dp numbers take twice as much space in memory as sp numbers), newer architecture fails to reach the dp performance of the older one for most of the explored range. If single precision is good enough for your problem, then newer equals better. But with most of the engineering problems demanding double precision math, it turns out that previous generation of gpus offers more. 

There's one limiting factor with these gaming gpus: they have relatively small amout of memory. While r9 290 has 4GB, 7970 has only 3GB and these are both small if you want to run some decent numeric simulation. There are two ways to grow beyond that: first is to cash out for "professional" gpu products with up to 32GB or memory and then, if even that is not enough, distribute your simulation across many gpus and many systems with MPI. But that is beyond our hobby again.

There are two things I want to do for next step: first I want to run OpenFOAM linked with these clMath libraries and measure any improvement. I assume that copying data to and from gpu for each time step will kill any performance bennefits that gpu can offer. But I want to have something to compare to, as I discovered a company that ported exactly the solvers I'm interested in to run fully on gpu, only doing the copy at the beginning and at the end of simulation. Also they offered affordable prices for their work for us hobbyists so stay tuned for part 2 :)

Spirals

2014-22-12 at 4:20 pm

With my curiosity and passion for innovation I often discover patterns around me that appear as pendulum motion, from one extreme to the other, back and forth. Plotted on a time line, they may appear as a sine curve, but looking from evolutionary point of view, a spiral is a much better representations. Sometimes those patterns even gravitate towards a common endpoint, such as a center of a spiral.

One such pattern is now in motion and starting to gravitate towards a center.

(more)

HPC world

2012-18-10 at 2:32 pm

Swimming in the HPC waters for the past two years, I got some sense of how wet and cold the water is.

(more)

Redundancy is not everything

2010-05-10 at 8:08 pm

Years ago when I was still maintaining highly available server clusters and thinking how to improve them, I learned quickly that redundancy of the servers by itself only brings you complications. The key to a meaningful redundant server setup are the sensory methods that monitor health of each server and the logic that acts upon those health states. One of the lessons I learned was that when you monitor some parameter via different methods and you get different outputs, it's usually the method that's at fault, be it either a timing issue or some simple text parsing (everyone loves to play with float numbers in bash</sarcasm>) error.

Now I just read an excellent report from  the Dutch Safety Board about a crash of Turkish 737-800 near Amsterdam Schiphol Airport last year. I was particularly interested in this accident because I know aircrafts usually carry two radio altimeters and I wondered what chain of events triggered a wrong reading from a single one that lead the plane to crash. Let me present my own view of this report and some thoughts that I got about the state of aviation software in general.

(more)

How you do things?

2010-24-02 at 1:10 pm

This blog post often comes very handy in various discussions. In today's small busnisess with startup mentality, the question of how to do things is often ignored and all focus is put on getting things done. Since my job is to make things repeatable, with predictable result, I must focus mostly on how, which creates all kinds of interesting situations.

And how do you see yourself?

Learning from mistakes

2010-16-02 at 1:49 pm

I was baking this post in the back of my head for about half a year. Two events finally forced to convert it into writing - google admitting a breakin and my discovery of a particular FAA site.

I remember myself as a kid, playing with legos and wanting to build thing with them that legos physically werent capable of. Even back then I wanted to do more with my toys than what they were meant to handle. I carried this over to computers, which soon lead me to linux & BSD world of free unices, where one has all the chance to use the computer in a way he sees fit instead of bowing your head and using it in a way someone else (like Microsoft and more recently Google) intended. This path surely involves lots of experimentations, some failures and some successes, lots of thinking, analyzing, more or less correct presumptions, causes and effects, etc. In general, I developed my own sistematic approach to a problem. Each new solution always consists of past expiriences and here is where my gripe lies.

(more)

Server vs. server

2010-09-01 at 9:46 pm

Some people like strong cars, I like powerful computers. Since I got on the internet in 1994 or so, this reads powerful servers. My last workstation oriented powerful computer was Pentium 200 in 1996 and it soon landed in the role of a server, with 4x 25GB drives in 1999. Btw, it still works perfectly ;)

Since home server is mostly limited by budget, one looks at price/performance factor. And since I have all the storage management (raid/lvm) expirience, I spent some time putting together a data-reliable machine, that serves the usual music/movies entertainment stuf, runs a samba share for few windoze boxen to share their data and a few NFS and AoE exports for my desktop. Since my desktop is diskless (actualy has no moving parts), it is very dependant on latency and io throughput the server offers it.

For a few years now this role was handled by a HP tc2120 "server" I got cheaply second-hand. Pentium 4, 64bit pci bus, looks like a decent IO oriented pc. I hang a pair of ide disks on onboard ide, a few more on addon ide cards and a bunch of sata disks on LSI 1068. All disks were in mirrored pairs (same size, different manufacturer) and joined together in one volume group. Out of that I carved various logical volumes for various tasks. One such volume was used for torrent downloads.

And how was I satisfied with it? It was barely useful. Every time software mirrors were doing a resync, latencies grow beyond anything I would consider useable. Every time a torrent was active, the whole system lagged. IO to one volume killed IO on others. I tried many things, like decreasing raid resync speed (helped only a little and slowed down resync time to a week or so), changing schedulers (no observable change), readahead settings (marginally better on read-mostly nfs mounts). At the end I concluded that it must be the interactions of all the storage layers that create artificial relations among diffrerent disks that are standing in my way and greatly decreasing the overall system performance.

Because I'm moving soon to a different house, I decided on setting up a less powerful server here and taking this "nice" HP box with me, configured differently, of course. I dug up an old slot1 Pentium 3 pc, put desktop gigabit nic and 3ware 6800 in it and stuffed it with disks. There's only one mirror now and 3ware takes care of it, all other disks are jbod, standalone, no lvm. Each disk has its own scheduler and readahead settings that depend on how it is used. Expirience: same throughput as HP "server", much better latencies, much better paralelism. Suprisingly this 12 years old pc is capable of sustained 50MB/s over ethernet to different disks, a number that I've only seen in bursts on HP "server".  Oh, and it also draws about half as much power as Pentium 4. And it serves you this page you're reading now.

 So much about "new and improved" technology. Complex setups are rarely better than simple setups. Cheap, fast, reliable: pick any two, you can't have all three. What more can I say? ;)

Software "design"

2008-17-08 at 01:17 am

It all started with me having too much free time at my workplace.

So I started researching about homebrew autopilots for model planes, that led me to software that runs avionics onboard the real planes, which ultimately led me to programming practices used to write such software. These two blog posts say most of the things I wanted to write here.

At the end, one thing is clear: designing and writing software must stop being something of artistic value and creative in a sense of expression (which often leads to "broken by design" state) and has to become something that is formally provable to be correct (see also Correctness by construction (1,2)). A good first candidate for this are mission critical systems on which lives depend on - such as airplane avionics. It's good to see that tools for this exist and are freely available. In the light of recent DNS vulnerability, which is there by design, I wonder what would happen if DNS was designed and implemented with such methods.

It was also pleasantly suprised for me to find out that I use some of the methods described in my own work. That's probably why I have free time in the first place ;)

How I see modern Air Traffic Control

2008-17-02 at 01:24 am

A few years ago I started expanding and overlapping my hobbies. Obvious overlap between being a system administrator (and integrator and architect) and flying is Air traffic control or ATC. I joined the virtual flying community and observed the procedures there and visited Slovenia Control and Ljubljana airport with them. I'm also reading AeroSafety world online magazine and am following aviation-safety.net and flightglobal rss feeds. If it is worth anything, I've also seen all AirCrash Investigation tv documentaries and IVTV dvds. I also asked our ATC for their SOP documents but was understandably denied the request as they're internal documents.

The most pressing problem of ATC today is how to increase the capacity of airways to safely carry more planes (=passengers) from point A to point B. Based on the little information I have available, I put together a sysadmin's view on how I think it should be done.

(more)

My thoughts on antispam

2007-02-09 at 01:49 am

Everyone knows spam today. There are estimations that it makes more than 90% of all email traffic. Judging by my personal spam collection, spam is on constant increase.

I have always filtered spam out of my inbox by hand. I played with almost all antispam technologies out there today and haven't found any to be perfect. They're either too hardware resource intensive (long list of daily updated regex rules) or too users' time intensive (manual training of statistical filter). Two tecnologies that made me think some more were greylisting and one-time email addresses.

The only way to stop spam is to increase the cost to spam, which basically means to decrease its efficiency. I came up with a concept that does this extremely well, while offering end user perfectly transparent and easy-to-get-rid-of-spam mail expirience.

(more)

Job satisfaction

2007-11-06 at 2:16 pm

Not long ago I had an interesting discussion with my coworker about our jobs. As a sysadmin I often get requests from people that know me to help them configure their server, set them up a web page, mail account, etc., for a small fee. Or from some small company that does not have the expetrise to maintain their own server. But I gladly pass all such requests to the coworker, because I can's stand the feeling that any of those people would have the right to bother me at $random time of day because of some irrelevant problem, like spam getting through or somesuch. He offers them his price and then they deceide if they want him to fix their systems or not. 

On the other hand, if someone ask me to teach him how to do all the system administration and is willing to learn, I'm more than glad to help. I don't even ask for a price, I just say if he feels like paying me, he can pay whatever he feels is the right price for the knowledge I gave him. But just the satisfaction of teaching someone is more than enough for me. I guess I got this after my father :)

Sad state of linux filesystems

2006-08-12 at 01:33 am

Faced with a need for a reliable, well performing file system, a linux admin today is in a bit of a trouble. Ext2 is still in the kernel just for academic purposes, ext3 is "good enough" for majority but uselessly slow for what I want from it, reiser3 is in "don't tuch if it's not b0rken" mode but suffers from BKL use which limit its scalability, xfs and jfs are nowhere near a reliable state and reiser4 is "almost there" but due to Hans arrest, one wonders if it will ever be finished.

The more I think, the more I see there are only two usefull options for me: use Veritas VXFS, which recently became free for "smaller" installs or dump linux altogether and go with ZFS in Solaris 10. Or maybe Nexenta.

One wonders if so much choice really does make sense ... because these days it looks like FS knowhow is too much spread out and not focused on making just one, but really good FS ... As much as I despise RedHat for sticking with just ext3 in their RHEL line, at the end it would seem that was the right longterm choice.

Now if only chunkfs would become a reallity sooner ...

Rants ...

2006-11-09 at 11:42 am

Recently my iBook was stolen (straight from my office, no less) and I took a look around if there's any notebook that I could replace it with. I'm looking for a light and small enough notebook to carry around that can run on its batteries for the longest possible time. I think I'd also like to have the ugliest notebook around (so no one would be tempted to steal it) and if it is in persoalised and unique in some obvious way, even better. Panasonic toughbook comes pretty close to that, but it is waaay too expensive.
But what I wanted to tell you is this. I checked web sites of all major notebook manufacturers and were shocked at how bad they were. I know what I want to buy, so I want the web site to let me input my requirements and then show me the products that match my requiremets best. None were even close to that. Shame.
Then, because I know how to translate my requirements to specific hardware configurations. I expected those web sites to select their models by the hardware. Again, no luck. All they do is offer you a list of their models (name + number) and then you have to dig the small print to get the hw specs. But I don't care about names and model numbers! I want a notebook with such and such configuration and I want a choice based on that.

Thinking over this again, I see a large business opportunity here :)

And there's more ...

(more)

Job offer ...

2006-07-06 at 12:38 am

What do you do when you get an invitation to a selection process for a job from the most desired company to work for? I was very suprised when I got an email from them and also a bit confused as how they found me and why they invited me. It was clear to me from the start that my chances were slim, but I deceided to try it anyway. Just to see how it feels to be questioned by the people who really know things they do. Instead of a long description of what all I went through, this episode of anime Windy tales, apropriately titled "Audition Chronicles", show it best. And yes, my story ended in much the same way as in this anime.

Can't sleep ...

2006-24-02 at 03:44 am

In nights like this, when my biorythm is synced to some place on the other side of our planet and my mind is wandering around, I usually take a nice long night walk in an attempt to follow it.

(more)

Eye candy vs. usefulness ...

2005-17-11 at 11:57 pm

Yesterday we had an interesting talk at our weekly meetings of a local web community, regarding flash vs. standard html pages. I won't comment the talk here, but rather mention some thoughts that started floating in my head as a result of the talk.

(more)

Sunny week

2005-17-11 at 10:33 pm

This week has been full of Sun all over the news. They officialy released their Niagara cpu as an Ultrasparc T1, released first Solaris builds with their new ZFS filesystem and of top of everything, offered their ide/compiler/tools Studio 11 for free. Looks like Sun is rising back from the ashes ...

(more)

Crows

2005-19-10 at 12:08 pm

Something very interesting has just happened. My workplace is located 5min walking distance from the train station and I usualy cross all the rails on my way. There's also a lot of electrical cables laid above the rails and my path leads me near the point where they all come to the ground.
Today, I found a dead crow there. A beautiful grey and black bird, probably young. I guess it got electrocuted on the wires there. I deceided to have a closer look at it and while I was turning the bird around, the whole flock gathered out of nowhere and started circling above me and yelling at me. They must have been very aware of the loss of one of their members, as they are very social animals. And smart too. Two of them even followed me all the way to the building while constantly yelling at me.

(more)

My "server room" (or junkyard, scrap heap, ... :)

2005-06-09 at 12:16 am

It looks like this:

Junk


Click on comments to see what all this is ...

(more)

Dreams

2005-03-06 at 12:49 pm

Dreams are good. And very usefull. Today when I didn't feel like getting up, I stayed in the floating state of not sleeping anymore and not being completely awake either and it suddenly realized I have a critical bug in one of my critical scripts. I've seen it floating in front of me, a screenshot of a vi terminal with cursor blinking at the line where the bug was ... amazing. I need more dreams like that :)

Cyrus HA

2005-10-03 at 03:21 am

There's more and more discussion about Cyrus high availability setups on the cyrus mailing list. Recent thread started with a familiar question of how to achieve best possible availability and drifted to the discussion about implementing an active-active application level redundancy into the cyrus itself. Since i love to play with disks, raids, volumes & co, I posted my raid-based sysadmin view on how would i try to implement it

  • 1