Jure's corner » Jure's corner

How airplanes fly

2018-11-19T11:31:00-00:00

By now I'm sure you've all seen that elementary school show: airfoil, airstream, Bernouli, pressure difference, lift. Boom, magic gone. However, there's much more to it than that.

Airfoil in airstream picture is a sideview of an airplane's wing. Have you ever asked yourself how does that lift look if you look at an airplane from the front or the rear? How is the lift distributed over the wing spanwise? How it should be distributed?

One very smart german engineer figured it all out back about 100 years ago. His name was Ludwig Prandtl and his work forms the basis of every aeronautical engineer's knowledge today. Prandtl's result was that elliptical lift distribution is the best for a given wing span and this was taken as a fact.

Until now.

During my childhood years flying my first model airplanes I was explained all about elliptic lift distribution, even before I fully understood what an ellipse is. Even then I remember being uneasy about it, but mostly due to "these people are way smarter than me, they know what they're doing". And I tried to live with that attitude.

Then some things happened that bought all those toughts and doubts back:

This is my model at a competition back in 2016 in Hungary, after a successful 10min flight. It got hit by another airplane about 3 minutes into the flight at about 250m in a crowded thermal. Another plane hit it on the aft fuselage, which pushed it into a flat spin at such a force that one of the wing tips simply flew out. (It was later recovered without damage.) Model recovered after four or five spins and after initial shock I discovered it is suprisingly controllable and flies more or less as before. It turned out that fuselage was badly cracked and whole tail was flapping around about 20 degrees in all directions and it was that that gave me most of the control issues. Roll control was fine.

It was one of those things that you would never try on purpose ... flying without a third of your wing. Just by looking at it, it can't fly like this, right? It shoul spin out of control, having much more lift on one side than on the other ... obvious, right?

Well, my successfull flight says otherwise.

And then the image of elliptic lift distribuion comes back again ... and one starts to wonder whether it is really there ... and if is that a good thing or not.

DP flops on the cheap

2016-12-18T23:16:00-00:00

I'm not the only one with piles of leftover hardware from cryptocoin mining days. Usually this is some AM3+ board with a couple of x16 PCIe slots, the cheapest cpu and a bunch of AMD (ex ATI) gpus. It is hard to give new and interesting life to such hardware, but with sufficiently motivating hobbies, there's a way.

Recently I came across an interesting project called DroneCFD. It's a bunch of python that does everything that needs to be done in order to put a 3D geometry of something that pretends to fly into a virtual wind tunnel and create some nice pictures with OpenFOAM and Paraview. Which is something that I wanted to do for some time now, since my hobby is fast developing into a proper sport (we already have world cup and world championship is within 6 years) and we'll want to have the best possible models.

So we need OpenFOAM. What's that? It's something called CFD, computational fluid dynamics. Actually it's a library of many different solvers for various fluid simulations, from laminar, turbulent, supersonic, recently it started to spread into multiphysics as well (heat, EM). It's completely opensource and popular in academic circles. There are many commercial tools out there that cover the same problem areas, but their cost is completely out of scope for hobbyists. If you want to explore CFD further, there are a couple of courses on youtube that can give you an insight into the math involved.

The gist of the problem is this: in CFD you describe your problem with complex sets of differential equations that you need to solve for each time step. When this is translated into numerical algorithms, you end up with large sparse matrices that you shuffle in and out of main memory. Since they're usually too big to fit into cpu caches, the memory bandwidth becomes your limiting factor. So if you want to run your simulations fast, you're looking for a system with highest possible memory bandwidth. And here is where our gpus come into play.

I've set up a system from my leftovers, got a second hand FX8320 cpu for it and cashed out for 32GB of cl9 1866MHz memory. Then I set up the whole optimized environment with the help of excellent EasyBuild framework. It consists of latest GNU GCC 5.2.0 compiler, AMD ACML 5.3.1 math libraries, OpenMPI 1.6.5 and OpenFOAM 2.3.1. With this setup I ran the simulation with example geometry, included with DroneCFD and it needed ExecutionTime = 36820.82s, ClockTime = 38151s to perform 3000 time steps of simulation.

Then I noticed that AMD receltny released clMath libraries on github. They implement some commonly used math routines in OpenCL 1.2 and 2.0, which means that you can run them on cpu, gpu or any other device that implements OpenCL. One nice thing about these libraries is that at least clBLAS includes a handy client and a bit of python scripts that enable one to do some benchmarking of the hardware. And that's exactly what I did.

First, I ran some tests to demonstrate cpu vs gpu difference. I used 7970 gpu here with OpenCL 1.2 version of the clMath libs. This is what I got:

X axis presents matrix size, Y is Gflops measured by the library. Here I performed single and double precision general matrix-matrix multiplication on cpu and on gpu. Y scale should almost be logarithmic to see cpu performance in more detail ;) There are a lot of interesting things worth more detailed discussions on this graph, but it serves a purpose I wanted to demonstrate - gpus are many many times faster than cpus for this kind of work. No wonder scientific communities are jumping on them like crazy.

Since 7970 is not the only kind of gpu I have lying around, I replaced it with R9-290, rebuilt clMath libs with OpenCL 2.0 and rerun the tests:

Couple of things to note here. R9 290 is based on a new, different architecture (called Hawaii) than 7970 (Tahiti). While the older architecture has about half the performance at double precision compared to single precision (which makes sense, as dp numbers take twice as much space in memory as sp numbers), newer architecture fails to reach the dp performance of the older one for most of the explored range. If single precision is good enough for your problem, then newer equals better. But with most of the engineering problems demanding double precision math, it turns out that previous generation of gpus offers more.

There's one limiting factor with these gaming gpus: they have relatively small amout of memory. While r9 290 has 4GB, 7970 has only 3GB and these are both small if you want to run some decent numeric simulation. There are two ways to grow beyond that: first is to cash out for "professional" gpu products with up to 32GB or memory and then, if even that is not enough, distribute your simulation across many gpus and many systems with MPI. But that is beyond our hobby again.

There are two things I want to do for next step: first I want to run OpenFOAM linked with these clMath libraries and measure any improvement. I assume that copying data to and from gpu for each time step will kill any performance bennefits that gpu can offer. But I want to have something to compare to, as I discovered a company that ported exactly the solvers I'm interested in to run fully on gpu, only doing the copy at the beginning and at the end of simulation. Also they offered affordable prices for their work for us hobbyists so stay tuned for part 2 :)

Spirals

2014-12-22T23:31:00-00:00

With my curiosity and passion for innovation I often discover patterns around me that appear as pendulum motion, from one extreme to the other, back and forth. Plotted on a time line, they may appear as a sine curve, but looking from evolutionary point of view, a spiral is a much better representations. Sometimes those patterns even gravitate towards a common endpoint, such as a center of a spiral.

One such pattern is now in motion and starting to gravitate towards a center.

As you probably know, I'm beginning my fifth year of swimming in HPC waters. Sometimes it feels more like drowning, since some aspects of HPC are positively archaic. Mode of operation is one of them. HPC is usually one big static resource in front of which people with stacks of punch cards are queuing up and waiting for their turn. Despite the fact that the cluster is composed of many individual machines, people are taught to look at it and use it as one single machine.

Yet some other aspects are bleeding edge developments and those are really exciting. These days it appears that even HPC people are becoming aware of their archaic points and are actively looking for other developments in the neighboring areas, such as clouds. Enter the "HPC cloud" arena.

There have already been some well publicized examples where HPC style problems were successfully solved on what is today known as "cloud" in a satisfactory manner. Unfortunately they're very few. Cloud it seems was designed to fit a different role and majority of HPC jobs do not fit well to the cloud model.

Lets take a look at how cloud evolved to the current level. Back in the day when web was young, scaling meant buying the largest machine you could buy. When even the largest machines became too small for some of the top sites of the dotcom bubble era, people realized that just throwing money at the hardware would not solve all their problems. So smart people started to think radically different and implemented software solutions for horizontal scaling. Which brought the need to have some number of equal machines configured to play a specific role, with their number varying based on the requirements of the moment, such as request rate. Developers were told to deploy each component to a specific server.

With commodity virtualization solutions this became relatively easily to achieve. Today you deploy your web app stack on a cloud, specify some min and max number of instances, configure some elastic load balancer and off you go.

While this is suitable to large majority of web presence, some of the top players figured out that this is still not good enough for them. They were forced to tear their web app stack apart and rethink every piece of it. What they came up with is something that resembles a traditional HPC to a suprising level.

What motivated me to write all this is that I recently discovered something called Mesos. They claim their product is a "datacenter operating system", but I'd wait a bit before putting out such bold claims. From my HPC perspective it is just a resource management and queuing system done right.

As a HPC operator one of my largest complaints was that traditional HPC assumes a lot: all compute nodes are supposed to offer the same software environment, all jobs expect to run on bare metal without any hypervisors etc, the user interface to queuing and resource management is relatively rigid and inflexible, tailored to manual interaction via command line. Jobs have a hard limit of time they're allowed to run and queue doesn't care if they finish successfully or not. Because it's already enough work to keep one cluster and all of its components up and running, there were almost none experiments on how to adapt it to more general use. So it happened that utilization of majority of smaller clusters was way bellow commercially acceptable levels, which translated to no commercial interest of creating and offering HPC as a service. There were some attempts, but as far as I know, they are all limping along with support from state funds in one form or another. So what these HPC operators are thinking about very hard is how to enable their hardware to run existing cloud workloads, with added benefits they can provide, such as fast storage and interconnects.

On the other side large scale web app providers are tearing apart their app stacks, splitting them into frontend tasks with realtime requirements and backend tasks with batch oriented data processing. These batch tasks are so large that efficient use of infrastructure makes large financial gains on the operating costs and therefore a better position on the market. Which is motivation enough that these people are doing it.

Mesos is one such example of what they have come up with. The whole thing looks suprisingly like a modern HPC scheduler. Mesos itself provides just resource offers and then it is up to the user to implement a resource manager on top of it. Some of them already exist and cover most common use cases, such as Chronos (think of it as global cron) and Marathon (for longer living jobs). Twitter people implemented their own, called Aurora, that covers their own needs. A "job" in Mesos can be anything from a single unix command to a kvm instance. They ported MPICH and Torque to run on top of Mesos, so you can set it up in a way that is very familiar to a user of a traditional HPC. All these jobs execute in some form of container, with docker being the most popular option these days. This enables one to prepare suitable environments for each job up front and stash them away neatly in a docker repository.

What does software like Mesos makes possible for us to do? A single hardware infrastructure, capable of running both HPC style MPI jobs, cloud VMs, big data analytics, web app stacks and everything else that might appear in the future. Infrastructure that can again be addressed as one system. Something that has been until now more of a dream than a reality. I might be too excited about this right now, but I see this as an enabling technology for the era of utility computing.

And this is where the spirals join at the center.

Now off to find a sponsor ...

HPC world

2014-08-27T11:25:00-00:00

Swimming in the HPC waters for the past two years, I got some sense of how wet and cold the water is.

First, there's the problem. You have a chance to build a real model or even the real thing and test its behaviour in the real world. Or you have a chance to run a digital model in a numerical simulation, running on some decent hardware. Which is better? No straight answer. Numerical simulations tend to be cheaper on a big scale, but they are just simulations and you would do well to test how they align with reality every now and then.

Then there's the code that runs the simulation. Usually written by people who understand the nature of the problem and know how to describe it in mathematical terms, but they're not expert programmers. They're happy when they got the numbers that look reasonable. They are far away from understanding, much less knowing what goes on inside the machine and how to improve their code to better utilize the machine.

Then there is hardware. CPUs designed by people who don't have enough connection with programmers from the real world, bought by people who make decisions based on easy understandable numbers fed to them by marketing guys. Networking, again sold by marketers with benchmarks to show their particular strengths, which has little connection to the application behaviour in the real world. Whole machines, each with their own details of implementation, requiring different tricks to utilize it best.

In June I spent few days at the international supercomputing conference in Hamburg. It's a very interesting show, but heavily split to two different groups. One sells exciting hardware and talks to people eager to test it, the other has to produce results with the hardware they have and are figuring out how exactly to at least partially utilize that hardware. I'm talking about programmers.

You see, developing faster and faster hardware (while not easy by any means) is likely the easiest part of the whole story. But to fully utilize the hardware, the code needs adaptations. Who should take care of that? Both hw and sw guys point to compiler guys. But the compiler guys whine that they don't get enough info from hw guys about what their hw can do and from sw guys about what are their code intentions in a particular code block. Also it is not always possible for the compiler to to the best thing, so programmer has to do some things in assembler tuned for the particular cpu. But that is unrealistic, so all cpu vendors offer optimized libraries with math routines already optimized for their cpus. They also offer their own optimized compilers, which sometime manage to demonstrate their value, but what they mostly do is complicate life because the flags used to turn on all the optimizations are nonstandard and one has to fish them out of the manual.

So what is a standard? One way to do things. But the thing is that in HPC world everyone does things in their own way, for mostly the same reasons. One reason I already identified is fear: in the fast moving sw world of HPC, one group of people wants to have control over their own environment, so what they do is they write their own environment. Is this the way to go forward? No, because this brings additional fragmenation to an already fragmented ecosystem. What is needed is more communication, more discussion, more working together. 80% of problems people are solving are common to all of us, so it is ridicoulus that eveyone is implementing their own solutions for common problems.

And I haven't even touched topics like accelerators and gigabytes of 30+ years old fortran code out there.

Redundancy is not everything

2010-12-24T02:39:00-00:00

Years ago when I was still maintaining highly available server clusters and thinking how to improve them, I learned quickly that redundancy of the servers by itself only brings you complications. The key to a meaningful redundant server setup are the sensory methods that monitor health of each server and the logic that acts upon those health states. One of the lessons I learned was that when you monitor some parameter via different methods and you get different outputs, it's usually the method that's at fault, be it either a timing issue or some simple text parsing (everyone loves to play with float numbers in bash) error.

Now I just read an excellent report from the Dutch Safety Board about a crash of Turkish 737-800 near Amsterdam Schiphol Airport last year. I was particularly interested in this accident because I know aircrafts usually carry two radio altimeters and I wondered what chain of events triggered a wrong reading from a single one that lead the plane to crash. Let me present my own view of this report and some thoughts that I got about the state of aviation software in general.

Lets begin with details:

Report states that readings from the radio altimeter such were present on the accident airplane were recreated in the lab by direct coupling of the transmitter and receiver antenna. If this was indeed the reason for the wrong reading, then this means there is a serious oversight in the design of placement of these antennas or their grounding. FAIL
Radio altimeter is calibrated in a way to display height of 0 when aircraft touches the ground on landing and to display height between -2 and -6 feet when the aircraft is parked at the gate. Therefore, reading of -8 should be treated as erroneous. But we read that readings from -20 to 2500 feet are acceptable as correct. This seems to me as a too generous boundary buffer applied at a wrong place, since it caused a known defect scenario to fall within the "correct" measurement range. FAIL
Report states that radio altimeter computer has three operating modes: normal, where measurements are accepted, "fail warn", when altimeter detects internal error, stops providing measurements and warns about its condition and a "non computed data", when measurements are out of design specs and errors are silently ignored. So we have a condition where it is possible for altimeter to pass on data that is bogus but falls in the "correct" range and there is nothing above it that would verify its readings. FAIL
Since radio altimeter measures height above ground (QFE in aviation terms), any logic that acts upon its input should figure out that any negative values in combination with air speed above stall speed are nonsense and disregard them. But apparently auto throttle blindly trusts everything a single radio altimeter is serving it, thus creating a single point of failure. FAIL
Even though there are two radio altimeters on board an aircraft, there does not seem to be any checking of the values they provide to the system on the system level. It seems this is left to the end user of the system, the pilot. Report states that captain's display showed -8 and that copilots display showed proper height, but it seems like this has gone unnoticed. Why is this not done at lower levels is beyond my imagination. FAIL
In digital environments such as computers, everything (and I mean everything) happens because of a reason. Report states that aircraft provided numerous unusual and unexpected warnings (gear warning while still high, speed brakes warning, captain's roll and pitch bars disappearing), but none were perceived by the crew as a problem. My interpretation of this is based on average user observation; users treat their PCs as persons and "you know, each person can have a bad day. It will behave better next time." Well, wrong. It's a machine, working as instructed by its program. If it does unexpected things, either program is wrong, input to the program is wrong or something is really wrong with the situation you're in. If the machine cannot communicate this fact to its user, this means we have a design problem with the user interface. Also, there's a remark in the report about pilots considering radio altimeters to be one of the least reliable instruments that frequently fail. Therefore, they're used to ignoring anything that has radio altimeter at the source. It turned out that their understanding of the possible consequences is lacking. FAIL
Report states that after this accident the software company behind the autopilot started installing improved versions, where autopilot actually compares radio altimeter data with the pressure altimeter. Too late, but let's give them thumbs up for this.

I could expand on each point here, but let me just warn about two.

There's a saying in the IT world about "garbage in, garbage out". If you feed useless data to the system, you'll get useless results on output. I am noticing an increasingly common and worrisome trend of this in aviation industry. Each sensor that provide input data has more than two states (working, not working) and this fact is often overlooked. In the case of turkish accident, we have a radio altimeter feeding incorrect but perfectly valid measurement to the rest of the avionics. In the case of this 777 same happened with accelerometers; one of them was not totally dead but not totally alive either. It fed its garbage to autopilot which caused the plane to start jumping around mid flight. Similar thing happened on this A330 with inertial reference system: garbage in, garbage out, people were hitting the cabin ceiling. Same thing seems to have happened at the AF447 accident, but this time on the airspeed indicator, which apparently created a confusion in the cockpit that seems to have lead to an inflight brakeup. What's the solution for this? Simple, gather data for the same parameter from different sources. Airbus even submitted a patent for such system for air speed. None of the instruments should be considered as absolute authority. And this is the other thing I want to warn about - redundancy exists to be used. Avionics software should compare readings from all available sources for a single parameter, not treat "left" and "right" systems as two separate entities. Two 757 have crashed (both available for viewing in NG Air Crash Investigations series) because of clogged pitostatic system - none needed to crash. There was both GPS and inertial systems available that could be used to provide information about the basic aircraft parameters, but the system wasn't designed with this in mind.

Learning about the software design and engineering principles behind aviation software, I have a feeling that we'll see more dead bodies caused by this two issues. I'd be happy if someone can prove me wrong.

How you do things?

2012-04-18T14:23:00-00:00

This blog post often comes very handy in various discussions. In today's small busnisess with startup mentality, the question of how to do things is often ignored and all focus is put on getting things done. Since my job is to make things repeatable, with predictable result, I must focus mostly on how, which creates all kinds of interesting situations.

And how do you see yourself?

Learning from mistakes

2017-03-10T09:06:00-00:00

I was baking this post in the back of my head for about half a year. Two events finally forced to convert it into writing - google admitting a breakin and my discovery of a particular FAA site.

I remember myself as a kid, playing with legos and wanting to build thing with them that legos physically werent capable of. Even back then I wanted to do more with my toys than what they were meant to handle. I carried this over to computers, which soon lead me to linux & BSD world of free unices, where one has all the chance to use the computer in a way he sees fit instead of bowing your head and using it in a way someone else (like Microsoft and more recently Google) intended. This path surely involves lots of experimentations, some failures and some successes, lots of thinking, analyzing, more or less correct presumptions, causes and effects, etc. In general, I developed my own sistematic approach to a problem. Each new solution always consists of past expiriences and here is where my gripe lies.

Expiriences in every area of human knowledge tend to be collected in a form of a knowledge that is collected through a process of learning. This knowledge is then passed on through the process of teaching. This should also include IT scene and it does, but from my experience teachings of IT knowledge include only very rudimentary and basic stuff. All common real world experiences and lessons are missing. So when time comes and a team has to design something reliable in for them new and unexplored area, the result ends with a not-good-enoug products in the best case or a disaster in the worst case.

Why is it so?

Usually such team has no access to previous art in the same area, even if it actually exists. They cannot learn from mistakes others already made and are bound to repeat them. This leads to duplicate wasted resources and combined with "time to market" pressure, inferior products that only cause pain and suffering, are usually optimized for only one or two use cases and are compatible only with some subset of poorly implemented standards. Want some examples? Just look around you and see if there's any gizmo there that does everything it is advertised to do and does it well.

Does it have to be that way?

No. It can be much, much better.

One example of an industry that got it right is aviation industry. Every incident that happens is logged, examined, thoroughly analyzed, conclusions are drawn and recommendations are made what to do in order to avoid similar events in the future. The whole process is somewhat public, but what's most important is that end result is nicely cataloged and available for everyone to learn from. Nobody wants new airplanes to carry the same design mistakes as the older ones, right?

IT industry, on the other side, sweeps almost every negative experience under the rug. Writings about breakins, product development that went wrong, products that failed to deliver are rare. Analyses of what went wrong and how it was taken care of even more so. Most of the stories are shared unofficially, over the beer, among people that know each other and so almost never reach the wide audience they really should be intended for.

It would be nice if we could get a nonprofit organization that would stand behind the formal process of collecting, organizing and presenting such knowledge from the IT scene. Afterall, world should be about collaboration, not competition.

Server vs. server

2010-05-07T14:45:00-00:00

Some people like strong cars, I like powerful computers. Since I got on the internet in 1994 or so, this reads powerful servers. My last workstation oriented powerful computer was Pentium 200 in 1996 and it soon landed in the role of a server, with 4x 25GB drives in 1999. Btw, it still works perfectly ;)

Since home server is mostly limited by budget, one looks at price/performance factor. And since I have all the storage management (raid/lvm) expirience, I spent some time putting together a data-reliable machine, that serves the usual music/movies entertainment stuf, runs a samba share for few windoze boxen to share their data and a few NFS and AoE exports for my desktop. Since my desktop is diskless (actualy has no moving parts), it is very dependant on latency and io throughput the server offers it.

For a few years now this role was handled by a HP tc2120 "server" I got cheaply second-hand. Pentium 4, 64bit pci bus, looks like a decent IO oriented pc. I hang a pair of ide disks on onboard ide, a few more on addon ide cards and a bunch of sata disks on LSI 1068. All disks were in mirrored pairs (same size, different manufacturer) and joined together in one volume group. Out of that I carved various logical volumes for various tasks. One such volume was used for torrent downloads.

And how was I satisfied with it? It was barely useful. Every time software mirrors were doing a resync, latencies grow beyond anything I would consider useable. Every time a torrent was active, the whole system lagged. IO to one volume killed IO on others. I tried many things, like decreasing raid resync speed (helped only a little and slowed down resync time to a week or so), changing schedulers (no observable change), readahead settings (marginally better on read-mostly nfs mounts). At the end I concluded that it must be the interactions of all the storage layers that create artificial relations among diffrerent disks that are standing in my way and greatly decreasing the overall system performance.

Because I'm moving soon to a different house, I decided on setting up a less powerful server here and taking this "nice" HP box with me, configured differently, of course. I dug up an old slot1 Pentium 3 pc, put desktop gigabit nic and 3ware 6800 in it and stuffed it with disks. There's only one mirror now and 3ware takes care of it, all other disks are jbod, standalone, no lvm. Each disk has its own scheduler and readahead settings that depend on how it is used. Expirience: same throughput as HP "server", much better latencies, much better paralelism. Suprisingly this 12 years old pc is capable of sustained 50MB/s over ethernet to different disks, a number that I've only seen in bursts on HP "server". Oh, and it also draws about half as much power as Pentium 4. And it serves you this page you're reading now.

So much about "new and improved" technology. Complex setups are rarely better than simple setups. Cheap, fast, reliable: pick any two, you can't have all three. What more can I say? ;)

Software "design"

2012-08-22T10:28:00-00:00

It all started with me having too much free time at my workplace.

So I started researching about homebrew autopilots for model planes, that led me to software that runs avionics onboard the real planes, which ultimately led me to programming practices used to write such software. These two blog posts say most of the things I wanted to write here.

At the end, one thing is clear: designing and writing software must stop being something of artistic value and creative in a sense of expression (which often leads to "broken by design" state) and has to become something that is formally provable to be correct (see also Correctness by construction (1,2)). A good first candidate for this are mission critical systems on which lives depend on - such as airplane avionics. It's good to see that tools for this exist and are freely available. In the light of recent DNS vulnerability, which is there by design, I wonder what would happen if DNS was designed and implemented with such methods.

It was also pleasantly suprised for me to find out that I use some of the methods described in my own work. That's probably why I have free time in the first place ;)

How I see modern Air Traffic Control

2010-01-17T01:41:00-00:00

A few years ago I started expanding and overlapping my hobbies. Obvious overlap between being a system administrator (and integrator and architect) and flying is Air traffic control or ATC. I joined the virtual flying community and observed the procedures there and visited Slovenia Control and Ljubljana airport with them. I'm also reading AeroSafety world online magazine and am following aviation-safety.net and flightglobal rss feeds. If it is worth anything, I've also seen all AirCrash Investigation tv documentaries and IVTV dvds. I also asked our ATC for their SOP documents but was understandably denied the request as they're internal documents.

The most pressing problem of ATC today is how to increase the capacity of airways to safely carry more planes (=passengers) from point A to point B. Based on the little information I have available, I put together a sysadmin's view on how I think it should be done.

While one obvious answer to that is to build larger planes like A380, this cannot satisfy all needs as not everyone will be flying around in those large planes. We need something else.

What is the main job of ATC today? ATC business is to provide separation between planes to allow safe flying. Of course this separation is just the opposite of squeezing as many planes as possible into the same air way between two points, so the workload and the stress of ATC folks increases. So the question is how to increase the density of airplanes in the sky while decrease the load of the controllers?

Spacing between the planes is largely determined by the delay by which the controller gets the status updates on his screen and the speed he is able to communicate his instructions to the pilot in relation to the speed airplanes fly at, plus all the necessary safety margins. Radars spin only so fast, atc radar picture is composed of information from many radars and it needs a few consequtive updates to calculate airplane vectors ... then processing also takes some time ... At the end it looked painfully slow and sufficiently outdated to me that I thought "this could and should be done better".

I'd say give the freedom to the planes. Don't tie them strictly to corridors from waypoint x to waypoint y. Let them fly as straight as possible from their origin to their destination.

Oh my! The chaos!

Yes. There can be a lot of order in chaos :)

What I would like to see is better integration of TCAS and ADS-B with FMC. Both already "hear" properly equipped airplanes from over 100 miles away, why is then FMC not told about these planes? Next step, when FMC is aware of them, planes should be able to automatically negotiate each their own optimal flight path to prevent not only colisions, but close encounters as well.

When I read about TCAS on wikipedia, I was very suprised to see that TCAS by itself is not aware of the current performance envelope of the airplane nor the terrain over which the plane flies. Thus it is theoretically possible for TCAS to issue a RA of climb when the plane is not capable of climb or to issue a RA of descend when the ground is dangerously close. This clearly shows that there is no proper interaction with TCAS and other airplane systems. In sysadmin speak we say that the unit is not efficiently integrated into the system. I find this interesting, as there is so much time and energy spent in crew coordination procedures, how the PF and PNF share their workload and cross check each other ... and then the instruments are not treated in the same way and are allowed to work one over the other. This is something that clearly needs to be taken care of, the sooner the better.

The other interesting thing I read about TCAS on wikipedia is that it is basically limited to about 1MBit/s of bandwith to communicate with other planes and that this limit is already a problem around some of the bussiest airports in the world. Whoever designed it that way, I can only hope they left enough flexibility in the packet format to upgrade it painlessly to higer bandwith.

ADS-B is the next interesting piece of equipment that is being introduced into planes. It extends mode-s info with "state vector" so that other airplanes "hear" not only where a certain ADS-B equipped plane is, but also where it is pointed at and how fast it is moving in that direction. This improves situational awarness a lot, but in order to decrease ATC load as I see it, it should also broadcast the plane's intentions of where it wants to be in lets say the next 5, 10 and 30 minutes. That way other airplanes would be able to identify them as potential threat. When two airplanes would find out that they are going to cross, they would enter negotiations on how to avoid eachother while taking into account everybody else, aircraft performance (how they're able to perform the avoid maneuvre), weather and ground situation (in which direction they can avoid). All that fully automatically and long before the TCAS would even issue a traffic warning.

When all this becomes mandatory and the density of the metal (and composites;) flying over our heads increase, there will come a situation where two planes will figure out that there is no way to perform avoidance without crossing someone else flight path. That will create a whole storm of negotiations, which can hit time or bandwith bottlenecks just by the nature of complexity of these algorithms. By then I expect swarm algorithms, which are currently being studied in the fields of micro robotics and crowd modeling, will be understood well enough to be ready for implementation into this system. Basically someone needs to say "I will perform such avoidance maneuvre and all others must adopt to me" in a way that is the most efficient for the whole group.

Testing of such system would be very easy to perform in the virtual skies. Design a funcional model of such "next gen TCAS", put them on each virtual plane and then slowly increase the density of flights over a certain area and observe the results. It should be pretty interesting experiment :)

The final question is always "how to implement this?" I'm sorry to say that, but ATC folks will be sweating for some more years. Even if this is mandated by some regulatory agency (which afaik does not exist because of the democracy and free market b*****it), it is going to take years to adopt to the new ways. But the sooner the whole system starts to reorganize, the better for all.

Comments?

Update: after a chat with a few pilots and folks from ATC they explained that systems for this are in development for at least 15 years now and that the problem is not technical but political in nature. Basically, who's to blame if things go wrong. Bah.

My thoughts on antispam

2010-01-10T01:27:00-00:00

Everyone knows spam today. There are estimations that it makes more than 90% of all email traffic. Judging by my personal spam collection, spam is on constant increase.

I have always filtered spam out of my inbox by hand. I played with almost all antispam technologies out there today and haven't found any to be perfect. They're either too hardware resource intensive (long list of daily updated regex rules) or too users' time intensive (manual training of statistical filter). Two tecnologies that made me think some more were greylisting and one-time email addresses.

The only way to stop spam is to increase the cost to spam, which basically means to decrease its efficiency. I came up with a concept that does this extremely well, while offering end user perfectly transparent and easy-to-get-rid-of-spam mail expirience.

First of all, this concept cannot be implemented everywhere immediately, because MUA has to be able to make modifications to MTA's alias table in a reliable and trustworthy way. The most obvious places to implement would be webmails and I hope the authors of various mail clients will pick it up too. Also, all mail should be treated as spam by default unless it's delivered to a know alias. This means that a maibox protected with this method is useless for giving out on business cards and such :) Effectively it's a "you cannot write me unless I write you first" mailbox.

Here's how it works:

Imagine having an user at example.com with user@example.com mailbox. This user writes a mail to johndoe@company.com. User's MUA notices that user didn't have any previous communication with johndoe, generates a unique string and replaces user's from address with this uniquestring@example.com in user's mail. It also places this uniquestring in MTA alias table, pointing it to user+johndoe_company_com@example.com mailbox. Finally, it creates a (imap or local pop) folder johndoe_company_com in user's mailbox.
When John Doe replies to user's mail, the reply goes to uniquestring@example.com and ends in this folder. Then, when user replies to it, MUA notices that user has already written to johndoe and uses the same uniquestring as from address.

Then when spyware on johndoe's windows machine picks up user's mail address, it picks up uniquestring@example.com. What happens is that all spam starts coming to a folder created for johndoe. User knows two things: that johndoe's computer is infected by spyware (and can tell johndoe about it) and that he can simply get rid of the spam by telling its MUA to forget this alias for johndoe. All the following spam will be dropped either into user's inbox (or even better, trash by default) or bounced by MTA (unknown user), depending on the whole config.

MUA also has to offer an option to generate such uniquestring (and alias table entry and folder) without writing out a mail, when user wants a contact email to put in a web form, for example. Or to start a mail dialogue with a friend who already has such system implemented.

Some thought should be given to how uniquestring should look like. We don't want spammers to sort them out easily from other email. We want them to mix with other legitimate looking mail addresses and clobber spammers' lists, making them slowly less and less useful. Also they should not offer any method of figuring out users root mailbox addres (or posting to that address should be disabled either by MTA or mail storage agent).

I would be very interested in any shortcomings you can spot in this concept. I plan to roll it out (all manual) on one of my mailboxes, just to see how it works in practice.

Job satisfaction

2014-11-24T21:06:00-00:00

Not long ago I had an interesting discussion with my coworker about our jobs. As a sysadmin I often get requests from people that know me to help them configure their server, set them up a web page, mail account, etc., for a small fee. Or from some small company that does not have the expetrise to maintain their own server. But I gladly pass all such requests to the coworker, because I can's stand the feeling that any of those people would have the right to bother me at $random time of day because of some irrelevant problem, like spam getting through or somesuch. He offers them his price and then they deceide if they want him to fix their systems or not.

On the other hand, if someone ask me to teach him how to do all the system administration and is willing to learn, I'm more than glad to help. I don't even ask for a price, I just say if he feels like paying me, he can pay whatever he feels is the right price for the knowledge I gave him. But just the satisfaction of teaching someone is more than enough for me. I guess I got this after my father :)

Sad state of linux filesystems

2010-01-10T01:26:00-00:00

Faced with a need for a reliable, well performing file system, a linux admin today is in a bit of a trouble. Ext2 is still in the kernel just for academic purposes, ext3 is "good enough" for majority but uselessly slow for what I want from it, reiser3 is in "don't tuch if it's not b0rken" mode but suffers from BKL use which limit its scalability, xfs and jfs are nowhere near a reliable state and reiser4 is "almost there" but due to Hans arrest, one wonders if it will ever be finished.

The more I think, the more I see there are only two usefull options for me: use Veritas VXFS, which recently became free for "smaller" installs or dump linux altogether and go with ZFS in Solaris 10. Or maybe Nexenta.

One wonders if so much choice really does make sense ... because these days it looks like FS knowhow is too much spread out and not focused on making just one, but really good FS ... As much as I despise RedHat for sticking with just ext3 in their RHEL line, at the end it would seem that was the right longterm choice.

Now if only chunkfs would become a reallity sooner ...

Rants ...

2010-01-10T01:26:00-00:00

Recently my iBook was stolen (straight from my office, no less) and I took a look around if there's any notebook that I could replace it with. I'm looking for a light and small enough notebook to carry around that can run on its batteries for the longest possible time. I think I'd also like to have the ugliest notebook around (so no one would be tempted to steal it) and if it is in persoalised and unique in some obvious way, even better. Panasonic toughbook comes pretty close to that, but it is waaay too expensive.
But what I wanted to tell you is this. I checked web sites of all major notebook manufacturers and were shocked at how bad they were. I know what I want to buy, so I want the web site to let me input my requirements and then show me the products that match my requiremets best. None were even close to that. Shame.
Then, because I know how to translate my requirements to specific hardware configurations. I expected those web sites to select their models by the hardware. Again, no luck. All they do is offer you a list of their models (name + number) and then you have to dig the small print to get the hw specs. But I don't care about names and model numbers! I want a notebook with such and such configuration and I want a choice based on that.

Thinking over this again, I see a large business opportunity here :)

And there's more ...

Since Ubuntu caused such a buzz lately, I deceided to use it on my new primary home desktop. And so far it is a general dissapointment. Ok, I expected to hack a kernel to get it running on iRam, but I also expected that would be the most complex tweak I'll have to make. iRam is the most exotic piece of hardware in this machine, everything else is pretty common these days. And guess what:
sound doesn't work properly. It also doesn't shut down properly if I have any nfs shares mounted and doesn't even boot properly if /usr is on non-local file system. It also managed to mix up different versions of nvidia-glx and nvidia kernel module. There are numerous other annoyances that I don't remember right now, but make the whole Ubuntu expirience anoying. Especially when
you look at the kind of people Ubuntu is targeted at - I can't expect my mom to figure out such problems, much less to fix them. She would simply conclude that her pc is broken.
And that is what Ubuntu is - broken.
It will remain broken for as long as they'll be putting out such releases targeted for average folks ... Me on the other hand will go on promoting Debian, because it doesn't promise anything, forces you to learn about the system you're using and because you actually need to spend a week or two untill it all works as you want.

Job offer ...

2010-01-09T05:07:00-00:00

What do you do when you get an invitation to a selection process for a job from the most desired company to work for? I was very suprised when I got an email from them and also a bit confused as how they found me and why they invited me. It was clear to me from the start that my chances were slim, but I deceided to try it anyway. Just to see how it feels to be questioned by the people who really know things they do. Instead of a long description of what all I went through, this episode of anime Windy tales, apropriately titled "Audition Chronicles", show it best. And yes, my story ended in much the same way as in this anime.

Can't sleep ...

2010-01-10T01:24:00-00:00

In nights like this, when my biorythm is synced to some place on the other side of our planet and my mind is wandering around, I usually take a nice long night walk in an attempt to follow it.

I go say hello to the crows, continue through the sleeping town to the river, follow it upstream, past the little hill with a church, turn west trough the fields and follow the rails back home. Or I go to the fields past the pharmaceutical factory, across the road to the little patch of trees and back. Or I drive to the nearby airfield and walk by its only runway. Either way it's at least one hour of walking.

So I walk in the muddy fields, trying to follow my mind ... which jumps around like a wild animal ... one moment it's all high and jumpy, full of joy about my recent achievement at my workplace, then next moment it falls down in the mud and drags behind my feet because it got some unpleasant association from that memory ... then notices a rabbit and runs across the fiel ... a rabbit, in this dirty polluted field! Then it stops suddnely and comes back, whining about pollution, ugly people and all bad that's going on around the world ... And then it's back on my shoulder, dreaming about what's still left on this planet to enjoy ...

See, it's not easy to be me. A coworker once said that I'm both a radical technocrat and a romantic hippy ... maybe, but that's just me and my mind.

Eye candy vs. usefulness ...

2010-01-10T01:49:00-00:00

Yesterday we had an interesting talk at our weekly meetings of a local web community, regarding flash vs. standard html pages. I won't comment the talk here, but rather mention some thoughts that started floating in my head as a result of the talk.

Like, how I treat visually nice software vs. usefull software.

For example ... in the longstanding Qt vs GTK argument, I was always on the Qt side. That's why I was quite suprised when I installed qcad, it also installed Qt libarry as a dependency. Actually I didn't have Qt on my deskop system ... And no, I'm not using gnome either :) I'm an xfce fan. Xfce is GTK2 ... and wow, all the apps I use every day, are GTK2. Ok, Opera is Qt, but I use the statically compiled version. Why is that so?

Frankly I don't care about desktop consistency, unified look and feel, eye candy and all that stuff. I just want my app to do its basic job and do it fast. That means to be efficient and without all the bloat. I've tried to use kmail, it was buggy as hell and imap never really worked right. I tried to use evolution, it ate half the memory in my box, was really slow and I was using maybe 5% of its functionallity. So every time I came back to sylpheed, which I've been using primarily since version 0.3. Back then, it was even less than a megabyte. And so was Opera in its earliest versions ... the only browser that could fit on a floppy :)

How does this apply to the web pages?

Back in the early days of internet, it was mostly hypertext with an occasional picture here and there. It was meant to deliver information in its most efficient form known to man, written word. Now, if you look at the average site which job is to deliver information, for example any news portal, you (well, at least I) have quite a lot of problems locating a piece of news that is not a top story at the moment ... and usually I give up before I can locate it. That's why many web designers roll their eyes when I try to convince them that no design is the best way to offer information. That person that came up with the knob in Opera that turns off all the css stuff must be a real genius.

Also, another thing regarding user interfaces ... recently my job forced me to think about that too ... This day, stuff like Flickr is regarded as top and functional design, yet when I'm confronted with it, I feel like grandma waiting for instructions from her grandson what to click. On the other hand, stuff like wakaba and kareha that runs many *chan sites enables me to take in the whole functionallitty with one glance and immediately understand what this is, what it is for and how to use it. I'm sure many web designers would just turn away because it's so ugly, but as I said, I don't care. It works and it works damn well.

It is not unusuall that I see more and more where some article is linked from sites like slashdot, there soon appears a link to a 'printer friendly' version in the comments. Why? Because, what is understood as 'printer friendly', is actually more 'eyes friendly' too. Huh, what a discovery. One can more easily focus on the reading without all the blinking colorfull junk around it. Afterall, it's the information that matters and not the distracting coloful junk around it.

Sunny week

2010-01-10T01:22:00-00:00

This week has been full of Sun all over the news. They officialy released their Niagara cpu as an Ultrasparc T1, released first Solaris builds with their new ZFS filesystem and of top of everything, offered their ide/compiler/tools Studio 11 for free. Looks like Sun is rising back from the ashes ...

Phoenix style :)

So what does all that mean? Right now it's kinda difficult to gather all the information scattered around many sun blogs and other sites ... Right now there are also no thirdparty benchmakrs to actually validate the value of all these new toys ... Because thats what they are to us serverroom geeks, exciting new toys. For example, this Niagara thing. It's a cpu that no one with their right mind would like to have in a desktop machine, but that just about everyone would have to like to have it in their web tier servers. Possibly also in the app tier. Because the thing is designed to run many jobs in parralel, each of them fast enough, not breaking any speed records for a single job, but very likely for the amount of work done in the unit of time.
Then, there's the ZFS. Smart folks at Sun have done an interesting job with it. While the whole industry is built on the disk-as-block-device + volume manager of some sort + filesystem paradigma, they were able to stand back and look at the whole thing from the entirely new perspective. Gone are the blocks, welcome to the objects. Gone are the fscks, welcome to the transactions and always consistent data. Gone is the predefined size of the filesystems, welcome to the dynamically resizing world. And so is gone the need for a volume manager. Other exciting blinking lights and knobs include snapshots and clones (or what is known as "flash copy" in IBM terminology), full checksums of all data, compression and encryption.
The last thing, Studio 11, is what I know the least about. Go ask developers, if they'd like to have optimized compilers for their machines and some tools to analyze performance bottlenecks ... I guess the answer is yes. Actually the compiler is the reason I wanted to get some SGI mips hardware, because I heard that when something compiles cleanly with that compiler, then it's really well written. Ofcourse, the common assumption was that their compiler is just b0rken :)

Now that reminds me I'm still missing some ultrasparc machines from my collection ...

Crows

2010-01-09T05:07:00-00:00

Something very interesting has just happened. My workplace is located 5min walking distance from the train station and I usualy cross all the rails on my way. There's also a lot of electrical cables laid above the rails and my path leads me near the point where they all come to the ground.
Today, I found a dead crow there. A beautiful grey and black bird, probably young. I guess it got electrocuted on the wires there. I deceided to have a closer look at it and while I was turning the bird around, the whole flock gathered out of nowhere and started circling above me and yelling at me. They must have been very aware of the loss of one of their members, as they are very social animals. And smart too. Two of them even followed me all the way to the building while constantly yelling at me.

I kinda feel sorry for them ... but I've been through all that too.

I've always been very aware of the larger birds around me, but it's only recently that I started paying more attention to the crows. 15 years ago, they were nowhere to be seen in the cities, but now, they're all over the place. I even have a small roost of a few hundred birds in the trees behind my house. Now, after my Haibane expirience, I'm noticing them everywhere. I'd really like to understand them someday ...

My "server room" (or junkyard, scrap heap, ... :)

2013-03-06T14:28:00-00:00

It looks like this:

Click on comments to see what all this is ...

1. Apple G4, borrowed. Mac OS X just doesn't touch me. Too graphical.
2. SGI Indigo2. Really Nice Machine. Its smaller brother, Indy, is living in the livingroom above as a graphical terminal (21" screen, baby!)
3. Old Umax scanner. SCSI :)
4. Drawers, full of pc hardware. You can find everything from RLL/MFM controller and disk to gigabit NICs, from hercules (monochrome!) cards to radeons 7500 (currently my most modern gfx card).
5. Stacks of sparcstations, spare parts.
6. Micro VAX 4000-200. Equipped with 32mb memory and two gigabyte disks I think. Has ethernet, but has no SCSI. Nice for heating up the room :) I also have a matching VT320 terminal (green! :)
7. Current test rig, PII 350MHz on 440BX board. Very worthy, because it has one big passive heatsink (=silent). Here it's stuffed with disks, two of them are already in the click-of-death phase.
8. THE Keyboard, made by IBM. You konw, the one you can use instead of hammer.
9. Many things below there. On the left is the disk enclosure for my mail spool, which is attached to the sparc below (gateway & private web server). Below is another sparc, used for boring stuff like dhcp, dns & tftp. Below of it all is a nice big APC UPS.
10. My first color monitor. Still works :) Now mainly used as handy console. 14" = portable :)
11. Main file & mail server. 3ware 6800 takes care of the storage, which sits in front of it :)
12. Some IBM fibre disk array, temporary here on test.
13. CPU drawer of an AlphaServer 4100. Dual 450w PSUs just for 4 CPUs ... this baby is even warmer than the VAX.
14. 300m drum of cat5 cable - you never know when you need to wire your house. On top of it sits a Cisco 1600 - the smallest "proper" router, IOS and all that.
15. More history - including fully loaded 386 board (387 fpu, 32mb ram :), some 486 boards and two p200 boards.
16. Similiar disk enclosure as left of 9, but this one is wide scsi. For Alpha.
17. Cardbox full of old PSUs (the connectors, man!), vms manuals and smaller parts.

If you're interested in all this mess, I can talk more about any item you see here. And maybe some more ... Just ask :)

Dreams

2014-11-24T21:06:00-00:00

Dreams are good. And very usefull. Today when I didn't feel like getting up, I stayed in the floating state of not sleeping anymore and not being completely awake either and it suddenly realized I have a critical bug in one of my critical scripts. I've seen it floating in front of me, a screenshot of a vi terminal with cursor blinking at the line where the bug was ... amazing. I need more dreams like that :)

Cyrus HA

2010-01-10T01:19:00-00:00

There's more and more discussion about Cyrus high availability setups on the cyrus mailing list. Recent thread started with a familiar question of how to achieve best possible availability and drifted to the discussion about implementing an active-active application level redundancy into the cyrus itself. Since i love to play with disks, raids, volumes & co, I posted my raid-based sysadmin view on how would i try to implement it