or at least my view of it
Swimming in the HPC waters for the past two years, I got some sense of how wet and cold the water is.
First, there's the problem. You have a chance to build a real model or even the real thing and test its behaviour in the real world. Or you have a chance to run a digital model in a numerical simulation, running on some decent hardware. Which is better? No straight answer. Numerical simulations tend to be cheaper on a big scale, but they are just simulations and you would do well to test how they align with reality every now and then.
Then there's the code that runs the simulation. Usually written by people who understand the nature of the problem and know how to describe it in mathematical terms, but they're not expert programmers. They're happy when they got the numbers that look reasonable. They are far away from understanding, much less knowing what goes on inside the machine and how to improve their code to better utilize the machine.
Then there is hardware. CPUs designed by people who don't have enough connection with programmers from the real world, bought by people who make decisions based on easy understandable numbers fed to them by marketing guys. Networking, again sold by marketers with benchmarks to show their particular strengths, which has little connection to the application behaviour in the real world. Whole machines, each with their own details of implementation, requiring different tricks to utilize it best.
In June I spent few days at the international supercomputing conference in Hamburg. It's a very interesting show, but heavily split to two different groups. One sells exciting hardware and talks to people eager to test it, the other has to produce results with the hardware they have and are figuring out how exactly to at least partially utilize that hardware. I'm talking about programmers.
You see, developing faster and faster hardware (while not easy by any means) is likely the easiest part of the whole story. But to fully utilize the hardware, the code needs adaptations. Who should take care of that? Both hw and sw guys point to compiler guys. But the compiler guys whine that they don't get enough info from hw guys about what their hw can do and from sw guys about what are their code intentions in a particular code block. Also it is not always possible for the compiler to to the best thing, so programmer has to do some things in assembler tuned for the particular cpu. But that is unrealistic, so all cpu vendors offer optimized libraries with math routines already optimized for their cpus. They also offer their own optimized compilers, which sometime manage to demonstrate their value, but what they mostly do is complicate life because the flags used to turn on all the optimizations are nonstandard and one has to fish them out of the manual.
So what is a standard? One way to do things. But the thing is that in HPC world everyone does things in their own way, for mostly the same reasons. One reason I already identified is fear: in the fast moving sw world of HPC, one group of people wants to have control over their own environment, so what they do is they write their own environment. Is this the way to go forward? No, because this brings additional fragmenation to an already fragmented ecosystem. What is needed is more communication, more discussion, more working together. 80% of problems people are solving are common to all of us, so it is ridicoulus that eveyone is implementing their own solutions for common problems.
And I haven't even touched topics like accelerators and gigabytes of 30+ years old fortran code out there.