[mdlug] Parallella: A Supercomputer For Everyone by Adapteva — Kickstarter

Michael Mol mikemol at gmail.com
Fri Oct 12 23:02:08 EDT 2012


On Fri, Oct 12, 2012 at 6:11 PM, Aaron Kulkis <akulkis00 at gmail.com> wrote:
> Michael Mol wrote:
>>
>> On Fri, Oct 12, 2012 at 4:01 PM, Garry Stahl <tesral at wowway.com> wrote:
>>>
>>> On 10/12/2012 01:54 AM, Aaron Kulkis wrote:
>>>
>>>>> If you're paying attention to the things necessary to squeeze every
>>>>> last {int|fl}op of performance from a given modern CPU, you'll find
>>>>> that you need to keep several different variants of your code handy to
>>>>> account for difference in execution speed or stall behavior for
>>>>> individual instructions across different brands of CPUs--even
>>>>> different models within the same brand!
>>>
>>>
>>>
>>> Amgia programers did exactly that.  Programs like ImageFX would have
>>> executables for 68k 020 030 and 040 processors depending on what you had
>>> in
>>> your Amiga you picked the right one to install.  Installers would ask you
>>> what your hardware setup was.
>>
>>
>> So, three images, written and tuned in assembly. Sounds like a lot of
>> work went into it.
>>
>> That matrix would be far, far more complex today:
>>
>> (Some of these questions become irrelevant if more recent technologies
>> are available; the earlier tech could be assumed present, or could be
>> presumed obsolete.)
>> Is floating point available in hardware?
>> Is mmx available?
>> Is 3dnow! available?
>> Is sse available? (obviated if a later version is available, or if
>> 64-bit is available)
>> Is sse2 available?
>> Is sse3 available?
>> Is sse4 available?
>> Is sse4.1 available?
>> Is avx available?
>> Is avx2 available?
>> Is the machine 32-bit or 64-bit?
>>
>> And that's before you get into questions of individual CPU expense
>> (we're talking about high-efficiency, count-every-cycle code, right?)
>> for each instruction, where the CPU expense will differ from brand to
>> brand, model to model or even stepping to stepping!
>>
>> Also, remember what I said about earlier technologies being assumed
>> present if later technologies are present? With Intel (the brand, not
>> necessarily the architecture) processors, features appear and
>> disappear depending on the target market.
>>
>> That's a *ton* of variance to pay attention to.
>>
>
> GCC compiler can custom-tune for all of that.

Hence why I run Gentoo with the local equivalent of -march=native. But
this whole thing comes from a context where every cycle counts. You're
not going to use a compiler for that, you're going to hand-write
assembler. ffmpeg and libavcodec both do this. If you want the
compiler to do it, you're going to be writing in a compiled language,
not an assembled language.

And, honestly? Compilers are *absolutely terrible* about recognizing
plausible vectorization (so, the stuff where mmx, 3dnow!, sse* and
avx* are useful) and optimizing code to use it. Even moderately
complex cases require a deep code graph and an expensive best-path
search--meaning finding the optimal solution falls into the same
category as the traveling salesman problem: NP-Hard. So in the cases
where it's _possible_ to find the optimal solution without changing
the side effects (this is a key term, by the way, with specific
meaning in language design) of the original code, nobody wants to wait
that long on the compiler!

It also only takes one or two vague program behaviors to throw a
monkey wrench into this process, too; let's say you've got a region of
memory mmapped for IPC or device access purposes. In such a case,
you're probably going to mark pointers into that space with the
'volatile' keyword--and you've just busted the compiler's ability to
optimize the code that's working with that region of memory. Now let's
say you're playing fast and loose with types in a language like C.
You, the programmer, might know what's going on, but the compiler
doesn't; your aliasing of types again broke its ability to optimize.

Stuff like this is why people who write video processing code
hand-write their assembler...but that takes a lot of time (which is
the scarcest resource any of us have, and so it's expensive), and
becomes a maintenance burden as it needs to be updated. Case in point:
When id Software released the source code to the Quake game engine,
the Quakeforge developers ripped out the hand-optimized assembler and
replaced it with C code--the hand-optimized assembler was great for a
486, but it was a hassle to maintain and wasn't nearly as useful when
most people were running Pentium IIs and Pentium IIIs.

Now take that maintenance burden and apply it to an _entire_ modern
system? That's a nightmare! Admittedly, there are people who've done
it[1], but I doubt their browser's JS implementation is as fast
Firefox's or Chrome's, and I fully expect their system to have more
bugs.

[1] http://menuetos.net/

(Incidentally, all of you who are complaining about things running
slower than they ought to, given how much faster the hardware is,
should try running that. If it seems OK, try making it your primary
OS. If that works for you, great...but let us know if/when you have to
switch back to Linux, Windows or Mac, and why.)

>
> _______________________________________________
> mdlug mailing list
> mdlug at mdlug.org
> http://mdlug.org/mailman/listinfo/mdlug



-- 
:wq


More information about the mdlug mailing list