[mdlug] Parallella: A Supercomputer For Everyone by Adapteva — Kickstarter

Adam Tauno Williams awilliam at whitemice.org
Fri Oct 12 19:36:00 EDT 2012


On Fri, 2012-10-12 at 18:11 -0400, Aaron Kulkis wrote:
> Michael Mol wrote:
> > On Fri, Oct 12, 2012 at 4:01 PM, Garry Stahl <tesral at wowway.com> wrote:
> >> On 10/12/2012 01:54 AM, Aaron Kulkis wrote:
> >>>> If you're paying attention to the things necessary to squeeze every
> >>>> last {int|fl}op of performance from a given modern CPU, you'll find
> >>>> that you need to keep several different variants of your code handy to
> >>>> account for difference in execution speed or stall behavior for
> >>>> individual instructions across different brands of CPUs--even
> >>>> different models within the same brand!
> >> Amgia programers did exactly that.  Programs like ImageFX would have
> >> executables for 68k 020 030 and 040 processors depending on what you had in
> >> your Amiga you picked the right one to install.  Installers would ask you
> >> what your hardware setup was.
> > So, three images, written and tuned in assembly. Sounds like a lot of
> > work went into it.
> > That matrix would be far, far more complex today:
> > (Some of these questions become irrelevant if more recent technologies
> > are available; the earlier tech could be assumed present, or could be
> > presumed obsolete.)
> > Is floating point available in hardware?
> > Is mmx available?
> > Is 3dnow! available?
> > Is sse available? (obviated if a later version is available, or if
> > 64-bit is available)
> > Is sse2 available?
> > Is sse3 available?
> > Is sse4 available?
> > Is sse4.1 available?
> > Is avx available?
> > Is avx2 available?
> > Is the machine 32-bit or 64-bit?
> > And that's before you get into questions of individual CPU expense
> > (we're talking about high-efficiency, count-every-cycle code, right?)
> > for each instruction, where the CPU expense will differ from brand to
> > brand, model to model or even stepping to stepping!
> > Also, remember what I said about earlier technologies being assumed
> > present if later technologies are present? With Intel (the brand, not
> > necessarily the architecture) processors, features appear and
> > disappear depending on the target market.
> > That's a *ton* of variance to pay attention to.
> GCC compiler can custom-tune for all of that.

*Some* of that, but that still requires builds for n-number of
combinations of features.  And glibc can toggle in and out certain
optimizations.  But whenever you 'generalize' optimizations you make
them less optimal.  And a machine is rarely doing one thing, so other
processes may impact the effectiveness of some optimizations, so in a
preemptive environment it is harder to play hardware tricks than in a
cooperative environment.

BTW, recent glibcs have [re]introduced optimizations for several SSE
levels/extensions.  There were many bugzilla entries created on the way
to using these "optimizations".




More information about the mdlug mailing list