[mdlug] Parallella: A Supercomputer For Everyone by Adapteva — Kickstarter

Thu Oct 11 16:18:39 EDT 2012

Adam Tauno Williams wrote:
> On Wed, 2012-10-10 at 17:48 -0400, Aaron Kulkis wrote:
>> Jonathan Billings wrote:
>>> On Wed, Oct 10, 2012 at 12:34:16PM -0700, Art Dries wrote:
>>>> I had a bioinformatics project a year ago that could have used this.
>>>> (genetic analysis scales very well)
>>>> Anyone into high-end crypto could use a pair for an end-to-end VPN.
>>>> (isn't math fun?)
>>>> The specs, API, and code will all be OPEN, so the possibities are endless.
>>>> For $100, this would make an excellent (insert idea here)
>>> True, but I am trying to dispell any notion that this thing will
>>> instantly turn your laptop into a 45GHz system.  The extra cores on
>>> the system are specialized, and you need to write code specifically to
>>> use it.
>> One simple solution:
>> Guest OS running in the ARM-space, running its own set of processes
>> compiled (or cross-compiled) to ARM
>> Or even simpler... do the whole distro compiled for ARM, and then
>> you get rid of the non-uniform hardware problem.
>> If you gave me a 3 GHz quadcore, and gave me the option of trading
>> that for a 48-core multi-CPU ARM machine running at only 1 GHz,
>> yes, sure, single-threaded apps will take longer to complete---MAYBE...
>
> This has already by tried by several vendors, Sun had the T-series which
> was highly parallel.   You don't see these boxes everywhere in large
> part because the theoretical advantage never really paid off.  On
> something like a database server where you have lots of threads and/or
> workers these boxes where supposed to be awesome.  They weren't.
>
> Software really does have to be tweaked to get the bang-for-the-buck
> from this type of setup.   And you are still going to run up against
> other bottlenecks - primarily I/O and network.  Now you just have 100
> concurrent processes making demands of those subsystems rather than 10.
> End result is that is just doesn't go that much faster.
>

The problem is that these were typically sold as "this is your
database server"...and if the database code isn't tweaked to
the nth degree to utilize all of the cores, then yes, it's not
very cost efficient, because a lot of the cores are sitting
around idle.

On the other hand, with a raft of processes, all of which are
completely independent of each other, without any synchronization
requirements causing processes to go to sleep until a sibling
process catches up, these problems go away.

Using massively multi-core computers as general compute
servers works very well.  I've seen this with Sequent
(bought out by IBM about 10-12 years ago for their
NUMA - Non-Uniform Memory Architecture) with their machines
made of large arrays (32 ~ 128) of off-the-shelf Intel
x86 CPUs going all the way back to the 80386 days.

They were very cost effective, compared to other makers'
super-minis.  Load averages, even with dozens of students
logged in, mostly running X11 graphics terminals, rarely
rose to 2.

In contrast, at the same time, I was also doing school
work on another machine... a Gould NP-1 (when Gould
Electronics had a High Performance Unix Division), and
it had a few CPUs (4) using emitter-coupled logic, and
running at higher speed...but at the expense of much
much higher power consumption.  ECL is fast, but every
gate is ALWAYS "on"... at that time, 0 was +2.5 V and
1 was -2.5 V....and all gate put out both inverted and
non-inverted output (so every gate is an AND/NAND or
OR/NOR gate)... the designer decides if he wants inverted
or non-inverted output from a gate, and connects to that
as appropriate.

>> Depending on machine load, that single-threaded app might STILL
>> get more clock-cycles (since CPU contention goes way down due to
>> the plethora of CPUs available),
>
> Nah.  CPUs like the i-family are very smart and internally concurrent.
> Maybe if you have a cluster of really good CPUs, but a shoebox of ARMs
> is going to get its butt kicked.
>
>> and therefore complete faster
>> than on a low-core, high clock-rate CPU.
>> I have a 2006-era dual core... it's constantly getting bogged down
>> whenever I want to do a couple CPU intensive things simultaneously.
>
> A 2006 vintage machine [assuming it wasn't state-of-the-art at the time]
> is going to hit lots of bottlenecks.  Most notably the front-side-bus
> and the speed of the RAM.
>
>> I don't need parallelized software...what I need are parallel cores.
>
> An i7 can give you eight, and that is a lot.  A dual i7 can give you 16.
>

Can they do it at 5W of power consumption?

> My laptop has an i7-2670QM CPU and I peg three cores @ 100% and it
> remains very responsive.  If I swamp the hard-drive it turns into a
> sled.
>

And it's probably sucking down 150~200 W during those moments.