Clock Speeds, Power Efficiency and why the Eurogamer rumour makes perfect sense. (Long Analysis)

Hello folks, few disclaimers first!

I am an analyst by trade, though I do not work in the technology industry, like many of you I am just an eager Nintendo fan who cannot wait for this console. I have an Analytical mind-set however and cannot help but break things down so I thought I would share this with you.

Firstly, my assumptions are based on my own prediction that the Nintendo switch, though it may be using Maxwell Tegra X1 instructions will be manufactured on a 16nm FinFET+ or even a 14nm process node like their GP107 chips being made for the recent 1050 series of cards. This makes sense to me as all nvidia manufacturing is on these two nodes currently and the efficiency benefits of these nodes versus 20nm or 28nm are enough for Nintendo to take notice, remember Nintendo likes low power draw consoles!

So, I need a base specification to go off to do my comparisons so we will use a theoretical 16nm tegra X1 device. Lets say…… Shield Android TV with a tegra parker chip! So we know the tegra X1 in the current shield TV is pretty much running all out. Though Nvidia says it’s a 10 Watt chip, its lies, tests indicate it actually draws closer to 20 Watt underload so this will be our baseline only on a loaded Tegra Parker Chip. So what do we have.

256 Cuda Cores @ 1.5 GHZ – 750 GFLOP FP32 8 ARM Cores in Big Little, variable clock speed. 20 Watt power draw.

So with our baseline set let’s take a look at what happens when we under clock a CPU and GPU.

If the rumour is accurate, the switch will be using four ARM cortex A57 cores clocked at 1020MHZ, I wonder why they chose to clock them so low and how that affects power draw? Well I have an article that shows how much that under clocking affects power draw. (The chip can run at 1.9GHZ)

As you can see from the attached image the relationship between CPU clock speed is not a linear one, on a four core chip reducing the clock speed by half results in 1/4 of the power draw. Interesting, and now we know how much power the switch CPU will consume at peak load 1.83 Watts. Also it is interesting to note that the gap between different core counts is pretty consistent, indicating a linear relationship between core count and power draw at a given clock speed.

But does the same apply to GPU's?

I found this blog regarding mobile GPU's specifically from a developer’s point of view.

This section in particular is interesting.

"Power consumption increases roughly quadratically with clock frequency1 — for example, raising the frequency from 600MHz to 700MHz on Exynos 5433 increases power consumption by 42%."

Along with

"GPU workloads are embarrassingly parallel, so it is easy to double the performance of the hardware if you are allowed to increase the power and chip area — just place two identical GPUs in the package! In the same way, you can get much improved power efficiency by using two GPUs and running them with halved frequency."

So it does indeed seem that the relationship between clock speed and efficiency also scales in the same way for GPU's as it does with CPU's.

But we can't just cram a GTX1080 sized GPU into a tablet form factor, that would be far too big right?

Well let’s have a look how big a Tegra X1 SoC is.

The answer is not much bigger than a finger nail. That contains the gpu and the 8 CPU cores. Now we cannot use a linear approach to increasing the GPU cores in this chip because we would end up with a chip with 24 CPU cores or something ridiculous, ARM cores are however incredibly small, less than 1mm in size so forgive me for making a few wild assumptions.

So if we keep 4 ARM Cortex A57 cores and increase the CUDA core count to 768 we increase the chip size by 2.5. Can this still fit on a tablet sized PCB? Looking at the size of the tegra chip, yes it can, worst case scenario lets go for 640 cores at that size. So now some power draw estimates based on the earlier example!

Now if we cram 768 CUDA cores into our fantasy parker based shield TV. We know the relationship between core count and power draw is linear from the graph. So now let’s say our Android powered monster is drawing 60 Watts Minus the 4 watts from not having an additional 8 ARM cores added.

768 CUDA Cores, 1.5GHZ, 56 Watts power draw. In that confined space our set top box has now set on fire and the house has burned down, thanks Nvidia. But this thing is now a 2.3 TFLOP monster that you can play Angry Birds on! Ah crap we don't need that so let’s cut the clock speed in half. Based on the earlier analysis our power draw needs to drop to 1/4 of what it was originally.

56/4 = 14 Watts Peak power draw, at 1.15 TFLOPS. Holy crap! But this is too much for a handheld, oh wait the clock speed in portable mode is supposed to be 40% of the docked clock speed.

So if we refer back to our earlier graph here.

We can see there are some diminishing returns below 1ghz and there is almost a linear relationship so….. Let’s chop the power draw down to 40% when the clock drops to 40% its maximum.

14 * 0.4 = 5.6W

Now we know the power draw of the ARM is not going to scale, it will be fixed, but even if we add that back in we get 7.43W for the entire SOC.

This would give us.

1.18 TFLOPS Docked, 472 Gflops portable.

If we were to assume 640 CUDA cores. We need to multiply our magic parker shield TV power draw by 2.5 and deduct the 4 Watts for the non-existent arm cores so.

(20W * 2.5)-4 = 46

46/4 = 11.5W SOC docked (11.5*.4) + 1.83 = 6.4W Portable.

983 Gflops Docked. 393 Gflops portable.

Punching just a bit below the Xbox one just like this latest unreal engine rumour suggests. Thing is, is this doable in what is effectively a slightly thicker 7 Inch tablet?

Heat is a concern, but power consumption and heat generated are directly related, a larger chip running slower has a greater surface area to dissipate that heat, I estimate that the switch will have a passive heatsink on the SOC and a couple of small fans located near the vents at the back to push air over the surface of the heatsink and this will be enough to cool the unit given the theoretical power consumption of the unit. These fans will operate at low RPM in portable mode and then increase their speed docked.

On to power draw overall, in our worst case scenario for power draw we have 7.43W portable, the SOC is going to have the largest power draw, a screen typically uses 1 Watt on average and I honestly have no idea about the rest.

The only thing I could find about power consumption for a tablet is this pixel C review where it says the thing draws 14W under load but the battery still lasts up to 5 hours.

Now I know the pixel is much larger, but it is also thinner. So again I imagine here that a device that is a bit thicker and draws a bit less power could manage a similar battery life.

You may ask why the XBOX one and PS4 consume so much power, draw so much heat and take up much more room. It is simply because X86 CPU's (AMD In particular) draw a lot of power and get very hot. A 1.8 GHZ X86 CPU will draw 40W, where as an ARM coretex A57 will draw 7.4W at the same clock speed. AMD GPU's also draw a lot of power and get very hot when compared to NVidia and the types of memory used also draw more power. The Xbox and PS4 don't have to care about these constraints as they are plugged into a wall, so they will go cheap and inefficient.

Now finally, cost. You may be saying now that this would cost too much to produce, well no. Look at how much a GTX 1050 manufactured on a 14NM process costs the consumer, around $110 – $120. Nvidia are going to be charging a considerable markup for these chips so the cost to them per card is probably $40-$50. A mobile SOC is around $40 to produce in reality, increasing the CUDA core count won't drive cost up too much. The volume that Nvidia will be selling to Nintendo along with the benefit to them in terms of gaining a console win means they won't be selling to Nintendo at a premium either.

My final prediction is a 640 CUDA core GPU along side the Eurogamer specs. Running just below an Xbox one in raw power.

TLDR: Using the leaked eurogamer clock speeds it is perfectly reasonable to get a 640 CUDA core chip into a tablet form factor that produces almost 1 TFlop docked and 400 GFLOPS portable whilst drawing less than 10 Watt for the entire system and being affordable.

Discussion started at here by ShaunSwitch

Share this post