CPU power management

Maximum power consumption of a processor

Our Intel CPU has a thermal design power (TDP) of 120W. The AMD CPU has a TDP of 200W.

The Intel CPU has 80 cores while the AMD one has 128 cores. For Intel, this gives us 1.5W per core, while for AMD, 1.56W.

The TDP is the average value the processor can sustain forever, and this is the power the cooling solution needs to be designed at for reliability. The TDP is measured under worst case load, with all cores running at 1.8Ghz (the base frequency).

C-State vs. P-State

We have two ways to control the power consumption:

This is done by using

C-State means that one or more subsystem are executing nothing, one or more subsystem of the CPU is at idle, powered down.

P-State the subsystem is actually running, but it does not require full performance, so the voltage and/or frequency it operates is decreased.

The states are numbered starting from 0. The higher the number, the more power is saved. C0 means no power saving. P0 means maximum performance (thus maximum frequency, voltage and power used).

C-state

A timeline of power saving using C states is as follow:

  1. normal operation is at c0
  2. the clock of idle core is stopped (C1)
  3. the local caches (L1/L2) of the core are flushed and the core is powered down (C3)
  4. when all the cores are powered down, the shared cache of the package (L3/LLC) are flushed and the whole package/CPU can be powered down
statedescription
C0operating state
C1a state where the processor is not executing instructions, but can return to an executing state essentially instantaneously
C2a state where the processor maintains all software-visible state, but may take longer to wake up
C3a state where the processor does not need to keep its cache coherent, but maintains other state

Running cpuid we can find all the supported C-states for a processor (Intel(R) Xeon(R) Gold 6122 CPU @ 1.80GHz):

   MONITOR/MWAIT (5):
      smallest monitor-line size (bytes)       = 0x40 (64)
      largest monitor-line size (bytes)        = 0x40 (64)
      enum of Monitor-MWAIT exts supported     = true
      supports intrs as break-event for MWAIT  = true
      number of C0 sub C-states using MWAIT    = 0x0 (0)
      number of C1 sub C-states using MWAIT    = 0x2 (2)
      number of C2 sub C-states using MWAIT    = 0x0 (0)
      number of C3 sub C-states using MWAIT    = 0x2 (2)
      number of C4 sub C-states using MWAIT    = 0x0 (0)
      number of C5 sub C-states using MWAIT    = 0x0 (0)
      number of C6 sub C-states using MWAIT    = 0x0 (0)
      number of C7 sub C-states using MWAIT    = 0x0 (0)

If I interpret this correctly:

P-state

Being in P-states means the CPU core is also in C0, since it has to be powered to execute some code.

P-states allow to change the voltage and frequency of the CPU core to decrease the power consumption.

A P-state refers to different frequency-voltage pairs. The highest operating point is the maximum state which is P0.

statedescription
P0maximum power and frequency
P1less than P0, voltage and frequency scaled
P2less than P1, voltage and frequency scaled

ACPI power state

The ACPI Specification defines the following four global “Gx” states and six sleep “Sx” states

GXnameSxdescription
G0workingS0The computer is running and executing instructions
G1sleepingS1Processor caches are flushed and the CPU stop executing instructions
G1sleepingS2CPU powered off, dirty caches flushed to RAM
G1sleepingS3Suspend to RAM
G1sleepingS4Suspend to disk, all content of the main memory is flushed to non volatile memory
G2soft offS5PSU still supplies power, a full reboot is required
G3mechanical offS6The system is safe for disassembly

When we are in any C-states, we are in G0.

Speed Select Technology

Speed Select Technology is a set of power management controls that allows a system administrator to customize per-core performance. By configuring the performance of specific cores and affinitizing workloads to those cores, higher software performance can be achieved. SST supports multiple types of customization:

Turbo Boost

TDP is the maximum power consumption the CPU can sustain. When the power consumption is low (e.g. many cores are in P1+ states), the CPU frequency can be increased beyond base frequency to take advantage of the headroom, since this condition does not increase the power consumption beyond TDP.

Modern CPUs are heavily reliant on “Turbo(Intel)” or “boost (AMD)” (TBT and TBTM).

In our case, the Intel 6122 is rated at 1.8GHz, A.K.A “stamp speed”. If we want to run the CPU at a consistent frequency, we’d have to choose 1.8GHz or below, and we’d lose significant performance if we were to disable turbo/boost.

Turbo boost max

During the manufacturing process, Intel is able to test each die and determine which cores possess the best overclocking capabilities. That information is then stored in the CPU in order from best to worst.

January 22, 2023