Friday, February 20, 2015

A look at Raspberry Pi 2 performance and overclocking

Raspberry Pi 2 significantly improves on original model


The Raspberry Pi 2 significantly increases performance when compared to the original Raspberry Pi. It corrects deficiencies in the design of the original SoC used inside the Raspberry Pi by integrating four more modern and faster Cortex-A7 ARMv7 CPU cores in a quad-core configuration, as opposed to the single ARM11 core in the original SoC, all within the constraints of a similar 40 nm manufacturing process. Whereas the CPU inside the BCM2835 processor of the original Raspberry Pi effectively ran without a L2 cache (which was tied to the GPU), the new Broadcom BCM2836 SoC contains a dedicated 512 KB CPU cache, improving memory performance and performance in general. The amount of RAM has also doubled to 1 GB. Other changes include more USB ports and a MicroSD card slot for storage instead of SD.

Compatibility with Raspbian


Otherwise the new SoC as well as the device itself has been engineered to maintain hardware and software compatibility with the original Raspberry Pi, while running considerably faster. When using the Raspbian OS, an ARM11 compatible Debian-based distribution using armhf specifically maintained for the Raspberry Pi, only the kernel is specific to the Raspberry Pi 2 with the entire userland being 100% compatible. Although this misses out on some of the advantages of the newer ARMv7 instruction set (such as the reduced code size of Thumb2 instructions, which are used in ARMv7 Debian), applications that can take advantage of, for example, NEON SIMD instructions usually do so on a run-time detection basis (as they do in ARMv7 Debian), so that the most critical gains from the new instruction set can in theory be taken advantage of in Raspbian.

Nevertheless, the new device can run an OS specifically configured for ARMv7, such as Debian armhf and derived distributions such as Ubuntu, which take advantage of the reduced-size Thumb2 instruction set. An example of such a distribution that has been applied to the Raspberry Pi 2 is Ubuntu Snappy Core.

Components of Raspberry Pi 2 SoC clocked conservatively out of the box


The maximum CPU clock of the Cortex-A7 cores in the Raspberry Pi 2 is 900 MHz, while the L2 cache appears to be clocked at only 250 MHz by default, inheriting the clock rate of the original Pi's GPU cache. SDRAM is clocked at 450 MHz by default. The GPU is clocked at 250 MHz, similar the original Raspberry Pi.

The configured speed of the L2 cache is particularly low, as we will see, since speeds up to 600 MHz seem to be stable when overclocking, resulting in a large performance increase. The CPU clock speed can also be bumped up somewhat.

The raspi-config utility in Raspbian at the time of writing contains just one overclocking option for the Raspberry Pi 2, which clocks the CPU at 1000 MHz, doubles L2 cache speed to 500 MHz and clocks SDRAM also at 500 MHz. Unfortunately, this setting turned out to be unstable on my device. This appears to be due to the SDRAM clock speed being set too high and causing problems. Bumping the SDRAM speed down to 483 MHz results in a stable system.

Overclocking test set-up


I have performed a number of overclocking tests with different clock configurations. The test set-up was as follows.

To prevent corruption of the root file system, I modified /etc/fstab to mount the root filesystem read-only at boot by adding "ro" to the mount flags. To remount with read-write capability when necessary after boot (on a stable system), I ran "sudo mount -o remount,rw /dev/mmcblk0p2 /".

The main stability test was performed using the single-threaded memtester package (available in Raspbian and Debian) using the command line "memtester 16M 10" (16 MB memory region, 10 loops). In several cases four of these commands were run in parallel to fully occupy the CPU and provide reliable stability information. In unstable configurations, this test almost always shows errors.

Memory performance was tested using a slightly modified version of the fastarm package (https://www.github.com/hglm/fastarm) with the command line "for x in 0 1 2 3 4 5 6 7 8 9; do ./benchmark --duration 1 --repeat 1 --memcpy e --test 0; done". Because of result variation due to cache allocation effects, I took the best result out of ten. Tests number 0 (memcpy of varying size, aligned, depends on CPU as well as memory) and 43 (4K page-aligned memcpy, a more pure memory subsystem test) were used.

For a real-world CPU performance indication I used the command line "time zcat bullet3-Bullet-2.83-alpha.tar.gz >/dev/null" performed multiple times, which is effectively gzip decompression of a large file out of buffer cache memory.

Table with stability testing results


The following table shows stability testing results for a large number of CPU clock, core clock (L2 cache clock), and SDRAM clock configurations. Also included are some benchmark scores, including memory performance and CPU performance.

CPU     +Volt   Core    SDRAM   +Volt   Stability       Memcpy perf.
                                p i c   (memtester)     Varied  4K      zcat

Default:
900     ?       250     450     0 0 0   OK (slow)       716     1015    2.388s
Standard overclock (raspi-config "Pi 2" option):
1000    2       500     500     0 0 0   Fail
Other settings:
900     0       450     450     0 0 0   OK              778     1270    2.380s
900     0       600     467     0 0 0   Almost          804     1431    2.379s
900     2       600     467     0 0 0   OK (multi-test)
1000    0       467     467     0 0 0   OK (multi-test) 867     1410    2.146s
1000    0       500     483     0 0 0   OK (multi-test) 880     1502    2.146s
1000    0       500     483     2 0 0   OK (multi-test) 878     1502    2.169s
1000    2       500     500     0 0 0   Almost
1000    4       500     500     0 0 0   Almost
1000    0       500     500     2 2 0   Almost
1000    0       500     500     4 4 0   Almost?
1000    0       500     500     4 0 0   Fail            886     1415    2.143s
1000    2       500     500     4 0 0   Fail
1000    4       500     500     4 4 0   Fail (multi)
1000    0       500     500     6 6 6   ?
1000    2       600     467     0 0 0   OK (multi-test) 885     1518    2.145s
1000    2       600     500     4 0 0   OK (multi-test) 890     1553    2.142s
1000    2       667     500     4 0 0   Fail (freeze)
1000    6       667     500     6 0 0   Fail (freeze)
1050    0       466     466     4 4 4   OK
1050    0       466     533     4 4 4   Fail
1050    0       466     533     6 6 6   Fail (bitspr.)
1050    4       600     450     0 0 0   OK (multi-test) 916     1528    2.045s
1050    4       600     483     2 0 0   OK (multi-test) 924     1571    2.041s
1067    6       533     533     6 6 6   Fail
1067    4       533     533     8 8 0   Fail (bitflip)
1067    6       533     533     8 8 0   Fail (bitflip)
1067    6       533     500     4 4 0   Almost
1067    4       533     466     0 0 0   OK (multi test) 925     1521    2.010s
1100    0       466     466     0 0 0   Fail (boot)
1100    4       466     466     0 0 0   OK?
1100    4       600     467     0 0 0   Fail
1100    4       500     500     6 6 6   OK?
1100    4       500     500     6 6 0   OK?
1100    4       500     500     4 0 0   Almost
1100    4       500     500     6 0 0   OK?             950     1532    1.950s
1100    6       500     500     6 0 0   Almost
1100    4       533     533     6 0 4   Fail            962     1593    1.948s
1100    4       550     483     0 0 0   OK (multi-test) 944     1549    1.951s
1133    4       567     466     0 0 0   Almost          974     1578    1.893s
1133    4       567     467     4 0 0   Almost
1133    5       567     453     0 0 0   Almost          971     1571    1.896s
1133    8       567     453     0 0 0   Fail
1166    4       466     466     0 0 0   Almost          960     1451    1.841s
1167    4       466     466     2 2 4   Fail
1166    6       466     466     0 0 0   Fail            962     1451    1.841s
1167    8       500     500     4 0 0   Fail                            1.839s
1167    8       500     500     8 8 8   Fail
1200    8       600     450     4 0 0   Fail
The stable configurations show "OK (multi-test)" in the stability column, meaning they were stable during a test with multiple memtester processes running concurrently. Most unstable configurations have an SDRAM clock speed of 500 MHz or higher, or a CPU speed higher than 1100 MHz.

CPU frequency corresponds with the "arm_freq=" setting in /boot/config.txt. The CPU/main SoC voltage is set with over_voltage setting. The core clock (the L2 cache speed on the Raspberry Pi 2) is set with core_freq. The SDRAM frequency is set with sdram_freq, while voltage settings for the SDRAM physical layer, I/O and controller are set using over_voltage_sdram_p, over_voltage_sdram_i and over_voltage_sdram_c, of which the physical layer voltage seems to be the most relevant to overclocking. An example of the relevant lines in /boot/config.txt for a particular overclocking configuration (1000 MHz CPU, with stable 483 MHz SDRAM, as well as 256 MB memory reserved for GPU) follows.
arm_freq=1000
over_voltage=0
core_freq=500
sdram_freq=483
over_voltage_sdram_p=0
over_voltage_sdram_i=0
over_voltage_sdram_c=0
gpu_mem=256
See the official documentation for more details.

Observations based on stability testing


The following is apparent from testing my device:
  • The core_freq setting seems to be directly correlated with the L2 CPU cache in the new SoC, which has a large effect on performance. Depending on other frequencies, core_freq frequencies up to 600 MHz seem to be stable, giving a significant performance boost over the default configuration of 250 MHz.
  • When increasing CPU speed beyond roughly 1000 MHz, the CPU core voltage has to be bumped up.
  • Increasing SDRAM speed beyond about 483 MHz seems to cause instability on my device. Bumping up the SDRAM voltage (in particular the physical layer voltage, but not the I/O voltage or SDRAM controller voltage) may help a little for potential stability. However, SDRAM speeds of 500 MHz and higher tend to cause stability problems regardless of voltages on my device.
  • Certain divisor relationships between CPU clock and core (L2 cache) clock (such as 2:1) seem to enhance stability and performance.

CPU overclocking conclusions


  • The default Raspberry Pi 2 core_freq (L2 CPU cache) setting of 250 MHz appears to be extremely conservative. At the default CPU frequency of 900 MHz, 450 MHz (which has a nice divisor of two) appears to be very stable and even 600 MHz can be stable.
  • Unfortunately, the standard Raspberry Pi 2 overclocking setting available in raspi-config at the time of writing (1000 MHz CPU, 500 MHz core clock, 500 MHz SDRAM) appears to be unstable on my device due to a SDRAM clock speed that is slightly too high. Instead of bumping the CPU voltage as performed by this setting, increasing the SDRAM voltage (primarily the physical layer voltage) may improve stability, but clocking the SDRAM slightly lower at 483 or 467 MHz seems to be the best solution.
  • It seems likely that certain SDRAM parameters (CAS delay, etc) are set to fixed values by the kernel and that higher SDRAM speeds will be possible when these parameters are configurable or appropriately adjusted by the kernel for higher SDRAM clock speeds. However, the actual RAM chip used is an Elpida/Micron EDB8132B4PB-8D-F LPDDR2-800 chip specified for 400 MHz clock frequency, so the overclocking headroom may not be that high.

Table with stable high-performance clock configurations


The following table shows stable high-performance clock configurations tested on my device and their clock frequency ratios:
CPU     Over-   Core    Base
clock   volt    clock   Clock   CPU : Core      SDRAM   Overv.

1067    +4      533     533     2 : 1           467
1050    +4      600     150     7 : 4           483     +2
1000    +2      600     100     5 : 3           500     +4
1000            500     500     2 : 1           483     +2
 900    +2      600     133     3 : 2           467
 900            450     450     2 : 1           450
However, I may have to retest the configuration with an SDRAM frequency of 500 MHz because other configurations show such a setting to be unstable after extensive testing. Additionally, the 1100 MHz CPU frequency setting turned out not be completely stable.

Overclocking the GPU


By default, the Raspberry Pi as well as the Raspberry Pi 2 will use dynamic clocking, whereby the CPU speed, "core_freq" speed and SDRAM frequency are dynamically ajdusted based on CPU load. Any GPU frequency settings, as governed by the "v3d", "h264_freq" and "isp_freq" settings in config.txt, are ignored by default.

Using "force_turbo=1" allows overclocking of the GPU using the "v3d_freq", "h264_freq" and "isp_freq" options. "v3d_freq" corresponds to the frequency of the 3D block (the most relevant for overclocking), while "h264_freq" is the H.264 video block and "isp_freq" governs the camera interface. However, "force_turbo=1" also disables dynamic clocking, locking the CPU, core and SDRAM speeds to fixed maximum values, which is highly undesirable. Also note that using "force_turbo=1" may void the warranty of the device.

There is another setting, "avoid_pwm_pll=1", that allows "core_freq" to be set independently from that of the GPU on the original Raspberry Pi, at the cost of slightly reducing analog audio output quality. However, "force_turbo=1" is still required to be able to modify the GPU clock frequencies.

Because the Raspberry Pi 2 has an independent GPU with its own independent L2 cache seperate from the L2 cache of the CPU, some of these limitations may have become unnecessary (in particular the requirement that the CPU is locked at a high speed with "force_turbo=1" in order to be able to overclock the GPU), and if that is the case these restrictions will hopefully be removed in the future.

When running 3D benchmarks, the following CPU and SDRAM settings were used (note that when using of "force_turbo=1" to overclock the GPU, these frequencies are locked and do not scale down when the CPU is idle):
cpu_freq=900
over_voltage=0
core_freq=450
sdram_freq=483
When running 3D GPU benchmarks without overclocking the GPU (force_turbo=0), it looks like the CPU / L2 cache frequencies are scaled down quickly because the CPU load is relatively low, negatively affecting the throughput of the 3D benchmarks because of a CPU bottleneck, resulting in an initial peak in fps dropping to a lower base. To avoid this, we modify the sampling_down_factor of the ondemand cpufreq governor from 50 to 1000:
sudo sh -c "echo 1000 >/sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor"
The following settings overclock the 3D block (V3D) of the GPU from 250 MHz to 300 MHz:
force_turbo=1
avoid_pwm_pll=1
v3d_freq=300
These are the results of benchmark testing with different V3D clock speeds:
v3d_freq        demo1   demo1   demo2   demo2   demo2   demo5   demo9   game
                        lights          lights  shadows
default          81.1    20.5    26.1     8.87    0.98   50.5    46.4   112
300              95.3            28.4     9.88    1.12   56.7    49.3   130
350             109*     27.4    29.9    10.9     1.24   62.3    51.6   148
400             120*     30.6    31.4    11.7     1.35   40-52*  53.5   108*
450              80*     33.7    20.2*   12.3     1.45   40-56*  55.0   111*
Although the clock frequency of either the CPU or the 3D block seemed to be scaled down in some cases at higher V3D speeds (presumably due to temperature measurements or voltage readings resulting in throttling), there were actually never any signs of stability issues when overclocking the GPU, up the maximum tested speed of 450 MHz. The

Regular dynamic downclocking of the CPU can occur due to USB power supply/cable issue


Initially,  downclocking by the Raspberry Pi 2 kernel's under-voltage monitor seemed to be triggered a lot of more frequently than it is on the original Raspberry Pi. This results in a rainbow-colored icon being displayed in the top-right corner of the screen. This even happens briefly during boot. At such occasions, presumably the CPU and other components are downclocked in order to ensure stability.

The rainbow-colored square suggests a power supply issue since it indicates a voltage that is too low. As it turns out, replacing the USB power cable I was using with a shorter one that is better insulated eliminates the under-voltage warnings, with the same 5V/2A power supply.

Updated 1 March  2015 (update explanation for CPU speed throttling).
Updated  25 March 2015 (update with USB power cable findings).

19 comments:

  1. Most useful overclocking guide for raspberry pi 2 i have found. Many thanks!

    ReplyDelete
  2. Thanks for this great article. Trying my luck with 1067/533/467 now. 1100/550/483 doesn't seem to work for me, and anything with sdram 500 also seems to make it unstable.

    ReplyDelete
  3. I'm also getting the small rainbow square on boot-up... But I'm not sure if this might be a result of the on board regulator having too slow a slew rate and under volting the board for a very short time when it's first turned on. I can't say that I see the rainbow square any other time. I've tried various power supplies some of them even rated at 3A and it still shows the voltage drop warning rainbow in the top right of the screen at first start-up.

    ReplyDelete
  4. I use retropie to stress test overclock settings. Most emulators typically use just one cpu... but this one cpu really gets punished.
    Running rock solid at 1050/525/500 with 5 cpu over volt and a 19mm high aluminum heat sink on the CPU.

    ReplyDelete
  5. You report the default SDRAM clock speed of 400 MHz. In my RPi 2, I observed it is 450 MHz (by running 'vcgencmd get_config int' with no overclocking-related setting in boot.config.txt).

    ReplyDelete
    Replies
    1. Thanks for the correction, I have updated the article.

      Delete
  6. First of all thank you for the effort made - definitely a lot time was necessary to test, gather and present the results this way.
    Since I actually would want to use one of the medium overclocking settings you have tested as stable, I wonder which of the following is actually correct.
    In paragraph "Table with stability testing results" in line 5 you mention "1000, 0, 500, 483, 0, 0, 0, OK (multi-test)" indicating no overvolting was used but a stable result was achieved. In paragraph "Table with stable high-performance clock configurations" in line 4 you mention a setting with the same frequencies "1000, 500, 500, 2 : 1, 483, +2" indicating SDRAM needed overvolting. Please let us know - which one is correct? Or am I getting something wrong?

    ReplyDelete
    Replies
    1. Line 6 - forgive me for not reading on.

      Delete
    2. After some dark places quakeworld team fortress multiplayer testing, cpu testing and some stability testing I came up with the following:

      Raspberry pi 2 b +

      arm_freq=1050
      over_voltage=4
      core_freq=525
      gpu_freq=350
      sdram_freq=480
      over_voltage_sdram_p=2
      over_voltage_sdram_i=2
      over_voltage_sdram_c=2
      gpu_mem=256


      *note* It took me trying out 3 USB cables to get one that would power the pi, Shorter and thicker are better. I ended up re-using a motorola tablet cable. I ordered a 1.5ft cable and the cheap thing wouldnt even boot the pi and the next cable kept making the multi colored box appear in the right corner of the screen and the red status light blink indicating low voltage.

      Picked core freq based on clock , 1050/2 = 525. picked gpu freq because 525 * (2/3) = 350. Picked ram of 480 because most people said the ram stability is best between 450 and 480 unless cas latency is adjusted and that cant be adjusted outside the kernel. Which brings me to my next point, the auto kernel update by hexxeh seemed faster and more stable.

      I also added a heat sync to the cpu and gpu. Wasnt able to put one on the ram because of the case, however the case has vent holes for the ram.

      Ran a couple of mvdsv quakeworld servers on stock settings for 130+ days of solid uptime. I normally run a headless unit with gpu mem set to 64 meg. Since my normal computer psu went out I decided to get the pi up and running the weekly quake match. The fps went from 25 to 60 fps to 60 to 110 fps in the dp quake client (I suspect that more optimized games like quake 3 would do much better with higher quality graphics)

      The only other thing i was wondering about was if it were possible to bring ram speed to 525, would it help with stability and performance??? Also does cas latency play into the equation, if i remember right cas latency of 1 - 2 was high end for ddr 2 and edo ram I remember seeing cas 4 - 12 for ddr3 memory and remember it having an impact for overclocking and stability.


      And so we'll see how the new settings hold up for the next 180 days. Cheers!!

      Delete

  7. This was very informative. I learnt the basics of Raspberry Pi at http://au.rs-online.com/web/generalDisplay.html?id=infozone&file=expert-reviews/expert-reviews-raspberry

    ReplyDelete
  8. Hi.
    Great work !!!
    really appreciated!!

    One question: how do I get "memtester" to run on OSMC environment?

    Thank a lot in advance,

    cheers

    ReplyDelete
  9. An official Pi Foundation fix for RAM instability beyond ~480MHz is 'in the works'. Although it's not ready for general public release just yet, I've had the priviledge of playing with it on a Pi2B that was never completely stable at much beyond 475MHz with any amount of manual tweaking. It's now completely stable at 500MHz with no manual tweaks, or 550MHz with minimal manual tweaks. Another tester has complete stability at 600MHz with minimal tweaks. The really good news is that it looks as though everyone should be stable at 500MHz with this fix. I'll try to remember to post back here when the fix goes public. :)

    ReplyDelete
  10. The official Pi Foundation fix for RAM stability problems has been pushed to 'rpi-update' for those brave enough to try it. Bear in mind that you will be running an experimental kernel that may lead to other breakages, so it's highly advisable to make a full backup before going down this route.

    This has taken a small group of test Pi2B from a stable RAM clock limit of around 480MHz up to complete stability at 600MHz. Although 600MHz is not guaranteed, running 'rpi-update' followed by a reboot then adding the following to 'config.txt' will most likely get you there if your hardware is capable of it...

    sdram_freq=600
    sdram_schmoo=0x02000020
    over_voltage_sdram_p=6
    over_voltage_sdram_i=4
    over_voltage_sdram_c=4

    If this proves unstable, either drop the RAM frequency a little or raise all three voltages by 1 or 2. As always, YMMV. Hopefully, this should find its way into the official stable Raspbian kernel at some stage in the near future. Have fun! :-)

    ReplyDelete
  11. Awesome! Thanks for the update, completely stable for me now. Just when everything gets tweaked out and I'm sitting happy with my pi2 ready to actually now *use* the darn thing, they go launch the pi3...

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
    2. You're welcome. :-)

      I won't be rushing out to buy a Pi3 just yet as some of the first batch of 300,000 are being reliably reported as reaching around 100°C just stress testing the CPUs at standard clock speeds with no GPU load. Let the early adopters be your guinea pigs. ;-)

      The official Foundation advice is that "you may need a heatsink" on a Pi3 under some circumstances. After seeing the results from a precision Flirc camera of just the CPUs being stressed, I sure as hell wouldn't run a Pi3 without one...

      https://imgur.com/gallery/tzgPU/

      Delete
  12. Having your own particular web flag making programming pays off by making more deals for your business. Standard advertisements are a standout amongst the most prevalent and powerful techniques for web promoting.
    Design dine egne banner, Svendborg

    ReplyDelete
  13. Hi! great research thanks! But the highest stable configuration I can run is:
    1067
    4
    533
    466

    I get no stability with 1100 or above.

    ReplyDelete
  14. Thanks for the FANTASTIC post! This information is really good and thanks a ton for sharing it :-) I m looking forward desperately for the next post of yours.. graphics card prices

    ReplyDelete