The recent release of the Jetson Nano is an inexpensive alternative to Jetson TX1:
|Jetson TX1 (Tegra X1)||4x ARM Cortex A57 @ 1.73 GHz||256x Maxwell @ 998 MHz (1 TFLOP)||4GB LPDDR4 (25.6 GB/s)||16 GB eMMC||$499|
|Jetson Nano||4x ARM Cortex A57 @ 1.43 GHz||128x Maxwell @ 921 MHz (472 GFLOPS)||4GB LPDDR4 (25.6 GB/s)||Micro SD||$99|
Basically, for 1/5 the price you get 1/2 the GPU. Detailed comparison of the entire Jetson line.
The X1 being the SoC that debuted in 2015 with the Nvidia Shield TV:
Fun Fact : During the GDC annoucement when Jensen and Cevat “play” Crysis 3 together their gamepads aren’t connected to anything. Seth W. from Nvidia (Jensen) and I (Cevat) are playing backstage. “Pay no attention to that man behind the curtain!”
The memory bandwidth of 25.6 GB/s is a little disappointing. We did some work with K1 and X1 hardware and memory ended up the bottleneck. It’s “conveniently” left out of the above table, but Xbox 360 eDRAM/eDRAM-to-main/main memory bandwidth is 256/32/22.4 GB/s.
Put another way, the TX1’s GPU hits 1 TFLOP while the original Xbox One GPU is 1.31 TFLOPS with main memory bandwidth of 68.3 GB/s (also ESRAM with over 100 GB/s, fugged-about-it). So, Xbone is 30% higher performance but has almost 2.7x the memory bandwidth.
When I heard the Nintendo Switch was using a “customized” X1, I assumed the customization involved a new memory solution. Nope. Same LPDDR4 that (imho) would be a better fit for a GPU with 1/4-1/2 the performance. We haven’t done any Switch development, but I wouldn’t be surprised if many titles are bottle-necked on memory. The next most likely culprit being the CPU if overly-dependent on 1-2 threads- but never the GPU.
Looks like we have to hold out until the TX2 to get “big boy pants”. It’s 1.3 TFLOPS with 58.3 GB/s of bandwidth (almost 2.3x the X1).
Follow the official directions. On Mac:
# For the SD card in /dev/disk2 sudo diskutil partitionDisk /dev/disk2 1 GPT "Free Space" "%noformat%" 100% unzip -p ~/Downloads/jetson-nano-dev-kit-sd-card-image | sudo dd of=/dev/rdisk2 bs=1m # Wait 10-20 minutes
The OS install itself is over 9 GB and the Nvidia demos are quite large such that a 16 GB SD card fills up quick. We recommend at least 32 GB SD card.
Boots into Nvidia customized Ubuntu 18.04.
Install other software to taste, remember to grab the arm64 / aarch64 version of binaries instead of arm/arm32.
- 3DMARK, PCMARK, Cinebench
- Unigine (Heaven, Valley)
- Several games that have benchmark/demo modes (e.g. “Rise of the Tomb Raider”, “Shadow of Mordor”, etc.)
- Claymore/Phoenix Miner
- A few others
But they’re either limited to Windows and/or x86. Seems the de-facto standard for ARM platforms might be the Phoronix Test Suite. Fall 2018, Phoronix did a comparison of a bunch of single-board computers that’s not exactly surprising but still interesting.
Purely for amusement also throwing in the results for the Raspberry Pi Zero. Which, to be fair, is in a completely different device class and target use-case.
|Test||Pi Zero||Pi 3 B||Nano||Notes|
|glxgears||107||560||2350||FPS. Zero using "Full KMS", when not using only manages 7.7 FPS. 3B using "Fake KMS", "Full KMS" caused display to stop working.|
|glmark2||399||383||1996||On the Pi’s several tests failed to run|
PTS is pretty nice. It provides an easy way to (re-)run a set of benchmarks based on a unique identifier. For example, to run the tests from the Fall 2018 ARM article:
sudo apt-get install -y php-cli php-xml # Download PTS somewhere and run/compare against article phoronix-test-suite benchmark 1809111-RA-ARMLINUX005 # Wait a few hours... # Results are placed in ~/.phoronix-test-suite/test-results/
|Test||Pi Zero||Pi 3 B||Nano||TX1||Notes|
|TTSIOD 3D Renderer||15.66||40.83||45.05|
|C-Ray||2357||943||851||Seconds (lower is better)|
|Primesieve||1543||466||401||Seconds (lower is better)|
|AOBench||333||190||165||Seconds (lower is better)|
|FLAC Audio Encoding||971.18||387.09||103.57||78.86||Seconds (lower is better)|
|LAME MP3 Encoding||780||352.66||143.82||113.14||Seconds (lower is better)|
|Perl (Pod2html)||5.3830||1.2945||0.7154||0.6007||Seconds (lower is better)|
|PostgreSQL (Read Only)||6640||12410||16079|
|PyBench||24349||7030||6348||ms (lower is better)|
|Scikit-Learn||844||496||434||Seconds (lower is better)|
The “Pi 3 B” and “TX1” columns are reproduced from the OpenBenchmarking.org results. There’s also an older set of benchmarks,
These all seem to be predominantly CPU benchmarks where the TX1 predictably bests the Nano by 10-20% owing to its 20% higher CPU clock.
Don’t let the name “TTSIOD 3D Renderer” fool you, it’s a software renderer (i.e. non-hardware-accelerated; no GPUs were harmed by that test). Further evidenced by the “Socionext Developerbox” showing. Socionext isn’t some new, up-and-coming GPU company, that device has a 24 core ARM Cortex A53 @ 1 GHz (yes, 24- that’s not a typo).
There’s more results for the Nano including things like Nvidia TensorRT and temperature monitoring both with and without a fan. But, GLmark2 is likely one of the only things that will run everywhere.
# Need to disable vsync for Nvidia hardware __GL_SYNC_TO_VBLANK=0 glxgears
Advanced Options > GL Driver > GL (Full KMS) > Ok adds
dtoverlay=vc4-kms-v3d to the bottom of
/boot/config.txt. Reboot and run
Getting GLmark2 working on the Nano is easy:
sudo apt-get install -y glmark2
On Pi, it’s currently broken.
You can use the commit right after Pi3 support was merged:
sudo apt-get install -y libpng-dev libjpeg-dev git clone https://github.com/glmark2/glmark2.git cd glmark2 git checkout 55150cfd2903f9435648a16e6da9427d99c059b4
There’s a build error:
../src/gl-state-egl.cpp: In member function ‘bool GLStateEGL::gotValidDisplay()’: ../src/gl-state-egl.cpp:448:17: error: ‘GLMARK2_NATIVE_EGL_DISPLAY_ENUM’ was not declared in this scope GLMARK2_NATIVE_EGL_DISPLAY_ENUM, native_display_, NULL); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/gl-state-egl.cpp at line 427 add:
#else // Platforms not in the above platform enums fall back to eglGetDisplay. #define GLMARK2_NATIVE_EGL_DISPLAY_ENUM 0
Build everything and run it:
# `dispmanx-glesv2` is for the Pi ./waf configure --with-flavors=dispmanx-glesv2 ./waf sudo ./waf install glmark2-es2-dispmanx --fullscreen
If it fails with
failed to add service: already-in-use? take a look at:
Both mention commenting out
/boot/config.txt- which was added when we enabled “GL (Full KMS)”.
After getting your system setup, take a look at “Hello AI World” which does image recognition and is pre-trained with 1000 objects. Start with “Building the Repo from Source”. It took a while to install dependencies, but then everything builds pretty quick.
cd jetson-inference/build/aarch64/bin # Recognize what's in orange_0.jpg and place results in output.jpg ./imagenet-console orange_0.jpg output.jpg # If you have a camera attached via CSI (e.g. Raspberry Pi Camera v2) ./imagenet-camera googlenet # or `alexnet`