Imagination’s stand-alone neural network accelerator – and two new GPUs
- 저자:Ella Cai
- 에 출시:2017-09-21
Imagination Technologies has revealed hardware neural network accelerator (NNA) intellectual property, as well as a pair of GPUs.
2048MAC/cycle of neural processing in a single core is predicted, with higher levels through multi-core integration.
Scroll down for new Imagination GPUs
“Companies building SoCs for mobile, surveillance, automotive and consumer systems can integrate the PowerVR Series2NX neural network accelerator for high-performance computation of neural networks at very low power consumption in minimal silicon area,” said the firm. “‘Convolutional neural networks’, ‘recurrent neural networks’ and ‘long short term memory networks’ are enabling an explosion in technological progress across industries.”
“Dedicated hardware for neural network acceleration will become a standard IP block on future SoCs just as CPUs and GPUs have done, “said Imagination marketing director Chris Longstaff.
Potential applications, according to Imagination (see below for details), include:
photography enhancement and predictive text enhancement in mobile devices
feature detection and eye tracking in augmented/virtual reality headsets
pedestrian detection and driver alertness monitoring in automotive safety systems
facial recognition and crowd behaviour analysis in smart surveillance
n-line fraud detection
content advice
‘predictive UK’ (user experience)
speech recognition and response in virtual assistants
collision avoidance and subject tracking in drones
Imagination PowerVR 2NX neural network precision exampleThe architecture is scalable across:
128-1,024 16bit MACs/clock and 256-2,048 8bit MACs/clock – and beyond with multi-core
Future cores with different performance points and feature sets to address multiple markets and applications
16, 12, 10, 8, 7, 6, 5, 4bit bit-depth data type support – 16bit for automotive, for example
Per layer adjustment for both weights and activations
Variable internal floating point precision – 32bit precision inside accumulator with variable output precision
The firm has not released performance figures, but is claiming an 8x performance density improvement versus DSP-only solutions. “In addition, neural networks are traditionally very bandwidth hungry, and the memory bandwidth requirements grow with the increase in size of neural network models. PowerVR 2NX can minimise bandwidth requirements for the external DDR memory to ensure a system is not bandwidth limited.
Longstaff did reveal: “Regognising one face per second requires a fairly low level of performance. So typically, you would be looking at the smallest version of our core. We can’t give out exact areas, but this would be <1mm2 in 16nm.
Without providing figures, Imagination is claiming:
Industry’s highest inference/mW
Industry’s highest inference/mm2
Industry’s lowest bandwidth solution – with support for bit depth for weights and data down to 4bit
Inferences?
“An inference is where a neural network has processed an input and created an output,” said Longstaff. “For example, one inference could be where one input image is processed by a neural network engine, and a set of outputs are provided, which describe the likelihood of a given object type being in the picture.
As examples, the firm is suggesting that 1,000 photos can be searched or sorted in two seconds on a 16nm process 2NX, compared with 60s taken by a “premium GPU” on a similar process, while 428,000 photos could be sorted for 1% of the charge on a certain battery, compared with 2,400 if the aforementioned GPU had 1% of energy to play with.
How does inference relate to MACS?
“The MACS is a measure of the performance of the convolution engine within the neural network engine, said Longstaff, “It is one of the processing heavy elements, but the MAC/s is not directly related to inferences/s.”
2NX includes hardware IP, software and tools to provide neural network solutions for SoCs.
It runs all common neural network computational layers.
Imagination PowerVR 2NX neural network acceleratorDepending on the computation requirements of the inference tasks, it can be used standalone, or in combination with CPUs, GPUs and other processors.
2NX development resources include mapping and tuning tools, sample networks, evaluation tools and documentation.
The mapping tool enables porting from industry standard machine learning frameworks such as Caffe and Tensorflow.
Imagination is also making available a deep neural network API to ease the transition between CPU, GPU and NNA. The single API works across multiple SoC configurations for prototyping on existing devices.
PowerVR 2NX NNA is available for licensing now.
Imagination: applications for 2NX neural network accelerator:
Mobile:
With the upcoming release of Tensorflow Lite and an API for Android, as well as momentum of the Caffe2Go framework, we will see an explosion in the number of AI enabled smartphone applications. Companies need a highly efficient way to perform inference tasks for functions such as image recognition, speech recognition, computational photography and more. PowerVR 2NX is the only IP solution today that can deliver against all of the requirements for a deployable mobile solution with its low power, low area, MMU and planned support for Android. In mobile devices where a GPU is mandated, companies can pair a new PowerVR Series9XE or 9XM GPU with the 2NX NNA in the same silicon footprint as a competing standalone GPU.
Smart surveillance:
Massive growth in numbers of cameras, both home and commercial installations, is driving the need for vision processing including neural networks. Smart cameras based on these technologies can be used for decision-making based on security alerts, retail analytics, demographics and engagement data. Taking into account bandwidth requirements, data confidentiality and other issues, cameras must be designed for some amount of ‘edge’ video analytics processing within the camera. Since these cameras typically have either no GPU or a very small GPU, and lower performance CPUs, what’s needed is an efficient, high-performance standalone neural network accelerator. The 2NX NNA is ideal, and is highly scalable to address both consumer and commercial implementations.
Automotive:
Applications for neural networks in vehicles include driver alertness monitoring, driver gaze tracking, seat occupancy, road sign detection, drivable path analysis, road user detection, driver recognition and others. As the number of autonomous vehicles and smart transportation systems increase over the next several years, these applications will continue to expand. Within automotive systems, a full hardware solution like the 2NX NNA is required to meet the associated performance points.
Home entertainment:
Devices such as set-top boxes and televisions will increasingly provide solutions based on neural networks, for example the ability to adapt preferences to certain users, provide automated child locks, and automatically pause and record programs based on user behavior. With such features, companies can increase their differentiation and revenues. Key to implementing neural networks on these devices will be highly efficient bandwidth and low cost as well as support for NN APIs – features at the heart of the 2NX NNA. There are numerous other emerging entertainment applications for NNA, including AR/VR.
“We’re really excited about potential applications for neural networks in VR and AR in terms of tracking, controlling, pose recognition, speech recognition and adding more interactivity to the experience,” said Tony Chia, founder of VR and AR tech firm Nanjing Ruiyue Technology). “A chipset incorporating PowerVR 2NX can provide the throughput and power efficiency needed to take the VR and AR experience to the next level.”
Imagination GPUs 9XM 9XENew GPUs
At the same time as 2NX, Imagination announced two GPUs: PowerVR Series9XE and PowerVR Series9XM.
Both are based on the firm’s existing Rogue architecture, not its recently-announced Furian architecture, which is optimised for gaming on phones.
“Furian is optimised for feature rich applications, and for performance/mW as well as scaling to higher performance, Said Longstaff. “The Rogue architecture has evolved over many generations to be very cost optimised, offering the best fill-rate/mm2 and performance/mm2. Some Furian features may be introduced into Rogue over time.”
Imagination GPUs 9XM 9XE9XE has the same fill-rate density as the existing Series8XE, but with 20% better application performance.
9XM has 50% better performance density than the existing Series8XEP.
Bandwidth in booth cases is improved by up to 25%.
API support includes OpenGL ES 3.2, Vulkan 1.0, OpenCL 1.2 EP, Renderscript and OpenVX 1.2.
OS support icludes Android, Linux, Tizen, WebOS, QNX, Integrity and Chrome OS.
In addition, there is hardware support for security and virtualisation, as well as an option on loss-less image compression.
Physical design optimisation kits are available for power/performance area balancing.
2048MAC/cycle of neural processing in a single core is predicted, with higher levels through multi-core integration.
Scroll down for new Imagination GPUs
“Companies building SoCs for mobile, surveillance, automotive and consumer systems can integrate the PowerVR Series2NX neural network accelerator for high-performance computation of neural networks at very low power consumption in minimal silicon area,” said the firm. “‘Convolutional neural networks’, ‘recurrent neural networks’ and ‘long short term memory networks’ are enabling an explosion in technological progress across industries.”
“Dedicated hardware for neural network acceleration will become a standard IP block on future SoCs just as CPUs and GPUs have done, “said Imagination marketing director Chris Longstaff.
Potential applications, according to Imagination (see below for details), include:
photography enhancement and predictive text enhancement in mobile devices
feature detection and eye tracking in augmented/virtual reality headsets
pedestrian detection and driver alertness monitoring in automotive safety systems
facial recognition and crowd behaviour analysis in smart surveillance
n-line fraud detection
content advice
‘predictive UK’ (user experience)
speech recognition and response in virtual assistants
collision avoidance and subject tracking in drones
Imagination PowerVR 2NX neural network precision exampleThe architecture is scalable across:
128-1,024 16bit MACs/clock and 256-2,048 8bit MACs/clock – and beyond with multi-core
Future cores with different performance points and feature sets to address multiple markets and applications
16, 12, 10, 8, 7, 6, 5, 4bit bit-depth data type support – 16bit for automotive, for example
Per layer adjustment for both weights and activations
Variable internal floating point precision – 32bit precision inside accumulator with variable output precision
The firm has not released performance figures, but is claiming an 8x performance density improvement versus DSP-only solutions. “In addition, neural networks are traditionally very bandwidth hungry, and the memory bandwidth requirements grow with the increase in size of neural network models. PowerVR 2NX can minimise bandwidth requirements for the external DDR memory to ensure a system is not bandwidth limited.
Longstaff did reveal: “Regognising one face per second requires a fairly low level of performance. So typically, you would be looking at the smallest version of our core. We can’t give out exact areas, but this would be <1mm2 in 16nm.
Without providing figures, Imagination is claiming:
Industry’s highest inference/mW
Industry’s highest inference/mm2
Industry’s lowest bandwidth solution – with support for bit depth for weights and data down to 4bit
Inferences?
“An inference is where a neural network has processed an input and created an output,” said Longstaff. “For example, one inference could be where one input image is processed by a neural network engine, and a set of outputs are provided, which describe the likelihood of a given object type being in the picture.
As examples, the firm is suggesting that 1,000 photos can be searched or sorted in two seconds on a 16nm process 2NX, compared with 60s taken by a “premium GPU” on a similar process, while 428,000 photos could be sorted for 1% of the charge on a certain battery, compared with 2,400 if the aforementioned GPU had 1% of energy to play with.
How does inference relate to MACS?
“The MACS is a measure of the performance of the convolution engine within the neural network engine, said Longstaff, “It is one of the processing heavy elements, but the MAC/s is not directly related to inferences/s.”
2NX includes hardware IP, software and tools to provide neural network solutions for SoCs.
It runs all common neural network computational layers.
Imagination PowerVR 2NX neural network acceleratorDepending on the computation requirements of the inference tasks, it can be used standalone, or in combination with CPUs, GPUs and other processors.
2NX development resources include mapping and tuning tools, sample networks, evaluation tools and documentation.
The mapping tool enables porting from industry standard machine learning frameworks such as Caffe and Tensorflow.
Imagination is also making available a deep neural network API to ease the transition between CPU, GPU and NNA. The single API works across multiple SoC configurations for prototyping on existing devices.
PowerVR 2NX NNA is available for licensing now.
Imagination: applications for 2NX neural network accelerator:
Mobile:
With the upcoming release of Tensorflow Lite and an API for Android, as well as momentum of the Caffe2Go framework, we will see an explosion in the number of AI enabled smartphone applications. Companies need a highly efficient way to perform inference tasks for functions such as image recognition, speech recognition, computational photography and more. PowerVR 2NX is the only IP solution today that can deliver against all of the requirements for a deployable mobile solution with its low power, low area, MMU and planned support for Android. In mobile devices where a GPU is mandated, companies can pair a new PowerVR Series9XE or 9XM GPU with the 2NX NNA in the same silicon footprint as a competing standalone GPU.
Smart surveillance:
Massive growth in numbers of cameras, both home and commercial installations, is driving the need for vision processing including neural networks. Smart cameras based on these technologies can be used for decision-making based on security alerts, retail analytics, demographics and engagement data. Taking into account bandwidth requirements, data confidentiality and other issues, cameras must be designed for some amount of ‘edge’ video analytics processing within the camera. Since these cameras typically have either no GPU or a very small GPU, and lower performance CPUs, what’s needed is an efficient, high-performance standalone neural network accelerator. The 2NX NNA is ideal, and is highly scalable to address both consumer and commercial implementations.
Automotive:
Applications for neural networks in vehicles include driver alertness monitoring, driver gaze tracking, seat occupancy, road sign detection, drivable path analysis, road user detection, driver recognition and others. As the number of autonomous vehicles and smart transportation systems increase over the next several years, these applications will continue to expand. Within automotive systems, a full hardware solution like the 2NX NNA is required to meet the associated performance points.
Home entertainment:
Devices such as set-top boxes and televisions will increasingly provide solutions based on neural networks, for example the ability to adapt preferences to certain users, provide automated child locks, and automatically pause and record programs based on user behavior. With such features, companies can increase their differentiation and revenues. Key to implementing neural networks on these devices will be highly efficient bandwidth and low cost as well as support for NN APIs – features at the heart of the 2NX NNA. There are numerous other emerging entertainment applications for NNA, including AR/VR.
“We’re really excited about potential applications for neural networks in VR and AR in terms of tracking, controlling, pose recognition, speech recognition and adding more interactivity to the experience,” said Tony Chia, founder of VR and AR tech firm Nanjing Ruiyue Technology). “A chipset incorporating PowerVR 2NX can provide the throughput and power efficiency needed to take the VR and AR experience to the next level.”
Imagination GPUs 9XM 9XENew GPUs
At the same time as 2NX, Imagination announced two GPUs: PowerVR Series9XE and PowerVR Series9XM.
Both are based on the firm’s existing Rogue architecture, not its recently-announced Furian architecture, which is optimised for gaming on phones.
“Furian is optimised for feature rich applications, and for performance/mW as well as scaling to higher performance, Said Longstaff. “The Rogue architecture has evolved over many generations to be very cost optimised, offering the best fill-rate/mm2 and performance/mm2. Some Furian features may be introduced into Rogue over time.”
Imagination GPUs 9XM 9XE9XE has the same fill-rate density as the existing Series8XE, but with 20% better application performance.
9XM has 50% better performance density than the existing Series8XEP.
Bandwidth in booth cases is improved by up to 25%.
API support includes OpenGL ES 3.2, Vulkan 1.0, OpenCL 1.2 EP, Renderscript and OpenVX 1.2.
OS support icludes Android, Linux, Tizen, WebOS, QNX, Integrity and Chrome OS.
In addition, there is hardware support for security and virtualisation, as well as an option on loss-less image compression.
Physical design optimisation kits are available for power/performance area balancing.