White Paper

Xeon-D Vs Xeon-E for Embedded Radar Applications

Issue link: https://read.uberflip.com/i/1173442

Contents of this Issue


Page 7 of 7

www.mrcy.com INNOVATION THAT MATTERS ™ Corporate Headquarters 50 Minuteman Road • Andover, MA 01810 USA (978) 967-1401 • (866) 627-6951 • Fax (978) 256-3599 europe - MerCury systeMs, Ltd Unit 1 - Easter Park, Benyon Road, Silchester, Reading RG7 2PQ United Kingdom + 44 0 1189 702050 • Fax + 44 0 1189 702321 Mercury Systems and Innovation That Matters are trademarks of Mercury Systems, Inc. Other products mentioned may be trademarks or registered trademarks of their respective holders. Mercury Systems, Inc. believes this information is accurate as of its publication date and is not responsible for any inadvertent errors. The information contained herein is subject to change without notice. Copyright © 2017 Mercury Systems, Inc. 3314.00E-0417-wp-STAP Summary and conclusions QR-Decomposition requires a majority of the processing in STAP radar. As shown in this paper reaching in the order of 15 Gflop/s per core for QRD is something we can expect from an Intel server computer running optimized math libraries. It should also be possible to optimize this further. STAP radar will however need to do more than QRD and therefore we have studied the performance of running a somewhat more complete STAP ap- plication based on MITRE RT_STAP [1]. Given the presented data it appears realistic to expect at least 10 Gflop/s per core overall STAP computations reaching to 30 GFlop/s per core for some functions (e.g. FFT). With the used STAP example application we would need about four cores to process the incoming data stream of 22 channels in real-time. One may ask how many channels can be processed in a slot or even in a 19" rack. However since real world sampling speed can be higher and other factors come into play (e.g. increased processing order, degree of freedom, etc.) making such general statement is not practical. What we can envision is how much STAP processing can be done in a compact 6U embedded chassis. With OpenVPX technology it is practical to build a compact rugged deployed embedded system in the region of 10 slots. Typically in a deployed system there will be some slots for data input. There will also be a switch card as well as need for spare slots. A typical deployed system might therefore have around five server processor boards. If each board has 24 cores then we would have a 120-core STAP computer. Assuming we reach between 10-20 Gflop/s per core overall then this would allow us to reach at least 1.2 Tflops/s and possibly extending up to 2.4 Tflops/s of STAP processing per 19" rack. We have compared Xeon D Vs E and measured that for the same core count and clock speed, compute bound functions run in isolation perform simi- larly. But even with a compute bound application such as STAP there will be data movement such as corner-turn which takes time. Xeon E has twice the memory speed which reduces time for memory data movement by up to 50%. It also has nearly double the cache size which to a greater extent enables the cores to work from cache instead of memory. As shown herein this makes scaling across cores more straight-forward. The effect of cache size and memory speed can also be greatly enhanced in memory-bound ap- plications requiring more data movement. The higher performance of Xeon E allows us to reduce the number of boards and this reduces system size, weight (lower-SWaP) and complexity. As described herein commercial embedded server-class processing tech- nology is ready to make 3rd order STAP and beyond a reality in deployed systems. Table of Acronyms References [1] K.C. Cain, J. A. Torres, and R.T. Williams, "RT_STAP: Real-time space-time adaptive processing benchmark", MITRE Technical Report, The MITRE Corporation, Center for Air Force C3 Systems, Bedford, MA, USA, 1997. About the Author Jonas Larsson is a Principal Systems Application engineer for Mercury Systems. Mr Lars- son has 18 years of experience architecting, implementing and promoting high-performance embedded system solutions. Mr Larsson earned his bachelor of science (BSc) degree in electrical engineering from Chalmers University of technology in Sweden and his master of science (MSc) degree in Network centred computing from University of Reading in England. CPU Central Processor Unit DOF Direction Of Freedom FFT Fast Fourier Transformation FMA Fused Multiply Add (Intel) FPGA Field Programmable Gate Array GPU Graphical Processing Unit MTI Moving Target Indicator OS Operating System QPI Quick Path Interconnect (Intel) SMP Symmetric Multi-Processing STAP Space Time Adaptive Processing

Articles in this issue

Links on this page

view archives of White Paper - Xeon-D Vs Xeon-E for Embedded Radar Applications