Intel Software Adrenaline

Haswell Architecture

Issue link:

Contents of this Issue


Page 1 of 6

The 4th gen Intel Core processor architecture focuses on three key elements: performance, modularity, and power innovations. Each element has its own goals, including improved performance of legacy (existing) code and the ability to extract greater parallelism with less coding work for developers. The modularity of the architecture gives the processor design its extreme flexibility while providing a consistent optimization path for software developers. The ability for developers to write an application that can run (at different feature or performance levels) across the entire array of devices built around the architecture is exceedingly powerful. The 4th gen Intel Core processor architecture is available in various configurations including two to four processing cores, three different levels of graphics subsystems, differing idle and active power levels, interconnects, and platforms. These configurations greatly increase the power and performance ranges of the architecture, when compared to the 2nd and 3rd gen Intel Core processor architectures, and are enabled by the system agent that acts as the intermediary between all of the components on the system-on-chip (SoC). CPU MICROARCHITECTURE IMPROVEMENTS The 4th gen Intel Core processor architecture includes many enhancements, including new instruction set architecture (ISA), better front-end design and improved branch prediction, changes to the x86 execution units, faster memory architectures, and power efficiency changes. Significant work was put into the out-of-order scheduling and memory hierarchy. All of these changes create an improved x86 design that raises the bar ever higher. The additions to the architecture instruction set fall into several categories. The new Intel® Advanced Vector Extension 2 (Intel® AVX2) technology moves Streaming SIMD Extensions (SSE) instructions from 128-bit to 256-bit, now on par with the performance of the first iteration of Intel AVX instructions on floating-point instructions. Intel AVX2 updates the extensions with support for permutes, shifts, and gathers in order to simplify code. Intel has also added support for nearly 100 instructions that fall into the FMA (Fused Multiply Add) category that perform computations on 128-bit and 256-bit vectors, enabling performance improvements in floating-point operations. Perhaps the most powerful addition to the 4th gen Intel Core processor architecture is Intel Transactional Synchronization Extensions (Intel® TSX). Intel TSX is a memory technology and instruction set that allows software developers to write specific parallel code with a focus on correctness and synchronization while the hardware of the 4th gen Intel Core processor architecture handles performance. With the two supported modes of Hardware Lock Elision (HLE) and Restricted Transactional Memory (RTM), Intel is supporting legacy code with Intel TSX as well as offering better performance for developers willing to update their software. The 2nd gen Intel Core processor architecture took significant steps to improve the front-end of the processor, and the 4th gen Intel Core processor architecture continues in that direction. While much of the setup phase of the For details on the enhancements to Intel® Quick Sync, see the white paper, "Intel® Quick Sync Video Technology on Intel® Iris™ Graphics and Intel® HD Graphics Family—Flexible Transcode Performance and Quality". I nte l ® S of tware Ad renal i ne 2

Articles in this issue

Links on this page

view archives of Intel Software Adrenaline - Haswell Architecture