Intel Software Adrenaline

Optimizing Java* and Apache Hadoop* for Intel® Architecture

Issue link: http://read.uberflip.com/i/140302

Contents of this Issue

Navigation

Page 4 of 4

Optimizing Java* and Apache Hadoop* for Intel® Architecture Conclusion Apache Hadoop is transforming the way organizations analyze and store data, enabling new uses cases for data analytics. Massive amounts of semistructured and unstructured data can now be easily manipulated, and the scalability of Apache Hadoop means that organizations can expand from one to thousands of nodes as their needs grow. The Intel architecture provides a solid, high-performance foundation for Apache Hadoop clusters. Intel's and Oracle's work on improving performance with Java has yielded demonstrable improvements to Apache Hadoop performance. By combining servers based on the latest Intel Xeon processor family with Oracle's Java Virtual Machine, organizations can realize significant performance benefits across their Apache Hadoop clusters. Intel® Integrated Performance Primitives (Intel® IPP) compression increases TeraSort* benchmark performance 85% zlib 60% zlib with Intel® IPP 40% LZO LZO with Intel® IPP 30% Baseline No Compression Relative Job Running Time Figure 4: TeraSort* benchmark performance increases using Intel® Integrated Performance Primitives (Intel® IPP) compression with common compression algorithms For more information on how your Apache Hadoop cluster can benefit from Intel architecture, visit www.intel.com/bigdata and hadoop.intel.com. 1Peformance baseline consisted of a server configured with an Intel® Xeon® 5690 processor, a 7200 rpm SATA hard drive, and a single gigabit Ethernet adapter. The enhanced configuration that resulted in a one terabyte sort performance improvement from four hours to seven minutes consisted of a server configured with an Intel Xeon E5-2690 processor, Intel® SSD 520 series solid state drive, an Intel® Ethernet 10 Gigabit Server Adapter, and the Intel® Distribution for Apache Hadoop*. 2Benchmark performed using Specjbb2005 and Specjvm2008. See http://www.spec.org for more details. 3Intel® Advanced Encryption Standard–New Instructions (Intel® AES-NI) requires a computer system with an AES-NI-enabled processor, as well as non-Intel software to execute the instructions in the correct sequence. AES-NI is available on select Intel® Core™ processors. For availability, consult your system manufacturer. For more information, see http://www.intel.com/content/www/us/en/architecture-and-technology/advanced-encryption-standard--aes-/data-protection-aes-general-technology.html. 4The Apache Hadoop* cluster consisted of a master server and four slave servers over a 10 gigabit Ethernet network. The master server was configured with dual Intel® Xeon® X5570 processors, and 64 GB of RAM. The slave nodes each consisted of dual Intel® Xeon® E5-2680 processors; 128 GB RAM, two dual-core Intel® QuickAssist Technology compression cards with hardware version C1 SKU4, firmware version 1.0.0, and driver version 1.2.0; seven 300 GB solid state drives in a RAID 0 configuration; and an external SAS enclosure with 24 64 GB hard drives in a RAID 0 configuration. Each server was running CentOS Release 6.3 with Linux kernel 2.6.32-279.19.1.el6.x86_64, and Hadoop 1.0.4. The Terasort and Sort benchmarks were run against a 500 GB dataset. The 50 percent Terasort and 30 percent Sort performance increases were seen between running the benchmarks on JDK 1.6.0_14 and JDK 1.7.0_13. 5The Apache Hadoop cluster* consisted of a master server and four slave servers connected over a gigabit Ethernet network. The master server was configured with dual Intel® Xeon® X5570 processors, 64 GB RAM, a single 64 GB solid state drive, and a single 500 GB hard drive. Each slave server was configured with dual Intel® Xeon® X5570 processors. Each server was running Ubuntu 11.10 64-bit with Linux kernel 3.0.0-12.20, Hadoop 0.20.203.0, and the Oracle Java HotSpot™ 64-bit server VM version 1.7. 6The Apache Hadoop cluster* consisted of a master server and four slave servers connected over a gigabit Ethernet network. The slave servers where the benchmark was run were each configured with dual Intel® Xeon® E5-2680 processors, 64 GB RAM, two dual-core Intel® QuickAssist Technology compression cards, and three 300 GB solid state drives. Each server was running CentOS Release 6.3 64-bit and the Oracle Java HotSpot™ 64-bit server VM version 1.7. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, go to: http://www.intel.com/ performance/resources/benchmark_limitations.htm. Copyright © 2013, Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon, and Xeon Inside are trademarks of Intel Corporation in the U.S. and other countries. Apache, Apache Hadoop, Hadoop, and the yellow elephant logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. *Other names and brands may be claimed as the property of others. Printed in USA 0413/TA/PRW/PDF Please Recycle 328915-001US

Articles in this issue

Links on this page

view archives of Intel Software Adrenaline - Optimizing Java* and Apache Hadoop* for Intel® Architecture