Installation Guide

Big data Cumulus-Linux installation guide

Issue link:

Contents of this Issue


Page 3 of 26

B I G D AT A AN D C U MU L U S L I NU X : V AL I DA T E D DE SI G N GU I D E 4 Big Data with Cumulus Linux Objective This Validated Design Guide presents a design and implementation approach for deploying big data analytics on network switches running Cumulus Linux. This design uses the Apache™ Hadoop® project as an example implementation of a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models, scalable from single to thousands of machines, each offering local compute and storage. Network Demands of Apache Hadoop Big data is in some ways, the application that made the modern data center as we know it. Big data applications build resiliency into the application rather than rely on an infallible network, and require a communication medium that scales from a few nodes to tens or even hundreds of thousands of nodes. Specifically, big data applications use a lot of inter- server communication and are not tied to behaving as if all nodes are on the same subnet. In other words, big data applications are served well by Layer 3 fabrics built on a Clos topology. Apache Hadoop is a software framework that supports large-scale distributed data analysis on commodity servers, typically on Linux-based compute nodes with local disks. Figure 1. Network Traffic Patterns of Hadoop Hadoop is a leading example of a modern data storage and processing platform, and is ideally deployed in conjunction with a modern data center infrastructure. In order to best leverage the scale-out capabilities of MapReduce and YARN available in Hadoop, compute and storage resources should be deployed using a high performance, Layer 3 Clos "leaf and spine" network fabric. An example and popular distribution of Apache Hadoop is Hortonworks Data Platform (HDP), a 100% open source, enterprise grade Hadoop distribution. Hortonworks is a major contributor to open source initiatives (Apache Hadoop, HDFS, Pig, Hive, HBase, Zookeeper) and has extensive experience managing production-level Hadoop clusters.

Articles in this issue

view archives of Installation Guide - Big data Cumulus-Linux installation guide