Introduction
Hue is Hadoop User Experience which provides web based interface to Hadoop and its related services. Its light weight web server based on Django python Framework.
Image courtesy gethue
23 Friday Dec 2016
Posted Big Data
in≈ Comments Off on Install Cloudera Hue on CentOS / Ubuntu
Introduction
Hue is Hadoop User Experience which provides web based interface to Hadoop and its related services. Its light weight web server based on Django python Framework.
Image courtesy gethue
20 Tuesday Dec 2016
Posted Big Data
in≈ Comments Off on Integrate Spark as Subscriber with Kafka
Apache Spark
Apache Spark is robust big data analytical computation system, that uses Hadoop (HDFS) or any streaming source like Kafka, Flume or TCP sockets as data source for computation. It is gaining popularity because it provide big data ecosystem with real-time processing capabilities.
In many real scenarios, for instance click stream data processing or recommendations to customers or managing real time video streaming traffic , there is certainly a need to move from batch processing to real time processing. Also in many such use case, there are endless requirement for robust distributed messaging system such as Apache Kafka, RabbitMQ, Message Queue, NATS and many more.
Apache Kafka
Apache Kafka is one of the well known distributed messaging system that act as backbone for many data streaming pipelines and applications.
Kafka project support core API i.e Producer API,Consumer API, Stream API, Connector API. We can develop create application for publish data to a topic or consume data from a topic using these core API.
In this tutorial, I will be discuss about spark streaming to receive data from Kafka.
19 Monday Dec 2016
Posted Big Data
in≈ Comments Off on Apache Kafka setup on CentOS
Apache Kafka
Apache Kafka is a distributed messaging system using components such as Publisher/Subscriber/Broker. It is popular due to the fact that system is design to store message in fault tolerant way and also its support to build real-time streaming data pipeline and applications.
In this message broker system, we create a topic(category) and list of producers which send message on a topic to brokers and then message from brokers are either broadcast or parallel processed by list of consumer registered to that topic.In this, the communication between producer and consumer are performed using TCP protocol.
ZooKeeper also integral part of the system, which help in co-ordination of distributed brokers and consumers.
This is the simple working model as shown below.
In this tutorial, I will discuss the steps for installing simple Kafka messaging system.
17 Saturday Dec 2016
Posted Big Data
inApache Spark
Apache Spark is one of the powerful analytical engine to process huge volume of data using distributed in-memory data storage.
Apache Hadoop Yarn
Hadoop is well-known as distributed computing system that consists of Distributed file system (HDFS), YARN (Resource management framework), Analytical computing job (such as Map Reduce, Hive,Pig, Spark etc).
Apache Spark analytical job can be run on Standalone Spark Cluster or YARN cluster or Mesos cluster.
In this tutorial, I will go through details steps and problem facing while setting up Spark job to run on remote YARN cluster. Since, I have just one computer, I have create 2 users (sparkuser & hduser). Now, Hadoop is installed as ‘hduser‘ and Spark installed as ‘sparksuser‘.
15 Thursday Dec 2016
Posted Big Data
in≈ Comments Off on Install Spark on Standalone Scheduler
Apache Spark
Apache Spark is cluster computing framework written in Scala language. It is gaining popularity as it provides real-time solutions to big data ecosystem.
Installation
Apache spark can be installed on stand alone mode by simply placing the compile version of spark on each node or build it yourself using the source code.
In this tutorial, I will provide details of installation using compile version of spark.
14 Wednesday Dec 2016
Posted Big Data
in≈ Comments Off on Installing Scala on CentOS
Scala
Scala is a programming language, which support object oriented and functional programming paradigm. This language is gaining popularity as it is becoming good choice for scalable distributed systems.
There are many ways the language can be configured on CentOS machine.
1. Install SBT
Refer to the article Spark Development using SBT in IntelliJ
10 Saturday Dec 2016
Posted Big Data
in≈ Comments Off on Install Maven on CentOS
Tags
Maven is open source (written in java) powerful build tool for java development projects. We can automate task such as compile,clean, build, deploy and also dependency management.
In order to install Maven on CentOS, follow the below steps:
1.Download the Maven tar ball
Download the tar in the folder you want to extract to using below command
wget http://mirror.reverse.net/pub/apache/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz
2. Extract the tar ball
tar xvf apache-maven-3.3.9-bin.tar.gz
10 Saturday Dec 2016
Posted Big Data
in≈ Comments Off on Functional vs Object Oriented Programming
There are many programming model exists today such as Procedural Programming, Structural Programming,Event Driven Programming, Object Oriented Programming, Functional Programming and many more. In this tutorial, we will be focus on just two programming models.
Overview of Object Oriented Programming
In this model, there is a concept of Object which consists of set of data and operations on it.
The object is basically the blue print of any real thing such as aeroplane, car,chair, table, any living person and so on. Or object can be represents any conceptual term such as Company, Bank, Account,Customer and so on.
Each object will have attributes called data associated to it. And we can manipulate the data in the object using the method/operations defined in the it. This concept is called Data encapsulation.
Also, the object can inherit attributes and operations from other object. This terminology is called inheritance. For eg. Horse is a ‘kind of’ Animal. Therefore, it can inherit the data and operations from Animal.
Many programming language support object oriented model such as Java, C++, PHP,Python, Scala and many more.
Lets take a example of the Scala code wherein we will demonstrate a class called Account and inherited class Saving Account.
09 Friday Dec 2016
Posted Big Data
in≈ Comments Off on Spark Development using SBT in IntelliJ
Apache Spark
Apache Spark is open source big data computational system. It is developed using Scala programming language which run on JVM ( Java Virtual Machine) platform. Today, popularity of Spark is increasing due to it’s in-memory data storage and real time processing capabilities. This computational system provides high level API in Java, Scala and Python. Therefore, we can run data analytical queries using these high level API on Spark system and get desire insights. Spark can deployed to standalone cluster, Hadoop 2 (YARN) or Mesos.
SBT Overview
SBT is Simple Build Tool. A build tool help in automation of tasks like build,compile, test, package, run, deploy. Other build tools are like Maven, Ant, Gradle, Ivy. SBT is also one othe build tool that focus mainly on Scala projects.
Today, I am going to explore to write a basic query using Spark high level API in Scala 2.10. Also, I will be using IntelliJ as IDE for development.
Now, all set. Let get our hands dirty with some actual coding.
07 Wednesday Dec 2016
Posted Big Data
in≈ Comments Off on CentOS 6 Installation using pen drive
On my laptop Dell Inspiron 5010, operating system windows 7 got crashed and I was not able to restore it. Another problem was that it was running too slow so decided to format it and freshly install CentOS 6.
Below are the steps for installation of CentOS:
1. Created a bootable pen drive using Mac machine (used dd command)
Prerequisite:
Pen drive should have ample space (350 MB for minimal boot media and 4.5 GB for full installation media).I have done minimal installation. It will be formatted before iso image is copied to it.
a. Download your favourite iso image using the url https://www.centos.org/download/ (Latest CentOS) or http://isoredirect.centos.org/centos/6/isos/x86_64/ (Centos 6).
b. Figure out the device of USB stick.
First, list all the disk attached to Mac machine
diskutil list