Installing Hadoop2.7 on CentOS 7 (Single Node Cluster)

09 Wednesday Sep 2015

≈ Comments Off on Installing Hadoop2.7 on CentOS 7 (Single Node Cluster)

Tags

Hadoop is open source framework written in Java for complex high volume computation. Today’s industry data is expanded in 3 Vs (Volume, Velocity and Variety), making it difficult to analyze/interpret such data. Now hadoop’s distributed high fault tolerant filesystem (HDFS) is solution for 3Vs data expansion and map-reduce is programming plateform to analyze data in HDFS.

Today, we will be discuss step for simple installing up and running Hadoop on CentOS server machine.

Step 1: Installing Java
Hadoop require Java 1.6 or higher version of installation. Please check if java exists and if not install using the below command.

[root@localhost ~]$ sudo yum install java-1.7.0-openjdk
Output
......
Dependency Installed:
  giflib.x86_64 0:4.1.6-3.1.el6
  jpackage-utils.noarch 0:1.7.5-3.14.el6
  pcsc-lite-libs.x86_64 0:1.5.2-15.el6
  ttmkfdir.x86_64 0:3.0.9-32.1.el6
  tzdata-java.noarch 0:2015f-1.el6
  xorg-x11-fonts-Type1.noarch 0:7.2-11.el6                                     

Complete!

[root@localhost ~]$ java -version
Output:
java version "1.7.0_85"
OpenJDK Runtime Environment (rhel-2.6.1.3.el6_7-x86_64 u85-b01)
OpenJDK 64-Bit Server VM (build 24.85-b03, mixed mode)

Step 2: Create a dedicated Hadoop user
We recommend to create the dedicated user (non root) for hadoop installation.

Elastic Search integration with Hadoop

28 Saturday Jun 2014

Posted by leveragebigdata in Big Data

≈ 1 Comment

Tags

Elastic Search, Hadoop, Hive, MapReduce

Elastic is open source distributed search engine, based on lucene framework with Rest API. You can download the elastic search using the URL http://www.elasticsearch.org/overview/elkdownloads/. Unzip the downloaded zip or tar file and then start one instance or node of elastic search by running the script ‘elasticsearch-1.2.1/bin/elasticsearch’ as shown below:

Installing plugin:

We can install plugins for enhance feature like elasticsearch-head provide the web interface to interact with its cluster. Use the command ‘elasticsearch-1.2.1/bin/plugin -install mobz/elasticsearch-head’ as shown below:

And, Elastic Search web interface can be using url: http://localhost:9200/_plugin/head/

Creating the index:

(You can skip this step) In Search domain, index is like relational database. By default number of shared created is ‘5’ and replication factor “1” which can be changed on creation depending on your requirement. We can increase the number of replication factor but not number of shards.


curl -XPUT "http://localhost:9200/movies/" -d '{"settings" : {"number_of_shards" : 2, "number_of_replicas" : 1}}'