Skip to main content

Posts

Showing posts with the label hadoop

Big Data - Jobs, tools and how to ace it

Big Data : Overview of Structure and Jobs  The demand for big data resources have increased dramatically in past few years. The requirements to create and get most out of "Big Data" environment is classified into 3 tiers Base Layer - DevOps and Infrastructure Mid  Layer - Understanding & manipulating data Front Layer - Analytics, data science I feel the jobs surrounding "Big Data" would also ultimately reflect this. Learning Big Data should be also based on these tiers. Software Suite/Tools Base Layer - Summary This layer forms the core infrastructure of "big data" platform and should be horizontally scalable. OS - Linux is the way forward for big data technologies. RedHat, SuSe, Ubuntu, CentOS Distributed Computing tools/software - Hadoop, Splunk Data Storage - Splunk, MongoDB, Apache Cassandra Configuration management - Ansible, Puppet, Chef Others - Networking knowledge, Version Control (Git) Mid Layer - Summary This

Hadoop Install Steps Tutorial for beginners

Many of you folks would be trying to install Hadoop (v2.3.0 & above) and its components to learn/try-out into the world of big-data. This Hadoop installation is done in 64-bit Ubuntu machine which is hosted in Windows 8.1 machine using Virtualbox (type 2) . I have allocated 3GB of RAM and 2 cpu-cores for Virtulbox Ubuntu which is pretty enough for " Pseudo-Distributed Operation " (aka Single Node Cluster) of Hadoop. Hadoop Installation on Ubuntu : Guide Prerequisites JavaTM 1.6.x  or above ( I used OpenJDK 1.7.0_51 Java ) ssh, sshd,  root access Hadoop 2.3.0, Hadoop 2.4.0   Environment and User setup Steps are shown in greyed boxes and can be copy pasted.  Install java (JDK) from Ubuntu Software Centre.  Locate installation path and softlink to a simplified location. (Mine was installed into  /usr/lib/jvm/java-7-openjdk-amd64/ ) # Install Java 7 JDK version sudo apt-get install java-1.7.0-openjdk-devel # creates a java folder for future u