Skip to main content

Posts

Showing posts with the label big data

Syslog Standards: A simple Comparison between RFC3164 & RFC5424

Syslog Standards: A simple Comparison between RFC3164 (old format) & RFC5424 (new format) Though syslog standards have been for quite long time, lot of people still doesn't understand the formats in detail. The original standard document is quite lengthy to read and purpose of this article is to explain with examples Some of things you might need to understand The RFC standards can be used in any syslog daemon (syslog-ng, rsyslog etc.) Always try to capture the data in these standards. Especially when you have log aggregation like Splunk or Elastic, these templates are built-in which makes your life simple. Syslog can work with both UDP & TCP  Link to the documents the original BSD format ( RFC3164 ) the “new” format ( RFC5424 ) RFC3164 (the old format) RFC3164 originated from combining multiple implementations (Year 2001)

Big Data - Jobs, tools and how to ace it

Big Data : Overview of Structure and Jobs  The demand for big data resources have increased dramatically in past few years. The requirements to create and get most out of "Big Data" environment is classified into 3 tiers Base Layer - DevOps and Infrastructure Mid  Layer - Understanding & manipulating data Front Layer - Analytics, data science I feel the jobs surrounding "Big Data" would also ultimately reflect this. Learning Big Data should be also based on these tiers. Software Suite/Tools Base Layer - Summary This layer forms the core infrastructure of "big data" platform and should be horizontally scalable. OS - Linux is the way forward for big data technologies. RedHat, SuSe, Ubuntu, CentOS Distributed Computing tools/software - Hadoop, Splunk Data Storage - Splunk, MongoDB, Apache Cassandra Configuration management - Ansible, Puppet, Chef Others - Networking knowledge, Version Control (Git) Mid Layer - Summary This

Hadoop Install Steps Tutorial for beginners

Many of you folks would be trying to install Hadoop (v2.3.0 & above) and its components to learn/try-out into the world of big-data. This Hadoop installation is done in 64-bit Ubuntu machine which is hosted in Windows 8.1 machine using Virtualbox (type 2) . I have allocated 3GB of RAM and 2 cpu-cores for Virtulbox Ubuntu which is pretty enough for " Pseudo-Distributed Operation " (aka Single Node Cluster) of Hadoop. Hadoop Installation on Ubuntu : Guide Prerequisites JavaTM 1.6.x  or above ( I used OpenJDK 1.7.0_51 Java ) ssh, sshd,  root access Hadoop 2.3.0, Hadoop 2.4.0   Environment and User setup Steps are shown in greyed boxes and can be copy pasted.  Install java (JDK) from Ubuntu Software Centre.  Locate installation path and softlink to a simplified location. (Mine was installed into  /usr/lib/jvm/java-7-openjdk-amd64/ ) # Install Java 7 JDK version sudo apt-get install java-1.7.0-openjdk-devel # creates a java folder for future u