Skip to main content

Posts

Showing posts with the label splunk

Big Data - Jobs, tools and how to ace it

Big Data : Overview of Structure and Jobs  The demand for big data resources have increased dramatically in past few years. The requirements to create and get most out of "Big Data" environment is classified into 3 tiers Base Layer - DevOps and Infrastructure Mid  Layer - Understanding & manipulating data Front Layer - Analytics, data science I feel the jobs surrounding "Big Data" would also ultimately reflect this. Learning Big Data should be also based on these tiers. Software Suite/Tools Base Layer - Summary This layer forms the core infrastructure of "big data" platform and should be horizontally scalable. OS - Linux is the way forward for big data technologies. RedHat, SuSe, Ubuntu, CentOS Distributed Computing tools/software - Hadoop, Splunk Data Storage - Splunk, MongoDB, Apache Cassandra Configuration management - Ansible, Puppet, Chef Others - Networking knowledge, Version Control (Git) Mid Layer - Summary This