Skip to main content

Posts

Showing posts with the label data science

Syslog Standards: A simple Comparison between RFC3164 & RFC5424

Syslog Standards: A simple Comparison between RFC3164 (old format) & RFC5424 (new format) Though syslog standards have been for quite long time, lot of people still doesn't understand the formats in detail. The original standard document is quite lengthy to read and purpose of this article is to explain with examples Some of things you might need to understand The RFC standards can be used in any syslog daemon (syslog-ng, rsyslog etc.) Always try to capture the data in these standards. Especially when you have log aggregation like Splunk or Elastic, these templates are built-in which makes your life simple. Syslog can work with both UDP & TCP  Link to the documents the original BSD format ( RFC3164 ) the “new” format ( RFC5424 ) RFC3164 (the old format) RFC3164 originated from combining multiple implementations (Year 2001)

Big Data - Jobs, tools and how to ace it

Big Data : Overview of Structure and Jobs  The demand for big data resources have increased dramatically in past few years. The requirements to create and get most out of "Big Data" environment is classified into 3 tiers Base Layer - DevOps and Infrastructure Mid  Layer - Understanding & manipulating data Front Layer - Analytics, data science I feel the jobs surrounding "Big Data" would also ultimately reflect this. Learning Big Data should be also based on these tiers. Software Suite/Tools Base Layer - Summary This layer forms the core infrastructure of "big data" platform and should be horizontally scalable. OS - Linux is the way forward for big data technologies. RedHat, SuSe, Ubuntu, CentOS Distributed Computing tools/software - Hadoop, Splunk Data Storage - Splunk, MongoDB, Apache Cassandra Configuration management - Ansible, Puppet, Chef Others - Networking knowledge, Version Control (Git) Mid Layer - Summary This