Ed Ferrara MSIA CISSP eferraratempleedu Week 9 Big Data amp Splunk Agenda Chapter 1 Introduction Splunk amp Big Data What is Big Data Alternate Data Processing Techniques Machine Data ID: 548654
Download Presentation The PPT/PDF document "MIS 5208" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
MIS 5208
Ed Ferrara, MSIA, CISSPeferrara@temple.edu
Week 9: Big Data &
SplunkSlide2
Agenda
Chapter 1 Introduction / Splunk & Big DataWhat is Big Data?
Alternate Data Processing Techniques
Machine Data
What is Splunk?
Chapter 2
Variety of Data
Dealing with Data
File & DirectoriesSlide3
What is Big Data? The Three Vs
Big Data are:High volume
High velocity
High variety
Information assets that require new forms of processing to enable:
Enhanced decision making
Insight discoveryProcess optimization
Volume – Data measured in petabytes
Highway sensors
Data processing logs
Amazon purchase data
Velocity – Speed of data generation and frequency of delivery
Variety – Difference in the number of data typesSlide4
BIG DATA
Facebook had more than 1B users with more than 618M active on a daily basisLinkedIn had more than 200M members – with the service adding 2 new members every second
Instagram members upload 40M photos per day
Twitter has 500M users – with the service adding 150K per day
Wordpress
has more than 40M new posts per day
Pandora music streaming service has more than 13,700 years of musicEtc.Slide5
Splunk and the Kill Chain
There are four classes of data that security teams need to leverage for a complete view
:
log data
binary
data (flow
and PCAP)threat intelligence data
and contextual data.
If
any
of these data types are missing, there’s a higher risk that an attack will go unnoticed.
These data types are the building blocks
for knowing
what’s normal and what’s not in your environment.
This single
question lies at the intersection of both system
availability (
IT operations and application) and security use cases.Slide6
Splunk and the Kill Chain
Effective data-driven security decisions
require:
Tens
of terabytes of data per day without
normalization
Access data anywhere in the environment, including: Traditional
security data
sources
P
ersonnel
time management systemsHR databases
I
ndustrial
control systemsHadoop data stores and custom enterprise applications that run the businessDelivers fast time-to-answer for forensic analysis and can be quickly operationalized for security operations teamsMakes data more available for analysis and helps staff view events in context.
https://
www.splunk.com
/
web_assets
/pdfs/secure/
Splunk_for_Security.pdfSlide7
Machine Data
Machine data contains a definitive record of all the activity and behavior of your customers, users, transactions, applications, servers, networks and mobile devices
.
Machine data includes:
configurations
,
API dataMessage queuesChange
events,
Diagnostic command output
Call
detail
recordsSensor
data from
industrial
systemsMachine data comes in an array of unpredictable formats and the traditional set of monitoring and analysis tools were not designed for the variety, velocity, volume or variability of this data. A new approach, one specifically architected for this unique class of data, is required to quickly diagnose service problems, detect sophisticated security threats, understand the health and performance of remote equipment and demonstrate compliance.Slide8
Splunk Data SourcesSlide9
Machine Data
Data Type
Where
What
Application Logs
Local log files, log4j, log4net,
Weblogic
, WebSphere,
JBoss
, .NET, PHP
User activity, fraud detection, application performance
Business Process Logs
Business process management logs
Customer activity across channels, purchases, account changes, trouble reports
Call Detail Records
Call detail records (CDRs), charging data records, event data records logged by telecoms and network switches
Billing, revenue assurance, customer assurance, partner settlements, marketing intelligence
Clickstream Data
Web server, routers, proxy servers, ad servers
Usability analysis, digital marketing and general research
Data Type
Where
What
Configuration Files
System configuration files
How an infrastructure has been set up, debugging failures, backdoor attacks, time bombs
Database Audit Logs
Database log files, audit tables
How database data was modified over time and who made the changes
Filesystem Audit Logs
Sensitive data stored in shared filesystems
Monitoring and auditing read access to sensitive data
Management and Logging APIs
Checkpoint firewalls log via the OPSEC Log Export API (OPSEC LEA) and other vendor specific APIs from VMware and Citrix
Management data and log eventsSlide10
Machine Data
Data Type
Where
What
Message Queues
JMS,
RabbitMQ
, and
AquaLogic
Debug problems in complex applications and as the backbone of logging architectures for applications
Operating System Metrics, Status and Diagnostic Commands
CPU and memory utilization and status information using command-line utilities like
ps
and
iostat
on Unix and Linux and performance monitor on Windows
Troubleshooting, analyzing trends to discover latent issues and investigating security incidents
Data Type
Where
What
SCADA Data
Supervisory Control and Data Acquisition (SCADA)
Identify trends, patterns, anomalies in the SCADA infrastructure and used to drive customer value
Packet/Flow Data
tcpdump
and
tcpflow
, which generate
pcap
or flow data and other useful packet-level and session-level information
Performance degradation, timeouts, bottlenecks or suspicious activity that indicates that the network may be compromised or the object of a remote attackSlide11
Module Quiz
Machine data is always structured?True
False
Machine data makes up more than ___% of the data accumulated by organizations.
10%
25%
50%
90%
False
90%Slide12
Module Quiz
Machine data can give you insights into:?Application performance
Security
Hardware monitoring
Sales
User behavior
Machine data is only log files on web servers.
True
False
All of the Above
FalseSlide13
Splunk ComponentsSlide14
Splunk ComponentsSlide15
Index Data
Collects data from any sourceData Enters
Inspectors decide how to process the data into a consistent format
When the indexer finds a match – Splunk tags the data type for future use
Events are then stored in
Splunk IndexSlide16
Splunk Event Processing Slide17
Search & Investigate
Enter a query into the Splunk search bar
Run statistics using the Splunk search
language
Collects and indexes log and machine data from any
source
Powerful
search, analysis and visualization
capabilitiesSlide18
Add Knowledge
Data classification: Event types and transactions
Event types and transactions group together interesting sets of similar events.
Event types group together sets of events discovered through searches, while transactions are collections of conceptually-related events that span time.
Data interpretation: Fields and field extractions
Fields and field extractions make up the first order of Splunk Enterprise knowledge.
The fields that Splunk Enterprise automatically extracts from your IT data help bring meaning to your raw data, clarifying what can at first glance seem incomprehensible.
The fields that you extract manually expand and improve upon this layer of meaning.
Data models
Data models are representations of one or more datasets, and they drive the Pivot tool, enabling quick generation of useful tables, complex visualizations, and reports without needing to interact with the Splunk Enterprise search language.
Data models are designed by knowledge managers who fully understand the format and semantics of their indexed data.
Knowledge Objects
Event Types
Transactions
Tags
Saved Searches
LookupsSlide19
Add Knowledge
Data normalization: Tags and
aliases
Tags and aliases are used to manage and normalize sets of field information.
You
can use tags and aliases to group sets of related field values together, and to give extracted fields tags that reflect different aspects of their identity.
For
example, you can group events from set of hosts in a particular location (such as a building or city) together--just give each host the same tag.
Or
maybe you have two different sources using different field names to refer to same data--you can normalize your data by using aliases (by aliasing
client
ip
to
ip
address, for example).
Data
enrichment: Lookups and workflow
actions
Lookups
and workflow actions are categories of knowledge objects that extend the usefulness of your data in various ways.
Field
lookups enable you to add fields to your data from external data sources such as static tables (CSV files) or Python-based commands.
Workflow
actions enable interactions between fields in your data and other applications or web resources, such as a WHOIS lookup on a field containing an IP address.Data Models cont.A typical data model makes use of other knowledge object types discussed in this manual, including lookups, transactions, search-time field extractions, and calculated fields.Slide20
Monitor & Alert
Type of alert
Base search is a...
Description
Alert examples
Alerts based on
real-time searches
that trigger
every time
the base search returns a result.
Real-time search (runs over all time)
Use this alert type if you need to know the moment a matching result comes in. This type is also useful if you need to design an alert for machine consumption (such as a workflow-oriented application). You can throttle these alerts to ensure that they don't trigger too frequently. Referred to as a "per-result alert."
Trigger an alert for every failed login attempt, but alert at most once an hour for any given username.
Trigger an alert when a "file system full" error occurs on any host, but only send notifications for any given host once per 30 minutes.
Trigger an alert when a CPU on a host sustains 100% utilization for an extended period of time, but only alert once every 5 minutes.Slide21
Monitor & Alert
Type of alert
Base search is a...
Description
Alert examples
Alerts based on
historical searches
that run on a
regular schedule
.
Historical search
This alert type triggers whenever a scheduled run of a historical search returns results that meet a particular condition that you have configured in the alert definition. Best for cases where immediate reaction to an alert is not a priority. You can use throttling to reduce the frequency of redundant alerts. Referred to as a "scheduled alert."
Trigger an alert whenever the number of items sold in the previous day is less than 500.
Trigger an alert when the number of 404 errors in any 1 hour interval exceeds 100.Slide22
Monitor & Alert
Type of alert
Base search is a...
Description
Alert examples
Alerts based on
real-time searches
that monitor events within a
rolling time "window"
.
Real-time search
Use this alert type to monitor events in real time within a rolling time window of a width that you define, such as a minute, 10 minutes, or an hour. The alert triggers when its conditions are met by events as they pass through this window in real time. You can throttle these alerts to ensure that they don't trigger too frequently. Referred to as a "rolling-window alert."
Trigger an alert whenever there are three consecutive failed logins for a user between now and 10 minutes ago, but don't alert for any given user more than once an hour.
Trigger an alert when a host is unable to complete an hourly file transfer to another host within the last hour, but don't alert more than once an hour for any particular host.Slide23
Report & Analyze
When you create a search or a pivot that you would like to run again or share with others, you can save it as a report. This means that you can create reports from both the Search and the Pivot sides of
Splunk
Enterprise
. After
you create a report you can:
Run the report on an ad hoc basis to review the results it returns on the report viewing page. You can get to the viewing page for a report by clicking the report's name on the Reports listing page.
Open
the report and edit it so that it returns different data or displays its data in a different manner. Your report will open in either Pivot or Search, depending on how it was created
.
This
topic explains how you can create and edit
reports.In
addition, if your permissions enable you to do so, you can:
Change the report permissions to share it with other Splunk Enterprise users. Schedule the report so that it runs on a regular interval. Scheduled reports can be set up to perform actions each time they're run, such as sending the results of each report run to a set of stakeholders.
Accelerate slow-completing reports built in Search.
Add
the report to a dashboard as a dashboard panel
. For
more information about scheduling reports, see "Schedule reports," in this manual
.
http://
docs.splunk.com
/Documentation/Splunk/6.0.2/Report/CreateandeditreportsSlide24
Splunk User RolesSlide25
Module 2 Quiz
Which of these is not a main component of Splunk?Collect and Index the data
Search and Investigate
Add knowledge
Compress and Archive
The index does not play a major role in Splunk
True
False
Compress and Archive
FalseSlide26
Module 2 Quiz
Data is broken into single events by:Sourcetype
Host
Number of files
The “-” character
Time stamps are stored _____.
In a consistent format
Differently for each indexed item
Differently for each year
As Images files
Sourcetype
In
a consistent
formatSlide27
Module 2 Quiz
Which role defines what apps a user will see by default:Admin
Power
User
Which two apps ship with Splunk
Enteprise
DB Connect
Search & reporting
Sideview
Utils
Home App
Admin
Search & Reporting
Home AppSlide28
Installing Splunk
DemonstrationSlide29
Thank you