Mark Runals Sr Security Engineer About Me Have been using Splunk for 2 years ArcSight admin for 3 years medium size deployment Motto Solve for 80 and move on Presentation Focus Caveats ID: 595796
Download Presentation The PPT/PDF document "Splunk quick start" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Splunk quick start
Mark Runals
Sr
Security EngineerSlide2
About MeHave been using Splunk for ~2 yearsArcSight admin for 3 years medium size deploymentMotto – Solve for 80% and move onSlide3
Presentation Focus / CaveatsFocus:High level tips on architecture and methodologies that have worked for OSU (potentially best practices)Get fundingGettingStartedSpecificUse Cases
ROICaveats:I don’t work for SplunkEveryone’s environment is different
This brief won’t be sufficient to answer all questions =)Slide4
AgendaMisc stuffHow many FTEs are needed?General server architecturePremade contentCommonly used config filesKeeping configuration files updatedIndex creation strategyMisc stuffSlide5
The value of visualizationExternal Threats!!!1!!1!1Top 5 CountriesChinaUnited StatesIndiaBrazil181826223844
Blocked IPs: Action taken on 3,225 external IPs attacking us in the last <timeperiod> Bro Snort1472741691Alerts in the last <timeperiod>
Are you doing this sort of reporting?Slide6
The value of visualizationBlocked IPs: Action taken on 3,225 addresses in the last <timeperiod> Slide7
What is Splunk? / Why use Splunk? Do we need to cover this?Slide8
Internet2 Splunk Deal3 year term license1 TB Maxhttp://www.internet2.edu/products-services/cloud-services-applications/splunk/#service-overviewMore informationSlide9
How many FTEs?Little dataLots of dataComplexityData diversityLog volumeEnvironmental ComplexityWho creates content
User diversityWhat’s your end game?Algorithms I’ve heard1 FTE per 7 servers1 FTE per TB daily volume(not 1:1)Slide10
FTE RequirementsCentrally Managed Service - Large EnvironmentService Work ListNew client interactionOnboard new dataData ManagementKnowledge ManagementDeploying appsTrainingContent CreationTestingTuning SplunkCustomer interactionDeployment Management
PoliticsData requestsGeneral Program ManagementPlanningServices SupportFixing stuffGeneral & random BS
Program &
Service Management
Content Creation
Care and Feeding
1 FTE
2
FTE
3 FTESlide11
Server ArchitectureGraphic from .conf2013 Best Practices: Deploying Splunk on Physical, Virtual and Cloud Infrastructure Slide12
Server ArchitectureFunctional Overview
Search heads
User interacts with Splunk, searches, alerts,
etc
Indexers
I
ngests and stores data, responds to queries
Forwarders
Collects and send data to indexers
Note: a single server can perform all three functions depending on data volumeSlide13
Server ArchitectureGeneral GuidanceCPUs / Cores3 Ghz 12 – 20 total coresGeneral rule of thumb for indexers1 indexer per 100GB of logs (daily throughput)Physical or VirtualVirtual: 20 – 30% in indexing performance reductionStorage: Local
vs SAN vs NAS vs other>> IOPS is a big performance constraint <<Production – if IOPS < 800 you need a different solutionRAID 1+0 arraysWindows or LinuxWindows: 10 – 20% in indexing performance reductionSlide14
Server ArchitectureGrowth Factors1:1 Search to core ratioAdd indexers before search headsMore servers > fewer beefy serversHow much incoming data?How many concurrent active users?Lots of real-time searches?What types of searches?
(similar to FTE questions)Slide15
Content DevelopmentSplunkBaseSplunkBase: great place to get startedApp can fulfill three types of functionsData management (i.e. getting data in)Knowledge management (i.e. define fields)Data visualizationSuggested appsSplunk on Splunk (SoS)Fire BrigadeWindows Security Operations Center
Windows / Nix Apps - at least the TAsDeployment Monitor – (if using Deployment Server & on 5x)Slide16
Splunk ConfigsLots and lots – beyond the scope of this preso Mostly use:inputs.conf – what is ingested: file paths, TCP/UDP ports, scripts. Typically live on forwardersprops/transforms.conf – data management instructions (next slide)
Live on indexers/search headsSlide17
Splunk Configsinputs.confCommon Attributes sourcetype host_segment index disabled ignoreOlderThan crcSaltTells Splunk what data to collectmonitor – directories or specific directories
TCP/UDP – ports listeningbatch – read and then delete datascript – run a local scriptGeneral useexplicit sourcetypingespecially useful on syslog servers (path split by host)where should ‘this’ monitored data gosome troubleshooting usesgood for limiting system loadread Splunk’s doc; especially useful for small filesSlide18
Splunk ConfigsTwo main data management configsProps.confTransforms.confCapabilities (not complete list)Timestamp recognitionLinebreakingHost overrideSourcetype overrideSimple Field Extractions
Complex Field CreationSlide19
Splunk ConfigsProps/Transforms RecommendationsTechnology xprops.conftransforms.conf…/deployment-apps/<group>_<technology>_TA
Place both
config
files in same folder
(why? note DS slides)
Use a common naming convention
Keep in mind alpha sorting
Way to ID the type of
configs
Splunk uses ‘TA’ = Technology
Addon
osu_shibboleth_props
osu_netflow_propsSlide20
Splunk ConfigsField Definitions – props.confRelatively simple search time field extractions via regex[my_sourcetype]EXTRACT-name_field = (?<name>\S+)EXTRACT-device = device_id=(?<device>\S+)Both call transforms.conf
Report = search time fieldsTransforms = index time fields[my_sourcetype]REPORT-<class> = <transforms_stanza_name>TRANSFORMS-<class> = <transforms_stanza_name>Three OptionsNote: defining fields isn’t required to search logsSlide21
Splunk ConfigsField ExtractionDefine fields inline[my_sourcetype]EXTRACT-data_fields = user (?<user>\S+) logged in from (?<device>\S+)[sourcetype_stanza_1]REGEX = user (?<user>\S+) logged in from (?<device>\S+)OR[sourcetype_stanza_1]REGEX = user (\S+) logged in from (\S+)
FORMAT = user::$1 device::$2
props
transforms
transforms
Pro tip: Fields
for new data source
Create search with
rex
Email to SME for validation
Plug into
configs
ProfitSlide22
Splunk ConfigsUse EXTRACT or REPORT?Delimiter based field definitionConcatenate fieldsReuse field extractions across multiple data sources/types Perform additional extraction within a particular fieldSetup configs for multi-value fields (requires use of fields.conf as well)
Generally speaking Extract and Report do the same thing. However there are times to use report to call
transforms.conf
or use
transforms.conf
in generalSlide23
Update ConfigsDo you have anything in-house? Chef, Puppet, Other ?Our ChallengesEach College IT shop is autonomousNothing is standardNo centralized asset management Splunk
Deployment ServerAt what point should you use an automated update mechanism?Forwarders on servers out of your direct controlMore than one indexer or search headMore than a handful of forwardersSlide24
Update ConfigsWhat to manage with Deployment Server?Smaller environmentMore focused on forwarder inputsMedium to Larger environmenteg: multiple indexer or search head servers
Forwarder inputsKeep server configs in synceg: single server indexer/search headSlide25
Update ConfigsSetting up Deployment Server
Can be installed on any Splunk server (ideally not an indexer)Put some content in SPLUNK_HOME/etc/deployment-appsCreate a serverclass.conf file in SPLUNK_HOME/etc/system/localCreate a deploymentclient.conf file on local agent in SPLUNK_HOME/etc/local
Typical
serverclass.conf
*
entry
[
serverClass:some_servers
]
whitelist.0 =
server_name
restartSplunkd
= true
[
serverClass:some_servers:app:some_content
]
Typical
deploymentclient.conf
[
target-broker:deploymentServer
]
targetUri
= splunk_ds.mycompany.com:8089
* $SPLUNK_HOME/
etc
/system/local/
serverclass.confSlide26
Update ConfigsWhitelisting Servers (serverclass.conf)
Options:Hostname
Considerations:
Can use wildcards / regex
Hostname collision (DC1)
Requires upfront list of servers
Did they use a (rational) naming convention?
[
serverClass:psychobotany_servers_win
]
whitelist.0 = psychobotany_dc01
whitelist.n
=
random_server_name
[
serverClass:psychobotany_servers_win:app:win_inputs
]Slide27
Update Config:Whitelisting Servers (serverclass.conf)
Options:HostnameIP address
Considerations:
Can use wildcards / regex
Doesn’t support CIDR
Multiple private IP space?
[
serverClass:psychobotany_servers_win
]
whitelist.0 =
10.10.10.*
[
serverClass:psychobotany_servers_win:app:win_inputs
]Slide28
Update ConfigsWhitelisting Servers (serverclass.conf)
Options:HostnameIP addressclientName string
Considerations:
Can use wildcards / regex
Key to rollout success at OSU
Local
Deploymentclient.conf
[deployment-client]
clientName
= psychobotany_win_dc01
[
serverClass:psychobotany_servers_win
]
whitelist.0 =
psychobotany_win
_*
[
serverClass:psychobotany_servers_win:app:win_inputs
]Slide29
Update ConfigsRandom Deployment Server TipsOne DS can manage ~3k check-ins per minute (Linux)500 check-ins per minute (Windows)Change default phonehome interval via Deployment Server packageGreat for troubleshootingDefault is every 30 secondsCan use DS to manage index.conf file on idx/
shPut technology X props/transforms in same package; deploy to both idx/shSlide30
Update Configs:Splunk Deployment ServerWhy bundle props/transforms together?Both files have settings that might be applied at index or search time Easier to just send updates out once
Set
restartSplunkd
to false to avoid inopportune service restarts
If
initial point of entry is heavy
forwarder and you need to change
index time
fields send the props/transforms file to it –
eg
syslog server
[
serverClass:all_search_heads
]
whitelist.0 = search_head_0*
restartSplunkd
= false
[
serverClass:all_search_heads:app:company_sso_props
]
[
serverClass:all_search_heads:app:company_firewall_props
]
[
serverClass:all_indexers
]
whitelist.0 = indexer_0*
restartSplunkd
= false
[
serverClass:all_indexers:app:company_sso_props
]
[
serverClass:all_indexers:app:company_firewall_props
]Slide31
Index Creation
splunk >
index = ??Slide32
Index CreationGeneralDon’t send data to ‘main’Default out-of-the-box location for dataCreate an alert to let you know when data IS in the main indexGive some consideration to log volumeNo need to be overly granular but can help search performancee.g. finding rare eventsCreate indices with logical / role based boundariesGroups or units, technologies (e.g. database, web, etc)Easiest way to grant permissions to dataUse to set retentionAge out data based on storage or dateSlide33
Index CreationGeneralDon’t send data to ‘main’Default out-of-the-box location for dataCreate an alert to let you know when data IS in the main indexGive some consideration to log volumeNo need to be overly granular but can help search performancee.g. finding rare eventsCreate indices with logical / role based boundariesGroups or units, technologies (e.g. database, web, etc)Easiest way to grant permissions to dataUse to set retentionAge out data based on storage or dateSlide34
Index CreationOSU’s General StrategyColleges1 – 5 admins for entire technology stackPrimary focus – audit complianceLarge variety of log sourcesEasy RBAC!ServersServersIIS
Firewall xFirewall yApacheIDSPsychobotanyXenopsychology
Office of the CIO
Service organization
Dedicated teams at various tiers
RBAC about to become a PITA
DC Firewalls
Server Management
Middleware
Basketweaving
SyslogSlide35
MiscellaneousRandom ThoughtsField creationCan create fields using eval statement in props.confi.e. calculations, case statements, etcShared resource for users?Consider removing user’s schedule search and real-time search abilitySomething to consider based on size/complexity of environmentCreate an app for each groupAbility for each group to create and share content ‘internally’Gives group a sense of ownershipLots of syslog data?Don’t send it directly to the indexers
Receive it on a server and ingest with a local universal or heavy forwarderUniversal forwarder – more efficient with high loadsHeavy forwarder – can adjust index time fields w/o restarting your indexers (ie host field)Slide36
MiscellaneousSplunk Config Order of Precedence
On bootSPLUNK_HOME/etc/default/… SPLUNK_HOME/etc/apps/default/0-9… SPLUNK_HOME/etc/apps/default/a-z….
SPLUNK_HOME/
etc
/apps/local/0-9…
SPLUNK_HOME/
etc
/apps/local/a-z….
SPLUNK_HOME/
etc
/local/…
Quick
Takeways
Upgrades overwrite ../default/.. files
Make all modifications in ../local/..
might mean making a file
Last attribute read in ‘wins’ if exists in multiple
config
filesSlide37
MiscellaneousRandom Admin Queries
Check for agents phoning home (lots of troubleshooting opportunities) index=_internal source=*splunkd_access.log POST phonehomeWatch for packages being installed/uninstalledindex=_internal sourcetype=splunkd
deployedapplication
(removing OR installing OR uninstalling) NOT "removing app at location" |
rex
"
DeployedApplication
- (?<Action>\S+)\
sapp
(\=|\S+\s)(?<App>\S+)" |
eval
Action = case(Action="Removing" , "Removing" , Action="Uninstalling" , "Removing" , Action="Installing" , "Installing" , 1=1,"Fix me") |
rex
"(
Removing|Installing
) app=(?<Version>\S+)" |
eval
Version = if(
isnull
(Version),"5x","-= 6x =-") |
dedup
_time host Action App Version | table _time host Action App Version | sort -_time
Busy agent processing a lot of files
index=_internal "File descriptor cache is full" |
rex
"is full \((?<
fd_limit
>\d+)" | stats count by host,
fd_limit
| sort -
fd_limit
, -countSlide38
MiscellaneousRandom Admin Queries
Check for agents pushing a lot of contentindex=_internal "current data throughput" | rex "Current data throughput \((?<kb>\S+)" | eval rate=case(kb < 500, "256", kb > 499 AND kb < 520, "512", kb > 520 AND kb < 770 ,"768", kb>771 AND kb<1210, "1024", 1=1, "Other") | stats count sparkline by host, rate | where count > 4 | sort -rate,-countCheck for file/folder monitoring permission errorsindex=_internal "permission denied" | stats count by host | sort –count
Alert on missing apps relative to
serverclass.conf
(i.e. spelling issues)
index=_internal source=*
splunkd.log
(component=application OR component=
serverclass
) warn OR errorSlide39
MiscellaneousRandom Admin Queries
Events of Interest (accounts created, deleted, delete command used, etc.)(index=_internal "No space left on device") OR (index=_audit "| delete" NOT "index=*_audit") OR (index=_audit action="login attempt" info=failed sourcetype="audittrail") OR (index=_internal source=*splunkd.log component=serverclass warn NOT "machineTypes in app * is deprecated") OR (index=_audit action=edit_user (operation=create OR operation=remove)) | eval Alert = case(action="
edit_user
" AND operation="create", "User account created", action="
edit_user
" AND operation="remove", "User account deleted", match(_raw, "Unable to load application"), "
Serverclass.conf
issue", match(_raw, "delete"), "Delete used", action="login attempt" AND info="failed", "Failed local login", match(_
raw,"No
space left on device"), "No space on device", 1=1, "fix me" ) |
eval
Message = case(Alert="User account deleted", "User: " .user. " Deleted: " .object, Alert="User account created", "User: " .user. " Created: " .object, Alert="Failed local login", "User: " .user, Alert="Delete used", "User: " .user. " Search: " .search, Alert="
Serverclass.conf
issue", message. " (Probably a spelling issue)", Alert="No space on device", "
Diskspace
or
inodes
issues", 1=1, "fix me") |
eval
a_time
=
strftime
(_
time,"%m
/%d/%y %k %p") | stats count by
a_time
host Alert MessageSlide40
?Slide41
Resourcesrunals.3@osu.edurunals.blogspot.comSplunkBase: apps.splunk.comSplunk Forum: answers.splunk.comSplunk Installation Manual (reference architecture, supported OS, etc)http://docs.splunk.com/Documentation/Splunk/latest/Installation/Whatsinthismanual