/
Better Analytics Demand Better Dataflow Better Analytics Demand Better Dataflow

Better Analytics Demand Better Dataflow - PDF document

delilah
delilah . @delilah
Follow
344 views
Uploaded On 2021-01-05

Better Analytics Demand Better Dataflow - PPT Presentation

Apache NiFi Presented by Joe Witt Apache NiFi PPMC Member Apache NiFis job Enterprise Dataflow Management 1 Automate the flow of data from any source to systems which extract meaning and ID: 827481

nifi data time apache data nifi apache time provenance dataflow flow node content view command real control sensitive formats

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Better Analytics Demand Better Dataflow" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Apache NiFiBetter Analytics Demand Bet
Apache NiFiBetter Analytics Demand Better DataflowPresented by: Joe WittApache NiFiPPMC MemberApache NiFi’sjob: Enterprise Dataflow Management1Automate the flow of data from any sour

ce…to systems which extract meaning
ce…to systems which extract meaning and insight…and to those that store and make it available for usersAnalytics need data with the following characteristics:2QualityCorrect,comple

te, reliableRelevanceRight size,rate,
te, reliableRelevanceRight size,rate, format, schema, content, lightweight analysisTimelinessAll data has a half-life. Not all data is created equal.SecureConfidential,unalteredComplia

ntAuthorized,traceableRecoverableErr
ntAuthorized,traceableRecoverableErrors happen. Iterateuntil it’s right.Enterprise Dataflow: “What could possibly go wrong?”3Dataflow –Route, Transform, MediateAcquireAnalyze

StoreDataflow across the enterprise4E
StoreDataflow across the enterprise4Edge SitesRegional SitesCorporate DatacentersPartnersChallenges at the edge5Edge Sites•Devices may•Have low power•Use legacy protocols and

formats•Use emerging protocols and
formats•Use emerging protocols and formats•Communications may be•Unstable•High latency / Low Throughput•Expensive•Data acquired may be•Erroneous•Devoid of value or

‘noisy’•Time sensitive or toler
‘noisy’•Time sensitive or tolerant•Of differing priority•SensitiveChallenges at the core6Corporate DatacentersData may need transformation•Enrichment•Format/schema co

nversion•Splitting or AggregationSy
nversion•Splitting or AggregationSystems may be•Down, degraded, returning to service•Rate or throughput sensitive•Authorized for a subset of dataScaling and reliability•Contr

olled data loss only•Up (node effici
olled data loss only•Up (node efficient) & Out (global volume)Governance•Keeping track of all the information flows•Ability to understand and manage the flows•Ability to detect an

d recover from mistakesThe basic buildi
d recover from mistakesThe basic building blocksReal-time Command and ControlThe Power of Provenance7Apache NiFiFoundational Concepts231HEADER-UUID-Name-Size-Entry TimeAttr

ibutes Map[[Key | Value]]CONTENTFlow
ibutes Map[[Key | Value]]CONTENTFlow File8•Types•Events•Objects•Files•Messages•Media•Formats•JSON•Avro•Text•Mp4•Proprietary•Sizes•Bytes

to GBsFlow File Processor9Connections
to GBsFlow File Processor9Connections10Flow Controller11NiFiArchitecture12NiFiClustering Model13Tighten the feedback loop•Changes have consequences (good or bad)•And you see

them as they occurContinuous Improveme
them as they occurContinuous Improvement•Compare real-time vs. historical statistics•View data provenance•View Content at any stageIntuitive user experience•Visual programming

•Logical flow graph14Real-time c
•Logical flow graph14Real-time command and control2Latency Optimization•Intra process•Inter process•End-to-endCompliance•Prove handling•Assess impactUnderstandi

ng•Step through time•View conten
ng•Step through time•View content•View Context15The Power of Provenanceaka “Dude, where’s my data?”3Status and direction for NiFi16Efficient use of each node-100s of MB

/s per node-100Ks transactions/s per n
/s per node-100Ks transactions/s per nodeSimple / Effective scaling modelRuntime Command and ControlData ProvenanceDistributed durability of data-Maybe Kafka backed queuesHigh Availabilit

y Cluster ManagerLive / Rolling Upgrade
y Cluster ManagerLive / Rolling UpgradesProvenance Query Language / ReportingA complete user experience enabled by provenanceExisting StrengthsRoadmap HighlightsApache NiFi(incubating) s

itehttp://nifi.incubator.apache.orgSub
itehttp://nifi.incubator.apache.orgSubscribe to and collaborate atdev@nifi.incubator.apache.orgSubmit Ideas or Issueshttps://issues.apache.org/jira/browse/NIFI@ApacheNifi17Learn more abou