Microsoft Seeks an Edge in Analyzing Big Data Jeff Hawkins Develops a Brainy Big Data Company Google Offers BigData Analytics The Age of Big Data How Big Data Became So Big Why Hire a Lawyer Computers Are Cheaper ID: 499955
Download Presentation The PPT/PDF document "Scientists See Promise in Deep-Learning ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Scientists See Promise in Deep-Learning Programs
Microsoft Seeks an Edge in Analyzing Big Data
Jeff Hawkins Develops a Brainy Big Data Company
Google Offers Big-Data Analytics
The Age of Big Data
How Big Data Became So Big
Why Hire a Lawyer? Computers Are Cheaper
Armies of Expensive Lawyers, Replaced by Cheaper SoftwareSlide2
The total amount of digital data in the world is estimated to
exceed 1.8
Zettabytes
(1.8 TRILLION Gigabytes)
)
The digital universe is doubling every 2 years
85%
of that data
is owned or controlled by corporations
at some point in its lifecycle
Source: International Data Corporation (IDC) Study, 2012Slide3
Big Data is Here
And it’s coming soon to a litigation near you…
What’s changed?Slide4
The
Great
Co
mming
lin
gSlide5
Redefining
scalability in eDiscovery.
1
1000
1 X 10
12Slide6
Predictive Coding is a Form of
Machine Learning
What is Machine Learning
?Slide7
voice recognition
software, e.g., calling your bank or credit card company handwriting, facial or fingerprint recognition
analyzing market trends and guiding investment decisions making decisions on applications for credit or loans
modeling and predicting severe weather patterns
filtering spam in your email inboxt
argeted marketing on the internet
robotics
It’s already a part of our lives. . . Slide8
KEY POINT: Predictive
coding is just a part of a continuum of technology assisted review (TAR) methods that we are already very familiar with in searching and analyzing data.
Key Words
Concept
Clustering
Concept
Search
Predictive
Coding
Three supporting propositions:
Each successive approach incorporates the preceding approaches.
Each successive approach contains more supporting criteria.
All are ultimately based on the concept of pattern matching.Slide9
Key Words
= Simple pattern matching
External input:“wild,” “wolf,” “pet”
dog
cat
rhino
ferret
goldfish
cow
wolf
domestic
wild
petSlide10
Concept Clustering
= Organization based on internal relationships
dog
cat
domesticated
wild
pet
rhinoferret
goldfish
cow
wolf
tigerdog
catdomesticatedwildpetrhinoferretgoldfish
cow
wolf
tiger
01110111011010010110110001100100 (
wild
)
011001000110111101100111 (
dog
)
011100000110010101110100 (
pet)Slide11
Concept Searching
dog
cat
rhino
ferret
goldfish
cow
wolf
domestic
wild
pet
dogcat
rhinoferretgoldfishcowwolfdomesticatedwildpet
tiger
= Key words + Concept organization
External input:
“zoo,” wild,”
“domesticated”
farm
zoo
01111010011011110110111
(zoo)
01110111011010010110110001100100
(wild)
011001000110111101101101011001010111001101110100011010010110001101100001011101000110010101100100 (domesticated)Slide12
Predictive Coding
dog
cat
rhino
ferret
goldfish
cow
wolf
domestic
wild
pet
dogcatrhino
ferretgoldfishcowwolfdomesticatedwildpet
tiger
= document-level input + probabilistic modeling
farm
zoo
e
xternal input:
h
uman-coded documents
output: doc-level
probability rankings
01111010011011110110111
(zoo)
01110111011010010110110001100100
(wild)
011001000110111101101101011001010111001101110100011010010110001101100001011101000110010101100100 (domesticated)Slide13
Infer
Step 1. sample documents from entire set.Slide14
Step 2: attorney review of sample documents to create training and control set.
In the European mind,
wolves
long stood as
a symbol of baneful, uncontrollable nature
. As far back as the time of
Aesop in 500 BCE (Before the
Common
Era
),
wolves
in
literature are portrayed as wicked villains and long-fanged, terrible beasts. Before the Middle Ages, wolves were nearly always the greedy thief, criminal trickster, or cruel remorseless murderer. The wolf does not fare well in the European imagination. Can the wolf be domesticated?The domesticated dog
is
descended from the
wolf
found in the
wild
.
While
some people have
occasionally attempted
to raise wolves as pets
, their
2 ½ inch fangs and tendency
to eat nearby small animals such as cats
can create
socially
awkward situations with
neighbors
.
Responsive
Not ResponsiveSlide15
Step 3: create model from human coded training set (responsive and not responsive).
In the European mind, wolves long stood as a symbol of baneful, uncontrollable nature. As far back as the time of Aesop in 500 BCE (Before the
Common
Era), wolves in literature are portrayed as wicked villains and long-fanged, terrible beasts. Before the Middle Ages, wolves were nearly always the greedy thief, criminal trickster, or cruel remorseless murderer. The wolf does not fare well in the European imagination.
Can the wolf be domesticated?
The domesticated dog is
descended from the wolf found in the wild.
While some people have
occasionally attempted
to raise wolves as pets, their
2 ½ inch fangs and tendency
to eat nearby small animals such as cats can create socially awkward situations withneighbors.
Can
the
wolf
be
domesticated?
The
domesticated dog
is
descended from
the
wolf found in
the
wild.
While some people
have
occasionally attempted
to
raise wolves
as
pets, their
2 ½ inch fangs
and
tendency
to
eat nearby small animals such
as
cats
can
create socially
awkward situations with
neighbors.
wolves
wolf
pet
Word
Pos.
Neg.
wolf
.98
.08
dog
.56
.43
pet
.42
.28
raise
.61
.09
costner
dances
Word
Assoc
%
wolf
pet
.73
dog
wolf
.43
pet
raise
..88
raise
wolf
.61
raise
werewolf
011001000110111101100111
011001000110111101100111
011001000110111101100111
011001000110111101100111
011001000110111101100111
011001000110111101100111
011001000110111101100111Slide16
Step 4: test model against sample (human coded) set.
"
Dances With Wolves" has the makings of a great work, one that recalls a variety of literary antecedents, everything from "Robinson Crusoe" and "Walden" to "Tarzan of the Apes." Michael Blake's screenplay touches both on man alone in nature and on the 19th-century white man's assuming his burden among the less privileged.
Wolves
are sometimes kept as exotic pets, and in some rarer occasions, as
working animals
. Although closely related to
dogs (which
are
believed to
have split from wolves between 10,000 and 100,000 years ago), wolves do not show the same tractability as dogs in living alongside humans. Wolves also need much more space than dogs, about 10- 15 sq. miles.Slide17
Yes
No
Apply model to remainder of documents that have not been reviewed
R
esponsive
Non-responsiveSlide18
Step 5: Apply model to entire set and rank documents.
100 %
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%Slide19
PREDICTIVE CODING AND BIG DATANYLJ/Pangea3 WebinarApril 15, 2013Slide20
OUTLINEMitigating Big Data in E-DiscoveryStakeholder AnalysisThe New Reality of Predictive Coding
Long-Term TrendsSlide21
Mitigating Big Data in e-discoveryPredictive Coding and Big DataSlide22
BIG DATA IN E-DISCOVERYBigger haystack—more documents in
generalCorporate data culture—more relevant documents
More sources—poses collection/preservation challengesSlide23
MITIGATING BIG DATA IN E-DISCOVERYSome mitigating factors:
Principles of proportionality and cooperationInformation governance tools and document managementTechnology-assisted r
eview and predictive codingSlide24
Stakeholder analysisPredictive Coding and Big DataSlide25
PREDICTIVE CODING STAKEHOLDER ANALYSIS Judges: generally receptive
Clients: cost efficiencies vs. risk managementLawyers: new model, building expertiseSlide26
The new reality of predictive codingPredictive Coding and Big DataSlide27
NEW REALITY OF PREDICTIVE CODINGSlide28
Long-term trendsPredictive Coding and Big DataSlide29
LONG-TERM TRENDSOver time, Big Data growth > predictive coding benefits
Some document-by-document human review necessary
Strategic nuances in a new discovery battlegroundSlide30
CONTACT PANGEA3Slide31
SEARCH (1)
How do we search for discoverable ESI?Manually?With automated assistance?Which is“better” and why?
M.R. Grossman & G.V. Cormack, “The Grossman-Cormack Glossary of Technology-Assisted Review,” 7 Fed.
Cts. Law R. 1 (2013)
Maura R. Grossman & Gordon V. Cormack, “Technologically-Assisted Review in E-Discovery Can Be More Effective and More Efficient than Exhaustive Manual Review,” XVII Rich. J.L. & Tech
. 11 (2011) (available at
http://jolt.richmond.edu/v17i3/article11.pdf)
For a “shorter” discussion, see Efficient E-Discovery, ABA Journal
31 (Apr. 2012)
31Slide32
SEARCH (2)
Using search terms? How accurate are these? See In re National Ass’n of Music Merchants, Musical Instruments and Equipment Antitrust Litig
., 2011 WL 6372826 (S.D. Ca. Dec. 19, 2011)
32Slide33
SEARCH (3)
Automated review or “predictive coding” as an alternative to the use of search terms. For decisions which address automated review, see:EORHB, Inc. v. HOA Holdings LLC, C.A. No. 7409 (Del. Ct. Ch. Oct. 15, 2012)
In re Actos (
Pioglitazone) Prod. Liability Litig
., MDL No. 6:11-md-2299 (W.D. La. July 27, 2012)
Da Silva Moore v.
Publicis
Groupe SA, 2012 U.S. Dist. LEXIS 23350 (S.D.N.Y. Feb. 24),
aff’d
, 11 Civ. 1279 (ALC (AJP) (S.D.N.Y. Apr. 26, 2012)
Global Aerospace Inc. v.
Landow Aviation, L.P., Consol. Case No. CL 61040 (VA Cir. Ct. Apr. 23, 2012)33Slide34
SEARCH (4)
WHAT LESSONS CAN BE DRAWN FROM THE DECISIONS?Judge approved automated search at a “threshold” level. “Results” may be subject to challenge and later rulings.Threshold superiority of automated vs. manual review recognized given volume of ESI and attorney review costs.Large volumes of ESI in issue.
Party seeking to do automated review must offer “transparency of process” or something close to it.“Reasonableness” of methodology is key.
Speculation by the opposing party is insufficient to defeat threshold approval.
34Slide35
SEARCH (5)
LET’S TAKE A DEEP BREATH AND RECAP WHERE WE ARE TODAY, VENDOR HYPE NOTWITHSTANDING:We have yet to see a judicial analysis of process
and results in a
contested matter.
Safe to assume that the proponent of a process
will bear the burden of proof (whatever that burden might be).Safe to assume
at least some transparency of process
may/will be expected.
If “reasonableness” is standard, how reasonable must the
results
be? Is “precision” of 80% enough? 90%? Remember, there are no agreed-on standards.
35Slide36
INTERLUDE
Assume a party makes production of ESI based on search terms proposed by an adversary. Assume further that the adversary suspects “something” is missing.Is suspicion enough to warrant direct access to the party’s databases by a consultant retained by the adversary?If not, what proofs should be required?Will an attorney’s certification or affidavit suffice?
Will/should the attorney become a witness?Will experts be needed?
Note, with regard to proofs, S2 Automation LLC v. Micron Technology, Inc., No. 11-0884 (D.N.M. Aug. 9, 2012), where the court, relying on Rule 26(g)(1), required a party to disclose its search methodology.
36Slide37
INTERLUDE
A collision between search and ethics?Assume a party’s attorney knows that search terms proposed by adversary counsel, if applied to the party’s ESI, will not lead to the production of relevant (perhaps highly relevant) ESI.Absent a lack of candor to adversary counsel or the court under RPC 3.4 (which implies if not
require,s some affirmative statement), does not RPC 1.6 require the party’s attorney to remain silent?
What if the “nonproduction” becomes learned later? If nothing else, will the party’s attorney suffer bad “PR” if nothing else?If the party’s attorney wants to advise the adversary, should the attorney secure her client’s informed consent? What if the client says, “no?”
(with thanks to the Hon. John M.
Facciola)
37Slide38
INTERLUDE
AS WE THINK ABOUT SEARCH, THINK ABOUT THE ETHICS ISSUES THAT USE OF A NONPARTY VENDOR MAY LEAD TO! 38