/
Creighton Barrett Digital Archivist, Dalhousie University Archives Creighton Barrett Digital Archivist, Dalhousie University Archives

Creighton Barrett Digital Archivist, Dalhousie University Archives - PowerPoint Presentation

pasty-toler
pasty-toler . @pasty-toler
Follow
371 views
Uploaded On 2018-03-09

Creighton Barrett Digital Archivist, Dalhousie University Archives - PPT Presentation

BitCurator User Forum Northwestern University April 2728 2017 Tools for identifying duplicate files and known software files 1 Tools 2017 BitCurator User Forum Tools for identifying duplicate files and known software files ID: 644538

duplicate files bitcurator tools files duplicate tools bitcurator kff user software forum identifying file status nsrl 2017 data rds

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Creighton Barrett Digital Archivist, Dal..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Creighton BarrettDigital Archivist, Dalhousie University ArchivesBitCurator User Forum, Northwestern UniversityApril 27-28, 2017

Tools for identifying duplicate files and known software files

1Slide2

Tools2017 BitCurator User Forum - Tools for identifying duplicate files and known software files2Slide3

FSlint (finds file system “lint”)DuplicatesInstalled packagesBad namesName clashesTemp files

Bad symlinksBad IDsEmpty directories

Non stripped binaries

Redundant whitespace

2017 BitCurator User Forum - Tools for identifying duplicate files and known software files

3Slide4

FSlint – Duplicates2017 BitCurator User Forum - Tools for identifying duplicate files and known software files4

Image source (

BitCurator

wiki): https

://wiki.bitcurator.net/index.php?title=Identify_and_delete_duplicate_filesSlide5

FTK – Flag DuplicatesSimpler process than FSlint, still a powerful featureChecks entire file and generates MD5

Assigns primary status to first instance of each MD5Assigns secondary status to subsequent instances of each MD5

2017 BitCurator User Forum - Tools for identifying duplicate files and known software files

5Slide6

FTK – Flag Duplicates2017 BitCurator User Forum - Tools for identifying duplicate files and known software files6

6Slide7

NSRL Reference Data Set (RDS)2017 BitCurator User Forum - Tools for identifying duplicate files and known software files7

Image source (NSRL):

https://www.nsrl.nist.gov/Documents/Data-Formats-of-the-NSRL-Reference-Data-Set-16.pdf

Slide8

NSRL Reference Data Set (RDS)Hashsets and metadata used in file identificationData can be used in third-party digital forensics tools

2017 BitCurator User Forum - Tools for identifying duplicate files and known software files

8

RDS is updated four times each year

As of v2.55, RDS is partitioned into four divisions:

Modern – applications created in or after 2000

Legacy – applications created in or before 1999

Android – Mobile apps for the Android OS

iOS – Mobile apps for iOSSlide9

FTK – Known File Filter (KFF)KFF data – hash values of known files that are compared against files in an FTK caseKFF data can come from pre-configured libraries (e.g., NSRL RDS, DHS, ICE, etc.) or custom libraries

FTK ships with version of NSRL RDS bifurcated into “Ignore” and “Alert” librariesKFF Server – used to process

KFF

data against

evidence

in an FTK case

KFF Import Utility – used to import

and index KFF data

2017 BitCurator User Forum - Tools for identifying duplicate files and known software files

9Slide10

FTK – Known File Filter (KFF)2017 BitCurator User Forum - Tools for identifying duplicate files and known software files

10Slide11

Other tools to work with NSRL RDSnsrlsvr - https://github.com/rjhansen/nsrlsvr/Keeps track of 40+ million hash values in an in-memory dataset to facilitate fast user queries

Supports custom libraries (“local corpus”)nsrlllokup -

https://rjhansen.github.io/nsrllookup

/

Command-line application

Works with tools

like hashdeep:

http://md5deep.sourceforge.net

/

National Software Reference Library - MD5/SHA1/File Name search -

http://

nsrl.hashsets.com

2017 BitCurator User Forum - Tools for identifying duplicate files and known software files

11Slide12

Bill Freedman fonds filtered in FTK2017 BitCurator User Forum - Tools for identifying duplicate files and known software files12

Filter

Description

# of files

Size

Unfiltered

All files in case

26,651,084

3,568 GB

Primary

status

Duplicate File indicator IS “Primary”

731,417

83.48 GB

Secondary status

Duplicate File indicator IS

“Secondary”

16,569,218

271.5 GB

KFF Ignore

Match

all files where KFF status IS “Ignore”

2,548,119

44.29 GB

No KFF Ignore

Match all files where KFF status

IS NOT “Ignore” + KFF status IS “Not checked”

24,102,965

3524 GB

Primary

status + No KFF Ignore

Match all files where duplicate

file indicator IS “Primary” + KFF status IS NOT “Ignore”

626,351

71.95 GB

Actual files + Primary status

+ No KFF Ignore

Match

all d

isk-bound files where duplicate file indicator IS “Primary” + KFF status

IS NOT “Ignore”

103,412

61.81 GBSlide13

QuestionsDoes it matter which duplicate file is selected for preservation? What if there are MD5 matches with different file names or extensions?Can queries against NSRL RDS be incorporated into BitCurator workflows

? Could provenance-based libraries of “known file” hashes be incorporated into

BitCurator

workflows?

Can repositories share provenance-based hash libraries (expose our “local corpus” of MD5s…)?

2017 BitCurator User Forum - Tools for identifying duplicate files and known software files

13