/
The Nuts & Bolts of GIS: The Nuts & Bolts of GIS:

The Nuts & Bolts of GIS: - PowerPoint Presentation

giovanna-bartolotta
giovanna-bartolotta . @giovanna-bartolotta
Follow
379 views
Uploaded On 2018-10-01

The Nuts & Bolts of GIS: - PPT Presentation

Dealing With Dirty Data Presented by John R Woodard MS GISP EA Kelly Wright MS Justin Walters WHAT TO EXPECT Todays workshop is about dealing with bad datainformation Three presenters ID: 683755

ctrl data article excel data ctrl excel article https architecture web text amp cells clean enterprise values shortcuts fill cell information tab

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "The Nuts & Bolts of GIS:" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

The Nuts & Bolts of GIS:Dealing With “Dirty” Data!

Presented by:

John R. Woodard MS, GISP, EA

Kelly Wright MS

Justin WaltersSlide2

WHAT TO EXPECT!

Today’s workshop is about dealing with bad data/information.

Three presenters

Each presenter will cover one section of the workshop (30 mins each)

Workshop Introduction & bringing the dirt! – John Woodard

Working with MS Excel to clean out the dirt! – Kelly Wright

Using the Python to squeeze out the dirt! – Justin Walters

Q&A & Wrap up!

Each presenter will give a short introduction about themselves

There will be a 10 min break after topic 2

Goals:

Everyone will learn about data errors and how to avoid them.

Everyone will learn about one or two tools that can help you find, fix, or avoid errors.

Provide references that can help you.Slide3

Who is John R. Woodard MS, GISP, EA?

PROFESSIONAL:

Currently, GIS/IT Project Developer for Davey Resource Group, Kent Ohio

16 years experience of developing GIS systems and data problem solving.

10 years U.S. Navy Reserves Seabees (Construction Battalion)

GISP (Geographic Information System Professional)Certification

EA (Enterprise Architecture) Certification

EDUCATION:

Masters in Digital Science at Kent State University.

MS in Geography/GIS from the University of Akron.

PERSONAL:

Scuba Diver &

Beekeeper (Yes, I deal with over 400,000 bees and sell honey!)

ArtistSlide4

WHY ARE WE HERE?

DIRT? – On who or what?

TO TALK ABOUT DIRT!

DIRT in my data! HELP!Slide5

What is “dirt” in data/information?

Dirty Data

is simply a catch all term that can be used to describe errors in data or information

.

Can anyone give me an example of “dirt” in a dataset?Slide6

Are these dirt?

Typos

Misspellings

Incomplete data

Inaccurate data

Duplicate data

Incorrect data

Not following principles

Not following formats

Data gathered on Tues

Inconsistent Data

Data type

Data ageSlide7

The Best Defense, is a Good Offense! - The Art of War, Sun Tzu

Let’s be honest, ensuring the cleanliness of your dataset is a constant struggle – you are literally waging an unending war……

Being Proactive and taking the initiative with strategies to prevent data errors is your best offense.

The field of Enterprise Architecture and its’ sub field of Data Architecture can provide help.

Enterprise Architecture works to establish and organize all of a company’s resources into a strategic plan that supports the organization’s mission.

Data Architecture is about organizing data within the enterprise architecture.Slide8

What is Data Architecture?

A strategic data plan that can fit within the Enterprise Architecture or stand alone (small system) that is:

1. business driven

2. based on real world objects and events

3. implemented into the proper environments.

The data architecture is created with models, policies, standards that govern how data is collected, created, stored and used within an information system.Slide9

The Open Group Architecture Framework (TOGAF) Architecture Principle Template

Name

Should both represent the essence of the rule as well as be easy to remember. Specific technology platforms should not be mentioned in the name or statement of a principle. Avoid ambiguous words in the Name and in the Statement such as: "support", "open", "consider", and for lack of good measure the word "avoid", itself, be careful with "manage(ment)", and look for unnecessary adjectives and adverbs (fluff).

Statement

Should succinctly and unambiguously communicate the fundamental rule. For the most part, the principles statements for managing information are similar from one organization to the next. It is vital that the principles statement be unambiguous.

Rationale

Should highlight the business benefits of adhering to the principle, using business terminology. Point to the similarity of information and technology principles to the principles governing business operations. Also describe the relationship to other principles, and the intentions regarding a balanced interpretation. Describe situations where one principle would be given precedence or carry more weight than another for making a decision.

Implications

Should highlight the requirements, both for the business and IT, for carrying out the principle - in terms of resources, costs, and activities/tasks. It will often be apparent that current systems, standards, or practices would be incongruent with the principle upon adoption. The impact to the business and consequences of adopting a principle should be clearly stated. The reader should readily discern the answer to: "How does this affect me?" It is important not to oversimplify, trivialize, or judge the merit of the impact. Some of the implications will be identified as potential impacts only, and may be speculative rather than fully analyzed.-The Open Group Architecture Framework (TOGAF)http://pubs.opengroup.org/architecture/togaf8-doc/arch/chap29.html#tagtcjh_2Slide10

Example principle of data architecture

Name

Data is an Asset

Statement

Data is an asset that has value to the enterprise and is managed accordingly.

Rationale

Data is a valuable corporate resource; it has real, measurable value. In simple terms, the purpose of data is to aid decision-making. Accurate, timely data is critical to accurate, timely decisions. Most corporate assets are carefully managed, and data is no exception. Data is the foundation of our decision-making, so we must also carefully manage data to ensure that we know where it is, can rely upon its accuracy, and can obtain it when and where we need it.

Implications

This is one of three closely-related principles regarding data: data is an asset; data is shared; and data is easily accessible. The implication is that there is an education task to ensure that all organizations within the enterprise understand the relationship between value of data, sharing of data, and accessibility to data.Stewards must have the authority and means to manage the data for which they are accountable. We must make the cultural transition from "data ownership" thinking to "data stewardship" thinking. The role of data steward is critical because obsolete, incorrect, or inconsistent data could be passed to enterprise personnel and adversely affect decisions across the enterprise.Part of the role of data steward, who manages the data, is to ensure data quality. Procedures must be developed and used to prevent and correct errors in the information and to improve those processes that produce flawed information. Data quality will need to be measured and steps taken to improve data quality - it is probable that policy and procedures will need to be developed for this as well.A forum with comprehensive enterprise-wide representation should decide on process changes suggested by the steward.Since data is an asset of value to the entire enterprise, data stewards accountable for properly managing the data must be assigned at the enterprise level.

The Open Group Architecture Framework (TOGAF)

http://pubs.opengroup.org/architecture/togaf8-doc/arch/chap29.html#tagtcjh_2Slide11

Data Standards

These are documented agreements (MOA’s) between departments, organizations, people, etc. that cover how data will be represented, formatted, defined, structured, transmitted, used, or managed.

Example:

Standard For Data Format (Streets)

All department’s agree to use the following format for street types

Fo

rmat

All Caps

Based on USPS Suffix AbbreviationsTwo, three, or four LettersList ST = Street RD = Road FLD = Field FRD = Ford CRK = Creek CT = Court Slide12

Standards can be maintained (enforced) in Geo-database domains.Slide13

Geo-database domains

How many of you use the domains in a file geo-database?

Have you planned their relationships to data standards or formats?

Domains are very useful for enforcing data formatting standards!

They prevent “personal” preferences from creating formatting errors.

They keep data entries consistent.

They prevent typos…..

Can you list a few more items?Slide14
Slide15

Early detection is also another method for preventing “dirty data” from corrupting you datasets. We often use MS Excel “flat files” for importing and exporting data/information between systems.

Ms. Kelly Wright will cover how to use MS Excel to detect and eliminate errors in your data.Slide16

Cleaning Dirty Data Part 2: Cleaning it up with ExcelSlide17

IntroductionWhy clean the data?Before you Clean, PlanWhy Excel?Pitfalls of Geodata

(Windows) Keyboard Shortcuts

Tools and FormulasSlide18

Who is Kelly Wright, MS?

PROFESSIONAL:

Currently, GIS Applications Specialist at LJB Inc. consulting of Miamisburg, Ohio

Previously, GIS Graduate Intern at ODOT, Data Analyst for Time Warner Cable

4 years experience in GIS, 7 years experience in Data Analysis

EDUCATION:

M.S. in Geographic Information Science and Technology from the University of Southern California (Go Trojans!)

B.A. in Geography from the University of Texas at Austin (Hook ‘

em Horns!)PERSONAL:No Matter What: Born a Buckeye, Always a BuckeyeSlide19

Why Clean the Data?Data Integrity Looks Professional

Costs U.S. businesses $600 billion per year

Legal Liability

Fraud

False Conclusions and Misdirected InvestmentsSlide20

Why Excel?UbiquitousReliableSupport and Resources

InteroperabilitySlide21

Watch Out For:Hyphenated wordsState/Province abbreviations

ZIP and ZIP+5

Case-sensitivity

Variations

Road” v. “Rd” v. “

rd

” v. “RD”

Extra SpacesNumbers as TextCharacters as Text Slide22

Before You Clean, PlanUnderstand the data before you change it

MAKE A BACKUP

Record all data transformations

in the order that you do them

Make sure all columns and rows are visible

Remove Filters

Remove Formatting

Manipulate Columns First

Always do the work in a new columnUse Keyboard ShortcutsSlide23

Keyboard Shortcuts 1

Ctrl+A

Select All

Ctrl+C

Copy

Ctrl+V

Paste

Ctrl+Alt+V

Paste Special

Ctrl+X

Cut

Ctrl+N

New Workbook

Ctrl+S

Save

Ctrl+R

Fill Right

Ctrl+D

Fill Down

Ctrl+Z

Undo

Ctrl+Y

Redo

Ctrl+F

Find

Ctrl+H

Replace

Ctrl+E

Flash FillSlide24

Keyboard Shortcuts 2

F2

Edit the active cell and move the cursor to the end of the text

F7

Check spelling in the active worksheet or selected range

Home

Moves to the beginning of a row in a worksheet

Ctrl+Home

Moves to the beginning of a worksheet (cell A1)

Ctrl+End

Moves to the end of a worksheet/Moves the cursor to the end of the text when in the formula bar

Page Down

Moves one screen down in a worksheet

Alt+PgDn

Moves one screen to the right in a worksheet

Alt+PgUp

Moves one screen to the left in a worksheet

Ctrl+1

Open the Format Cells window

Ctrl+F6

Switch between open workbooks or windowsSlide25

Keyboard Shortcuts 3

Ctrl+PgUp

Move between work sheets in the same document

Ctrl+PgDn

Move between work sheets in the same document

Ctrl+Tab

Move between Two or more open Excel files

Ctrl+Shift+=

Insert a new column or row

Ctrl+Space

Select an entire column in a worksheet

Shift+Space

Select an entire row in a worksheet

Ctrl+Shift+Space

Select all cells containing data (also Ctrl+A)

Ctrl+Shift+End

Select all text in the formula bar from the cursor position to the end

Ctrl+Shift+Arrow

Extend the selection of cells to the last nonblank cell in the same column or row as the active cell, or if the next cell is blank, to the next nonblank cell.Slide26

Keyboard Shortcuts 4

Shift+F3

Insert a function (puts the cursor in the formula bar)

Ctrl+Shift+(

Unhides any hidden rows within the selection

Ctrl+Shift+)

Unhides any hidden columns within the selection.

Ctrl+Shift+~

Applies the General number format.

Ctrl+`

Alternates between displaying cell values and displaying formulas.Slide27

Sort and FilterSortDoesn’t change the values at all, just puts them in order

Alphabetical/Numeric

Ascending/Descending

Can reveal problems caused by extra spaces, misspelling

Quick way to see how Excel “buckets” the data

Filter

See an immediate list of unique values

Determine what to Find and Replace

Identify situations like extra spaces or misspellingBEWARE: don’t copy and paste when filters are appliedData permutations may not apply to cells hidden when filtered

Home Tab > Sort & FilterSlide28

Find and Replace ToolMost important tool for cleaning dirty data!Remove spaces and nonprinting characters

Standardize spelling and case

Go To Special – F5

Blanks – Selects all blank cells in a highlighted dataset

Conditional Formatting – Selects all cells with a certain formatting, like color

Formulas – Selects cells filled by formula

Enter desired text in one cell and use

Ctrl+Enter

to fill all of the blank cells

Home Tab > Find & Select

Ctrl+F

Find

Ctrl+H

ReplaceSlide29

Find/Remove Duplicates 1Highlight Duplicate Data

Data Tab > Conditional Formatting > Highlight Cells Rules > Duplicate Values

Remove Duplicates

Data Tab > Data Tools > Remove Duplicates

Permanently removes duplicates – copy original range of cells to a new sheet or workbook

Determined by value (if date types do not match, they are not considered duplicate)

Excellent for generating a domainSlide30

Find/Remove Duplicates 2Filter for Unique Values

Data Tab > Sort & Filter > Advanced

Temporarily Hides Duplicates

Shortcut: Open Filter Tool to see all unique values

Conditionally Format Unique or Duplicate Values

Home Tab > Conditional Formatting

Very Powerful

Use resulting formatting to filterSlide31

“Fill Down” and “Flash Fill”Replicates values or formula to continue series downward in a column

Click and drag to show context menu

Double-click to apply to next adjacent blank

Paste values or continues a series

Flash Fill (2016 only)

Automatically fills data when a pattern is sensed

Home Tab > Fill

Data Tab > Flash Fill (2016)

Ctrl+D

Fill Down

Ctrl+E

Flash FillSlide32

FormulasType an equal sign “=“ to begin

Excel suggests formulas as you type

Formula Tab >

Show Formulas

Evaluate Formulas

Formula Library

Error CheckingSlide33

Case: UPPER, LOWER, and PROPERThese functions make data entries more uniform and ensures that Excel “buckets” these records together.

Columbus,

columbus

,

COlumbus

, and COLUMBUS

Excel (and ArcMap by extension) treats these as unique values

Causes problems with SQL queries

=LOWER – convert all characters to lower case=UPPER – convert all characters to upper case=PROPER – convert to camel case/sentence caseSlide34

Spaces and nonprinting charactersFind and Replace Tool=CODE

– returns the Unicode character set values for the first character in a text string

0 – 31, 32, 127, 129, 141, 143, 144, 157 and 160 (160 is non-breaking space

nbsp

) are nonprinting characters that may affect the behavior of filter, sort, and search

=CLEAN

– removes Unicode values 0 – 31 from the string

=TRIM

– removes Unicode value 32 (space) from the stringRemoves leading, trailing, and extra (duplicate) spaces from a text stringSlide35

Fixing Numbers Stored as TextHighlight range of cells and use Error Checking context menu

Apply “General” Formatting to entire dataset

Unhides issues with incorrect data types

Paste Special

and Multiply

Good for multiple columns at once

Put the number 1 into a blank cell that is not storing a number as text (i.e. a new column). “Copy” the cell (CTRL+C)

Select the group of cells that are incorrectly storing numbers as text

Home Tab > Paste > Paste SpecialClick Multiply > OKThis multiplies each cell by 1, returning a number as the resultSlide36

Fixing Dates and TimesSort column to identify common error types2-digit v. 4-digit years

DD/MM/YY v. MM/DD/YY

Date and time v. date only

24-hr clock v. 12-hr clock

A v. a.m., P v. p.m.

Highlight range of cells and use Error Checking context menu

Find & replace “.” or “-“ or “/” to standardize if the delimiting character is the only thing wrongSlide37

Parsing: Text to Columns ToolSplit Text Strings into more than one column

Use Delimiters or Fixed Width

Set the Data Format

General

Text

Date

Not dynamic

Split names, addresses, extract ZIP codes or cities, select first # of characters from a string

Data Tab > Text To Columns

Alt + A + ESlide38

Parsing: Functions to Split ColumnsThese functions use the position of each character within the string

=LEFT

– returns the first character/s

=LEFT([cell #], [# of chars])

=MID

– returns a specific number of characters from a string, starting at the position you specify, based on the number of characters you specify

=MID([cell #], [start position], [# of chars])

=RIGHT

– returns the last character/s=SEARCH – returns the position of the sub-string you are searching for=LEN – returns the number of charactersSlide39

Combining: CONCATENATE and “&”Combine the contents of two or more cells

=CONCATENATE

=CONCATENATE(

[Cell1]

,

[Cell2]

)

Add text and characters before, between, and/or after

=CONCATENATE([Cell1],”[chars or text]”,[Cell2])=CONCAT replaces this function in Excel 2016 and laterUsing the Ampersand & to combine text values=[Cell1]&[Cell2]=[Cell1]&””&[Cell2] to include a space (doesn’t need commas like CONCAT)Slide40

10 MIN BREAKSlide41

Justin Walters – Geospatial Specialist Davey Resource GroupProfessional:

Current: GIS project manager for Utility Vegetation Management since 2014

5 years experience Garmin International Inc., Cartography project lead, recreation team.

Education:

BA: Geographic Information Systems, Kent State University

BA: Environmental Geography, Kent State University

Minor: Mathematics, Physics, Climatology, Kent State UniversitySlide42

Cleaning Data Through ArcMap Automations

Method 1: Model Builder

Benefits: Easy, Intuitive

Limitations: Complete customization with complexities.

Method 2: Arcpy

Benefits: Highly customizable, higher functionality

Limitations: Understanding the languageSlide43

Model Builder AutomationsThe Problem

: The client wants a complete record of features in geodatabase format. Additionally a direct link to the data storage website is requested.

The Solution

: Create a tool that populates this field for a complete geodatabase. Slide44

ArcPy Automations - BeginnerThe Problem

: When utilizing tools to convert shapefiles into a geodatabase, inaccurate geometries often occur.

The Solution

: Repair the geometries prior to data analysis. Slide45

ArcPy Automations - IntermediateThe Problem

: Received parcels have the entire address in one field and they need separated into two

The Solution

: Splitting at the first space.Slide46

ArcPy Automations - Advanced

The Problem

: Customers that refuse work need to be highlighted and reported separate from fully compliant customers.

The Solution

: Automate the cross referencing

of work layers to parcels, applying

permissions from work areas to select

parcels.Slide47

Book: Data Strategy, by Sid Adelman, Larissa Moss, Majid Abai,Book: Enterprise Architecture As Strategy: Creating a Foundation for Business Execution, by Jeanne W. Ross and Peter Will

Web Article: Data Cleaning -

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1198040/

Web Article: Big Data’s dirty problem by Verne Kopytoff -

http://fortune.com/2014/06/30/big-data-dirty-problem/

Web Article: Data format -

https://en.wikipedia.org/wiki/Data_format

Website: 10 Data Principles (CDC) -

https://www.cdc.gov/nchhstp/programintegration/tenguidingprinciples.htmWebsite: Data Standards (USGS)- https://www2.usgs.gov/datamanagement/plan/datastandards.phpWebsite: -The Open Group Architecture Framework (TOGAF) http://pubs.opengroup.org/architecture/togaf8-doc/arch/chap29.html#tagtcjh_2REFERENCES:Slide48

REFERENCES:

Book: Data Strategy, by Sid Adelman, Larissa Moss, Majid Abai,

Book: Enterprise Architecture As Strategy: Creating a Foundation for Business Execution, by Jeanne W. Ross and Peter Will

Web Article: Data Cleaning -

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1198040/

Web Article: Big Data’s dirty problem by Verne Kopytoff -

http://fortune.com/2014/06/30/big-data-dirty-problem/

Web Article: Data format -

https://en.wikipedia.org/wiki/Data_formatWebsite: 10 Data Principles (CDC) - https://www.cdc.gov/nchhstp/programintegration/tenguidingprinciples.htmWebsite: Data Standards (USGS)- https://www2.usgs.gov/datamanagement/plan/datastandards.phpWebsite: -The Open Group Architecture Framework (TOGAF) http://pubs.opengroup.org/architecture/togaf8-doc/arch/chap29.html#tagtcjh_2Slide49

Web Article: Microsoft Support: Top Ten Ways to Clean Your Data - https://support.office.com/en-us/article/Top-ten-ways-to-clean-your-data-2844b620-677c-47a7-ac3e-c2e157d1db19

Web Article: 10 Super Neat Ways to Clean Data in Excel Spreadsheets -

10 Super Neat Ways to Clean Data in Excel Spreadsheets

Related YouTube Video:

https://www.youtube.com/watch?v=e0TfIbZXPeA

Web Article: 5 Tips for Cleaning Your Dirty Data -

http://www.msn.com/?ocid=wispr&pc=u477https://esj.com/articles/2012/05/22/5-tips-cleaning-data.aspx

Web Article: How to Clean Data in Excel: 5 Simple Tricks - https://www.livestories.com/blog/excel-tutorial-5-easy-ways-you-can-clean-data-in-excel Web Article: Get Only the Numeric or the Text Part from a String Using Excel Formulas - https://trumpexcel.com/get-numeric-text-part-of-string/ Web Article: 7 Amazing Things Excel Text-t0-Columns Can Do For You - https://trumpexcel.com/excel-text-to-columns-examples/ REFERENCES: Slide50

Web Article and Video: Microsoft Support - Auto Fill and Flash Fill - https://support.office.com/en-us/article/Use-AutoFill-and-Flash-Fill-2e79a709-c814-4b27-8bc2-c4dc84d49464

Web Article: Dirty Data Got You Down? Clean It Up -

https://www.govtechworks.com/dirty-data-got-you-down-clean-it-up/#gs.23JDD7E

Wikipedia: Data Cleansing -

https://en.wikipedia.org/wiki/Data_cleansing

Web Article: Excel keyboard shortcuts and function keys for Windows -

https://support.office.com/en-us/article/Excel-keyboard-shortcuts-and-function-keys-for-Windows-1798d9d5-842a-42b8-9c99-9b7213f0040f Web Article: Excel keyboard shortcuts - http://www.asap-utilities.com/excel-tips-shortcuts.php Web Article: 222 Excel shortcuts for Windows and Mac - https://exceljet.net/keyboard-shortcuts REFERENCES:Slide51

Web Article: 253 Shortcuts for Microsoft Excel 2016 (Windows) - https://shortcutworld.com/Excel/win/Microsoft-Excel_2016_ShortcutsWeb Article and Video: Microsoft Support – Overview of Formulas in Excel -

https://support.office.com/en-us/article/Overview-of-formulas-in-Excel-ecfdc708-9162-49e8-b993-c311f47ca173

Web Article: Microsoft Support – Excel Functions (alphabetical) -

https://support.office.com/en-us/article/Excel-functions-alphabetical-b3944572-255d-4efb-bb96-c6d90033e188

Web Article: Microsoft Support – Excel Functions (by category) -

https://support.office.com/en-us/article/Excel-functions-by-category-5f91f4e9-7b42-46d2-9bd1-63f26a86c0eb

REFERENCES: