Data, Data and Data

Data, Data and Data Data, Data and Data - Start

Added : 2017-08-23 Views :49K

Download Presentation

Data, Data and Data




Download Presentation - The PPT/PDF document "Data, Data and Data" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.



Presentations text content in Data, Data and Data

Slide1

Data, Data and Data

DH Press

Slide2

Our Example Projects

These are set up in our

test

blog

.

DH Press’

demo musicians project

, from

their supplied data

set.

Maps

Timeline

Embedded audio in popups

Grove Road

: A couple of blocks from our nearby

historic

district.

Concert Record

: 17 years of performances.

Slide3

What we’re trying to do

From a web site for an historical district, capture information about each property, including:

Addresses and locations

Photographs if available

Styles

Dates

Block and lot numbers, and

Associated outbuildings such as garages, and their classifications.

Slide4

Don’t be intimidated

“Data munging” is the most time-consuming part of projects such as these.

Garbage-in, garbage-out is the rule, so take the time.

We’re going to breeze through this example just to show you the kinds of things you can do.

Your project and data source will probably different, and you’ll need to find your own methods.

Slide5

Where to get your data

XML (we’ll be using WordPress’ built-in “export” function) from

http://www.mphda.org

Exports (see above)

Tables

Spreadsheets

Downloads from databases

If you’re willing to spend the time to do the text processing, you can use almost anything.

Slide6

What an XML Entry looks like

<item> <title>221 Grove Road</title> <link>http://mphda.org/221-grove-road/</link> <pubDate>Fri, 02 Nov 2012 18:05:33 +0000</pubDate> [ … ] <content:encoded><![CDATA[221 Grove Road is a 2 1/2 story, 3 bay, rectangular plan,, wood frame, Dutch Colonial Revival, residential building. Constructed c. 1925, the side gambrel roofed house has shingle cladding, 6/6 windows, a shed roof over the first floor, and an off-center entrance.]]></content:encoded> <excerpt:encoded><![CDATA[Constructed c. 1925, the side gambrel roofed house has shingle cladding, 6/6 windows, a shed roof over the first floor, and an off-center entrance.]]></excerpt:encoded> <wp:post_id>1866</wp:post_id> <wp:post_date>2012-11-02 18:05:33</wp:post_date> [ … ] <wp:post_name>221-grove-road</wp:post_name> [ … ]

<category domain="category" nicename="grove-road"><![CDATA[Grove Road]]></category><wp:postmeta> <wp:meta_key>settings</wp:meta_key> <wp:meta_value><![CDATA[a:14:{s:9:"slideshow";s:2:"no";s:16:"slideshow_select";s:0:"";s:4:"safe";s:2:"no";s:7:"related";s:2:"no";s:4:"meta";s:2:"no";s:15:"meta_view_style";s:10:"horizontal";s:4:"love";s:2:"no";s:7:"sharing";s:3:"yes";s:6:"author";s:2:"no";s:7:"post_bg";s:0:"";s:8:"position";s:4:"left";s:6:"repeat";s:9:"no-repeat";s:10:"attachment";s:6:"scroll";s:5:"color";s:0:"";}]]></wp:meta_value> </wp:postmeta><wp:postmeta> <wp:meta_key>_wp_geo_latitude</wp:meta_key> <wp:meta_value><![CDATA[40.747923]]></wp:meta_value> </wp:postmeta><wp:postmeta> <wp:meta_key>_wp_geo_longitude</wp:meta_key> <wp:meta_value><![CDATA[-74.249036]]></wp:meta_value> </wp:postmeta>

<

wp:postmeta

>

<

wp:meta_key

>_

wp_geo_title

</

wp:meta_key

>

<

wp:meta_value

><![CDATA[221 Grove Road]]></

wp:meta_value

>

</

wp:postmeta

>

<

wp:postmeta

>

<

wp:postmeta

>

<

wp:meta_key

>Block</

wp:meta_key

> <

wp:meta_value

><![CDATA[1105]]></

wp:meta_value

>

</

wp:postmeta

>

<

wp:postmeta

>

<

wp:meta_key

>Lot</

wp:meta_key

> <

wp:meta_value

><![CDATA[11]]></

wp:meta_value

>

</

wp:postmeta

>

<

wp:postmeta

>

<

wp:meta_key

>Contributing Outbuildings</

wp:meta_key

>

<

wp:meta_value

><![CDATA[1 stylistically similar detached garage (C)]]></

wp:meta_value

>

</

wp:postmeta

>

</item>

Slide7

How to manage your data

Excel can be your best friend for cleaning your data

Excel macros

Advanced

Excel

functions

Also:

Find/Replace

Manual intervention

Slide8

Opening an XML file into Excel

Opens up the

wp:post-meta

keys and values, one to a line

Slide9

Filtering ImportedXML Data

Columns can be filtered to only showthe lines you wantNotice how the header rows have functions attached to them.

Slide10

Filtering ImportedXML Data 2

Reduce it to this and copy only what you need

Slide11

Copy only the cells filtered out, then use exciting Excel functions to get you closer.

From there, paste the filtered cells into a new table and run the “VLOOKUP” function.

This is a good way to turn rows into columns for this particular use case. Sort the table so blank rows fall to the bottom, then copy-and-paste the VALUES into a new sheet.

Slide12

These work across sheets

The magic “VLOOKUP” function isn’t limited to just the sheet you’re working on.

Slide13

Now we’re getting closer to our real data

Slide14

Sometimes you have to do it by hand. That’s how the “style” column was derived. Once you have established a controlled vocabulary you can do a lookup.

Slide15

Getting external data

An earlier version used a macro to determine longitude and latitude for entries

Google Docs uses an “

ImportXML

” function

Excel has the “WEBSERVICE” function. You can use it like this to return an array of

geodata

values from openstreetmap.org:

=WEBSERVICE("http

://nominatim.openstreetmap.org/search/?

format=xml&q

=400 South Orange Ave., South Orange, NJ”

)

Then use FILTERXML on top of WEBSERVICE to get specific values:

=FILTERXML(C2,"//place/@

lat

")

This returns the latitude value for the address

Slide16

XML is cool, what about JSON?

JSON is a widely used serialized data format, which isn’t natively supported by Excel.

Power Query

for Excel:

https://

www.microsoft.com/en-us/download/details.aspx?id=39379

might work for you.

Slide17

How to proof your data

Be thorough

Read some of it for sense, and make sure your columns contain what you think they contain

Sort columns

Weird stuff drifts to the top or bottom

Search (ctrl-F) for anomalies

Special characters

Odd spaces – especially at the end of fields

Slide18

Getting a proper CSV file

You can’t use Excel because Microsoft’s CSV format is different from everyone else’s. I know, big surprise.

DH Press requires all fields to be surrounded by quote marks

Proper escaping, etc. etc.

Slide19

Converting your data

Convert it from Excel to a CSV file

Get your headers right, according to the DH Press spec

Copy and paste your data into Open Office or Google Sheets

Open Office is a free download, and runs anywhere (even on your phone)

“Save As” a CSV file from Google Sheets or Open Office.

Slide20

UseOpen Office’s “Calc” app.

It has its own problems

You want to paste in the values from your Excel sheet but by default it makes all the fields fixed-width, thereby truncating your text.

So “

143 Grove Road is a 2 story, 5 bay, rectangular plan, wood frame, shingle-clad, Tudor Revival, residential building. The c. 1910, side gable-roofed house has three

pedimented

dormers of which one is a cross gable, 6/2 windows, and a Doric column-supported portico. The entrance is flanked by diamond-paned sidelights, and the cross gable features overhanging bracketed eaves. A one story side porch is set back from the main block.

Becomes “

So “143 Grove Road is a 2 story, 5 bay, rectangular plan, wood frame, shingle-clad, Tudor Revival, residential building. The c. 1910, side gable-roofed house has three

pedimented

dormers of which one is a cross gable, 6/2 windows, and a Doric

“ (strangling sound here)

Slide21

Right-click to “Paste Special” to get the whole thing. Use UTF-8! It will keep your special characters and even linebreaks.

Slide22

Then select your settings for export. Seriously, despite what this says, use UTF-8.

Slide23

And follow the available DH Press tutorials for the rest…

Documentation is complete, detailed and hard-to-find.

A complete walk-through

. Follow everything closely, it’s very detailed!

The DH Press documents page

. Links to the DH Press demo project with a complete data set you can experiment with.

The project’s blog

at UNC Chapel Hill.

Slide24

These concepts are portable

You could use them to put data into

Timeliner

Or

Timemapper

(http://

timemapper.okfnlabs.org/)

Or

VisualEyes

(http://

viseyes.org/viseyes.htm)

Slide25

A few DH Press tips

A

DH-Press “marker

” is a single line from your imported file. Don’t rely on the usual “publish” or “update” buttons while you develop your project.

Always

use the “Save Settings” button first!

Anything

you want to

display (text, image)

or use

(as a filter) needs

to be set up as a mote.

Build your “entry points” using motes, which will describe each marker

.

When you define mote as “short text,” that means it can operate as a filter. You can (and should) assign colors or icons to each term.

Slide26

And

Don’t be afraid to go back and rework your data

Some things you can easily fix

Other things will blow up your project

Google your problem – there’s probably a good answer out there.

Use

the testing and error-checking section.

Be sure the right maps are available. I like to use OSM Base.

If you change or edit pieces of data, use the “rebuild” button for that mote.

Don’t be intimidated by the custom field utilities’ warnings

. It can be a huge timesaver.


About DocSlides
DocSlides allows users to easily upload and share presentations, PDF documents, and images.Share your documents with the world , watch,share and upload any time you want. How can you benefit from using DocSlides? DocSlides consists documents from individuals and organizations on topics ranging from technology and business to travel, health, and education. Find and search for what interests you, and learn from people and more. You can also download DocSlides to read or reference later.
Youtube