DH Press Our Example Projects These are set up in our test blog DH Press demo musicians project from their supplied data set Maps Timeline Embedded audio in popups Grove Road ID: 581292
Download Presentation The PPT/PDF document "Data, Data and Data" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Data, Data and Data
DH PressSlide2
Our Example Projects
These are set up in our
test
blog
.
DH Press’
demo musicians project
, from
their supplied data
set.
Maps
Timeline
Embedded audio in popups
Grove Road
: A couple of blocks from our nearby
historic
district.
Concert Record
: 17 years of performances. Slide3
What we’re trying to do
From a web site for an historical district, capture information about each property, including:
Addresses and locations
Photographs if available
Styles
Dates
Block and lot numbers, and
Associated outbuildings such as garages, and their classifications. Slide4
Don’t be intimidated
“Data munging” is the most time-consuming part of projects such as these.
Garbage-in, garbage-out is the rule, so take the time.
We’re going to breeze through this example just to show you the kinds of things you can do.
Your project and data source will probably different, and you’ll need to find your own methods. Slide5
Where to get your data
XML (we’ll be using WordPress’ built-in “export” function) from
http://www.mphda.org
Exports (see above)
Tables
Spreadsheets
Downloads from databases
If you’re willing to spend the time to do the text processing, you can use almost anything.Slide6
What an XML Entry looks like
<item>
<
title>221 Grove Road</title>
<
link>http://mphda.org/221-grove-road/</link>
<
pubDate
>Fri, 02 Nov 2012 18:05:33 +0000</
pubDate
>
[ … ] <content:encoded><![CDATA[221 Grove Road is a 2 1/2 story, 3 bay, rectangular plan,, wood frame, Dutch Colonial Revival, residential building. Constructed c. 1925, the side gambrel roofed house has shingle cladding, 6/6 windows, a shed roof over the first floor, and an off-center entrance.]]></content:encoded> <excerpt:encoded><![CDATA[Constructed c. 1925, the side gambrel roofed house has shingle cladding, 6/6 windows, a shed roof over the first floor, and an off-center entrance.]]></excerpt:encoded> <wp:post_id>1866</wp:post_id> <wp:post_date>2012-11-02 18:05:33</wp:post_date> [ … ] <wp:post_name>221-grove-road</wp:post_name> [ … ]
<category domain="category" nicename="grove-road"><![CDATA[Grove Road]]></category><wp:postmeta> <wp:meta_key>settings</wp:meta_key> <wp:meta_value><![CDATA[a:14:{s:9:"slideshow";s:2:"no";s:16:"slideshow_select";s:0:"";s:4:"safe";s:2:"no";s:7:"related";s:2:"no";s:4:"meta";s:2:"no";s:15:"meta_view_style";s:10:"horizontal";s:4:"love";s:2:"no";s:7:"sharing";s:3:"yes";s:6:"author";s:2:"no";s:7:"post_bg";s:0:"";s:8:"position";s:4:"left";s:6:"repeat";s:9:"no-repeat";s:10:"attachment";s:6:"scroll";s:5:"color";s:0:"";}]]></wp:meta_value> </wp:postmeta><wp:postmeta> <wp:meta_key>_wp_geo_latitude</wp:meta_key> <wp:meta_value><![CDATA[40.747923]]></wp:meta_value> </wp:postmeta><wp:postmeta> <wp:meta_key>_wp_geo_longitude</wp:meta_key> <wp:meta_value><![CDATA[-74.249036]]></wp:meta_value> </wp:postmeta>
<
wp:postmeta
>
<
wp:meta_key
>_
wp_geo_title
</
wp:meta_key
>
<
wp:meta_value
><![CDATA[221 Grove Road]]></
wp:meta_value
>
</
wp:postmeta
>
<
wp:postmeta
>
<
wp:postmeta
>
<
wp:meta_key
>Block</
wp:meta_key
> <
wp:meta_value
><![CDATA[1105]]></
wp:meta_value
>
</
wp:postmeta
>
<
wp:postmeta
>
<
wp:meta_key
>Lot</
wp:meta_key
> <
wp:meta_value
><![CDATA[11]]></
wp:meta_value
>
</
wp:postmeta
>
<
wp:postmeta
>
<
wp:meta_key
>Contributing Outbuildings</
wp:meta_key
>
<
wp:meta_value
><![CDATA[1 stylistically similar detached garage (C)]]></
wp:meta_value
>
</
wp:postmeta
>
</item>Slide7
How to manage your data
Excel can be your best friend for cleaning your data
Excel macros
Advanced
Excel
functions
Also:
Find/Replace
Manual interventionSlide8
Opening an XML file into Excel
Opens up the
wp:post-meta
keys and values, one to a line Slide9
Filtering Imported
XML Data
Columns can be
filtered to only show
the lines you want
Notice how the header rows
have functions attached to
them.Slide10
Filtering Imported
XML Data 2
Reduce it to this and copy only what you needSlide11
Copy only the cells filtered out, then use exciting Excel functions to get you closer.
From there, paste the filtered cells into a new table and run the “VLOOKUP” function.
This is a good way to turn rows into columns for this particular use case.
Sort the table so blank rows fall to the bottom, then copy-and-paste the VALUES into a new sheet.Slide12
These work across sheets
The magic “VLOOKUP” function isn’t limited to just the sheet you’re working on. Slide13
Now we’re getting
closer to our real dataSlide14
Sometimes you have to do it by hand. That’s how the “style” column was derived. Once you have established a controlled vocabulary you can do a lookup.Slide15
Getting external data
An earlier version used a macro to determine longitude and latitude for entries
Google Docs uses an “
ImportXML
” function
Excel has the “WEBSERVICE” function. You can use it like this to return an array of
geodata
values from openstreetmap.org:
=WEBSERVICE("http
://nominatim.openstreetmap.org/search/?
format=xml&q
=400 South Orange Ave., South Orange, NJ”)Then use FILTERXML on top of WEBSERVICE to get specific values:=FILTERXML(C2,"//place/@lat")This returns the latitude value for the addressSlide16
XML is cool, what about JSON?
JSON is a widely used serialized data format, which isn’t natively supported by Excel.
Power Query
for Excel:
https://
www.microsoft.com/en-us/download/details.aspx?id=39379
might work for you.Slide17
How to proof your data
Be thorough
Read some of it for sense, and make sure your columns contain what you think they contain
Sort columns
Weird stuff drifts to the top or bottom
Search (ctrl-F) for anomalies
Special characters
Odd spaces – especially at the end of fieldsSlide18
Getting a proper CSV file
You can’t use Excel because Microsoft’s CSV format is different from everyone else’s. I know, big surprise.
DH Press requires all fields to be surrounded by quote marks
Proper escaping, etc. etc. Slide19
Converting your data
Convert it from Excel to a CSV file
Get your headers right, according to the DH Press spec
Copy and paste your data into Open Office or Google Sheets
Open Office is a free download, and runs anywhere (even on your phone)
“Save As” a CSV file from Google Sheets or Open Office.Slide20
UseOpen Office’s “
Calc
” app.
It has its own problems
You want to paste in the values from your Excel sheet but by default it makes all the fields fixed-width, thereby truncating your text.
So “
143 Grove Road is a 2 story, 5 bay, rectangular plan, wood frame, shingle-clad, Tudor Revival, residential building. The c. 1910, side gable-roofed house has three
pedimented
dormers of which one is a cross gable, 6/2 windows, and a Doric column-supported portico. The entrance is flanked by diamond-paned sidelights, and the cross gable features overhanging bracketed eaves. A one story side porch is set back from the main block.
“
Becomes “
So “143 Grove Road is a 2 story, 5 bay, rectangular plan, wood frame, shingle-clad, Tudor Revival, residential building. The c. 1910, side gable-roofed house has three pedimented dormers of which one is a cross gable, 6/2 windows, and a Doric “ (strangling sound here)Slide21
Right-click to “Paste Special” to get the whole thing. Use UTF-8! It will keep your special characters and even
linebreaks
.Slide22
Then select your settings for export. Seriously, despite what this says, use UTF-8. Slide23
And follow the available DH Press tutorials for the rest…
Documentation is complete, detailed and hard-to-find.
A complete walk-through
. Follow everything closely, it’s very detailed!
The DH Press documents page
. Links to the DH Press demo project with a complete data set you can experiment with.
The project’s blog
at UNC Chapel Hill.Slide24
These concepts are portable
You could use them to put data into
Timeliner
Or
Timemapper
(http://
timemapper.okfnlabs.org/)
Or
VisualEyes
(http://
viseyes.org/viseyes.htm)Slide25
A few DH Press tips
A
DH-Press “marker
” is a single line from your imported file. Don’t rely on the usual “publish” or “update” buttons while you develop your project.
Always
use the “Save Settings” button first!
Anything
you want to
display (text, image)
or use
(as a filter) needs
to be set up as a mote.Build your “entry points” using motes, which will describe each marker.When you define mote as “short text,” that means it can operate as a filter. You can (and should) assign colors or icons to each term.Slide26
And
Don’t be afraid to go back and rework your data
Some things you can easily fix
Other things will blow up your project
Google your problem – there’s probably a good answer out there.
Use
the testing and error-checking section.
Be sure the right maps are available. I like to use OSM Base.
If you change or edit pieces of data, use the “rebuild” button for that mote.
Don’t be intimidated by the custom field utilities’ warnings
. It can be a huge timesaver.