Data Mining (scraping) & Interdisciplinary

Data Mining (scraping) & Interdisciplinary Data Mining (scraping) & Interdisciplinary - Start

2017-10-13 32K 32 0 0

Data Mining (scraping) & Interdisciplinary - Description

Research in Law: .  . Gov. Docs: Domestic and Foreign. Eric C. Glass.  . &. Dana Neacsu. Web Scraping. Extracting and parsing formatted data from a web page *(HTML, XHMTL, JSON etc.).. Automated or manual. ID: 595535 Download Presentation

Download Presentation

Data Mining (scraping) & Interdisciplinary




Download Presentation - The PPT/PDF document "Data Mining (scraping) & Interdiscip..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.



Presentations text content in Data Mining (scraping) & Interdisciplinary

Slide1

Data Mining (scraping) & Interdisciplinary Research in Law:  Gov Docs: Domestic and Foreign

Eric C. Glass &Dana Neacsu

Slide2

Web Scraping

Extracting and parsing formatted data from a web page *(HTML, XHMTL, JSON etc.).

Automated or manual

Python

Beautiful Soup -

https://www.crummy.com/software/BeautifulSoup/

 

Toolkit for

dissecting a document and extracting what you need. It doesn't take much code to write an

application

Manages encodings

Sits

on top of popular Python parsers like 

lxml

 and 

html5lib

Gathering election results example:

http://www.b-list.org/weblog/2010/nov/02/news-done-broke/

Slide3

Web Scraping

Automated tools (no programing)

Import.io

Cloud based web application

https://www.import.io/

No longer free apparently, but free trial is available

Webscraper.io

Chrome browser plug in available free

http://webscraper.io/

 Sitemap building, data extraction and export are all done within

browser

Have not used, but there is

Youtube

:

https://www.youtube.com/watch?v=y00t5NpW7pY

Slide4

Web Scraping

Premade tools (many applications on GitHub)

Example – NYPD Crash Data Band Aid

http://blog.johnkrauss.com/nypd-crash-data-band-aid/

On

Github

-

https://github.com/talos/nypd-crash-data-bandaid

NYPD released data based through “idiotically obfuscated PDFs”

Tool is built in python and on top of

xpdf

and

wget

Slide5

Training and Help

Lynda.com (through libraries license)

Python: Programming Efficiently

Code Academy.

Python intro in addition to a variety of web based APIs

Digital centers in the libraries

Python open lab

R open lab

Collaboratory

at Columbia University

An

appointment-based free consulting service for students and researchers at Columbia University that offers assistance with planning and executing data driven research projects, including help with data visualization, analysis and prediction, both in conceptual terms and with concrete software implementations

.

https://www.surveymonkey.com/r/CollaboratoryClinic

Slide6

Read your question for research clues:

Q. 1. What is the connection between gun ownership and public health?

Slide7

Open Data

Map of open data policies:

http://www.opendatapolicies.org/browse/

NYC Open data

On March 7, 2012, former Mayor Bloomberg signed Local Law 11 of 2012, more commonly known as the “Open Data Law,” which amended the New York City administrative code to mandate that all public data be made available on a single web portal by the end of 2018

.

https://opendata.cityofnewyork.us/

NYS Open Data

FOIA & FOIL requests

https://www.dos.ny.gov/coog/freedomfaq.html#denybroad

Slide8

Collections

Inter University

Consortium for Political and Social

Research

(

ICPSR)

http://www.icpsr.umich.edu/icpsrweb/ICPSR/

A rich data archive of over 7,500 titles presented with full documentation and most with data formatted for use in standard statistical packages.

ProQuest Statistical Insight

https://

clio.columbia.edu/catalog/2334507

Provides statistical data from U.S. government publications from 1973, state and private sources from 1980, and international organizations from 1983.

Historical Statistics of the United States

https://

clio.columbia.edu/catalog/5634151

ProQuest statistical abstract of the U.S. 

http://www.columbia.edu/cgi-bin/cul/resolve?clio10126076

DSSC data catalogs

http://library.columbia.edu/locations/dssc.html

Social Sciences resources - http://library.columbia.edu/locations/dssc/data/socsc.html

Slide9

Open data:

USA.gov, Data.gov, Data.un.org

Government Agency websites

Sunlight Foundation

Closed data:

ICPSR

Govistics

Columbia Spatial Data Catalog

Slide10

Read your question for research clues:

Q. 2.

 Creating a false appearance of active trading in the market by investors is a domestic and international problem. Can it be regulated?

Slide11

Clarify your question (break it into smaller concepts)

Do you need contextual information? (literature search)

Can you go for the primary source, the rule?

Where do you start?

Slide12

Remember a mere Google search may give you the right starting point

Slide13

Answer – US regulations

Doing the research!

Depending on your research needs you may or may not skip the literature search (regular law-based databases)

Find

gov

docs free-of charge databases

Use a library guide

Gov

reports

(CRS reports)

Agency reports and other activities

Use fee-based databases

Bloomberglaw.com

Practical Law (from Westlaw)

Slide14

Remember a mere Google search may give you the right starting point

Slide15

Answer – EU regulations

Doing the research!

Depending

on

your research needs you may or may not skip the literature search (regular law-based databases)

The main EU database is free of charge:

Europa

Use a library guide

Use the database itself

Find legislation on your topic

The main UN databases are free of charge

UNCITRAL

Google Searches

Slide16

Questions?

Eric C. Glass Email: ecg2104@columbia.eduDana NeacsuEmail: edn13@columbia.edu


About DocSlides
DocSlides allows users to easily upload and share presentations, PDF documents, and images.Share your documents with the world , watch,share and upload any time you want. How can you benefit from using DocSlides? DocSlides consists documents from individuals and organizations on topics ranging from technology and business to travel, health, and education. Find and search for what interests you, and learn from people and more. You can also download DocSlides to read or reference later.