High Performance Computing with R PowerPoint Presentation, PPT - DocSlides

High Performance Computing with R PowerPoint Presentation, PPT - DocSlides

2017-12-05 71K 71 0 0

Description

For many years, R had a reputation for being slow, unable to process large datasets. This was true until c2012 when its compiler caw switched from `parse tree’ to `byte code’. Its speed is now similar to Python, Java, . ID: 612629

Embed code:

Download this presentation



DownloadNote - The PPT/PDF document "High Performance Computing with R" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Presentations text content in High Performance Computing with R

Slide1

High Performance Computing with R

For many years, R had a reputation for being slow, unable to process large datasets. This was true until c2012 when its compiler caw switched from `parse tree’ to `byte code’. Its speed is now similar to Python, Java,

Matlab

& IDL. All of these are slower than Fortran, C and C++.

# Benchmarking R: Ten million

element vector on MacBook laptop

w <-

rnorm

(10000000)

#

0.9

sec

wsort

<- sort(w)

#

1.4 sec

wfft

<-

fft

(w)

# 1.8 sec

foo

<- 0

loop

<-

function

(n) {

for

(i in 1:n) {

if

(tan(i) > 0.5)

foo

=

foo

+

atan

(i)^{2/3 * w[i]} } }

loop(10000000)

#

6.6 sec Loop

with

nonlinear

computation

quartz() ; plot(

w) #

190 sec

write

(format(w,

digits

=2))

#

34 sec

save(w, file='

w.out

')

#

0.1

sec

Slide2

R programming toolslibrary(help=‘utils’) & ‘base’ & ‘tools’

Program flow (if, for, else, while, repeat, break, next, stop)Host computer (system, list.files, source, readline, Rscript, pipe)Editors, IDEs and GUIs (Rstudio, Jupyter, edit, emacs, vi)Debugging (debug, browser, try, traceback, Rprof, testthat)LaTeX (xtable, Sweave, knitr)Language interfaces (C, C++, Fortran, Python, Java, Julia, Matlab, SQL, HTML, Oracle, Tcl/Tk, BUGS, JAGS, Stan)Use rpy2 to access R from Python: conda install rpy2 ## on terminal, installs R and rpy2pip install rpy2 ## in (i)Python, requires R previously installedimport rpy2import rpy2.robjects as robjectsR = robjects.rranGauss = R.rnorm(100)print(ranGauss)

Slide3

Strategies for speeding up R code

Precompile user-created functions

User vector/matrix operations where possible

Avoid

for(

i

in 1:N)

loops where possible

Use external C, Fortran or C++ routines for computationally intensive steps

Profile your code:

Rprof

, CRAN

rbenchmark

,

microbenchmark

Slide4

CRAN packages for parallel processing

R is not intrinsically designed for parallel processing but, due to utilities for interaction with the host computer, dozens of CRAN packages are now available to facilitate parallel & distributed computing

p

arallel

and

foreach

functions distributes

for

loop to resident cores

m

ulticore, batch & condor

serve multicore computers

m

clapply

applies any function to each element of a vector in parallel

h

2o

facilitates machine learning (e.g. RFs, ANNs) in a parallel environment

CRAN

HadoopStream

& hive

serve

MapReduce

in

Hadoop

environment

CRAN

cloudRmpi

serves MPI and

Gputools

, magma

&

OpenCl

serve GPU clusters

RevoScaleR

integrates R with Microsoft SQL Server

d

atatable

,

ff

&

bigmemory

treat large out-of-memory datasets

Slide5

While originally designed for an individual

exploring

small datasets,

R can be pipelined and can treat

megadatsets

also

R scripts are very compact

Two Penn State Ph.D. theses completed in

10

2

lines of code


About DocSlides
DocSlides allows users to easily upload and share presentations, PDF documents, and images.Share your documents with the world , watch,share and upload any time you want. How can you benefit from using DocSlides? DocSlides consists documents from individuals and organizations on topics ranging from technology and business to travel, health, and education. Find and search for what interests you, and learn from people and more. You can also download DocSlides to read or reference later.