/
Elementary Functions in Julia Elementary Functions in Julia

Elementary Functions in Julia - PowerPoint Presentation

aaron
aaron . @aaron
Follow
389 views
Uploaded On 2017-06-21

Elementary Functions in Julia - PPT Presentation

Mustafa Mohamad Department of Mechanical Engineering MIT 18337 Fall 2016 Problem Develop pure Julia codes for elementary function Trigonometric functions sin cos tan sincos asin acos ID: 561912

code exp julia functions exp code functions julia fma function polynomial exp10 base amal sleef speed openlibm versions log

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Elementary Functions in Julia" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Elementary Functions in Julia

Mustafa Mohamad

Department of Mechanical Engineering, MIT

18.337, Fall 2016Slide2

Problem

Develop pure Julia codes for elementary function

Trigonometric functions

sin, cos, tan,

sincos

asin

,

acos

,

atan

, atan2

Hyperbolic functions

sinh

,

cosh

, tanh,

asinh

,

acosh

,

atanh

Exponential and logarithmic

log, log2, log10, log1p, ilog2,

exp

, exp2, exp10, expm1,

cbrt

, pow

Floating point manipulation

ldexp

, ilog2,

frexpSlide3

Grand Goal

Excise dependency on

openlibm

Basically a fork of  

freebsd's

msun

math library plus patches (originally developed by Sun microsystems)

Accumulated patches have gathered over the years that have not been ported over to

openlibmSlide4

The Julia Promise

Multiple dispatch and macros, and other features make it possible to write generic versions of algorithms and still get maximal performance

Generic versions often easier to understand

Get efficient code for Float16 and Float128

Float16: forthcoming in Julia 0.6/1.0

Float128: forthcoming in Julia 0.6/1.0Slide5

Background: Julia port of SLEEF C library

Takes advantage of SIMD

and minimal code branches

Look at code @

code_llvm

or @

code_native

Julia ported code benefits from auto SIMD vectorization

See: https://github.com/JuliaMath/Sleef.jlSlide6

SLEEF good accuracy but too slow

Avoiding branches necessitates unnecessary computations for some branches

Double-double arithmetic used to squeeze extra bits of precision and avoid branching (library too slow on non-

fma

systems, ~6-10 slower then Base)Slide7

Benchmarks

MEDIAN TIME RATIO SLEEF/BASE

Performance critically depends on availability of fused-multiply-add instruction (FMA)

sin/cos/tan have bugs so benchmark here is not fair, same for the log functions

exp10 function is very fast compared to base exp10, but the exp10 function in

openlibm

is dumb; it just calls the pow functionSlide8

Amal.jl

https://github.com/JuliaMath/Amal.jlSlide9

General lessons

Developing accurate and fast elementary functions is very hard and time consuming….

strict accuracy constrains for high quality implementation require error’s to be < 1

ulp

Must take advantage of the IEEE form of Floating point numbers to extract maximal speedSlide10

Important considerations

Ideally prefer polynomial approximants instead of rational approximants (division is much slower than

or

~can)

Availability of FMA instruction (fused multiply and add with no intermediate rounding)

Avoid algorithms that depend on table-lookups (throughout of CPU instructions is outpacing memory access speed, as a result these table-based methods have fallen out of flavor)

 Slide11

Testing

Critical

Possible to test all Float32 numbers

in a couple of minutes

Unfortunately it’s not possible to test all Float64 numbers

Benchmarking: @

code_llvm

/@

code_native

, @

code_warntype

,

BenchmarkTools.jl

 Slide12

1. Range reduction on argument to the smallest set

possible

2. Compute minimax approximation polynomial on

3. Scale back if necessary

Remez based algorithm to compute minimax polynomial

Minimizes norm

For rational approximations use

Padé

approximant

 

Developing approximantsSlide13

Developed Public Functions

exp

, exp2, exp10

exp

, exp2 have non-

fma

versions

(

msun

based)exp10 (polynomial < 1ulp for both fma and nonfma)logFloating point manipulation functions

frexp

,

ldexp

, significand, exponent Slide14

Contributing to Base’s math.jlSlide15

Pure Julia ldexp

function

regular branch: ~ 1.5x speedup

subnormal number and subnormal output: ~ 3.6x speedup

normal number and subnormal output: ~ 2.6x speedup

subnormal number and normal output: ~ 13x speedup

Generic over both argument x AND e (may be Int128,

BigInt

, etc., without sacrificing Int64/Int32 perf)

computes significand x 2 

exponent

without multiplication

Wins on readability and speed!

Generic!Slide16

exp function example

https://github.com/JuliaMath/Amal.jl/blob/master/src/exp.jl

Method

observe:

Argument reduction: Reduce x to an r so that |r| <= 0.5*ln(2).

Given x, find r and integer k such that

x = k ln(2) + r, |r|

0.5 ln(2).

2. Approximate

exp

(r) by a polynomial on the interval [-0.5*ln(2), 0.5*ln(2)]:

exp

(x) = 1.0 + x + polynomial(x),

3. Scale back:

exp

(x) = 2

k

exp

(r)

 

We have taken advantage of the float format!

Note the special representation!Slide17

median speed relative to base: 0.95 (Amal/Base)

Possible to make this a tad faster

custom macro that converts numeric coefficients to the same type as

with no overhead (dispatch)

 Slide18

exp for non-

fma

systems

Pade approximation to

A Remez algorithm is used to generate a polynomial to approximate

Therefore we can write

(it’s actually better to then carry out another long division step to minimize cancelation effects)

 Slide19

Thank you for listening!