/
PuReMD PuReMD

PuReMD - PowerPoint Presentation

mitsue-stanley
mitsue-stanley . @mitsue-stanley
Follow
455 views
Uploaded On 2015-12-07

PuReMD - PPT Presentation

A Reactive ReaxFF Molecular Dynamics Package Hassan Metin Akgulta Sagar Pandit Joseph Fogarty Ananth Grama aygcspurdueedu Acknowledgements Department of ID: 216848

performance water puremd silica water performance silica puremd shell scaling techniques time parallelization strain system interface parallel realization list

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "PuReMD" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

PuReMD: A Reactive (ReaxFF) Molecular Dynamics PackageHassan Metin Akgulta, Sagar Pandit, Joseph Fogarty, Ananth Gramaayg@cs.purdue.edu.

Acknowledgements: Department

of

Energy, National

Science FoundationSlide2

OutlineSequential Realization: SerialReaxAlgorithms and Numerical TechniquesValidationPerformance CharacterizationParallel Realization: PuReMDParallelization: Algorithms and TechniquesPerformance and ScalabilityApplicationsStrain Relaxation in Si/Ge/Si NanobarsWater-Silica Interface

Oxidative Stress in Lipid

BilayersSlide3

Sequential Realization: SerialReaxExcellent per-timestep running timeefficient generation of neighbors listselimination of bond order derivative listscubic spline interpolation: for non-bonded interactionshighly optimized linear solver: for charge equilibrationLinear scaling memory footprintfully dynamic and adaptive interaction listsRelated publication:Reactive Molecular Dynamics: Numerical Methods and Algorithmic TechniquesH. M. Aktulga, S. A.

Pandit

, A. C. T. van

Duin

, A. Y.

Grama

SIAM

Journal on Scientific

Computing,

2012.Slide4

SerialReax ComponentsSystemGeometryControl ParametersForce Field Parameters

Trajectory

Snapshots

System Status

Update

Program Log

File

SerialReax

Initialization

Read input data

Initialize data structs

Compute Bonds

Corrections are applied after all

uncorrected bonds are computed

Bonded Interactions

Bonds

Lone pairs

Over/UnderCoord

Valence Angles

Hydrogen Bonds

Dihedral/ Torsion

QEq

Large sparse linear system

PGMRES(50) or PCG

ILUT-based pre-conditioners

give good performance

Evolve the System

F

total = Fnonbonded + FbondedUpdate x & v with velocity VerletNVE, NVT and NPT ensembles

ReallocateFully dynamic and adaptive memory management: efficient use of resources large systems on a single processor

Neighbor Generation3D-grid based O(n) neighbor generation Several optimizations for improved performance

Init Forces Initialize the QEq coef matrix Compute uncorr. bond orders Generate H-bond lists

vd

Waals & electrostatics

Single pass over the

far

nbr

-list after

charges are updated

Interpolation with cubic

splines for nice speed-upSlide5

Generation of Neighbors Listatom list

3D Neighbors grid

atom list:

reorderedSlide6

Neighbors List OptimizationsSize of the neighbor grid cell is importantrnbrs: neighbors cut-off distanceset cell size to half the rnbrsReordering reduces look-ups significantlyAtom to cell distanceVerlet lists for delayed re-neighboring

neighbors grid

Just need to look inside these cells

No need to check these cells with reordering!

Looking for the neighbors of atoms in this box

If d > r

nbrs

, then no need to look into the cell

dSlide7

Elimination of Bond Order Derivative ListsBOij: bond order between atoms i and j at the heart of all bonded interactionsdBOij/drk: derivative of BOij wrt. atom knon-zero for all k sharing a bond with i or

j

costly force computations, large memory space needed

Analogy

Eliminate

dBO

list

delay computation of

dBO

terms

accumulate force related

coef

. into

CdBO

ij

compute

dBO

ij

/

dr

k

at the end of the step

f

k += CdBOij x dBOij

/drkSlide8

Look-up TablesExpensive non-bonded force computationstoo many interactions: rnonb ~ 10A vs. rbond ~ 4-5Abut simple, pair-wise interactionsReduce non-bonded force computationscreate look-up tablescubic spline interpolation: for non-bonded energy & forcesExperiment with a 6540 atom bulk water systemSignificant performance gain, high accuracySlide9

Linear Solver for Charge EquilibrationQEq Method: used for equilibrating chargesoriginal QEq paper cited 600+ timesapproximation for distributing partial chargessolution using Lagrange multipliers yieldsN = # of atomsH: NxN sparse matrixs & t fictitious charges: used to determine the actual charge

q

i

Too expensive for direct methods

 Iterative (

Krylov

sub-space) methodsSlide10

Basic Solvers for QEqSample systemsbulk water: 6540 atoms, liquidlipid bilayer system: 56,800 atoms, biological systemPETN crystal: 48,256 atoms, solidSolvers: CG and GMRESH has heavy diagonal: diagonal pre-conditioningslowly evolving environment : extrapolation from prev. solutionsPoor Performance:

tolerance level = 10

-6

which is fairly satisfactory

much worse at 10

-10

tolerance level

due to cache effects

more pronounced here

# of iterations = # of matrix-vector multiplications

actual running time in seconds

fraction of total computation timeSlide11

ILU-based preconditioningILU-based pre-conditioners: no fill-in, 10-2 drop toleranceeffective (considering only the solve time)no fill-in + threshold: nice scaling with system sizeILU factorization is expensive

bulk water system

bilayer system

cache effects are still evident

system/solver

time to

compute

preconditioner

solve time (s)

total time (s)

bulk water/ GMRES+ILU

0.50

0.04

0.54

bulk water/

GMRES+diagonal

~0

0.11

0.11Slide12

ILU-based preconditioningObservation: can amortize the ILU factorization costslowly changing simulation environment re-usable pre-conditionersPETN crystal:solid, 1000s of steps!Bulk water:liquid, 10-100s of steps!Slide13

Memory ManagementCompact data-structuresDynamic and adaptive listsinitially: allocate after estimationat every step: monitor & re-allocate if necessaryLow memory foot-print, linear scaling with system size

n-1

n

n-1’s data

n’s data

in CSR format

neighbors list

Qeq matrix

3-body intrs

n-1

n

n-1’s data

n’s data

reserved for n-1

reserved for n

in modified CSR

bonds list

hbonds listSlide14

ValidationHexane (C6H14) Structure ComparisonExcellent agreement!Slide15

Comparison to MD MethodsComparisons using hexane: systems of various sizesAb-initio MD: CPMDClassical MD: GROMACSReaxFF: SerialReaxSlide16

Comparison to LAMMPS-ReaxFTime per time-step comparisonQeq solver performanceMemory foot-print

different QEq formulations

similar results

LAMMPS:

CG / no preconditionerSlide17

OutlineSequential Realization: SerialReaxAlgorithms and Numerical TechniquesValidationPerformance CharacterizationParallel Realization: PuReMDParallelization: Algorithms and TechniquesPerformance and ScalabilityApplicationsStrain Relaxation in Si/Ge/Si NanobarsWater-Silica Interface

Oxidative Stress in Lipid

BilayersSlide18

Parallel Realization: PuReMDBuilt on the SerialReax platformExcellent per-timestep running timeLinear scaling memory footprintExtends its capabilities to large systems, longer time-scalesScalable algorithms and techniquesDemonstrated scaling to over 3K coresRelated publication:Parallel Reactive Molecular Dynamics: Numerical Methods and Algorithmic TechniquesH. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. GramaParallel Computing, 2012.Slide19

Parallelization: DecompositionDomain decomposition: 3D torusSlide20

Parallelization: Outer-Shell

b

r

b

r

b

r/2

b

r

full shell

half shell

midpoint-shell

tower-plate shellSlide21

Parallelization: Outer-Shell

b

r

b

r

b

r/2

b

r

full shell

half shell

midpoint-shell

tower-plate shell

choose full-shell due to dynamic bonding despite the comm. overheadSlide22

Parallelization: Boundary Interactionsrshell= MAX (3xrbond_cut, rhbond_cut, rnonb_cut)Slide23

Parallelization: MessagingSlide24

Parallelization: Messaging PerformancePerformance Comparison: PuReMD with direct vs. staged messagingSlide25

Parallelization: OptimizationsTruncate bond related computationsdouble computation of bonds at the outer-shellhurts scalability as the sub-domain size shrinkstrim all bonds that are further than 3 hops or more Scalable parallel solver for the QEq problemGMRES/ILU factorization: not scalableuse CG + diagonal pre-conditioninggood initial guess:redundant computations: to avoid reverse communicationiterate both systems togetherSlide26

PuReMD: Performance and ScalabilityWeak scaling testStrong scaling testComparison to LAMMPS-REAXPlatform: Hera cluster at LLNL4 AMD Opterons/node -- 16 cores/node800 batch nodes – 10800 cores, 127 TFLOPS/sec32 GB memory / nodeInfiniband interconnect42nd on TOP500 list as of Nov 2009Slide27

PuReMD: Weak ScalingBulk Water: 6540 atoms in a 40x40x40 A3 box / coreSlide28

PuReMD: Weak ScalingQEq ScalingEfficiencySlide29

PuReMD: Strong ScalingBulk Water: 52320 atoms in a 80x80x80 A3 boxSlide30

PuReMD: Strong ScalingEfficiency and throughputSlide31

OutlineSequential Realization: SerialReaxAlgorithms and Numerical TechniquesValidationPerformance CharacterizationParallel Realization: PuReMDParallelization: Algorithms and TechniquesPerformance and ScalabilityValidation ApplicationsStrain Relaxation in Si/Ge/Si NanobarsWater-Silica Interface

Oxidative Stress in Lipid

BilayersSlide32

LAMMPS-Reax/C User CommunityKonstantin Shefov - Sankt-Peterburgskij Gosudarstvennyj UniversitetCamilo Calderon - Boston UniversityRicardo Paupitz Barbosa dos Santos - Universidade Estadual de MaringaShawn Coleman - University of ArkansasPaolo Valentini - University of MinnesotaHengji Zhang - University of Texas at DallasBenjamin Jensen - Michigan Technological UniversityXiao Dong Han - Beijing UniversityRobert Meissner - Fraunhofer Institute for Manufacturing Technology and Advanced Materials, BremenJames Larentzos - High Performance Technologies, Inc. (HPTi) Hegoi Manzano - Concrete Sustainability Hub, MITOlivier POLITANO - Laboratoire Interdisciplinaire Carnot de BourgogneTi Leggett – University of ChicagoSlide33

Si/Ge/Si nanoscale bars MotivationSi/Ge/Si nanobars: ideal for MOSFETsas produced: biaxial strain, desirable: uniaxialdesign & production: understanding strain behavior is important

Si

Si

Ge

Width (W)

Periodic boundary

conditions

Height (H)

[100], Transverse

[001], Vertical

[010] , Longitudinal

Related publication:

Strain relaxation in Si/Ge/Si nanoscale bars from molecular dynamics simulations

Y. Park, H.M. Aktulga, A.Y. Grama, A. Strachan

Journal of Applied Physics 106, 1 (2009)Slide34

Si/Ge/Si nanoscale bars Key Result:When Ge section is roughly square shaped, it has almost uniaxial strain!

W = 20.09 nm

Si

Ge

Si

average transverse Ge strain

average strains for Si&Ge in each dimension

Simple strain model derived from MD resultsSlide35

Water-Silica InterfaceMotivationa-SiO2: widely used in nano-electronic devicesalso used in devices for in-vivo screeningunderstanding interaction with water: critical for reliability of devicesRelated publication:A Reactive Simulation of the Silica-Water Interface J. C. Fogarty, H. M. Aktulga, A. van Duin, A. Y. Grama, S. A. Pandit Journal of Chemical Physics 132, 174704 (2010) Slide36

Water-Silica InterfaceWater model validationSilica model validationSlide37

Water-Silica InterfaceSilicaWaterSiO

O

O

H

H

Key Result

Silica surface hydroxylation as evidenced by experiments is observed.

Proposed reaction:

H2O + 2Si + O

 2SiOHSlide38

Water-Silica InterfaceKey ResultHydrogen penetration is observed: some H atoms penetrate through the slab.Ab-initio simulations predict whole molecule penetration.We propose: water is able to diffuse through a thin film of silica via hydrogen hopping, i.e., rather than diffusing as whole units, water molecules dissociate at the surface, and hydrogens diffuse through, combining with other dissociated water molecules at the other surface.Slide39

Oxidative Damage in Lipid BilayersMotivationModeling reactive processes in biological systemsROS Oxidative stress Cancer & AgingSystem Preparation200 POPC lipid + 10,000 water molecules and same system with 1% H2O2 mixedMass Spectograph: Lipid molecule weighs 760 u

Key Result

Oxidative damage observed:

In pure water:

40% damaged

In 1% peroxide:

75% damagedSlide40

ConclusionsAn efficient and scalable realization of ReaxFF through use of algorithmic and numerical techniquesDetailed performance characterization; comparison to other methodsApplications on various systemsLAMMPS/Reax/C and PuReMD strand-alone Reax implementations available over the public domain.Slide41

References Reactive Molecular Dynamics: Numerical Methods and Algorithmic TechniquesH. M. Aktulga, S. A. Pandit, A. C. T. van Duin, A. Y. GramaSIAM Journal on Scientific Computing (2012). Parallel Reactive Molecular Dynamics: Numerical Methods and Algorithmic TechniquesH. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. GramaParallel Computing, to appear (2012).Strain relaxation in Si/Ge/Si nanoscale bars from molecular dynamics simulations

Y. Park,

H.M.

Aktulga

, A.Y.

Grama

, A. Strachan

Journal of Applied Physics 106, 1 (2009)

A Reactive Simulation of the Silica-Water Interface

J. C. Fogarty,

H. M.

Aktulga

, A. van

Duin

, A. Y.

Grama

, S. A. Pandit Journal of Chemical Physics 132, 174704 (2010)

Related Contents


Next Show more