/
Practical Performance Tuning Practical Performance Tuning

Practical Performance Tuning - PowerPoint Presentation

calandra-battersby
calandra-battersby . @calandra-battersby
Follow
373 views
Uploaded On 2018-03-19

Practical Performance Tuning - PPT Presentation

For Your Progress OpenEdge Database A Few Words About The Speaker Tom Bascom Roaming DBA amp Progress User since 1987 President DBAppraise LLC Remote Database Management Service ID: 656483

blocks amp data customer amp blocks customer data index sec cache file size oct users integer 000 100 variable dbname larger cluster

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Practical Performance Tuning" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Practical Performance Tuning

For Your

Progress

OpenEdge

DatabaseSlide2

A Few Words About The Speaker

Tom

Bascom

, Roaming DBA & Progress User since 1987

President,

DBAppraise

, LLC

Remote Database Management Service.

Simplifying the job of Managing and Monitoring the world’s best business applications.

tom@dbappraise.com

VP, White Star Software, LLC

Expert Consulting Services related to all aspects of Progress and

OpenEdge

.

tom@wss.com Slide3

Tom’s Top Twenty

Tuning Tips

In No Particular Order.Slide4

{inc/disclaimer.i}

Your

kilometerage

will

vary.

These tips are in no special order. But sometimes order does matter.

In an ideal world you will change one thing at a time, test and remove changes that do not have the desired effect.

The real world isn’t very ideal.Slide5

#20 Dump & LoadYou

shouldn’t

need to

routinely

dump & load.

If you’re on OE10,

Using type 2 areas,

That have been well defined,

And you have a well-behaved application.

The rest of us benefit from an occasional D&L. Slide6

#20 Dump & Load

DB-20 Highly Parallel Dump & Load

http://www.psdn.com/ak_download/media/exch_audio/2007/DB/DB-20_Bascom.pdfSlide7

#19 Stay Current

Up to date releases of Progress, your OS and your application are essential components of a well tuned system.

You cannot take advantage of the best techniques, improved algorithms or new hardware without staying up to date.

Failure to stay up to date may mean poor performance, increased costs and

uncompetitiveness

.Slide8

#19 Stay Current

TRX throughput in v8 -> v9 (CCLP)

Major 4GL execution speed improvements in 9.1E.

64 bit platform support and large memory.

10.1A Automatic Defragmentation.

10.1B Workgroup/

Multicore

bug addressed.

Significant improvements in DB read performance in 10.1C (-spin enhancements).Slide9

#19 The Impact of an Upgrade

conv89Slide10

#18 Parallelize

Step outside of the box and consider what portions of your system could benefit from being parallelized:

MRP Runs

Nightly Processing

Reports

Data Extracts

Data ImportsSlide11

#18 Parallelize

$

mbpro

dbname –p

exp.p

param

“01|0,3000”

$

mbpro

dbname

–p

exp.p

param

“02|3000,6000”

$

mbpro

dbname

–p

exp.p

param

“03|6000,9999”

/*

exp.p

*/

define variable

startCust

as integer no-undo.

define variable

endCust

as integer no-undo.

startCust

= integer( entry( 1, entry( 2,

session:parameter

, “|” ))).

endCust

= integer( entry( 2, entry( 2,

session:parameter

, “|” ))).

output to value( “export.” + entry( 1,

session:parameter

, “|” ).

for each customer no-lock where

custNum

>=

startCust

and

custNum

<

endCust

:

export customer.

end.

output close.

quit.Slide12

#17 Update Statistics

SQL-92 uses a cost based optimizer…

But it cannot optimize without knowledge of the cost! (data distribution).

Weekly or monthly “update statistics” is appropriate for most people.

Or when 20% of your data has changed.

This is a data intense process:

Run it during off hours if you can.

You might want to only do a few tables/indexes at a time.Slide13

#17 Update Statistics

$ cat customer.sql

UPDATE

TABLE STATISTICS

AND INDEX STATISTICS

AND ALL COLUMN STATISTICS

FOR

PUB.Customer

;

COMMIT WORK;

$DLC/bin/

sqlexp

\

-db

dbname

\

-S

portnum

\

-

infile

customer.sql \

-

outfile

customer.out

\

-user username \

-password passwd >> customer.err 2>&1

For More Information:

OpenEdge

Data Management: SQL Development

Chapter 10, Optimizing Query PerformanceSlide14

#16 Progress AppServer

Used to reduce network traffic and latency.

When properly implemented it will minimize the path length between business logic and data persistence layers.

IOW, for best performance, the

AppServer

should live on the same server as the database and use a self-service connection.

Exception: An

AppServer

which is dedicated to making Windows API calls.Slide15

procedure

asproc

for each …

end.

end.

#16 Progress

AppServer

run

asproc

on …

procedure

asproc

for each …

end.

end.Slide16

#15 fork() & exec()Very expensive system calls.

Multiple context switches, process creation & tear-down, IO to load executables etc.

Complex shell scripts (the old UNIX “

lp

” subsystem).

Long PATH variables.

4GL usage of UNIX, DOS, OS-COMMAND and INPUT THROUGH.Slide17

#15 fork() & exec()

define variable

i

as integer no-undo.

define variable

fSize

as integer no-undo.

etime

( yes ).

do

i

= 1 to 1000:

input through value( "

ls

-ld .." ).

import ^ ^ ^ ^

fSize

.

input close.

end.

display

etime

fSize

.

define variable

i

as integer no-undo.

define variable

fSize

as integer no-undo.

etime

( yes ).

do

i

= 1 to 1000: file-info:file-name = "..". fSize = file-info:file-size.end.display etime fSize.3140ms, at least 1000 calls to each ofopen(), close(), fork(), exec(), read()complete with multiple context switchesper invocation.16ms, 1,000 stat() calls.Slide18

#14 -spin

Almost all new machines, even desktops & laptops are now multi-core.

Do

NOT

use the old X * # of CPUs rule to set

–spin.

It is bogus.

Bigger is not always better with –spin!

Modest values (5,000 to 10,000)

generally

provide the best and most consistent results for the vast majority of people.

Use

readprobe.p

to explore.

Check out Rich

Banville’s

Superb Exchange 2008 Presentation!Slide19

#14 -spin

OPS-28 A New Spin on Some Old Latches

http://www.psdn.com/ak_download/media/exch_audio/2008/OPS/OPS-28_Banville.pptSlide20

#13 bi cluster size

The idea is to reduce the number and frequency of checkpoints giving APWs plenty of time to work.

Larger bi clusters permit spikes in the workload to take place without ambushing the APWs.

Easy to overlook when building new db via

prostrct

create…

512 is the default OE 10 bi cluster size.

8192 is good for small systems.

16384 is “a good start” for larger systems.

Longer REDO phase on startup so don’t get crazy.

NOT

a good idea for “Workgroup” database licenses. For WG small values (512 or 1024) are better.Slide21

#13 bi cluster size

$

grep

‘(4250)’

dbname.lg

Before-Image Cluster Size: 524288.

$

proutil

dbname

-C truncate bi -bi 16384

(1620) Before-image cluster size set to 16384 kb.

Before-Image Cluster Size: 16777216.

$

proutil

dbname

-C -

bigrow

8Slide22

#12 APW, AIW, BIW & WDOGAlways start a BIW

Always start an AIW

Start WDOG

Two APWs are usually enough:

Too many is just a (small) waste of CPU cycles.

If you are consistently flushing buffers at checkpoints increase bi cluster size and add an APW (one at a time until buffer flushes stop).Slide23

#12 APW, AIW, BIW & WDOG

APW Data

BIW & AIW DataSlide24

#11 Larger db Blocks

Larger blocks result in

much

more efficient IO.

Fewer IO ops mean less contention for disk.Moving from 1k to 4k is huge. 4k to 8k is relatively less huge but still very valuable.

8k works best in most cases. Especially read-heavy workloads.

Better disk space efficiency (tighter packing, less overhead).

Don’t forget to adjust –B and Rows Per Block!Slide25

#11 Larger db BlocksLarge Blocks reduce IO, fewer operations are needed to move the same amount of data.

More data can be packed into the same space because there is proportionally less overhead.

Because a large block can contain more data it has improved odds of being a cache “hit”.

Large blocks enable HW features to be leveraged. Especially SAN HW.Slide26

#11 Larger db Blocks

8k Blocks

8k Blocks

New ServerSlide27

#10 Type 2 Storage AreasData Clusters – contiguous blocks of data that are homogenous (just one table).

64 bit ROWID.

Variable (by area) rows per block.

All data should be in type 2 areas – until you

prove

otherwise.

Storage Optimization Strategies! Slide28

#10 Type 2 Storage Areas

Storage areasSlide29

#9 Transactions

Distinguish between a “business transaction” and a “database transaction”.

Do not try to abuse a database transaction to enforce a business rule:

You may need to create “reversing (business) transactions”.

Or

restartable

transactions.

For large database operations “chunk” your transactions.Slide30

#9 “Chunking” Transactions

define variable

i

as integer no-undo.

outer: do for customer transaction while true:

inner: do while true:

i

=

i

+ 1.

find next customer

exclusive-lock no-error.

if not available customer then leave outer.

discount = 0.

if

i

modulo

100

= 0 then next outer.

end.

end.Slide31

#8 Minimize Network Traffic

Use FIELD-LIST in queries.

Use –cache and –pls.

NO-LOCK queries pack multiple records into a request and eliminate lock downgrade requests.

Watch out for client-side sorting and selection on queries.

Remember that CAN-DO is evaluated on the CLIENT (yet another reason not to use it).

Use -

noautoresultlist

/FORWARD-ONLY.Slide32

#8 Minimize Network TrafficUse a secondary broker to isolate high activity clients (such as reports).

Consider setting –Mm to 8192 or larger.

Use –

Mn

to keep the number of clients per server low (3 or less).

Use –Mi 1 to spread connections across servers.Slide33

#7 Runaways, Orphans, Traps & Kills

Consume entire cores doing nothing useful.

These are sometimes caused by bugs.

But that is rare.

More likely is a poorly conceived policy of restricting user logins.

The UNIX “trap” command is often at the bottom of these problems.Slide34

#7 Runaways, Orphans, Traps & Kills

http://dbappraise.com/traps.htmlSlide35

#6 The Buffer Cache

The cure for disk IO is RAM.

Use RAM to buffer and cache IO ops.

Efficiency of –B:

Is loosely measured by

hit ratio

.

Changes follow an inverse square law.

So to make a noticeable change in hit ratio you must make a

large

change to –B.

100,000 is “a good start” (800MB @ 8k blocks).Slide36

#6 The Buffer Cache

Layer

Time

# of

Recs

# of Ops

Cost per Op

Relative

Progress

to –B

0.96

100,000

203,473

0.000005

1

-B to FS Cache

10.24

100,000

26,711

0.000383

75

FS

Cache to SAN

5.93

100,000

26,711

0.000222

45

-B to SAN Cache*

11.17

100,000

26,711

0.000605

120

SAN

Cache to Disk

200.35

100,00026,7110.0075001500-B to Disk211.52100,00026,7110.0079191585* Used concurrent IO to eliminate FS cache

In Big B You Should Trust!Slide37

#5 Rapid ReadersSimilar to a runaway – consumes a whole CPU

But is actually doing db IO

Usually caused by:

Table scans

Poor index selection.

Unmonitored batch processes and app-servers.

Really bad algorithm choices.Slide38

#5 Rapid Readers

High Read Rate

Suspicious user IO

Code being run!Slide39

#4 Balance & Isolate IO

Use more than one disk:

A fast disk can do 150 or so

random

IO Ops/sec.Kbytes/sec is a measure of

sequential

IO.

OLTP is mostly

random

.

Don’t waste time trying to “manually stripe”.

Instead, use “hardware” striping and mirroring.

Isolate AI extents for

safety

, not performance.

Isolate temp-file, application, OS and “other” IO.Slide40

#4 Balance & Isolate IO

fillTime

=

cacheSize

/ (

requestRate

serviceRate

)

Typical Production DB Example (4k db blocks):

4GB / ( 200

io

/sec – 800

io

/sec ) = cache doesn’t fill!

Heavy Update Production DB Example:

4GB / ( 1200

io

/sec – 800

io

/sec ) = 2621 seconds (RAID10)

4GB / ( 1200

io

/sec – 200

io

/sec ) = 1049 seconds (RAID5)

Maintenance Example:

4GB / ( 5000

io

/sec – 3200

io

/sec ) = 583 seconds (RAID10)

4GB / ( 5000

io

/sec – 200

io

/sec ) = 218 seconds (RAID5)Slide41

#3 Manage Temp File IOTemp-file IO can exceed db IO.

Sometimes by 2:1, 3:1 or more!

-T isolates temp file IO.

-t helps you to crudely diagnose the source of IO.

-y provides some detail regarding r-code swapping.

-

mmax

buffers r-code, 4096 is a good start for

ChUI

, 16384 for GUI.

Memory mapped procedure libraries cache r-code.

Use –Bt & -

tmpbsize

to tune 4GL temp-tables.Slide42

#3 Manage Temp File IO

-

rw

-r--r-- 1 VEILLELA users 579312 Oct 19 15:16

srtrAyhEb

-

rw

-r--r-- 1

wrightb

users 35697664 Oct 19 15:16 srtH6miqb

-

rw

-r--r-- 1 STEELEJL users 36772864 Oct 19 15:16 srtz37kyb

-

rw

-r--r-- 1 THERRIKS users 0 Oct 19 07:12

srt

--

Elab

-

rw

-r--r-- 1 root users 17649 Oct 19 15:16 lbiV6Qp7a

-

rw

-r--r-- 1 root users 34704 Oct 19 15:16

lbi-TymMa

-

rw

-r--r-- 1

wrightb

users 811008 Oct 19 15:16

DBIHDmiqc

-

rw

-r--r-- 1 BECKERLM users 8192 Oct 19 11:06 DBI--

Abac

-

rw-r--r-- 1 CALUBACJ users 8192 Oct 19 09:16 DBI--AbycCLIENT.MON (-y)Program access statistics: Times BytesReads from temp file: 0 0Writes to temp file: 0 0Loads of .r programs: 14 524594Saves of compilation .r's: 0 0Compilations of .p's: 0 0Checks of files with stat: 165 0Slide43

#2 Index CompactCompacts Indexes.

Removes deleted record placeholders.

Improves “utilization” = fewer levels & blocks and more index entries per read.

Runs online or offline.

Available since version 9.Slide44

#2 Index Compact

Do

NOT

set target % for 100!

Consider compacting when utilization < 70%… and blocks > 1,000.

proutil

dbname

–C

idxcompact

table.index

target%

INDEX BLOCK SUMMARY FOR AREA "

APP_FLAGS_Idx

" : 96

-------------------------------------------------------

Table Index Fields Levels Blocks Size %

Util

Factor

PUB.APP_FLAGS

AppNo

183 1 3

4764

37.1M

89.9

1.2

FaxDateTime

184 2 2

45

259.8K

72.4

1.6

FaxUserNotified 185 2 2 86

450.1K 65.6 1.7INDEX BLOCK SUMMARY FOR AREA "Cust_Index" : 10-------------------------------------------------------Table Index Fields Levels Blocks Size % Util FactorPUB.Customer Comments 13 1 1 1 874.0B 21.5 1.0 CountryPost 14 2 2 4 9.0K 56.5 1.9

CustNum 12 1 2 4 9.9K 62.2 1.8

Name 15 1 2 9 22.5K 62.9 1.7

SalesRep 16 1 1 1 1.3K 33.6 1.0Slide45

#1 Stupid 4GL TricksBad code will defeat any amount of heroic tuning and excellent hardware.

Luckily bad code is often advertised by the perpetrator as having been developed to “improve performance”.

Just because a feature of the 4GL

can

do something doesn’t mean that it

should

be used to do it.Slide46

#1 Stupid 4GL Tricks

/* SR#1234 – enhanced lookup to improve performance! */

update

cName

.

find first customer where

cName

matches customer.name

use-index

custNum

no-error.

Or –

find first customer where can-do(

cName

, name )

use-index

custNum

no-error.Slide47

Questions?Slide48

Thank You!