/
The Deepest Depths of The Deepest Depths of

The Deepest Depths of - PowerPoint Presentation

yoshiko-marsland
yoshiko-marsland . @yoshiko-marsland
Follow
391 views
Uploaded On 2016-10-18

The Deepest Depths of - PPT Presentation

promon Presented by Gus Bjorklund amp Dan Foreman And how it may help in troubleshooting certain DB problems Gus Bjorklund Progress Wizard Dan Foreman Progress User since 1984 Author of several Progress related ID: 477738

sec latch buf lock latch sec lock buf lru counts 65987456 activity amp time buffer total screen locked debghb share screens option

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "The Deepest Depths of" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

The Deepest Depths of promon

Presented by: Gus Bjorklund & Dan Foreman

And how it may help in troubleshooting certain DB problemsSlide2

Gus BjorklundProgress WizardSlide3

Dan ForemanProgress User since 1984Author of several Progress related PublicationsNews Flash – A new publication which is a superset of this

presentation, titled promon debghb, will be available shortlyAuthor of several cool and useful Progress DBA ToolsProMonitor & ProCheck & LockMonPro Dump&LoadBalanced BenchmarkBasketball Fanatic…which sometimes leads to unexpected trips to the ERSlide4

Dan ForemanSlide5

Brief History of the debghb option in promonAdded to promon

V6.3 by GusPurpose: The shared memory architecture introduced in V6.3 was quite a bit different and a way to monitor shared memory activity at a detailed level was neededThe debghb option was not a formally endorsed enhancement but written by Gus in his spare timedeb = DEBugghb = Gus’s initials….the middle initial is a

Top Secret

fanatically guarded by

the Finnish governmentSlide6

WarningsThe debghb option is not documented

by ProgressDO NOT call or email Gus or PTS for help in using this optionMany of the screens and/or metrics have no value to a DBAThe view of the data is not transactionally consistent sometimes even on the same screen; example to followSome of the data is not accurate (data overflow, rounding errors,

etc)

Some of the screens are broken (don’t display any data)

The

debghb

option can be altered, removed, or hidden by Progress any time they want toSlide7

Warnings #2DB activities in shared memory can be slowed down if certain options are enabled

05/08/13 OpenEdge Release 10 Monitor (R&D)19:10:44 Adjust Latch Options

1. Spins before timeout: 24000

2. Enable latch activity data collection

3. Enable latch timing data collection

4. Initial latch sleep time: 10 milliseconds

5. Maximum latch sleep time: 5000 milliseconds

6. Record Free Chain Search Depth Factor: 5Slide8

How do I access debghb?Start promonEnter: R&D (also works in lower case starting in V10)Enter: debghbYou have now entered the

debghb “zone”Two main differences in the world of debghbExtensions to some existing R&D screensEnables access to “This menu is not here Menu” Enter: “6” even though there is no visible option 6See next slideSlide9

This menu is not here Menu06/06/13 OpenEdge

Release 10 Monitor (R&D)15:23:01 This menu is not here Menu 1. Cache Entries 2. Hash Chain 3. Page Writer Queue 4. Lru Chains

5. Locked Buffers

6. Buffer Locks

7. Buffer Use Counts

8. Resource Queues

9. TXE Lock Activity

10. Adjust TXE Options

11. Latch Counts

12. Latch Times

13. I/O Wait Time by Type

14. I/O by File

15. Buffer Lock Queue

16. Semaphores

17. ShutdownSlide10

Operating HintsAllow at least 40-45 lines of screen dataAllow at least 120-140 columns of screen widthZero out the stats (“z”) to get a clean starting placeThis ‘zeroing’ does not wipe out the actual shared memory counters but only affects the current promon session

Update the stats periodically (“u”) to get snapshotsAll the above can be scriptedSlide11

Operating Hints - User# -1 Usecnt = # of concurrent processes accessing the blockWhen initially examining the BLQ there were 5 Clients accessing the same DBKEYBut before all 5 could be displayed:One Client dropped off, i.e. released the Buffer Lock,

before they could be displayedAnother one of the 5 is partially displayed; i.e. the -1 User#02/06/13 Status: Buffer Lock Queue00:37:21 User DBKEY Area T

Status

Type

Usect

-1

70658368 34 I

LOCKED

SHARE

5

746

70658368 34 I

LOCKED

SHARE

3

762

70658368 34 I

LOCKED

SHARE

3

826

70658368 34 I

LOCKED

SHARE

3Slide12

Useful Screens - CheckpointsExtensions to the ‘normal’ Checkpoint screenColumns of interestDuration: the amount of time required to complete a Checkpoint; the entire Database is transactionally frozen during this time

_CheckPoint._CheckPoint-Duration (V10.2B SP5)Sync Time: a subset of the ‘Duration’ column; the amount of time required to execute fdatasync() system call_CheckPoint._CheckPoint.Synctime (V10.2B SP5)See http://www.makelinux.net/alp/060 for an excellent description of fdatasync (don’t confuse it with fsync).

Sample data on the next slideSlide13

Useful Screens - CheckpointsThe ‘Duration’ of the Checkpoints (i.e. the total freeze time) is very high for most of the CPs displayedA ‘Duration’ of less than 1 second

is a good goalThe 10 sec ‘Duration’ is approximately 1/3 of the CP ‘Len’ In other words, a CP is occurring approximately every 30 seconds and for 10 seconds of that period, NO transaction activity can take place.Ckpt

------ Database Writes

----

No. Time

Len

Freq

Dirty CPT

Q

Scan APW

Q Flushes

Duration Sync Time

2499

00:56:04

246

265

14097 14586 3003

62 0

2.12

0.42

2498 00:52:16

212

228

33792 35191 2915

199 0

6.49

4.85

2497 00:51:40

33

36

34536 37114

229

85 0

7.15

5.96

2496 00:51:09

30

31

34385 36950

164

151 0

7.22

5.89

2495 00:50:34

34

35

37992 40285

408

341 0

8.33

7.15

2494 00:49:59

34

35

40933 43429

132

363 0

10.20

9.05

2493 00:49:29

29

30

41377 43427

690

281 0

5.39

3.84Slide14

Useful Screens – Resource QueuesNHM (Not Here Menu) - Option #8Do not confuse Resources with LatchesLink to Banville’s presentationIn general the busiest locks will be:

DB Buf S LockDB Buf X LockRecord LockWaits that can be problematic:DB Buf I Lock (I = Intent but these are for Index blocks)Sample on the next slideSlide15

Useful Screens – Resource Queues01/31/13 Activity: Resource Queues00:31:57 01/31/13 00:26 to 01/31/13 00:31 (5

min)Queue - Requests - ------- Waits ------- Total /Sec Total /Sec PctRecord

Lock 1007903 3360 8 0 0.00

Trans

Commit 1631 5 0 0 0.00

DB

Buf

I Lock 1006724 3356 139476 465 0.14

Record Get 724869 2416 0 0 0.00

DB

Buf

Read 305596 1019 0 0 0.00

DB

Buf

Write 62727 209 0 0 0.00

DB

Buf

S Lock 33370848 111236 159591 532 0.00

DB

Buf

X Lock 1092894 3643 157022 523 0.14

DB

Buf

S Lock LRU2 20934886 69783 0 0 0.00

DB

Buf

X Lock LRU2 11088 37 0 0 0.00

DB

Buf

Write LRU2 3367 11 0 0 0.00

BI

Buf

Read 4788 16 0 0 0.00

BI

Buf

Write 16075 54 1096 4 0.07

TXE

Share Lock 1148821 3829 0 0 0.00

TXE Update Lock 10347 34 282 1 0.03

TXE Commit Lock 63540 212 1927 6 0.03Slide16

Useful Screens – Latch CountsNHM (Not Here Menu) - Option #11The R&D Blocked Clients screen doesn’t show Latch contention so debghb is the only place in promon where detailed Latch activity is visible

Definition of Naps: When –spin is ‘used up’ by a Progress Client, the process Naps (i.e. does no useful work) for a while and tries againGeneral Principle: Napping is badSamples on the next few slidesSlide17

Latch Counts – OM LatchOM (Object Cache) Latch activity can be totally eliminated by setting the -omsize parameter equal to or greater than the number of _StorageObject

records.04/24/13 Activity: Latch Counts00:59:28 04/24/13 00:54 to 04/24/13 00:59 (5 min 1 sec) -----

Locks -----

---

Busy

---

Naps

--------

Spins

-------- --

Nap Max

-

Owner

Total

/Sec

/Sec

Pct

/

Sec

/Sec

/

Lock

/Busy

Total HWM

MTX --

1563935

5195

86

1.6

45

6461297

1243

74724

259 300

USR --

1178290

3914

3

0.0

0

0

0

0

0 0

OM

--

21523860 71507

7322

10.2

139

9144588

127

1248

3 80Slide18

Latch Counts – USR LatchThe small contention on the USR (DB Connection Table) Latch is because Statement Caching is enabled

04/25/13 Activity: Latch Counts00:33:17 04/25/13 00:28 to 04/25/13 00:33 (5 min 0 sec) ----- Locks ----- ---- Busy

--- Naps --------- Spins

----------

---

Nap Max

Owner

Total

/Sec

/

Sec

Pct

/

Sec

/Sec

/Lock

/Busy

Total HWM

MTX

--

2402181

8007

17

0.2

45

2059022

257

120481

133 300

USR

--

1517252

5057

8

0.1

0

0

0

0

0 0

OM 4343

27667792

92225

1962

2.1

32

5781905

62

2946

0 80

BIB --

2170630

7235

0

0.0

1

49241

6

165982

11 300

SCH --

447

1

0

0.0

0

0

0

0

0 0

LKP --

90680

302

0

0.0

0

6

0

923

0 10

GST --

195

0

0

0.0

0

200

307

0

1 10

TXT --

703633

2345

0

0.0

0

811

0

10149

0 10

SEQ

--

505781

1685

0

0.0

0

18283

10

66889

4 10

AIB --

1947241

6490

0

0.0

0

4957

0

43745

1 20

TXQ --

3304146

11013

5

0.0

1

116828

10

20788

0 20

EC --

0

0

0

0.0

0

0

0

0

0 0

LKF --

1834458

6114

1

0.0

0

16091

2

15673

7 10

BFP --

0

0

0

0.0

0

0

0

0

0 0

BHT --

63583335

211944

40

0.0

9

1511922

7

37429

13 300

PWQ --

491

1

0

0.2

0

23

14

7184

0 0

CPQ --

535397

1784

0

0.0

0

23253

13

47135

0 160

LRU --

812009

2706

0

0.0

2

129408

47

160424

0 80

LRU --

0

0

0

0.0

0

0

0

0

0 0

BUF --

39136969

130456

13

0.0

1

191806

1

13868

3 40Slide19

Latch Counts – LRU ChainThe total number of Locks for LRU is the second highest of all the Resources shown (BHT – Buffer Hash Table – is #1)The # of Naps per Second is the highest of all latches (Zero is the goal)

01/31/13 Activity: Latch Counts00:05:38 01/31/13 00:00 to 01/31/13 00:05 (5 min 0 sec) ----- Locks ----- ------ Busy ------

Naps Owner

Total /Sec

/

Sec

Pct

/Sec

MTX

--

1654034 5513

0 0.0

15

OM

--

8216844 27389

0 0.0

1

BHT

--

62371320 207904

0 0.0

3

CPQ

--

197126 657

0 0.0

0

LRU 830

40395944 134653

0 0.0

1402

LRU --

36 0

0 0.0

0

BUF --

32676880 108922

0 0.0

0

BUF --

39818994 132729

0 0.0

529

BUF --

31278094 104260

0 0.0

8

BUF --

33342130 111140

0 0.0

3Slide20

Latch Counts – LRU ChainThe # of locks on the second LRU (Alternate Buffer Cache) is nil because all the ABC Objects completely fit in the amount of –B2 memory allocated

01/31/13 Activity: Latch Counts00:05:38 01/31/13 00:00 to 01/31/13 00:05 (5 min 0 sec) ----- Locks ----- ------ Busy ------ Naps

Owner

Total /Sec

/

Sec

Pct

/Sec

BHT

--

62371320 207904

0 0.0

3

CPQ

--

197126 657

0 0.0

0

LRU 830

40395944 134653

0 0.0

1402

LRU --

36 0

0 0.0

0

BUF -- 32676880 108922 0 0.0 0

BUF -- 39818994 132729 0 0.0 529Slide21

Latch Counts – LRU Chain‘Owner’ column: if the User# doesn’t change (in value or frequency) that can be a problem indicator because Latches should be held for only a fraction of a second

01/31/13 Activity: Latch Counts00:05:38 01/31/13 00:00 to 01/31/13 00:05 (5 min 0 sec) ----- Locks ----- ------ Busy ------ Naps

Owner

Total /Sec

/

Sec

Pct

/Sec

BHT

--

62371320 207904

0 0.0

3

CPQ

--

197126 657

0 0.0

0

LRU 830

40395944 134653

0 0.0

1402

LRU --

36 0

0 0.0

0

BUF -- 32676880 108922 0 0.0 0

BUF -- 39818994 132729 0 0.0 529Slide22

Using Latch Counts to set -spinShort answer – Forget It!If it was that easy Progress would have done it alreadyPast attempts have not been successfulAlso the optimal value of –spin is not going to be the same for each Latch

General guidelines:Greater than 1000Less than 100000Current Default: 6000 * (# of CPU Cores)Default not advised if you have more than 16 CoresDan’s (Patent Pending) Formula: (DBA-Birthday-Year * )Gus’s formula: 5000Slide23

Useful Screens – Buffer Lock QueueNHM (Not Here Menu) - Option #15The ‘normal’ R&D Blocked Clients screen does not show the Area that the DBKEY belongs toThe Buffer Lock Queue (BLQ) Screen shows the Area as well as the Block TypeExamples on the next two slidesSlide24

R&D Blocked ClientsThe R&D Blocked Clients screen doesn’t show enough information to identify the Object involved in this contention storm for DBKEY 65987456There were 29 Clients all blocked on the same DBKEY

01/31/13 Status: Blocked Clients00:26:41 Usr

Name Type Wait

Wait

Info Trans id Login

time

730

_AUTO-B SELF/ABL BKSH

65987456 601383708 01/30/13 23:22

735

_AUTO-B SELF/ABL BKSH

65987456 601383773 01/30/13 23:23

743

_AUTO-B SELF/ABL BKSH

65987456 601383921 01/30/13 23:22

747 _AUTO-B SELF/ABL BKSH

65987456 601384104 01/30/13 23:22

749 _AUTO-B SELF/ABL BKSH

65987456 601383895 01/30/13 23:23

755 _AUTO-B SELF/ABL BKSH

65987456 601384175 01/30/13 23:23

769 _AUTO-B SELF/ABL BKSH

65987456 601384161 01/30/13

23:22Slide25

Buffer Lock QueueIF there is a matching DBKEY on the the BLQ screen, we can get the Area# and the Block Type (I = Index)There were 29 processes on the Blocked Clients screen with this DBKEY and only 4 on the BLQ screen with the same DBKEY

01/31/13 Status: Buffer Lock Queue00:26:41 User DBKEY Area

T

Status

Type

Usect

-

1

65987456

34 I

LOCKED

SHARE

4

722

65987456

34 I

LOCKED

SHARE

4

772

65987456

34 I

LOCKED

SHARE

4

856

65987456

34

I

LOCKED

SHARE

4

859

65987456

34 I

LOCKED

SHARE

4

<lines unrelated to DBKEY 65987456 snipped>Slide26

Thank You!Questions?

Gus: gus@progress.comDan: dan@prodb.com