promon Presented by Gus Bjorklund amp Dan Foreman And how it may help in troubleshooting certain DB problems Gus Bjorklund Progress Wizard Dan Foreman Progress User since 1984 Author of several Progress related ID: 477738
Download Presentation The PPT/PDF document "The Deepest Depths of" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
The Deepest Depths of promon
Presented by: Gus Bjorklund & Dan Foreman
And how it may help in troubleshooting certain DB problemsSlide2
Gus BjorklundProgress WizardSlide3
Dan ForemanProgress User since 1984Author of several Progress related PublicationsNews Flash – A new publication which is a superset of this
presentation, titled promon debghb, will be available shortlyAuthor of several cool and useful Progress DBA ToolsProMonitor & ProCheck & LockMonPro Dump&LoadBalanced BenchmarkBasketball Fanatic…which sometimes leads to unexpected trips to the ERSlide4
Dan ForemanSlide5
Brief History of the debghb option in promonAdded to promon
V6.3 by GusPurpose: The shared memory architecture introduced in V6.3 was quite a bit different and a way to monitor shared memory activity at a detailed level was neededThe debghb option was not a formally endorsed enhancement but written by Gus in his spare timedeb = DEBugghb = Gus’s initials….the middle initial is a
Top Secret
fanatically guarded by
the Finnish governmentSlide6
WarningsThe debghb option is not documented
by ProgressDO NOT call or email Gus or PTS for help in using this optionMany of the screens and/or metrics have no value to a DBAThe view of the data is not transactionally consistent sometimes even on the same screen; example to followSome of the data is not accurate (data overflow, rounding errors,
etc)
Some of the screens are broken (don’t display any data)
The
debghb
option can be altered, removed, or hidden by Progress any time they want toSlide7
Warnings #2DB activities in shared memory can be slowed down if certain options are enabled
05/08/13 OpenEdge Release 10 Monitor (R&D)19:10:44 Adjust Latch Options
1. Spins before timeout: 24000
2. Enable latch activity data collection
3. Enable latch timing data collection
4. Initial latch sleep time: 10 milliseconds
5. Maximum latch sleep time: 5000 milliseconds
6. Record Free Chain Search Depth Factor: 5Slide8
How do I access debghb?Start promonEnter: R&D (also works in lower case starting in V10)Enter: debghbYou have now entered the
debghb “zone”Two main differences in the world of debghbExtensions to some existing R&D screensEnables access to “This menu is not here Menu” Enter: “6” even though there is no visible option 6See next slideSlide9
This menu is not here Menu06/06/13 OpenEdge
Release 10 Monitor (R&D)15:23:01 This menu is not here Menu 1. Cache Entries 2. Hash Chain 3. Page Writer Queue 4. Lru Chains
5. Locked Buffers
6. Buffer Locks
7. Buffer Use Counts
8. Resource Queues
9. TXE Lock Activity
10. Adjust TXE Options
11. Latch Counts
12. Latch Times
13. I/O Wait Time by Type
14. I/O by File
15. Buffer Lock Queue
16. Semaphores
17. ShutdownSlide10
Operating HintsAllow at least 40-45 lines of screen dataAllow at least 120-140 columns of screen widthZero out the stats (“z”) to get a clean starting placeThis ‘zeroing’ does not wipe out the actual shared memory counters but only affects the current promon session
Update the stats periodically (“u”) to get snapshotsAll the above can be scriptedSlide11
Operating Hints - User# -1 Usecnt = # of concurrent processes accessing the blockWhen initially examining the BLQ there were 5 Clients accessing the same DBKEYBut before all 5 could be displayed:One Client dropped off, i.e. released the Buffer Lock,
before they could be displayedAnother one of the 5 is partially displayed; i.e. the -1 User#02/06/13 Status: Buffer Lock Queue00:37:21 User DBKEY Area T
Status
Type
Usect
-1
70658368 34 I
LOCKED
SHARE
5
746
70658368 34 I
LOCKED
SHARE
3
762
70658368 34 I
LOCKED
SHARE
3
826
70658368 34 I
LOCKED
SHARE
3Slide12
Useful Screens - CheckpointsExtensions to the ‘normal’ Checkpoint screenColumns of interestDuration: the amount of time required to complete a Checkpoint; the entire Database is transactionally frozen during this time
_CheckPoint._CheckPoint-Duration (V10.2B SP5)Sync Time: a subset of the ‘Duration’ column; the amount of time required to execute fdatasync() system call_CheckPoint._CheckPoint.Synctime (V10.2B SP5)See http://www.makelinux.net/alp/060 for an excellent description of fdatasync (don’t confuse it with fsync).
Sample data on the next slideSlide13
Useful Screens - CheckpointsThe ‘Duration’ of the Checkpoints (i.e. the total freeze time) is very high for most of the CPs displayedA ‘Duration’ of less than 1 second
is a good goalThe 10 sec ‘Duration’ is approximately 1/3 of the CP ‘Len’ In other words, a CP is occurring approximately every 30 seconds and for 10 seconds of that period, NO transaction activity can take place.Ckpt
------ Database Writes
----
No. Time
Len
Freq
Dirty CPT
Q
Scan APW
Q Flushes
Duration Sync Time
2499
00:56:04
246
265
14097 14586 3003
62 0
2.12
0.42
2498 00:52:16
212
228
33792 35191 2915
199 0
6.49
4.85
2497 00:51:40
33
36
34536 37114
229
85 0
7.15
5.96
2496 00:51:09
30
31
34385 36950
164
151 0
7.22
5.89
2495 00:50:34
34
35
37992 40285
408
341 0
8.33
7.15
2494 00:49:59
34
35
40933 43429
132
363 0
10.20
9.05
2493 00:49:29
29
30
41377 43427
690
281 0
5.39
3.84Slide14
Useful Screens – Resource QueuesNHM (Not Here Menu) - Option #8Do not confuse Resources with LatchesLink to Banville’s presentationIn general the busiest locks will be:
DB Buf S LockDB Buf X LockRecord LockWaits that can be problematic:DB Buf I Lock (I = Intent but these are for Index blocks)Sample on the next slideSlide15
Useful Screens – Resource Queues01/31/13 Activity: Resource Queues00:31:57 01/31/13 00:26 to 01/31/13 00:31 (5
min)Queue - Requests - ------- Waits ------- Total /Sec Total /Sec PctRecord
Lock 1007903 3360 8 0 0.00
Trans
Commit 1631 5 0 0 0.00
DB
Buf
I Lock 1006724 3356 139476 465 0.14
Record Get 724869 2416 0 0 0.00
DB
Buf
Read 305596 1019 0 0 0.00
DB
Buf
Write 62727 209 0 0 0.00
DB
Buf
S Lock 33370848 111236 159591 532 0.00
DB
Buf
X Lock 1092894 3643 157022 523 0.14
DB
Buf
S Lock LRU2 20934886 69783 0 0 0.00
DB
Buf
X Lock LRU2 11088 37 0 0 0.00
DB
Buf
Write LRU2 3367 11 0 0 0.00
BI
Buf
Read 4788 16 0 0 0.00
BI
Buf
Write 16075 54 1096 4 0.07
TXE
Share Lock 1148821 3829 0 0 0.00
TXE Update Lock 10347 34 282 1 0.03
TXE Commit Lock 63540 212 1927 6 0.03Slide16
Useful Screens – Latch CountsNHM (Not Here Menu) - Option #11The R&D Blocked Clients screen doesn’t show Latch contention so debghb is the only place in promon where detailed Latch activity is visible
Definition of Naps: When –spin is ‘used up’ by a Progress Client, the process Naps (i.e. does no useful work) for a while and tries againGeneral Principle: Napping is badSamples on the next few slidesSlide17
Latch Counts – OM LatchOM (Object Cache) Latch activity can be totally eliminated by setting the -omsize parameter equal to or greater than the number of _StorageObject
records.04/24/13 Activity: Latch Counts00:59:28 04/24/13 00:54 to 04/24/13 00:59 (5 min 1 sec) -----
Locks -----
---
Busy
---
Naps
--------
Spins
-------- --
Nap Max
-
Owner
Total
/Sec
/Sec
Pct
/
Sec
/Sec
/
Lock
/Busy
Total HWM
MTX --
1563935
5195
86
1.6
45
6461297
1243
74724
259 300
USR --
1178290
3914
3
0.0
0
0
0
0
0 0
OM
--
21523860 71507
7322
10.2
139
9144588
127
1248
3 80Slide18
Latch Counts – USR LatchThe small contention on the USR (DB Connection Table) Latch is because Statement Caching is enabled
04/25/13 Activity: Latch Counts00:33:17 04/25/13 00:28 to 04/25/13 00:33 (5 min 0 sec) ----- Locks ----- ---- Busy
--- Naps --------- Spins
----------
---
Nap Max
Owner
Total
/Sec
/
Sec
Pct
/
Sec
/Sec
/Lock
/Busy
Total HWM
MTX
--
2402181
8007
17
0.2
45
2059022
257
120481
133 300
USR
--
1517252
5057
8
0.1
0
0
0
0
0 0
OM 4343
27667792
92225
1962
2.1
32
5781905
62
2946
0 80
BIB --
2170630
7235
0
0.0
1
49241
6
165982
11 300
SCH --
447
1
0
0.0
0
0
0
0
0 0
LKP --
90680
302
0
0.0
0
6
0
923
0 10
GST --
195
0
0
0.0
0
200
307
0
1 10
TXT --
703633
2345
0
0.0
0
811
0
10149
0 10
SEQ
--
505781
1685
0
0.0
0
18283
10
66889
4 10
AIB --
1947241
6490
0
0.0
0
4957
0
43745
1 20
TXQ --
3304146
11013
5
0.0
1
116828
10
20788
0 20
EC --
0
0
0
0.0
0
0
0
0
0 0
LKF --
1834458
6114
1
0.0
0
16091
2
15673
7 10
BFP --
0
0
0
0.0
0
0
0
0
0 0
BHT --
63583335
211944
40
0.0
9
1511922
7
37429
13 300
PWQ --
491
1
0
0.2
0
23
14
7184
0 0
CPQ --
535397
1784
0
0.0
0
23253
13
47135
0 160
LRU --
812009
2706
0
0.0
2
129408
47
160424
0 80
LRU --
0
0
0
0.0
0
0
0
0
0 0
BUF --
39136969
130456
13
0.0
1
191806
1
13868
3 40Slide19
Latch Counts – LRU ChainThe total number of Locks for LRU is the second highest of all the Resources shown (BHT – Buffer Hash Table – is #1)The # of Naps per Second is the highest of all latches (Zero is the goal)
01/31/13 Activity: Latch Counts00:05:38 01/31/13 00:00 to 01/31/13 00:05 (5 min 0 sec) ----- Locks ----- ------ Busy ------
Naps Owner
Total /Sec
/
Sec
Pct
/Sec
MTX
--
1654034 5513
0 0.0
15
OM
--
8216844 27389
0 0.0
1
BHT
--
62371320 207904
0 0.0
3
CPQ
--
197126 657
0 0.0
0
LRU 830
40395944 134653
0 0.0
1402
LRU --
36 0
0 0.0
0
BUF --
32676880 108922
0 0.0
0
BUF --
39818994 132729
0 0.0
529
BUF --
31278094 104260
0 0.0
8
BUF --
33342130 111140
0 0.0
3Slide20
Latch Counts – LRU ChainThe # of locks on the second LRU (Alternate Buffer Cache) is nil because all the ABC Objects completely fit in the amount of –B2 memory allocated
01/31/13 Activity: Latch Counts00:05:38 01/31/13 00:00 to 01/31/13 00:05 (5 min 0 sec) ----- Locks ----- ------ Busy ------ Naps
Owner
Total /Sec
/
Sec
Pct
/Sec
BHT
--
62371320 207904
0 0.0
3
CPQ
--
197126 657
0 0.0
0
LRU 830
40395944 134653
0 0.0
1402
LRU --
36 0
0 0.0
0
BUF -- 32676880 108922 0 0.0 0
BUF -- 39818994 132729 0 0.0 529Slide21
Latch Counts – LRU Chain‘Owner’ column: if the User# doesn’t change (in value or frequency) that can be a problem indicator because Latches should be held for only a fraction of a second
01/31/13 Activity: Latch Counts00:05:38 01/31/13 00:00 to 01/31/13 00:05 (5 min 0 sec) ----- Locks ----- ------ Busy ------ Naps
Owner
Total /Sec
/
Sec
Pct
/Sec
BHT
--
62371320 207904
0 0.0
3
CPQ
--
197126 657
0 0.0
0
LRU 830
40395944 134653
0 0.0
1402
LRU --
36 0
0 0.0
0
BUF -- 32676880 108922 0 0.0 0
BUF -- 39818994 132729 0 0.0 529Slide22
Using Latch Counts to set -spinShort answer – Forget It!If it was that easy Progress would have done it alreadyPast attempts have not been successfulAlso the optimal value of –spin is not going to be the same for each Latch
General guidelines:Greater than 1000Less than 100000Current Default: 6000 * (# of CPU Cores)Default not advised if you have more than 16 CoresDan’s (Patent Pending) Formula: (DBA-Birthday-Year * )Gus’s formula: 5000Slide23
Useful Screens – Buffer Lock QueueNHM (Not Here Menu) - Option #15The ‘normal’ R&D Blocked Clients screen does not show the Area that the DBKEY belongs toThe Buffer Lock Queue (BLQ) Screen shows the Area as well as the Block TypeExamples on the next two slidesSlide24
R&D Blocked ClientsThe R&D Blocked Clients screen doesn’t show enough information to identify the Object involved in this contention storm for DBKEY 65987456There were 29 Clients all blocked on the same DBKEY
01/31/13 Status: Blocked Clients00:26:41 Usr
Name Type Wait
Wait
Info Trans id Login
time
730
_AUTO-B SELF/ABL BKSH
65987456 601383708 01/30/13 23:22
735
_AUTO-B SELF/ABL BKSH
65987456 601383773 01/30/13 23:23
743
_AUTO-B SELF/ABL BKSH
65987456 601383921 01/30/13 23:22
747 _AUTO-B SELF/ABL BKSH
65987456 601384104 01/30/13 23:22
749 _AUTO-B SELF/ABL BKSH
65987456 601383895 01/30/13 23:23
755 _AUTO-B SELF/ABL BKSH
65987456 601384175 01/30/13 23:23
769 _AUTO-B SELF/ABL BKSH
65987456 601384161 01/30/13
23:22Slide25
Buffer Lock QueueIF there is a matching DBKEY on the the BLQ screen, we can get the Area# and the Block Type (I = Index)There were 29 processes on the Blocked Clients screen with this DBKEY and only 4 on the BLQ screen with the same DBKEY
01/31/13 Status: Buffer Lock Queue00:26:41 User DBKEY Area
T
Status
Type
Usect
-
1
65987456
34 I
LOCKED
SHARE
4
722
65987456
34 I
LOCKED
SHARE
4
772
65987456
34 I
LOCKED
SHARE
4
856
65987456
34
I
LOCKED
SHARE
4
859
65987456
34 I
LOCKED
SHARE
4
<lines unrelated to DBKEY 65987456 snipped>Slide26
Thank You!Questions?
Gus: gus@progress.comDan: dan@prodb.com