/
Caches II CSE 351 Summer 2020 Caches II CSE 351 Summer 2020

Caches II CSE 351 Summer 2020 - PowerPoint Presentation

edolie
edolie . @edolie
Follow
27 views
Uploaded On 2024-02-03

Caches II CSE 351 Summer 2020 - PPT Presentation

Instructor Porter Jones Teaching Assistants Amy Xu Callum Walker Sam Wolfson Tim Mandzyuk Administrivia Questions doc httpstinyurlcomCSE351729 hw15 due Friday 731 1030am ID: 1044267

block cache memory address cache block address memory bits offset blocks data byte index storage size local bit sets

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Caches II CSE 351 Summer 2020" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Caches IICSE 351 Summer 2020Instructor: Porter JonesTeaching Assistants:Amy XuCallum WalkerSam WolfsonTim Mandzyuk

2. AdministriviaQuestions doc: https://tinyurl.com/CSE351-7-29hw15 due Friday (7/31) – 10:30amNo homework due Monday!Lab 3 due Friday (7/31) – 11:59pm You get to write some buffer overflow exploits!Unit Summary 2 Due next Wednesday (8/5) – 11:59pm 2

3. Memory HierarchiesSome fundamental and enduring properties of hardware and software systems:Faster storage technologies almost always cost more per byte and have lower capacityThe gaps between memory technology speeds are wideningTrue for: registers ↔ cache, cache ↔ DRAM, DRAM ↔ disk, etc.Well-written programs tend to exhibit good localityThese properties complement each other beautifullyThey suggest an approach for organizing memory and storage systems known as a memory hierarchyFor each level k, the faster, smaller device at level k serves as a cache for the larger, slower device at level k+13

4. An Example Memory Hierarchy4registerson-chip L1cache (SRAM)main memory(DRAM)local secondary storage(local disks)Larger, slower, cheaper per byteremote secondary storage(distributed file systems, web servers)Local disks hold files retrieved from disks on remote network serversMain memory holds disk blocks retrieved from local disksoff-chip L2cache (SRAM)L1 cache holds cache lines retrieved from L2 cacheCPU registers hold words retrieved from L1 cacheL2 cache holds cache lines retrieved from main memorySmaller,faster,costlierper byte

5. An Example Memory Hierarchy5registerson-chip L1cache (SRAM)main memory(DRAM)local secondary storage(local disks)Larger, slower, cheaper per byteremote secondary storage(distributed file systems, web servers)off-chip L2cache (SRAM)explicitly program-controlled (e.g. refer to exactly %rax, %rbx)Smaller,faster,costlierper byteprogram sees “memory”;hardware manages cachingtransparently

6. An Example Memory Hierarchy6registerson-chip L1cache (SRAM)main memory(DRAM)local secondary storage(local disks)Larger, slower, cheaper per byteremote secondary storage(distributed file systems, web servers)off-chip L2cache (SRAM)Smaller,faster,costlierper byte<1 ns1 ns5-10 ns100 ns150,000 ns10,000,000 ns(10 ms)1-150 msSSDDisk5-10 s1-2 min15-30 min31 days66 months = 5.5 years1 - 15 years

7. Intel Core i7 Cache Hierarchy7RegsL1 d-cacheL1 i-cacheL2 unified cacheCore 0RegsL1 d-cacheL1 i-cacheL2 unified cacheCore 3…L3 unified cache(shared by all cores)Main memoryProcessor packageBlock size: 64 bytes for all cachesL1 i-cache and d-cache:32 KiB, 8-way, Access: 4 cyclesL2 unified cache:256 KiB, 8-way, Access: 11 cyclesL3 unified cache:8 MiB, 16-way,Access: 30-40 cycles

8. Making memory accesses fast!Cache basicsPrinciple of localityMemory hierarchiesCache organizationDirect-mapped (sets; index + tag)Associativity (ways)Replacement policyHandling writesProgram optimizations that consider caches8

9. Cache Organization (1)Block Size (): unit of transfer between and MemGiven in bytes and always a power of 2 (e.g. 64 B)Blocks consist of adjacent bytes (differ in address by 1)Spatial locality! 9Note: The textbook uses “B” for block size

10. Cache Organization (1)Block Size (): unit of transfer between and MemGiven in bytes and always a power of 2 (e.g. 64 B)Blocks consist of adjacent bytes (differ in address by 1)Spatial locality!Offset field Low-order bits of address tell you which byte within a block(address) mod = lowest bits of address(address) modulo (# of bytes in a block) 10Block NumberBlock Offset-bit address:(refers to byte in memory)  bits  bits Note: The textbook uses “b” for offset bits

11. Polling Question [Cache II-a]If we have 6-bit addresses and block size = 4 B, which block and byte does 0x15 refer to?Vote at: http://pollev.com/pbjones Block Num Block Offset 1 1 1 5 5 1 5 5We’re lost… 11

12. Cache Organization (2)Cache Size (): amount of data the can storeCache can only hold so much data (subset of next level)Given in bytes () or number of blocks ()Example: = 32 KiB = 512 blocks if using 64-B blocksWhere should data go in the cache?We need a mapping from memory addresses to specific locations in the cache to make checking the cache for an address fastWhat is a data structure that provides fast lookup?Hash table! 12

13. Review: Hash Tables for Fast Lookup130123456789Insert:52734102119Apply hash function to map data to “buckets”

14. Place Data in Cache by Hashing AddressMap to cache index from block numberUse next bits in the address (after offset bits)C/K is the number of sets here(block number) mod (# blocks in cache) 14Block NumBlock Data0000000100100011010001010110011110001001101010111100110111101111MemoryCacheIndexBlock Data00011011Here = 4 Band = 4 

15. Place Data in Cache by Hashing AddressMap to cache index from block numberLets adjacent blocks fit in cache simultaneously!Consecutive blocks go in consecutive cache indices15Block NumBlock Data0000000100100011010001010110011110001001101010111100110111101111MemoryCacheIndexBlock Data00011011Here = 4 Band = 4 

16. Practice Question6-bit addresses, block size = 4 B, and our cache holds = 4 blocks.A request for address 0x2A results in a cache miss. Which set index does this block get loaded into and which 3 other addresses are loaded along with it? No voting for this question 16

17. Place Data in Cache by Hashing AddressCollision!This might confuse the cache later when we access the dataSolution?17Block NumBlock Data0000000100100011010001010110011110001001101010111100110111101111MemoryCacheIndexBlock Data00011011Here = 4 Band = 4 

18. Tags Differentiate Blocks in Same IndexTag = rest of address bits bits = Check this during a cache lookup 18Block NumBlock Data0000000100100011010001010110011110001001101010111100110111101111MemoryCacheIndexTagBlock Data00000110011101Here = 4 Band = 4 

19. Checking for a Requested AddressCPU sends address request for chunk of dataAddress and requested data are not the same thing!Analogy: your friend ≠ their phone numberTIO address breakdown:Index field tells you where to look in cacheTag field lets you check that data is the block you wantOffset field selects specified start byte within blockNote: and sizes will change based on hash function 19Tag () Offset () -bit address: Block NumberIndex () 

20. Checking for a Requested Address ExampleUsing 8-bit addresses. Cache Params: block size (K) = 4 B, cache size (C) = 32 B (which means number of sets is C/K = 8 sets).Offset bits (k) = = Index bits (s) = = Tag bits (t) = Rest of the bits in the address = What are the fields for address 0xBA?Tag bits (unique id for block):Index bits (cache set block maps to):Offset bits (byte offset within block): 20Tag () Offset () -bit address: Block NumberIndex () 

21. Cache Puzzle [Cache II–b]Based on the following behavior, which of the following block sizes is NOT possible for our cache?Cache starts empty, also known as a cold cacheAccess (addr: hit/miss) stream:(14: miss), (15: hit), (16: miss)4 bytes8 bytes16 bytes32 bytesWe’re lost…21Vote at http://pollev.com/pbjones

22. Direct-Mapped Cache ProblemWhat happens if we access the following addresses?8, 24, 8, 24, 8, …?Conflict in cache (misses!)Rest of cache goes unusedSolution?22Block NumBlock Data00 0000 0100 1000 1101 0001 0101 1001 1110 0010 0110 1010 1111 0011 0111 1011 11MemoryCacheIndexTagBlock Data00??01??1011??Here = 4 Band = 4 

23. AssociativityWhat if we could store data in any place in the cache?More complicated hardware = more power consumed, slowerSo we combine the two ideas:Each address maps to exactly one setEach set can store block in more than one way23012345670123 Set01 Set1-way:8 sets,1 block each2-way:4 sets,2 blocks each4-way:2 sets,4 blocks each0 Set8-way:1 set,8 blocksdirect-mappedfully associative

24. Cache Organization (3)Associativity (): # of ways for each setSuch a cache is called an “-way set associative cache”We now index into cache sets, of which there are Use lowest = bits of block addressDirect-mapped: = 1, so = as we saw previouslyFully associative: = , so = 0 bits 24Decreasing associativityFully associative(only one set)Direct mapped(only one way)Increasing associativitySelects the setUsed for tag comparisonSelects the byte from blockTag () Index () Offset () Note: The textbook uses “b” for offset bits

25. Example PlacementWhere would data from address 0x1833 be placed?Binary: 0b 0001 1000 0011 001125 = ?  block size:16 Bcapacity:8 blocksaddress:16 bitsSetTagData01234567Direct-mappedSetTagData0123SetTagData012-way set associative4-way set associativeTag () Offset () -bit address: Index ()  =   =   = ––  = ?   = ?  

26. Direct-Mapped CacheHash function: (block number) mod (# of blocks in cache)Each memory address maps to exactly one index in the cacheFast (and simpler) to find a block26Block NumBlock Data00 0000 0100 1000 1101 0001 0101 1001 1110 0010 0110 1010 1111 0011 0111 1011 11MemoryCacheIndexTagBlock Data0000011110011101Here = 4 Band = 4 

27. Direct-Mapped Cache ProblemWhat happens if we access the following addresses?8, 24, 8, 24, 8, …?Conflict in cache (misses!)Rest of cache goes unusedSolution?27Block NumBlock Data00 0000 0100 1000 1101 0001 0101 1001 1110 0010 0110 1010 1111 0011 0111 1011 11MemoryCacheIndexTagBlock Data00??01??1011??Here = 4 Band = 4 

28. Notes Diagramsregisterson-chip L1cache (SRAM)main memory(DRAM)local secondary storage(local disks)remote secondary storage(distributed file systems, web servers)off-chip L2cache (SRAM)Larger, slower, cheaper per byteSmaller,faster,costlierper byteBlock NumberBlock Offset-bit address:  bits  bits Selects the indexUsed for tag comparisonSelects the byte from blockTagIndexOffset bits  bits  bits -bit address:(refers to a byte in memory)