How To Calculate Page Size Virtual Memory
The memory management strategies for multiprogramming systems described in the previous lesson were based on two assumptions:
These seemed similar obvious assumptions when multiprogramming was first introduced. However, every bit programs became larger and every bit systems started to load larger numbers of processes at the aforementioned time, these archaic memory management systems began to suspension down.
In the 1970s, most computer systems began to switch to a memory management organization called virtual retentiveness in which both of these assumptions were violated. Virtual retentiveness is based on the ascertainment that at whatsoever given time, a process is only using a small fraction of its lawmaking and information. A process which may consist of tens of thousands of lines of code might only be running in a few functions. In fact, a substantial fraction of almost real earth lawmaking consists of error handling routines which are seldom used.
Recall the principles of spatial and temporal locality discussed last week. These ii principles also apply to virtual retentiveness.
The idea behind virtual memory is that physical retentiveness is divided into fixed size pages. Pages are typically 512 to 8192 bytes, with 4096 being a typical value. Folio size is virtually always a power of ii, for reasons to exist explained below. Loadable modules are as well divided into a number of page frames. Page frames are always the same size equally the pages in memory.
Pages frames are loaded into retentiveness simply when they are needed. Adjacent page frames are not necessarily loaded into adjacent pages in retentiveness. At a point in time during the execution of a program, only a small fraction of the total pages of a process may exist loaded.
Notation that there are 2 kinds of addresses (or 2 address spaces), virtual addresses and real addresses. Virtual addresses, also called logical addresses, are plan generated. In a system without virtual retentivity, a physical address and a logical accost are the aforementioned; the virtual address is placed on the address bus directly because it refers to a physical address in chief memory. On systems with virtual memory, there is a mapping betwixt virtual addresses and real addresses. In the above effigy, supposed that code for the side by side education to be executed is in page frame 3. The running program only knows the virtual address. Rather than putting the virtual accost direct on the accost bus, the system sends this accost to the Memory Direction Unit or MMU. The MMU provides this mapping betwixt virtual addresses and physical addresses. In the case above, this mapping would catechumen a virtual accost in page frame 3 to a real address in folio x. The data structure which contains this mapping is called a page tabular array.
Information technology is possible that a process might access a page which is not in retentiveness. For instance, suppose that the procedure tries to access data in page frame eight. The MMU volition determine that this page is non loaded, and this will generate a page fault. A page fault is an interrupt. The handler for a page fault locates the needed folio frame on the disk, copies information technology to a folio in retention, and updates the page table, telling it where the page frame is loaded. Control then returns to the process.
When a program kickoff starts running, it will generate a number of folio faults. Suppose that the first executable instruction is
MOV A704,AX; copy value at address hex A704 to annals AX
This one didactics will generate 2 page faults. The didactics itself will be at an accost in the lawmaking segment and so the page frame which has this instruction will take to exist loaded into memory. And so the page in the data segment or stack which contains virtual accost A704 will accept to be loaded into some other frame. However, if the plan conforms to the principles of temporal and spatial locality, subsequent instructions are likely to exist in the same page frame as this instruction and subsequent accesses to information are likely to be in the same page frame of data.
Construction of the page table
A row of the page tabular array typically has the following fields.
- Page frame number (virtual addresses)
- Folio number (physical or existent address)
- Nowadays/absent-minded flake (is this page frame currently loaded?)
- Protection bit (read/write vs. read just)
- Dirty bit (has the data been modified?)
- Referenced bit (has this page been recently referenced?)
- Caching disabled bit (Can this data be buried? Computers which support retention mapped devices should non cache certain information).
The MMU
When virtual memory was first proposed, many people idea that it would never work considering information technology would be as well difficult to implement paging rapidly enough. The page tabular array provides a mapping betwixt the logical (virtual) address and the real accost. This mapping has to be very fast, and the folio tables can be very big.
Obviously this mapping has to be implemented in hardware. The section of the CPU that provides this service is called the Memory Management Unit (MMU). Notation that every memory reference has to go through the MMU.
Let'due south start with a toy implementation to demonstrate some of the issues that the MMU has to deal with. Suppose a system has a 16 chip virtual address space (the program addresses range from 0 .. 64K), but it only has 32K of memory, divided into eight pages of size 4K. A 16 scrap virtual address can be divided into a 4-bit page reference and a 12-chip outset. Thus, each page and page frame is four,096 bytes ( 212 The folio tabular array consists of 16 entries, i for each possible value of the 4-bit page field. The offset is just the address within that page. For case, the virtual address 0X5068
(0101 0000 0110 0100 in binary)
is divided into two parts. The first iv bits refer to the page (0x5), and the final 12 bits refer to the offset inside that page (0x068) which is 100 in decimal.
The folio table has an entry for each logical page, xvi in all, and the iv bit page field is an index into the page tabular array.
If the entire page table is loaded into registers, this is niggling to implement in hardware, and performance would be very fast, but it has some serious issues. The obvious one is that near modern systems allow processes to have very large address spaces. An address space of 32 bits per procedure is typical now. If a folio size is 12 bits (4096 bytes, also a typical figure), and so the folio field has to exist xx $.25, so the page table has to have ii20 (i,048,57610) entries. This is clearly far too big to load into registers.
Also, go on in mind that each process would have its own page table, and and so every fourth dimension a context switch occurred, the page table for the new process would take to be loaded into the registers, and this would make context switching very wearisome, far too slow for a typical system which may make a hundred or more context switches per second.
The alternative is to keep the page table in main memory, but this is inefficient because each retentivity reference now involves two memory references, one to get the appropriate page tabular array entry and one to fetch the actual data. That is non a viable solution.
One solution is to accept a multi-level page table. In the higher up example, a logical address was divided into two fields, a folio field and an showtime field. With a multi-level folio table the logical address is divided into 3 fields, one for the primary page tabular array, i for the secondary folio tabular array, and one for the outset of the address within the page.
Here is an example. A system with a 32 flake logical address field divides such an accost into a x fleck field for the starting time page tabular array, a ten bit field for the second level page table and a 12 fleck field for the beginning, creating a page size of ii12 or 4,096 bytes.
There are potentially ii20 entries in the page table, far likewise many to continue in memory at i fourth dimension. However, keep in mind that although each process has a potential address space of 232 , most processes only use a tiny fraction of this. In particular, when a process is loaded, in that location is a large clamper of unused memory in the middle of the accost infinite for the stack and the heap to grow into.
Think that in the page table design described above, each entry in the page table contained a arrow to real retentivity. In a two level folio table, each entry in the get-go level folio table contains a pointer to a second level page tabular array, and the 2nd level folio table provides the address of the page in real memory. In that location are 2x or 1,024 entries in the get-go level page table, each pointing to a second level folio table. The first level page tabular array is e'er in memory, only but a handful of second level page tables are usually needed.
If the 32 bit virtual address is
00000101 11100110 11100100 01100100 or 0x05EAE864
to make up one's mind the physical address, this logical address is sent to the MMU, which extracts the first ten bits 0000010111 to use every bit an index to the first level folio tabular array. The first level page table provides a pointer to one of 1,024 2nd level page tables (in fact most of these will not exist). This table is hopefully in memory. The MMU will extract the 2nd x bits 1001101110 and use this equally an index to the second level folio table, which will provide a pointer to the actual folio in real memory. The last 12 bits 010001100100 will provide the offset in the page for the actual byte which is requested.
This model is still too ho-hum to be effective. Even if the both of the required folio tables are in memory, this at present means that ane logical memory access would require three memory accesses because the MMU would have to read the get-go page table and the second page tabular array to exist able to generate the physical address.
The solution is Translation Lookaside Buffers or TLBs. These are based on the principles of temporal and spatial locality which most programs conform to. Most processes, no matter how big they are, are usually only executing steps in a pocket-sized office of the text segment and accessing only a small-scale fraction of the information during any given time short time menstruation. This means that they are accessing only a small number of pages. The thought backside a TLB is that these pages are cached.
The TLB is inside the MMU, and contains pointers to a small number of frequently used pages. Whenever the running program needs to access memory, it start checks to see if the arrow to the folio containing that address is stored in the TLB. Most of the time information technology is, and so the MMU can immediately put the real address on the address autobus without having to read any other retention.
Suppose the TLB has slots for 32 pages. Even with this pocket-sized number, it cannot do a linear search to determine if the page is in the TLB. It needs to search in parallel, and and then this has to be implemented in hardware.
Entries in the TLB would look very like to entries in the actual page table; each record would contain the virtual page number, the physical page number, a valid flag, a dirty chip, and protection information.
The relationship between the TLB, the MMU, and the cache is fairly complicated. Here is a typical arrangement.
The CPU generates a virtual address. First, the TLB is searched to run across if the address of the physical accost is cached. If in that location is a hit, we know the physical address. The adjacent footstep is to search the cache to see if the value is in the enshroud. If there is a striking, this value is returned to the CPU. In most cases, both of these queries are hits and then the value can be returned quickly without accessing chief memory.
If the TLB does not have a arrow to the page that contains the virtual accost, the adjacent step is to search the actual folio tabular array. If there is a hit, then nosotros tin generate the physical accost and so we search the cache to determine if this value is in the cache. We besides volition add this page to the TLB.
If the folio is not in the page tabular array, then a folio fault is generated. In this instance, the page volition exist read in from the deejay, the page table will exist updated, and the TLB will exist updated.
Inverted Page Tables
Most page tables are relatively sparse; they contain space for perhaps a one thousand thousand pages, simply just a tiny fraction of these pages are really used by the typical process. Thus, a traditional page table is somewhat wasteful. Also, as 64 flake computers become more than common, folio tables will be inconceivably huge. The solution is inverted page tables
In a regular folio table, there is an entry for each virtual page, in an inverted page table there is an entry for each physical folio, so the page table is much smaller. This would not seem to make much sense, considering to access a item virtual address, information technology would exist necessary to search each folio in the inverted folio tabular array to see if it is in that location. Of course the system first checks the TLB, and if there is a hit, there is no demand to access the page table at all.
An inverted page table uses associative or content addressable retention. Each cell in an associative memory contains a cardinal field and a information field. To search an inverted page table, the system uses a hash role, in which the physical address is passed through a function to generate an index to the advisable row in the inverted page tabular array. You all remember from your information structures course that a hash function allows very rapid search of large data sets.
Page Replacement Algorithms
When a page mistake occurs, the system has to re-create a new folio frame from disk into a page of memory. This may involve replacing a folio which is already loaded. How does the system decide which page to replace? Note that if the page which is being replaced has been modified (the dirty bit is on), it has to exist copied to deejay. It is important to have an efficient page replacement algorithm, considering if the page which is replaced is accessed once again soon, the result will be some other page fault.
Page faults are time consuming. If the arrangement is generating too many page faults, information technology is doing little existent piece of work. This is a situation called thrashing.
Let's presume that we have perfect knowledge of the page reference stream into the futurity. The forward altitude of a particular page at a particular time is the next place in the stream where that page is referenced again. The forwards altitude would be i if the very side by side reference was to that page, and infinity if that folio will never be referenced again. The optimal page replacement algorithm would exist to replace that page which has the highest frontward distance, in other words, supercede that folio which volition not be used for the longest time.
Unfortunately, the arrangement has no style of knowing this. However, it is possible to brand a reasonable guess about which pages may not exist referenced again soon.
1 uncomplicated and obvious strategy is the not-recently-used algorithm (NRU). This algorithm says to supervene upon a page in retention which has not been accessed recently. Because of the principle of temporal locality, we can presume that a page which has not been referenced recently is unlikely to be referenced soon.
The ideal implementation of the NRU algorithm would be to supersede that page which has been least recently used (LRU); i.east. has not been accessed for the longest time. However, it is difficult to implement a pure LRU algorithm because this would crave keeping track of how long agone each folio was referenced, and the overhead associated with this is far too high to be practical. An approximation of LRU will be discussed below.
It is uncomplicated to implement a Not Recently Used algorithm. A row in a page table commonly has a flag called the referenced bit. Most systems take a clock which periodically generates an interrupt about once every 20 msec. On each clock interrupt (or similar schedule), the referenced $.25 of all of the pages in the page table are gear up to zippo. As each page is referenced, its referenced bit is turned on. When a page mistake occurs, if an active page needs to be replaced, the system looks for a page where the referenced flake is off, and replaces that page.
This is far from perfect. If a page fault occurs shortly after a clock interrupt, essentially all of the pages will exist marked unreferenced, and so the organisation will have little data nigh which page to overwrite.
This system is further complicated by the fact that if the dirty flake is on, the folio has to be copied to disk. In this case, a page fault results in both a read-from-deejay operation and a write-to-disk functioning. Thus, a variant of this NRU strategy is to wait first for a folio where both the referenced scrap and the dingy bit are off, and it only replaces a page with the dirty flake on if there are no pages with both bits off. Of course this requires additional overhead because the unabridged page table has to exist searched.
Another low overhead strategy is the First In Start Out (FIFO) page replacement algorithm. In this example, the system keeps rail of the load time for each page, and information technology replaces that folio with the longest time in retention. This can be easily implemented by putting each new folio at the tail of a list in retentiveness, and removing pages from the head.
The FIFO algorithm should be included in this list for abyss, merely it is not used in practice. There is no particular reason to assume that the page which was loaded longest agone is less probable to exist accessed soon than a folio which was simply loaded.
We can meliorate on the NRU algorithm. The text describes a second chance algorithm and a clock algorithm, merely the latter is simply an implementation of the former. The clock algorithm keeps all of the electric current pages in a circular list. In that location is a pointer which moves effectually this list like the hand of a clock. When a folio fault occurs, the arrangement examines the page that the hand is currently pointing to. If the referenced chip is off, this page is replaced in retentiveness (copying the contents dorsum to disk if the dirty flake is on). If the referenced chip is on for that page, it is set to off and the hand continues around the list. This continues until it finds a page where the referenced bit is off. The algorithm is guaranteed to find such a page, because the list is circular, and the mitt will eventually get back to the page where information technology started, and this folio will have the referenced bit off.
It goes without proverb that the referenced bit is turned on whenever a page is referenced.
This algorithm has an reward over the NRU algorithm, considering when it finds a page frame with the referenced flake off, it can be certain that this page has not been referenced in one full sweep across all of the pages, whereas with the NRU algorithm, if the system finds a page with the reference bit off, there is no guarantee regarding how long ago information technology was referenced, merely that it has not been referenced since the last clock interrupt.
The clock algorithm or variants of it are used in many existent operating systems.
It was stated above that the Least Recently Used (LRU) algorithm would be very efficient, just that it would be difficult to implement in do. The text talks about ii ways to implement LRU in hardware.
I method is for the system to maintain a counter which is incremented each time any instruction is accessed. The row of data in the page frame needs to maintain room for this counter. Whenever a particular page is referenced, the counter is incremented and the value is copied to the page frame.
When it is time to supervene upon a folio in retentivity, the system checks this value in each page and replaces that page where the value is the lowest.
Hither is an example. Suppose that the value of the counter at the outset of the scenario is m (in practice, the counter should exist very large, the text suggests 64 bits). There are eight pages in memory, numbered one to viii. Here is a string of page references.
6, seven, 3, viii, 8, ane, 4, two, v, 6, 1
After executing this string, here is the value of the counter for each of the eight page frames.
If a folio fault occurred at this point in time, and one of these eight pages had to exist replaced, the system would expect for the lowest value of the counter, meet that it was page 7, and it would supplant that page.
This algorithm tin can simply exist implemented if there is hardware support for it in the MMU, and few if any hardware developers have implemented such an algorithm.
A close approximation of the LRU algorithm can be implemented entirely in software. The text refers to this as Not Frequently Used (NFU) with aging. Information technology works like this. Each page has a counter associated with it. Recall that operating systems have a clock interrupt which occurs roughly every 20 msec. At each clock interrupt, each counter is right shifted by one chip (this is equivalent to dividing by ii. The least significant chip is lost). If the folio has been referenced since the last interrupt, the near significant bit of the counter is set to ane, otherwise it is set to zero. Then all of the referenced flags are set to zero.
Hither is an example. In that location are six pages in retentivity, and the counter is 8 bits. (Eight bits is usually enough). During each time slice (the catamenia of time between clock interrupts), the following pages are referenced. For example, during the beginning fourth dimension slice, pages two and 4 were referenced, the others were loaded just non referenced.
1 | 2,iv |
2 | ane,ii,3,4 |
3 | two,iii,four,5,half-dozen |
4 | 1,2 |
5 | 1,2,v,6 |
half-dozen | 2 |
7 | 1,two,half-dozen |
8 | 1,2,5 |
After this set of eight fourth dimension slices, the counters look like this
1 | 11011010 (218) |
2 | 11111111 (255) |
3 | 00000110 (half-dozen) |
iv | 00000111 (vii) |
5 | 10010100 (148) |
6 | 01010100 (84) |
When a page error occurs, the page with the lowest value of the counter is replaced. If a page fault occurred later on the to a higher place cord, it would supervene upon page 3, because its counter had the lowest value.
Notation that whatever page which was referenced in the concluding clock bicycle will accept a higher value than a page non so referenced, because its well-nigh significant bit will exist one, and the most significant scrap of whatever page not referenced in the final cycle will be zero. Likewise, any page referenced in the by two cycles will take a college value than a folio which has non been referenced in the past two cycles, then on.
Working Sets All of the above models of paging use need paging. In demand paging, a page is simply loaded when it is needed, and a page fault ever loads exactly one page. When a process is get-go created, none of its pages are in memory, and so in that location volition always exist a number of page faults when a process starts. Later on a while, enough pages will be loaded into memory so that the process runs for long periods with few page faults. The set up of pages which demand to be in memory in society for the plan to run for an extended period with few folio faults is chosen the working set.
Operating systems which are running at or near capacity will swap out processes, that is, remove all of their pages from retention and remove the job from the running queue. At some point a swapped out job volition be swapped back in, that is, returned to the running queue. If the organisation is using demand paging, such a job will again generate a number of folio faults before having enough pages in memory to run. Thus, it would make sense to go on runway of the pages of a process which were in retentiveness when the job was swapped out, and when the job was swapped dorsum in, restore all of these pages to retentiveness before starting to run the task. Loading pages earlier they generate a folio fault is called prepaging, and this model is chosen the working set model considering it restores the entire working prepare of a process before running it.
This is more complicated than information technology seems. The operating system needs to keep track of which pages have been recently accessed. It is possible, indeed likely, that a process may have pages loaded which it has non accessed for a while, just which have non been replaced. These pages should not exist considered to be a function of the working set.
We can ascertain the working set westward(1000,t) at time t every bit the set of pages that have been referenced by a particular process within the past thou memory references. Upwardly to a indicate, as k increases, the size of the working fix increases, just this function levels off quickly.
In order to implement this, the system would demand to update a register at each retention reference, and this is far likewise costly to be done fifty-fifty in hardware. Still, it is possible to have a reasonable simulation. A field for time of concluding use is associated with each folio (although this cannot exist kept entirely current). As with other page replacement algorithms, each folio besides has a referenced scrap. At each clock tick (about once every 20 msec), the system clears this bit (sets it to zero), and each page reference sets it to ane.
All operating systems proceed track of the full running time of a procedure. When a page fault occurs, the system checks all of the pages associated with the current process. If the referenced chip is on, significant that the page has been referenced since the last clock tick, the time of last use field is updated to the current value of the total run time of the process. If the referenced bit is zero, the value in time of last use field is compared to the current total time value, and if the difference is greater than some value, tau, this folio is removed.
Hither is an example. At an instant in time, the currently running process has half-dozen pages in memory. The total running time of the process is now 1000 msec (this value is always an approximation, since it is merely updated at clock interrupts). The value of tau, set past the organisation, is 50 msec, significant that pages which have not been referenced in the by 50 msec of run fourth dimension for that procedure are candidates to be replaced because they are no longer in the working set. Here are the values of the six pages when a page fault occurs. Clearly the page tabular array would have other data as well, just I only show the data relevant to this algorithm.
Page number | Time of terminal use | Referenced flag |
---|---|---|
1 | 980 | 1 |
2 | 980 | 0 |
iii | 960 | 0 |
4 | 980 | i |
5 | 930 | 1 |
6 | 940 | 0 |
The arrangement needs to find a folio to replace. The first folio has been referenced since the last clock interrupt, so it is conspicuously not a candidate for replacement. Its Fourth dimension of last use field is updated to m (the current running fourth dimension value). The second page has not been referenced in this clock wheel, but the curren fourth dimension (thousand) minus its time of last use (980) is less than tau (20 is less than 50), then it is not a candidate for removal. The same is true of the third page. Pages 4 and 5 take been recently referenced, so their time of concluding use field is updated to 1000. Folio 6 has non been referenced since the concluding clock tick (its referenced bit is zilch) and the current time minus its fourth dimension of last utilize is 60 msec, which is greater than tau, and so it is no longer in the working gear up and it is a candidate for replacement.
This algorithm works well, but there is a lot of overhead associated with it because at each folio fault, every page associated with that process has to exist examined and possibly updated. This algorithm tin can exist combined with the clock algorithm to get an efficient folio replacement strategy which is widely used in practice.
In this algorithm, the pages associated with a process are kept in a circular list. Each page has a referenced chip, a modified scrap (dirty bit), and a time of concluding use field besides as a pointer to the physical page address. At each page fault, the list is searched to detect a suitable candidate for replacement. Searching starts where it concluded at the last page error (Movie the paw of a clock). The algorithm differs from the above in that non all pages are searched, searching stops when a suitable page for replacement is found. Also, if the page is not in the working set but the modified scrap is on, meaning that the page cannot be replaced until its new values are written to disk, rather than performing the write immediately, the write to disk is scheduled and the hand moves on, looking for a better candidate folio.
How To Calculate Page Size Virtual Memory,
Source: http://www.cs.rpi.edu/academics/courses/fall04/os/c12/
Posted by: campbellyeard1966.blogspot.com
0 Response to "How To Calculate Page Size Virtual Memory"
Post a Comment