* Question on swapping @ 2002-12-04 15:17 Martin Maletinsky 2002-12-05 19:08 ` Joseph A Knapka 0 siblings, 1 reply; 5+ messages in thread From: Martin Maletinsky @ 2002-12-04 15:17 UTC (permalink / raw) To: linux-mm, kernelnewbies Hello, I am looking at the swapping mechanism in Linux. I have read the relevant chapter 16 in 'Understanding the Linux Kernel' from Bovet&Cesati, and looked at the 2.2.18 kernel source code. I still have the follwing question: Function try_to_swap_out() [p. 481 in 'Understanding the Linux Kernel']: If the page in question already belongs to the swap cache, the function performs no data transfer to the swap space on the disk (but only marks the page as swapped out). The corresponding comment in the try_to_swap_out() functions states 'Is the page already in the swap cache? If so, ..... - it is already up-to-date on disk. Understanding the Linux Kernel states on p. 482 'If the page belongs to the swap cache .... no memory transfer is performed'. Now my question is, couldn't the page have been modified since it was added to the swap cache (and written to disk), and thus differ from the data in the swap space? In this case shouldn't the page be written to disk (again)? Such a modification may result from a store operation of another process that shares the page with the process from which it was added to the swap cache, or by an I/O operation from some external device - in both cases the data stored in the corresponding page slot in the swap area differs from the data stored in the page frame. If later on the page frame is released by shrink_mmap() from the swap cache, and subsequently needs to be restored from the data in the swap area, the page frame's latest content is lost. Thanks in advance for any help with best regards Martin Maletinsky P.S. Please put me on CC: in your reply, since I am not in the mailing list. -- Supercomputing System AG email: maletinsky@scs.ch Martin Maletinsky phone: +41 (0)1 445 16 05 Technoparkstrasse 1 fax: +41 (0)1 445 16 10 CH-8005 Zurich -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Question on swapping 2002-12-04 15:17 Question on swapping Martin Maletinsky @ 2002-12-05 19:08 ` Joseph A Knapka 2002-12-06 9:45 ` Martin Maletinsky 0 siblings, 1 reply; 5+ messages in thread From: Joseph A Knapka @ 2002-12-05 19:08 UTC (permalink / raw) To: Martin Maletinsky; +Cc: linux-mm, kernelnewbies Martin Maletinsky wrote: > Hello, > > I am looking at the swapping mechanism in Linux. I have read the relevant chapter 16 in 'Understanding the Linux Kernel' from Bovet&Cesati, and looked at the 2.2.18 kernel > source code. I still have the follwing question: > > Function try_to_swap_out() [p. 481 in 'Understanding the Linux Kernel']: > If the page in question already belongs to the swap cache, the function performs no data transfer to the swap space on the disk (but only marks the page as swapped out). > The corresponding comment in the try_to_swap_out() functions states 'Is the page already in the swap cache? If so, ..... - it is already up-to-date on disk. > Understanding the Linux Kernel states on p. 482 'If the page belongs to the swap cache .... no memory transfer is performed'. > Now my question is, couldn't the page have been modified since it was added to the swap cache (and written to disk), and thus differ from the data in the swap space? In > this case shouldn't the page be written to disk (again)? If the page is in the swap cache, it's *effectively* up to date on disk, because the swap cache page is *the* authoritative image of the page. If it's dirty it will get written out by page_launder() in short order, because whomever dirtied it set the page_dirty bit in the page struct. That issue is unimportant to the process doing the swap_out, though - all it cares about is that the page is going to be taken care of by the cache machinery. Cheers, -- Joe -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Question on swapping 2002-12-05 19:08 ` Joseph A Knapka @ 2002-12-06 9:45 ` Martin Maletinsky 2002-12-07 1:29 ` Amol Kumar Lad 0 siblings, 1 reply; 5+ messages in thread From: Martin Maletinsky @ 2002-12-06 9:45 UTC (permalink / raw) To: Joseph A Knapka; +Cc: linux-mm, kernelnewbies Hello Joe, Thank you for your reply. I have an additional question. Joseph A Knapka wrote: > Martin Maletinsky wrote: > > Hello, > > > > I am looking at the swapping mechanism in Linux. I have read the relevant chapter 16 in 'Understanding the Linux Kernel' from Bovet&Cesati, and looked at the 2.2.18 kernel > > source code. I still have the follwing question: > > > > Function try_to_swap_out() [p. 481 in 'Understanding the Linux Kernel']: > > If the page in question already belongs to the swap cache, the function performs no data transfer to the swap space on the disk (but only marks the page as swapped out). > > The corresponding comment in the try_to_swap_out() functions states 'Is the page already in the swap cache? If so, ..... - it is already up-to-date on disk. > > Understanding the Linux Kernel states on p. 482 'If the page belongs to the swap cache .... no memory transfer is performed'. > > Now my question is, couldn't the page have been modified since it was added to the swap cache (and written to disk), and thus differ from the data in the swap space? In > > this case shouldn't the page be written to disk (again)? > > If the page is in the swap cache, it's *effectively* up to date on disk, > because the swap cache page is *the* authoritative image of the page. > If it's dirty it will get written out by page_launder() in short > order, because whomever dirtied it set the page_dirty bit in the > page struct. That issue is unimportant to the process doing the > swap_out, though - all it cares about is that the page is going > to be taken care of by the cache machinery. Assume a page P that is marked as clean (i.e. PG_dirty bit not set), and is in the page cache. Additionaly assume that P is mapped by a process A. Now let A perform a store operation into the page P, which will mark A's page table entry mapping P as dirty (i.e. set the dirty bit). Subsequently assume that try_to_swap_out() is called on A's page table entry that maps P. try_to_swap_out() will see that P is in the swap cache already, and thus drop the pte. This leads to a situation, where P is in the swap cache, marked as clear (i.e. PG_dirty bit clear), while the disk image differs from the data that is in the main memory page frame. I would have expected try_to_swap_out() to check the page table entries dirty bit, and to mark the page dirty. However, I can't see any such code in the function (I pasted the relevant lines of code from linux 2.2.18 below). static int try_to_swap_out(struct task_struct * tsk, struct vm_area_struct* vma, unsigned long address, pte_t * page_table, int gfp_mask) { pte_t pte; unsigned long entry; unsigned long page; struct page * page_map; pte = *page_table; if (!pte_present(pte)) return 0; page = pte_page(pte); if (MAP_NR(page) >= max_mapnr) return 0; page_map = mem_map + MAP_NR(page); if (pte_young(pte)) { /* * Transfer the "accessed" bit from the page * tables to the global page map. */ set_pte(page_table, pte_mkold(pte)); flush_tlb_page(vma, address); set_bit(PG_referenced, &page_map->flags); return 0; } if (PageReserved(page_map) || PageLocked(page_map) || ((gfp_mask & __GFP_DMA) && !PageDMA(page_map))) return 0; /* * Is the page already in the swap cache? If so, then * we can just drop our reference to it without doing * any IO - it's already up-to-date on disk. * * Return 0, as we didn't actually free any real * memory, and we should just continue our scan. */ if (PageSwapCache(page_map)) { entry = page_map->offset; swap_duplicate(entry); set_pte(page_table, __pte(entry)); drop_pte: vma->vm_mm->rss--; flush_tlb_page(vma, address); __free_page(page_map); return 0; } ............ Thanks again, with best regards Martin -- Supercomputing System AG email: maletinsky@scs.ch Martin Maletinsky phone: +41 (0)1 445 16 05 Technoparkstrasse 1 fax: +41 (0)1 445 16 10 CH-8005 Zurich -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Question on swapping 2002-12-06 9:45 ` Martin Maletinsky @ 2002-12-07 1:29 ` Amol Kumar Lad 2002-12-12 15:56 ` Joseph A Knapka 0 siblings, 1 reply; 5+ messages in thread From: Amol Kumar Lad @ 2002-12-07 1:29 UTC (permalink / raw) To: Martin Maletinsky; +Cc: Joseph A Knapka, linux-mm, kernelnewbies On Fri, 2002-12-06 at 04:45, Martin Maletinsky wrote: > Hello Joe, > > Thank you for your reply. I have an additional question. > > Joseph A Knapka wrote: > > > Martin Maletinsky wrote: > > > Hello, > > > > > > I am looking at the swapping mechanism in Linux. I have read the > relevant chapter 16 in 'Understanding the Linux Kernel' from > Bovet&Cesati, and looked at the 2.2.18 kernel > > > source code. I still have the follwing question: > > > > > > Function try_to_swap_out() [p. 481 in 'Understanding the Linux > Kernel']: > > > If the page in question already belongs to the swap cache, the > function performs no data transfer to the swap space on the disk (but > only marks the page as swapped out). > > > The corresponding comment in the try_to_swap_out() functions states > 'Is the page already in the swap cache? If so, ..... - it is already > up-to-date on disk. > > > Understanding the Linux Kernel states on p. 482 'If the page belongs > to the swap cache .... no memory transfer is performed'. > > > Now my question is, couldn't the page have been modified since it > was added to the swap cache (and written to disk), and thus differ from > the data in the swap space? In > > > this case shouldn't the page be written to disk (again)? > > > > If the page is in the swap cache, it's *effectively* up to date on > disk, > > because the swap cache page is *the* authoritative image of the page. > > If it's dirty it will get written out by page_launder() in short > > order, because whomever dirtied it set the page_dirty bit in the > > page struct. That issue is unimportant to the process doing the > > swap_out, though - all it cares about is that the page is going > > to be taken care of by the cache machinery. > > Assume a page P that is marked as clean (i.e. PG_dirty bit not set), and > is in the page cache. Additionaly assume that P is mapped by a process > A. Now let A perform a store > operation into the page P, which will mark A's page table entry mapping > P as dirty (i.e. set the dirty bit). > Subsequently assume that try_to_swap_out() is called on A's page table > entry that maps P. try_to_swap_out() will see that P is in the swap > cache already, >>>> For the first time P would never be found in the swap cache, infact try_to_swap_out shall do following a] Page is dirty (in page table entry), so set PG_DIRTY in struct page b] Allocate a swap entry and add this page to swap cache c] release the page, and add the modify page table entry to point it to swap entry Now We have a] Page table entry for P contains swap info b] Page P in swap cache c] PG_DIRTY _is_ set (infact for a page in swap cache this is always true) Do remember, along with the swap cache P may be party of inactive_dirty list. The actual swapping to backing store is done by page scanner. It shall do following. Assume it has decided to _really_ free P 1] As page is dirty, call the page write back function. Thus here for the first time page found its place in swap. 2] send P back home, to buddy allocator If process A again access the page, then page fault handler shall do following 1] allocate a swap cache page 2] read the page from swap. 3] Modify page table entry of A to point to this page Just give a little thought about all this, VM shall reveal herself to you :) bye Amol and thus drop the pte. > This leads to a situation, where P is in the swap cache, marked as clear > (i.e. PG_dirty bit clear), while the disk image differs from the data > that is in the main memory page > frame. > I would have expected try_to_swap_out() to check the page table entries > dirty bit, and to mark the page dirty. However, I can't see any such > code in the function (I pasted the > relevant lines of code from linux 2.2.18 below). > > static int try_to_swap_out(struct task_struct * tsk, struct > vm_area_struct* vma, > unsigned long address, pte_t * page_table, int gfp_mask) > { > pte_t pte; > unsigned long entry; > unsigned long page; > struct page * page_map; > > pte = *page_table; > if (!pte_present(pte)) > return 0; > page = pte_page(pte); > if (MAP_NR(page) >= max_mapnr) > return 0; > page_map = mem_map + MAP_NR(page); > > if (pte_young(pte)) { > /* > * Transfer the "accessed" bit from the page > * tables to the global page map. > */ > set_pte(page_table, pte_mkold(pte)); > flush_tlb_page(vma, address); > set_bit(PG_referenced, &page_map->flags); > return 0; > } > > if (PageReserved(page_map) > || PageLocked(page_map) > || ((gfp_mask & __GFP_DMA) && !PageDMA(page_map))) > return 0; > > /* > * Is the page already in the swap cache? If so, then > * we can just drop our reference to it without doing > * any IO - it's already up-to-date on disk. > * > * Return 0, as we didn't actually free any real > * memory, and we should just continue our scan. > */ > if (PageSwapCache(page_map)) { > entry = page_map->offset; > swap_duplicate(entry); > set_pte(page_table, __pte(entry)); > drop_pte: > vma->vm_mm->rss--; > flush_tlb_page(vma, address); > __free_page(page_map); > return 0; > } > ............ > > > Thanks again, with best regards > Martin > > -- > Supercomputing System AG email: maletinsky@scs.ch > Martin Maletinsky phone: +41 (0)1 445 16 05 > Technoparkstrasse 1 fax: +41 (0)1 445 16 10 > CH-8005 Zurich > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Question on swapping 2002-12-07 1:29 ` Amol Kumar Lad @ 2002-12-12 15:56 ` Joseph A Knapka 0 siblings, 0 replies; 5+ messages in thread From: Joseph A Knapka @ 2002-12-12 15:56 UTC (permalink / raw) To: Amol Kumar Lad; +Cc: Martin Maletinsky, linux-mm, kernelnewbies Amol Kumar Lad wrote: >For the first time P would never be found in the swap cache, infact > try_to_swap_out shall do following > a] Page is dirty (in page table entry), so set PG_DIRTY in struct page This appears to be the *only* place in the kernel where pte dirty bits are propagated into the mem_map. > b] Allocate a swap entry and add this page to swap cache > c] release the page, and add the modify page table entry to point it to > swap entry > > Now We have > a] Page table entry for P contains swap info > b] Page P in swap cache > c] PG_DIRTY _is_ set (infact for a page in swap cache this is always > true) > > Do remember, along with the swap cache P may be party of inactive_dirty > list. > > The actual swapping to backing store is done by page scanner. > It shall do following. Assume it has decided to _really_ free P > 1] As page is dirty, call the page write back function. Thus here for > the first time page found its place in swap. > 2] send P back home, to buddy allocator > > If process A again access the page, then page fault handler shall do > following > 1] allocate a swap cache page > 2] read the page from swap. > 3] Modify page table entry of A to point to this page So now the page is not marked dirty in the mem_map. What if A *now* writes the page and then tries to swap it out? That's Martin's question: in that case, we have a page that's in the swap cache; whose page struct is *not* marked dirty; but which *is* actually dirty. How do we know the page will be kept up to date on disk? I used to understand how this worked, but I've forgotten. Or maybe I never really understood it. Cheers, -- Joe -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2002-12-12 15:56 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2002-12-04 15:17 Question on swapping Martin Maletinsky 2002-12-05 19:08 ` Joseph A Knapka 2002-12-06 9:45 ` Martin Maletinsky 2002-12-07 1:29 ` Amol Kumar Lad 2002-12-12 15:56 ` Joseph A Knapka
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox