From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail172.messagelabs.com (mail172.messagelabs.com [216.82.254.3]) by kanga.kvack.org (Postfix) with SMTP id D8C0D6B0092 for ; Wed, 10 Jun 2009 02:29:19 -0400 (EDT) Date: Wed, 10 Jun 2009 09:28:55 +0300 From: Izik Eidus Subject: Re: [PATCH 0/4] RFC - ksm api change into madvise Message-ID: <20090610092855.43be2405@woof.tlv.redhat.com> In-Reply-To: References: <1242261048-4487-1-git-send-email-ieidus@redhat.com> <4A2D47C1.5020302@redhat.com> <4A2D7036.1010800@redhat.com> <20090609074848.5357839a@woof.tlv.redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org To: Hugh Dickins Cc: aarcange@redhat.com, akpm@linux-foundation.org, nickpiggin@yahoo.com.au, chrisw@redhat.com, riel@redhat.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org List-ID: On Tue, 9 Jun 2009 20:27:29 +0100 (BST) Hugh Dickins wrote: > On Tue, 9 Jun 2009, Hugh Dickins wrote: > > On Tue, 9 Jun 2009, Izik Eidus wrote: > > > How does this look like? > >=20 > > One improvment to make now, though: you've elsewhere avoided > > the pgd,pud,pmd,pte descent in ksm.c (using get_pte instead), and > > page_check_address() is not static to rmap.c (filemap_xip wanted > > it), so please continue to use that. It's not exported, right, but > > I think Chris was already decisive that we should abandon modular > > KSM, yes? >=20 > I think you can simplify it further, can't you? Isn't the get_pte() > preamble in try_to_merge_one_page() just unnecessary overhead now? > See untested code below. Or even move the trylock/unlock of the page > into write_protect_page if you prefer. Later on we'll uninline > rmap.c's vma_address() so you can use it instead of your > addr_in_vma() copy. >=20 > Hugh Great!, what you think about below? another thing we want to add or to start sending it to Andrew? btw may i add your signed-off to this patch? (the only thing that i changed was taking down the *orig_pte =3D *ptep, so we will merge write_protected pages, and add orig_pte =3D __pte(0) to avoid annoying warning message about being used uninitialized) =46rom 7304f4404d91a40e234b0530de6d3bfc8c5925a2 Mon Sep 17 00:00:00 2001 From: Izik Eidus Date: Tue, 9 Jun 2009 21:00:55 +0300 Subject: [PATCH 1/2] ksm: remove ksm from being a module. Signed-off-by: Izik Eidus --- include/linux/mm.h | 2 +- include/linux/rmap.h | 4 ++-- mm/Kconfig | 5 ++--- mm/memory.c | 2 +- 4 files changed, 6 insertions(+), 7 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index e617bab..cdc08d2 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1258,7 +1258,7 @@ int vm_insert_pfn(struct vm_area_struct *vma, unsigne= d long addr, int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr, unsigned long pfn); =20 -#if defined(CONFIG_KSM) || defined(CONFIG_KSM_MODULE) +#if defined(CONFIG_KSM) int replace_page(struct vm_area_struct *vma, struct page *oldpage, struct page *newpage, pte_t orig_pte, pgprot_t prot); #endif diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 469376d..939c171 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -118,7 +118,7 @@ static inline int try_to_munlock(struct page *page) } #endif =20 -#if defined(CONFIG_KSM) || defined(CONFIG_KSM_MODULE) +#if defined(CONFIG_KSM) int page_wrprotect(struct page *page, int *odirect_sync, int count_offset); #endif =20 @@ -136,7 +136,7 @@ static inline int page_mkclean(struct page *page) return 0; } =20 -#if defined(CONFIG_KSM) || defined(CONFIG_KSM_MODULE) +#if defined(CONFIG_KSM) static inline int page_wrprotect(struct page *page, int *odirect_sync, int count_offset) { diff --git a/mm/Kconfig b/mm/Kconfig index 5ebfd18..e7c118f 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -227,10 +227,9 @@ config MMU_NOTIFIER bool =20 config KSM - tristate "Enable KSM for page sharing" + bool "Enable KSM for page sharing" help - Enable the KSM kernel module to allow page sharing of equal pages - among different tasks. + Enable KSM to allow page sharing of equal pages among different tasks. =20 config NOMMU_INITIAL_TRIM_EXCESS int "Turn on mmap() excess space trimming before booting" diff --git a/mm/memory.c b/mm/memory.c index 8b4e40e..e23d4dd 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1617,7 +1617,7 @@ int vm_insert_mixed(struct vm_area_struct *vma, unsig= ned long addr, } EXPORT_SYMBOL(vm_insert_mixed); =20 -#if defined(CONFIG_KSM) || defined(CONFIG_KSM_MODULE) +#if defined(CONFIG_KSM) =20 /** * replace_page - replace page in vma with new page --=20 1.5.6.5 =46rom 3d9975dea43ae848f21875b5f99accecf1366765 Mon Sep 17 00:00:00 2001 From: Izik Eidus Date: Wed, 10 Jun 2009 09:16:26 +0300 Subject: [PATCH 2/2] ksm: remove page_wrprotect() from rmap.c Remove page_wrprotect() from rmap.c and instead embedded the needed code into ksm.c Hugh pointed out that for the ksm usage case, we dont have to walk over the= rmap and to write protected page after page beacuse when Anonymous page is mapped more than once, it have to be write protected already, and in a case that it mapped just once, no need to walk over the rmap, we can instead write prote= ct it from inside ksm.c. Thanks. Signed-off-by: Izik Eidus --- include/linux/rmap.h | 12 ---- mm/ksm.c | 86 +++++++++++++++++++++---------- mm/rmap.c | 139 ----------------------------------------------= ---- 3 files changed, 59 insertions(+), 178 deletions(-) diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 939c171..350e76d 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -118,10 +118,6 @@ static inline int try_to_munlock(struct page *page) } #endif =20 -#if defined(CONFIG_KSM) -int page_wrprotect(struct page *page, int *odirect_sync, int count_offset); -#endif - #else /* !CONFIG_MMU */ =20 #define anon_vma_init() do {} while (0) @@ -136,14 +132,6 @@ static inline int page_mkclean(struct page *page) return 0; } =20 -#if defined(CONFIG_KSM) -static inline int page_wrprotect(struct page *page, int *odirect_sync, - int count_offset) -{ - return 0; -} -#endif - #endif /* CONFIG_MMU */ =20 /* diff --git a/mm/ksm.c b/mm/ksm.c index 74d921b..9d4be62 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -37,6 +37,7 @@ #include #include #include +#include #include =20 #include @@ -642,6 +643,60 @@ static inline int pages_identical(struct page *page1, = struct page *page2) return !memcmp_pages(page1, page2); } =20 +static inline int write_protect_page(struct page *page, + struct vm_area_struct *vma, + pte_t *orig_pte) +{ + struct mm_struct *mm =3D vma->vm_mm; + unsigned long addr; + pte_t *ptep; + spinlock_t *ptl; + int swapped; + int ret =3D 1; + + addr =3D addr_in_vma(vma, page); + if (addr =3D=3D -EFAULT) + goto out; + + ptep =3D page_check_address(page, mm, addr, &ptl, 0); + if (!ptep) + goto out; + + if (pte_write(*ptep)) { + pte_t entry; + + swapped =3D PageSwapCache(page); + flush_cache_page(vma, addr, page_to_pfn(page)); + /* + * Ok this is tricky, when get_user_pages_fast() run it doesnt + * take any lock, therefore the check that we are going to make + * with the pagecount against the mapcount is racey and + * O_DIRECT can happen right after the check. + * So we clear the pte and flush the tlb before the check + * this assure us that no O_DIRECT can happen after the check + * or in the middle of the check. + */ + entry =3D ptep_clear_flush(vma, addr, ptep); + /* + * Check that no O_DIRECT or similar I/O is in progress on the + * page + */ + if ((page_mapcount(page) + 2 + swapped) !=3D page_count(page)) { + set_pte_at_notify(mm, addr, ptep, entry); + goto out_unlock; + } + entry =3D pte_wrprotect(entry); + set_pte_at_notify(mm, addr, ptep, entry); + } + *orig_pte =3D *ptep; + ret =3D 0; + +out_unlock: + pte_unmap_unlock(ptep, ptl); +out: + return ret; +} + /* * try_to_merge_one_page - take two pages and merge them into one * @mm: mm_struct that hold vma pointing into oldpage @@ -661,9 +716,7 @@ static int try_to_merge_one_page(struct mm_struct *mm, pgprot_t newprot) { int ret =3D 1; - int odirect_sync; - unsigned long page_addr_in_vma; - pte_t orig_pte, *orig_ptep; + pte_t orig_pte =3D __pte(0); =20 if (!PageAnon(oldpage)) goto out; @@ -671,42 +724,21 @@ static int try_to_merge_one_page(struct mm_struct *mm, get_page(newpage); get_page(oldpage); =20 - page_addr_in_vma =3D addr_in_vma(vma, oldpage); - if (page_addr_in_vma =3D=3D -EFAULT) - goto out_putpage; - - orig_ptep =3D get_pte(mm, page_addr_in_vma); - if (!orig_ptep) - goto out_putpage; - orig_pte =3D *orig_ptep; - pte_unmap(orig_ptep); - if (!pte_present(orig_pte)) - goto out_putpage; - if (page_to_pfn(oldpage) !=3D pte_pfn(orig_pte)) - goto out_putpage; /* * we need the page lock to read a stable PageSwapCache in - * page_wrprotect(). + * write_protect_page(). * we use trylock_page() instead of lock_page(), beacuse we dont want to * wait here, we prefer to continue scanning and merging diffrent pages * and to come back to this page when it is unlocked. */ if (!trylock_page(oldpage)) goto out_putpage; - /* - * page_wrprotect check if the page is swapped or in swap cache, - * in the future we might want to run here if_present_pte and then - * swap_free - */ - if (!page_wrprotect(oldpage, &odirect_sync, 2)) { + + if (write_protect_page(oldpage, vma, &orig_pte)) { unlock_page(oldpage); goto out_putpage; } unlock_page(oldpage); - if (!odirect_sync) - goto out_putpage; - - orig_pte =3D pte_wrprotect(orig_pte); =20 if (pages_identical(oldpage, newpage)) ret =3D replace_page(vma, oldpage, newpage, orig_pte, newprot); diff --git a/mm/rmap.c b/mm/rmap.c index f53074c..c3ba0b9 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -585,145 +585,6 @@ int page_mkclean(struct page *page) } EXPORT_SYMBOL_GPL(page_mkclean); =20 -#if defined(CONFIG_KSM) || defined(CONFIG_KSM_MODULE) - -static int page_wrprotect_one(struct page *page, struct vm_area_struct *vm= a, - int *odirect_sync, int count_offset) -{ - struct mm_struct *mm =3D vma->vm_mm; - unsigned long address; - pte_t *pte; - spinlock_t *ptl; - int ret =3D 0; - - address =3D vma_address(page, vma); - if (address =3D=3D -EFAULT) - goto out; - - pte =3D page_check_address(page, mm, address, &ptl, 0); - if (!pte) - goto out; - - if (pte_write(*pte)) { - pte_t entry; - - flush_cache_page(vma, address, pte_pfn(*pte)); - /* - * Ok this is tricky, when get_user_pages_fast() run it doesnt - * take any lock, therefore the check that we are going to make - * with the pagecount against the mapcount is racey and - * O_DIRECT can happen right after the check. - * So we clear the pte and flush the tlb before the check - * this assure us that no O_DIRECT can happen after the check - * or in the middle of the check. - */ - entry =3D ptep_clear_flush(vma, address, pte); - /* - * Check that no O_DIRECT or similar I/O is in progress on the - * page - */ - if ((page_mapcount(page) + count_offset) !=3D page_count(page)) { - *odirect_sync =3D 0; - set_pte_at_notify(mm, address, pte, entry); - goto out_unlock; - } - entry =3D pte_wrprotect(entry); - set_pte_at_notify(mm, address, pte, entry); - } - ret =3D 1; - -out_unlock: - pte_unmap_unlock(pte, ptl); -out: - return ret; -} - -static int page_wrprotect_file(struct page *page, int *odirect_sync, - int count_offset) -{ - struct address_space *mapping; - struct prio_tree_iter iter; - struct vm_area_struct *vma; - pgoff_t pgoff =3D page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT); - int ret =3D 0; - - mapping =3D page_mapping(page); - if (!mapping) - return ret; - - spin_lock(&mapping->i_mmap_lock); - - vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) - ret +=3D page_wrprotect_one(page, vma, odirect_sync, - count_offset); - - spin_unlock(&mapping->i_mmap_lock); - - return ret; -} - -static int page_wrprotect_anon(struct page *page, int *odirect_sync, - int count_offset) -{ - struct vm_area_struct *vma; - struct anon_vma *anon_vma; - int ret =3D 0; - - anon_vma =3D page_lock_anon_vma(page); - if (!anon_vma) - return ret; - - /* - * If the page is inside the swap cache, its _count number was - * increased by one, therefore we have to increase count_offset by one. - */ - if (PageSwapCache(page)) - count_offset++; - - list_for_each_entry(vma, &anon_vma->head, anon_vma_node) - ret +=3D page_wrprotect_one(page, vma, odirect_sync, - count_offset); - - page_unlock_anon_vma(anon_vma); - - return ret; -} - -/** - * page_wrprotect - set all ptes pointing to a page as readonly - * @page: the page to set as readonly - * @odirect_sync: boolean value that is set to 0 when some of the ptes wer= e not - * marked as readonly beacuse page_wrprotect_one() was not = able - * to mark this ptes as readonly without opening window to = a race - * with odirect - * @count_offset: number of times page_wrprotect() caller had called get_p= age() - * on the page - * - * returns the number of ptes which were marked as readonly. - * (ptes that were readonly before this function was called are counted as= well) - */ -int page_wrprotect(struct page *page, int *odirect_sync, int count_offset) -{ - int ret =3D 0; - - /* - * Page lock is needed for anon pages for the PageSwapCache check, - * and for page_mapping for filebacked pages - */ - BUG_ON(!PageLocked(page)); - - *odirect_sync =3D 1; - if (PageAnon(page)) - ret =3D page_wrprotect_anon(page, odirect_sync, count_offset); - else - ret =3D page_wrprotect_file(page, odirect_sync, count_offset); - - return ret; -} -EXPORT_SYMBOL(page_wrprotect); - -#endif - /** * __page_set_anon_rmap - setup new anonymous rmap * @page: the page to add the mapping to --=20 1.5.6.5 >=20 > static inline int write_protect_page(struct page *page, > struct vm_area_struct *vma, > pte_t *orig_pte) > { > struct mm_struct *mm =3D vma->vm_mm; > unsigned long addr; > pte_t *ptep; > spinlock_t *ptl; > int swapped; > int ret =3D 1; >=20 > addr =3D addr_in_vma(vma, page); > if (addr =3D=3D -EFAULT) > goto out; >=20 > ptep =3D page_check_address(page, mm, addr, &ptl, 0); > if (!ptep) > goto out; >=20 > if (pte_write(*ptep)) { > pte_t entry; >=20 > swapped =3D PageSwapCache(page); > flush_cache_page(vma, addr, page_to_pfn(page)); > /* > * Ok this is tricky, when get_user_pages_fast() run > it doesnt > * take any lock, therefore the check that we are > going to make > * with the pagecount against the mapcount is racey > and > * O_DIRECT can happen right after the check. > * So we clear the pte and flush the tlb before the > check > * this assure us that no O_DIRECT can happen after > the check > * or in the middle of the check. > */ > entry =3D ptep_clear_flush(vma, addr, ptep); > /* > * Check that no O_DIRECT or similar I/O is in > progress on the > * page > */ > if ((page_mapcount(page) + 2 + swapped) !=3D > page_count(page)) { set_pte_at_notify(mm, addr, ptep, entry); > goto out_unlock; > } > entry =3D pte_wrprotect(entry); > set_pte_at_notify(mm, addr, ptep, entry); > *orig_pte =3D *ptep; > } > ret =3D 0; >=20 > out_unlock: > pte_unmap_unlock(ptep, ptl); > out: > return ret; > } >=20 > /* > * try_to_merge_one_page - take two pages and merge them into one > * @mm: mm_struct that hold vma pointing into oldpage > * @vma: the vma that hold the pte pointing into oldpage > * @oldpage: the page that we want to replace with newpage > * @newpage: the page that we want to map instead of oldpage > * @newprot: the new permission of the pte inside vma > * note: > * oldpage should be anon page while newpage should be file mapped > page * > * this function return 0 if the pages were merged, 1 otherwise. > */ > static int try_to_merge_one_page(struct mm_struct *mm, > struct vm_area_struct *vma, > struct page *oldpage, > struct page *newpage, > pgprot_t newprot) > { > int ret =3D 1; > pte_t orig_pte; >=20 > if (!PageAnon(oldpage)) > goto out; >=20 > get_page(newpage); > get_page(oldpage); >=20 > /* > * we need the page lock to read a stable PageSwapCache in > * write_protect_page(). > * we use trylock_page() instead of lock_page(), beacuse we > dont want to > * wait here, we prefer to continue scanning and merging > diffrent pages > * and to come back to this page when it is unlocked. > */ > if (!trylock_page(oldpage)) > goto out_putpage; >=20 > if (write_protect_page(oldpage, vma, &orig_pte)) { > unlock_page(oldpage); > goto out_putpage; > } > unlock_page(oldpage); >=20 > if (pages_identical(oldpage, newpage)) > ret =3D replace_page(vma, oldpage, newpage, orig_pte, > newprot); >=20 > out_putpage: > put_page(oldpage); > put_page(newpage); > out: > return ret; > } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org