[PATCH V5 0/4] Refactor do_wp_page, no functional change

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH V5 0/4] Refactor do_wp_page, no functional change
@ 2015-02-22 13:42 Shachar Raindel
  2015-02-22 13:42 ` [PATCH V5 1/4] mm: Refactor do_wp_page, extract the reuse case Shachar Raindel
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Shachar Raindel @ 2015-02-22 13:42 UTC (permalink / raw)
  To: linux-mm
  Cc: kirill.shutemov, mgorman, riel, ak, matthew.r.wilcox,
	dave.hansen, n-horiguchi, akpm, torvalds, haggaie, aarcange,
	pfeiner, hannes, sagig, walken, raindel

Currently do_wp_page contains 265 code lines. It also contains 9 goto
statements, of which 5 are targeting labels which are not cleanup
related. This makes the function extremely difficult to
understand. The following patches are an attempt at breaking the
function to its basic components, and making it easier to understand.

The patches are straight forward function extractions from
do_wp_page. As we extract functions, we remove unneeded parameters and
simplify the code as much as possible. However, the functionality is
supposed to remain completely unchanged. The patches also attempt to
document the functionality of each extracted function. In patch 2, we
split the unlock logic to the contain logic relevant to specific needs
of each use case, instead of having huge number of conditional
decisions in a single unlock flow.

Change log:

v0 -> v1:
- Minor renaming of argument in patch 1
- Instead of having a complex unlock function, unlock the needed parts
  in the relevant call sites. Simplify code accordingly.
- Avoid calling wp_page_copy with the ptl held.
- Rename wp_page_shared_vma to wp_page_shared, flip the logic of a
  check there to goto the end of the function if no function, instead
  of having a large conditional block.

v1 -> v2:
- Cosmetical white space changes in patch 4

v2 -> v3:
- Rebase to v3.19-rc3
- Add missing Acked-by and CC notations to the commit messages.

v3 -> v4:
- Rebase to next-20150219

v4 -> v5:
- Fix minor style issues found by Johannes Weiner <hannes@cmpxchg.org>
- Add Acked-by from Johannes Weiner <hannes@cmpxchg.org>

Shachar Raindel (4):
  mm: Refactor do_wp_page, extract the reuse case
  mm: Refactor do_wp_page - rewrite the unlock flow
  mm: refactor do_wp_page, extract the page copy flow
  mm: Refactor do_wp_page handling of shared vma into a function

 mm/memory.c | 371 +++++++++++++++++++++++++++++++++++-------------------------
 1 file changed, 216 insertions(+), 155 deletions(-)

-- 
1.7.11.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH V5 1/4] mm: Refactor do_wp_page, extract the reuse case
  2015-02-22 13:42 [PATCH V5 0/4] Refactor do_wp_page, no functional change Shachar Raindel
@ 2015-02-22 13:42 ` Shachar Raindel
  2015-02-24 13:39   ` Michal Hocko
  2015-02-22 13:42 ` [PATCH V5 2/4] mm: Refactor do_wp_page - rewrite the unlock flow Shachar Raindel
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Shachar Raindel @ 2015-02-22 13:42 UTC (permalink / raw)
  To: linux-mm
  Cc: kirill.shutemov, mgorman, riel, ak, matthew.r.wilcox,
	dave.hansen, n-horiguchi, akpm, torvalds, haggaie, aarcange,
	pfeiner, hannes, sagig, walken, raindel, Dave Hansen

When do_wp_page is ending, in several cases it needs to reuse the
existing page. This is achieved by making the page table writable,
and possibly updating the page-cache state.

Currently, this logic was "called" by using a goto jump. This makes
following the control flow of the function harder. It is also
against the coding style guidelines for using goto.

As the code can easily be refactored into a specialized function,
refactor it out and simplify the code flow in do_wp_page.

Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Haggai Eran <haggaie@mellanox.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: Michel Lespinasse <walken@google.com>
---
 mm/memory.c | 117 +++++++++++++++++++++++++++++++++++-------------------------
 1 file changed, 68 insertions(+), 49 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 8068893..7a04414 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1983,6 +1983,65 @@ static int do_page_mkwrite(struct vm_area_struct *vma, struct page *page,
 }
 
 /*
+ * Handle write page faults for pages that can be reused in the current vma
+ *
+ * This can happen either due to the mapping being with the VM_SHARED flag,
+ * or due to us being the last reference standing to the page. In either
+ * case, all we need to do here is to mark the page as writable and update
+ * any related book-keeping.
+ */
+static int wp_page_reuse(struct mm_struct *mm, struct vm_area_struct *vma,
+			 unsigned long address, pte_t *page_table,
+			 spinlock_t *ptl, pte_t orig_pte,
+			 struct page *page, int page_mkwrite,
+			 int dirty_shared)
+	__releases(ptl)
+{
+	pte_t entry;
+	/*
+	 * Clear the pages cpupid information as the existing
+	 * information potentially belongs to a now completely
+	 * unrelated process.
+	 */
+	if (page)
+		page_cpupid_xchg_last(page, (1 << LAST_CPUPID_SHIFT) - 1);
+
+	flush_cache_page(vma, address, pte_pfn(orig_pte));
+	entry = pte_mkyoung(orig_pte);
+	entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+	if (ptep_set_access_flags(vma, address, page_table, entry, 1))
+		update_mmu_cache(vma, address, page_table);
+	pte_unmap_unlock(page_table, ptl);
+
+	if (dirty_shared) {
+		struct address_space *mapping;
+		int dirtied;
+
+		if (!page_mkwrite)
+			lock_page(page);
+
+		dirtied = set_page_dirty(page);
+		VM_BUG_ON_PAGE(PageAnon(page), page);
+		mapping = page->mapping;
+		unlock_page(page);
+		page_cache_release(page);
+
+		if ((dirtied || page_mkwrite) && mapping) {
+			/*
+			 * Some device drivers do not set page.mapping
+			 * but still dirty their pages
+			 */
+			balance_dirty_pages_ratelimited(mapping);
+		}
+
+		if (!page_mkwrite)
+			file_update_time(vma->vm_file);
+	}
+
+	return VM_FAULT_WRITE;
+}
+
+/*
  * This routine handles present pages, when users try to write
  * to a shared page. It is done by copying the page to a new address
  * and decrementing the shared-page counter for the old page.
@@ -2008,8 +2067,6 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	struct page *old_page, *new_page = NULL;
 	pte_t entry;
 	int ret = 0;
-	int page_mkwrite = 0;
-	bool dirty_shared = false;
 	unsigned long mmun_start = 0;	/* For mmu_notifiers */
 	unsigned long mmun_end = 0;	/* For mmu_notifiers */
 	struct mem_cgroup *memcg;
@@ -2026,7 +2083,8 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		 */
 		if ((vma->vm_flags & (VM_WRITE|VM_SHARED)) ==
 				     (VM_WRITE|VM_SHARED))
-			goto reuse;
+			return wp_page_reuse(mm, vma, address, page_table, ptl,
+					     orig_pte, old_page, 0, 0);
 		goto gotten;
 	}
 
@@ -2055,12 +2113,16 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			 */
 			page_move_anon_rmap(old_page, vma, address);
 			unlock_page(old_page);
-			goto reuse;
+			return wp_page_reuse(mm, vma, address, page_table, ptl,
+					     orig_pte, old_page, 0, 0);
 		}
 		unlock_page(old_page);
 	} else if (unlikely((vma->vm_flags & (VM_WRITE|VM_SHARED)) ==
 					(VM_WRITE|VM_SHARED))) {
+		int page_mkwrite = 0;
+
 		page_cache_get(old_page);
+
 		/*
 		 * Only catch write-faults on shared writable pages,
 		 * read-only shared pages can get COWed by
@@ -2091,51 +2153,8 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			page_mkwrite = 1;
 		}
 
-		dirty_shared = true;
-
-reuse:
-		/*
-		 * Clear the pages cpupid information as the existing
-		 * information potentially belongs to a now completely
-		 * unrelated process.
-		 */
-		if (old_page)
-			page_cpupid_xchg_last(old_page, (1 << LAST_CPUPID_SHIFT) - 1);
-
-		flush_cache_page(vma, address, pte_pfn(orig_pte));
-		entry = pte_mkyoung(orig_pte);
-		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
-		if (ptep_set_access_flags(vma, address, page_table, entry,1))
-			update_mmu_cache(vma, address, page_table);
-		pte_unmap_unlock(page_table, ptl);
-		ret |= VM_FAULT_WRITE;
-
-		if (dirty_shared) {
-			struct address_space *mapping;
-			int dirtied;
-
-			if (!page_mkwrite)
-				lock_page(old_page);
-
-			dirtied = set_page_dirty(old_page);
-			VM_BUG_ON_PAGE(PageAnon(old_page), old_page);
-			mapping = old_page->mapping;
-			unlock_page(old_page);
-			page_cache_release(old_page);
-
-			if ((dirtied || page_mkwrite) && mapping) {
-				/*
-				 * Some device drivers do not set page.mapping
-				 * but still dirty their pages
-				 */
-				balance_dirty_pages_ratelimited(mapping);
-			}
-
-			if (!page_mkwrite)
-				file_update_time(vma->vm_file);
-		}
-
-		return ret;
+		return wp_page_reuse(mm, vma, address, page_table, ptl,
+				     orig_pte, old_page, page_mkwrite, 1);
 	}
 
 	/*
-- 
1.7.11.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V5 1/4] mm: Refactor do_wp_page, extract the reuse case
  2015-02-22 13:42 ` [PATCH V5 1/4] mm: Refactor do_wp_page, extract the reuse case Shachar Raindel
@ 2015-02-24 13:39   ` Michal Hocko
  0 siblings, 0 replies; 11+ messages in thread
From: Michal Hocko @ 2015-02-24 13:39 UTC (permalink / raw)
  To: Shachar Raindel
  Cc: linux-mm, kirill.shutemov, mgorman, riel, ak, matthew.r.wilcox,
	dave.hansen, n-horiguchi, akpm, torvalds, haggaie, aarcange,
	pfeiner, hannes, sagig, walken, Dave Hansen

On Sun 22-02-15 15:42:15, Shachar Raindel wrote:
> When do_wp_page is ending, in several cases it needs to reuse the
> existing page. This is achieved by making the page table writable,
> and possibly updating the page-cache state.
> 
> Currently, this logic was "called" by using a goto jump. This makes
> following the control flow of the function harder. It is also
> against the coding style guidelines for using goto.
> 
> As the code can easily be refactored into a specialized function,
> refactor it out and simplify the code flow in do_wp_page.
> 
> Signed-off-by: Shachar Raindel <raindel@mellanox.com>
> Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Acked-by: Rik van Riel <riel@redhat.com>
> Acked-by: Andi Kleen <ak@linux.intel.com>
> Acked-by: Haggai Eran <haggaie@mellanox.com>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Peter Feiner <pfeiner@google.com>
> Cc: Michel Lespinasse <walken@google.com>

Makes sense to me
Reviewed-by: Michal Hocko <mhocko@suse.cz>

> ---
>  mm/memory.c | 117 +++++++++++++++++++++++++++++++++++-------------------------
>  1 file changed, 68 insertions(+), 49 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 8068893..7a04414 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1983,6 +1983,65 @@ static int do_page_mkwrite(struct vm_area_struct *vma, struct page *page,
>  }
>  
>  /*
> + * Handle write page faults for pages that can be reused in the current vma
> + *
> + * This can happen either due to the mapping being with the VM_SHARED flag,
> + * or due to us being the last reference standing to the page. In either
> + * case, all we need to do here is to mark the page as writable and update
> + * any related book-keeping.
> + */
> +static int wp_page_reuse(struct mm_struct *mm, struct vm_area_struct *vma,
> +			 unsigned long address, pte_t *page_table,
> +			 spinlock_t *ptl, pte_t orig_pte,
> +			 struct page *page, int page_mkwrite,
> +			 int dirty_shared)
> +	__releases(ptl)
> +{
> +	pte_t entry;
> +	/*
> +	 * Clear the pages cpupid information as the existing
> +	 * information potentially belongs to a now completely
> +	 * unrelated process.
> +	 */
> +	if (page)
> +		page_cpupid_xchg_last(page, (1 << LAST_CPUPID_SHIFT) - 1);
> +
> +	flush_cache_page(vma, address, pte_pfn(orig_pte));
> +	entry = pte_mkyoung(orig_pte);
> +	entry = maybe_mkwrite(pte_mkdirty(entry), vma);
> +	if (ptep_set_access_flags(vma, address, page_table, entry, 1))
> +		update_mmu_cache(vma, address, page_table);
> +	pte_unmap_unlock(page_table, ptl);
> +
> +	if (dirty_shared) {
> +		struct address_space *mapping;
> +		int dirtied;
> +
> +		if (!page_mkwrite)
> +			lock_page(page);
> +
> +		dirtied = set_page_dirty(page);
> +		VM_BUG_ON_PAGE(PageAnon(page), page);
> +		mapping = page->mapping;
> +		unlock_page(page);
> +		page_cache_release(page);
> +
> +		if ((dirtied || page_mkwrite) && mapping) {
> +			/*
> +			 * Some device drivers do not set page.mapping
> +			 * but still dirty their pages
> +			 */
> +			balance_dirty_pages_ratelimited(mapping);
> +		}
> +
> +		if (!page_mkwrite)
> +			file_update_time(vma->vm_file);
> +	}
> +
> +	return VM_FAULT_WRITE;
> +}
> +
> +/*
>   * This routine handles present pages, when users try to write
>   * to a shared page. It is done by copying the page to a new address
>   * and decrementing the shared-page counter for the old page.
> @@ -2008,8 +2067,6 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  	struct page *old_page, *new_page = NULL;
>  	pte_t entry;
>  	int ret = 0;
> -	int page_mkwrite = 0;
> -	bool dirty_shared = false;
>  	unsigned long mmun_start = 0;	/* For mmu_notifiers */
>  	unsigned long mmun_end = 0;	/* For mmu_notifiers */
>  	struct mem_cgroup *memcg;
> @@ -2026,7 +2083,8 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  		 */
>  		if ((vma->vm_flags & (VM_WRITE|VM_SHARED)) ==
>  				     (VM_WRITE|VM_SHARED))
> -			goto reuse;
> +			return wp_page_reuse(mm, vma, address, page_table, ptl,
> +					     orig_pte, old_page, 0, 0);
>  		goto gotten;
>  	}
>  
> @@ -2055,12 +2113,16 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  			 */
>  			page_move_anon_rmap(old_page, vma, address);
>  			unlock_page(old_page);
> -			goto reuse;
> +			return wp_page_reuse(mm, vma, address, page_table, ptl,
> +					     orig_pte, old_page, 0, 0);
>  		}
>  		unlock_page(old_page);
>  	} else if (unlikely((vma->vm_flags & (VM_WRITE|VM_SHARED)) ==
>  					(VM_WRITE|VM_SHARED))) {
> +		int page_mkwrite = 0;
> +
>  		page_cache_get(old_page);
> +
>  		/*
>  		 * Only catch write-faults on shared writable pages,
>  		 * read-only shared pages can get COWed by
> @@ -2091,51 +2153,8 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  			page_mkwrite = 1;
>  		}
>  
> -		dirty_shared = true;
> -
> -reuse:
> -		/*
> -		 * Clear the pages cpupid information as the existing
> -		 * information potentially belongs to a now completely
> -		 * unrelated process.
> -		 */
> -		if (old_page)
> -			page_cpupid_xchg_last(old_page, (1 << LAST_CPUPID_SHIFT) - 1);
> -
> -		flush_cache_page(vma, address, pte_pfn(orig_pte));
> -		entry = pte_mkyoung(orig_pte);
> -		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
> -		if (ptep_set_access_flags(vma, address, page_table, entry,1))
> -			update_mmu_cache(vma, address, page_table);
> -		pte_unmap_unlock(page_table, ptl);
> -		ret |= VM_FAULT_WRITE;
> -
> -		if (dirty_shared) {
> -			struct address_space *mapping;
> -			int dirtied;
> -
> -			if (!page_mkwrite)
> -				lock_page(old_page);
> -
> -			dirtied = set_page_dirty(old_page);
> -			VM_BUG_ON_PAGE(PageAnon(old_page), old_page);
> -			mapping = old_page->mapping;
> -			unlock_page(old_page);
> -			page_cache_release(old_page);
> -
> -			if ((dirtied || page_mkwrite) && mapping) {
> -				/*
> -				 * Some device drivers do not set page.mapping
> -				 * but still dirty their pages
> -				 */
> -				balance_dirty_pages_ratelimited(mapping);
> -			}
> -
> -			if (!page_mkwrite)
> -				file_update_time(vma->vm_file);
> -		}
> -
> -		return ret;
> +		return wp_page_reuse(mm, vma, address, page_table, ptl,
> +				     orig_pte, old_page, page_mkwrite, 1);
>  	}
>  
>  	/*
> -- 
> 1.7.11.2
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH V5 2/4] mm: Refactor do_wp_page - rewrite the unlock flow
  2015-02-22 13:42 [PATCH V5 0/4] Refactor do_wp_page, no functional change Shachar Raindel
  2015-02-22 13:42 ` [PATCH V5 1/4] mm: Refactor do_wp_page, extract the reuse case Shachar Raindel
@ 2015-02-22 13:42 ` Shachar Raindel
  2015-02-24 13:45   ` Michal Hocko
  2015-02-22 13:42 ` [PATCH V5 3/4] mm: refactor do_wp_page, extract the page copy flow Shachar Raindel
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Shachar Raindel @ 2015-02-22 13:42 UTC (permalink / raw)
  To: linux-mm
  Cc: kirill.shutemov, mgorman, riel, ak, matthew.r.wilcox,
	dave.hansen, n-horiguchi, akpm, torvalds, haggaie, aarcange,
	pfeiner, hannes, sagig, walken, raindel, Dave Hansen

When do_wp_page is ending, in several cases it needs to unlock the
pages and ptls it was accessing.

Currently, this logic was "called" by using a goto jump. This makes
following the control flow of the function harder. Readability was
further hampered by the unlock case containing large amount of logic
needed only in one of the 3 cases.

Using goto for cleanup is generally allowed. However, moving the
trivial unlocking flows to the relevant call sites allow deeper
refactoring in the next patch.

Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Haggai Eran <haggaie@mellanox.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: Michel Lespinasse <walken@google.com>
---
 mm/memory.c | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 7a04414..3afd9ce 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2066,7 +2066,7 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 {
 	struct page *old_page, *new_page = NULL;
 	pte_t entry;
-	int ret = 0;
+	int page_copied = 0;
 	unsigned long mmun_start = 0;	/* For mmu_notifiers */
 	unsigned long mmun_end = 0;	/* For mmu_notifiers */
 	struct mem_cgroup *memcg;
@@ -2101,7 +2101,9 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 							 &ptl);
 			if (!pte_same(*page_table, orig_pte)) {
 				unlock_page(old_page);
-				goto unlock;
+				pte_unmap_unlock(page_table, ptl);
+				page_cache_release(old_page);
+				return 0;
 			}
 			page_cache_release(old_page);
 		}
@@ -2148,7 +2150,9 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 							 &ptl);
 			if (!pte_same(*page_table, orig_pte)) {
 				unlock_page(old_page);
-				goto unlock;
+				pte_unmap_unlock(page_table, ptl);
+				page_cache_release(old_page);
+				return 0;
 			}
 			page_mkwrite = 1;
 		}
@@ -2246,29 +2250,28 @@ gotten:
 
 		/* Free the old page.. */
 		new_page = old_page;
-		ret |= VM_FAULT_WRITE;
+		page_copied = 1;
 	} else
 		mem_cgroup_cancel_charge(new_page, memcg);
 
 	if (new_page)
 		page_cache_release(new_page);
-unlock:
+
 	pte_unmap_unlock(page_table, ptl);
-	if (mmun_end > mmun_start)
-		mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
+	mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
 	if (old_page) {
 		/*
 		 * Don't let another task, with possibly unlocked vma,
 		 * keep the mlocked page.
 		 */
-		if ((ret & VM_FAULT_WRITE) && (vma->vm_flags & VM_LOCKED)) {
+		if (page_copied && (vma->vm_flags & VM_LOCKED)) {
 			lock_page(old_page);	/* LRU manipulation */
 			munlock_vma_page(old_page);
 			unlock_page(old_page);
 		}
 		page_cache_release(old_page);
 	}
-	return ret;
+	return page_copied ? VM_FAULT_WRITE : 0;
 oom_free_new:
 	page_cache_release(new_page);
 oom:
-- 
1.7.11.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V5 2/4] mm: Refactor do_wp_page - rewrite the unlock flow
  2015-02-22 13:42 ` [PATCH V5 2/4] mm: Refactor do_wp_page - rewrite the unlock flow Shachar Raindel
@ 2015-02-24 13:45   ` Michal Hocko
  0 siblings, 0 replies; 11+ messages in thread
From: Michal Hocko @ 2015-02-24 13:45 UTC (permalink / raw)
  To: Shachar Raindel
  Cc: linux-mm, kirill.shutemov, mgorman, riel, ak, matthew.r.wilcox,
	dave.hansen, n-horiguchi, akpm, torvalds, haggaie, aarcange,
	pfeiner, hannes, sagig, walken, Dave Hansen

On Sun 22-02-15 15:42:16, Shachar Raindel wrote:
> When do_wp_page is ending, in several cases it needs to unlock the
> pages and ptls it was accessing.
> 
> Currently, this logic was "called" by using a goto jump. This makes
> following the control flow of the function harder. Readability was
> further hampered by the unlock case containing large amount of logic
> needed only in one of the 3 cases.
> 
> Using goto for cleanup is generally allowed. However, moving the
> trivial unlocking flows to the relevant call sites allow deeper
> refactoring in the next patch.
> 
> Signed-off-by: Shachar Raindel <raindel@mellanox.com>
> Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Acked-by: Rik van Riel <riel@redhat.com>
> Acked-by: Andi Kleen <ak@linux.intel.com>
> Acked-by: Haggai Eran <haggaie@mellanox.com>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Peter Feiner <pfeiner@google.com>
> Cc: Michel Lespinasse <walken@google.com>

Neat!
Reviewed-by: Michal Hocko <mhocko@suse.cz>

> ---
>  mm/memory.c | 21 ++++++++++++---------
>  1 file changed, 12 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 7a04414..3afd9ce 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2066,7 +2066,7 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  {
>  	struct page *old_page, *new_page = NULL;
>  	pte_t entry;
> -	int ret = 0;
> +	int page_copied = 0;
>  	unsigned long mmun_start = 0;	/* For mmu_notifiers */
>  	unsigned long mmun_end = 0;	/* For mmu_notifiers */
>  	struct mem_cgroup *memcg;
> @@ -2101,7 +2101,9 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  							 &ptl);
>  			if (!pte_same(*page_table, orig_pte)) {
>  				unlock_page(old_page);
> -				goto unlock;
> +				pte_unmap_unlock(page_table, ptl);
> +				page_cache_release(old_page);
> +				return 0;
>  			}
>  			page_cache_release(old_page);
>  		}
> @@ -2148,7 +2150,9 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  							 &ptl);
>  			if (!pte_same(*page_table, orig_pte)) {
>  				unlock_page(old_page);
> -				goto unlock;
> +				pte_unmap_unlock(page_table, ptl);
> +				page_cache_release(old_page);
> +				return 0;
>  			}
>  			page_mkwrite = 1;
>  		}
> @@ -2246,29 +2250,28 @@ gotten:
>  
>  		/* Free the old page.. */
>  		new_page = old_page;
> -		ret |= VM_FAULT_WRITE;
> +		page_copied = 1;
>  	} else
>  		mem_cgroup_cancel_charge(new_page, memcg);
>  
>  	if (new_page)
>  		page_cache_release(new_page);
> -unlock:
> +
>  	pte_unmap_unlock(page_table, ptl);
> -	if (mmun_end > mmun_start)
> -		mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
> +	mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
>  	if (old_page) {
>  		/*
>  		 * Don't let another task, with possibly unlocked vma,
>  		 * keep the mlocked page.
>  		 */
> -		if ((ret & VM_FAULT_WRITE) && (vma->vm_flags & VM_LOCKED)) {
> +		if (page_copied && (vma->vm_flags & VM_LOCKED)) {
>  			lock_page(old_page);	/* LRU manipulation */
>  			munlock_vma_page(old_page);
>  			unlock_page(old_page);
>  		}
>  		page_cache_release(old_page);
>  	}
> -	return ret;
> +	return page_copied ? VM_FAULT_WRITE : 0;
>  oom_free_new:
>  	page_cache_release(new_page);
>  oom:
> -- 
> 1.7.11.2
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH V5 3/4] mm: refactor do_wp_page, extract the page copy flow
  2015-02-22 13:42 [PATCH V5 0/4] Refactor do_wp_page, no functional change Shachar Raindel
  2015-02-22 13:42 ` [PATCH V5 1/4] mm: Refactor do_wp_page, extract the reuse case Shachar Raindel
  2015-02-22 13:42 ` [PATCH V5 2/4] mm: Refactor do_wp_page - rewrite the unlock flow Shachar Raindel
@ 2015-02-22 13:42 ` Shachar Raindel
  2015-02-24 13:56   ` Michal Hocko
  2015-02-22 13:42 ` [PATCH V5 4/4] mm: Refactor do_wp_page handling of shared vma into a function Shachar Raindel
  2015-02-24  0:20 ` [PATCH V5 0/4] Refactor do_wp_page, no functional change Andrew Morton
  4 siblings, 1 reply; 11+ messages in thread
From: Shachar Raindel @ 2015-02-22 13:42 UTC (permalink / raw)
  To: linux-mm
  Cc: kirill.shutemov, mgorman, riel, ak, matthew.r.wilcox,
	dave.hansen, n-horiguchi, akpm, torvalds, haggaie, aarcange,
	pfeiner, hannes, sagig, walken, raindel, Dave Hansen

In some cases, do_wp_page had to copy the page suffering a write fault
to a new location. If the function logic decided that to do this, it
was done by jumping with a "goto" operation to the relevant code
block. This made the code really hard to understand. It is also
against the kernel coding style guidelines.

This patch extracts the page copy and page table update logic to a
separate function. It also clean up the naming, from "gotten" to
"wp_page_copy", and adds few comments.

Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Haggai Eran <haggaie@mellanox.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: Michel Lespinasse <walken@google.com>
---
 mm/memory.c | 265 +++++++++++++++++++++++++++++++++---------------------------
 1 file changed, 147 insertions(+), 118 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 3afd9ce..a06e705 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2042,6 +2042,146 @@ static int wp_page_reuse(struct mm_struct *mm, struct vm_area_struct *vma,
 }
 
 /*
+ * Handle the case of a page which we actually need to copy to a new page.
+ *
+ * Called with mmap_sem locked and the old page referenced, but
+ * without the ptl held.
+ *
+ * High level logic flow:
+ *
+ * - Allocate a page, copy the content of the old page to the new one.
+ * - Handle book keeping and accounting - cgroups, mmu-notifiers, etc.
+ * - Take the PTL. If the pte changed, bail out and release the allocated page
+ * - If the pte is still the way we remember it, update the page table and all
+ *   relevant references. This includes dropping the reference the page-table
+ *   held to the old page, as well as updating the rmap.
+ * - In any case, unlock the PTL and drop the reference we took to the old page.
+ */
+static int wp_page_copy(struct mm_struct *mm, struct vm_area_struct *vma,
+			unsigned long address, pte_t *page_table, pmd_t *pmd,
+			pte_t orig_pte, struct page *old_page)
+{
+	struct page *new_page = NULL;
+	spinlock_t *ptl = NULL;
+	pte_t entry;
+	int page_copied = 0;
+	const unsigned long mmun_start = address & PAGE_MASK;	/* For mmu_notifiers */
+	const unsigned long mmun_end = mmun_start + PAGE_SIZE;	/* For mmu_notifiers */
+	struct mem_cgroup *memcg;
+
+	if (unlikely(anon_vma_prepare(vma)))
+		goto oom;
+
+	if (is_zero_pfn(pte_pfn(orig_pte))) {
+		new_page = alloc_zeroed_user_highpage_movable(vma, address);
+		if (!new_page)
+			goto oom;
+	} else {
+		new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address);
+		if (!new_page)
+			goto oom;
+		cow_user_page(new_page, old_page, address, vma);
+	}
+	__SetPageUptodate(new_page);
+
+	if (mem_cgroup_try_charge(new_page, mm, GFP_KERNEL, &memcg))
+		goto oom_free_new;
+
+	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
+
+	/*
+	 * Re-check the pte - we dropped the lock
+	 */
+	page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
+	if (likely(pte_same(*page_table, orig_pte))) {
+		if (old_page) {
+			if (!PageAnon(old_page)) {
+				dec_mm_counter_fast(mm, MM_FILEPAGES);
+				inc_mm_counter_fast(mm, MM_ANONPAGES);
+			}
+		} else {
+			inc_mm_counter_fast(mm, MM_ANONPAGES);
+		}
+		flush_cache_page(vma, address, pte_pfn(orig_pte));
+		entry = mk_pte(new_page, vma->vm_page_prot);
+		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+		/*
+		 * Clear the pte entry and flush it first, before updating the
+		 * pte with the new entry. This will avoid a race condition
+		 * seen in the presence of one thread doing SMC and another
+		 * thread doing COW.
+		 */
+		ptep_clear_flush_notify(vma, address, page_table);
+		page_add_new_anon_rmap(new_page, vma, address);
+		mem_cgroup_commit_charge(new_page, memcg, false);
+		lru_cache_add_active_or_unevictable(new_page, vma);
+		/*
+		 * We call the notify macro here because, when using secondary
+		 * mmu page tables (such as kvm shadow page tables), we want the
+		 * new page to be mapped directly into the secondary page table.
+		 */
+		set_pte_at_notify(mm, address, page_table, entry);
+		update_mmu_cache(vma, address, page_table);
+		if (old_page) {
+			/*
+			 * Only after switching the pte to the new page may
+			 * we remove the mapcount here. Otherwise another
+			 * process may come and find the rmap count decremented
+			 * before the pte is switched to the new page, and
+			 * "reuse" the old page writing into it while our pte
+			 * here still points into it and can be read by other
+			 * threads.
+			 *
+			 * The critical issue is to order this
+			 * page_remove_rmap with the ptp_clear_flush above.
+			 * Those stores are ordered by (if nothing else,)
+			 * the barrier present in the atomic_add_negative
+			 * in page_remove_rmap.
+			 *
+			 * Then the TLB flush in ptep_clear_flush ensures that
+			 * no process can access the old page before the
+			 * decremented mapcount is visible. And the old page
+			 * cannot be reused until after the decremented
+			 * mapcount is visible. So transitively, TLBs to
+			 * old page will be flushed before it can be reused.
+			 */
+			page_remove_rmap(old_page);
+		}
+
+		/* Free the old page.. */
+		new_page = old_page;
+		page_copied = 1;
+	} else {
+		mem_cgroup_cancel_charge(new_page, memcg);
+	}
+
+	if (new_page)
+		page_cache_release(new_page);
+
+	pte_unmap_unlock(page_table, ptl);
+	mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
+	if (old_page) {
+		/*
+		 * Don't let another task, with possibly unlocked vma,
+		 * keep the mlocked page.
+		 */
+		if (page_copied && (vma->vm_flags & VM_LOCKED)) {
+			lock_page(old_page);	/* LRU manipulation */
+			munlock_vma_page(old_page);
+			unlock_page(old_page);
+		}
+		page_cache_release(old_page);
+	}
+	return page_copied ? VM_FAULT_WRITE : 0;
+oom_free_new:
+	page_cache_release(new_page);
+oom:
+	if (old_page)
+		page_cache_release(old_page);
+	return VM_FAULT_OOM;
+}
+
+/*
  * This routine handles present pages, when users try to write
  * to a shared page. It is done by copying the page to a new address
  * and decrementing the shared-page counter for the old page.
@@ -2064,12 +2204,7 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		spinlock_t *ptl, pte_t orig_pte)
 	__releases(ptl)
 {
-	struct page *old_page, *new_page = NULL;
-	pte_t entry;
-	int page_copied = 0;
-	unsigned long mmun_start = 0;	/* For mmu_notifiers */
-	unsigned long mmun_end = 0;	/* For mmu_notifiers */
-	struct mem_cgroup *memcg;
+	struct page *old_page;
 
 	old_page = vm_normal_page(vma, address, orig_pte);
 	if (!old_page) {
@@ -2085,7 +2220,10 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 				     (VM_WRITE|VM_SHARED))
 			return wp_page_reuse(mm, vma, address, page_table, ptl,
 					     orig_pte, old_page, 0, 0);
-		goto gotten;
+
+		pte_unmap_unlock(page_table, ptl);
+		return wp_page_copy(mm, vma, address, page_table, pmd,
+				    orig_pte, old_page);
 	}
 
 	/*
@@ -2165,119 +2303,10 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	 * Ok, we need to copy. Oh, well..
 	 */
 	page_cache_get(old_page);
-gotten:
-	pte_unmap_unlock(page_table, ptl);
-
-	if (unlikely(anon_vma_prepare(vma)))
-		goto oom;
-
-	if (is_zero_pfn(pte_pfn(orig_pte))) {
-		new_page = alloc_zeroed_user_highpage_movable(vma, address);
-		if (!new_page)
-			goto oom;
-	} else {
-		new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address);
-		if (!new_page)
-			goto oom;
-		cow_user_page(new_page, old_page, address, vma);
-	}
-	__SetPageUptodate(new_page);
-
-	if (mem_cgroup_try_charge(new_page, mm, GFP_KERNEL, &memcg))
-		goto oom_free_new;
-
-	mmun_start  = address & PAGE_MASK;
-	mmun_end    = mmun_start + PAGE_SIZE;
-	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
-
-	/*
-	 * Re-check the pte - we dropped the lock
-	 */
-	page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
-	if (likely(pte_same(*page_table, orig_pte))) {
-		if (old_page) {
-			if (!PageAnon(old_page)) {
-				dec_mm_counter_fast(mm, MM_FILEPAGES);
-				inc_mm_counter_fast(mm, MM_ANONPAGES);
-			}
-		} else
-			inc_mm_counter_fast(mm, MM_ANONPAGES);
-		flush_cache_page(vma, address, pte_pfn(orig_pte));
-		entry = mk_pte(new_page, vma->vm_page_prot);
-		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
-		/*
-		 * Clear the pte entry and flush it first, before updating the
-		 * pte with the new entry. This will avoid a race condition
-		 * seen in the presence of one thread doing SMC and another
-		 * thread doing COW.
-		 */
-		ptep_clear_flush_notify(vma, address, page_table);
-		page_add_new_anon_rmap(new_page, vma, address);
-		mem_cgroup_commit_charge(new_page, memcg, false);
-		lru_cache_add_active_or_unevictable(new_page, vma);
-		/*
-		 * We call the notify macro here because, when using secondary
-		 * mmu page tables (such as kvm shadow page tables), we want the
-		 * new page to be mapped directly into the secondary page table.
-		 */
-		set_pte_at_notify(mm, address, page_table, entry);
-		update_mmu_cache(vma, address, page_table);
-		if (old_page) {
-			/*
-			 * Only after switching the pte to the new page may
-			 * we remove the mapcount here. Otherwise another
-			 * process may come and find the rmap count decremented
-			 * before the pte is switched to the new page, and
-			 * "reuse" the old page writing into it while our pte
-			 * here still points into it and can be read by other
-			 * threads.
-			 *
-			 * The critical issue is to order this
-			 * page_remove_rmap with the ptp_clear_flush above.
-			 * Those stores are ordered by (if nothing else,)
-			 * the barrier present in the atomic_add_negative
-			 * in page_remove_rmap.
-			 *
-			 * Then the TLB flush in ptep_clear_flush ensures that
-			 * no process can access the old page before the
-			 * decremented mapcount is visible. And the old page
-			 * cannot be reused until after the decremented
-			 * mapcount is visible. So transitively, TLBs to
-			 * old page will be flushed before it can be reused.
-			 */
-			page_remove_rmap(old_page);
-		}
-
-		/* Free the old page.. */
-		new_page = old_page;
-		page_copied = 1;
-	} else
-		mem_cgroup_cancel_charge(new_page, memcg);
-
-	if (new_page)
-		page_cache_release(new_page);
 
 	pte_unmap_unlock(page_table, ptl);
-	mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
-	if (old_page) {
-		/*
-		 * Don't let another task, with possibly unlocked vma,
-		 * keep the mlocked page.
-		 */
-		if (page_copied && (vma->vm_flags & VM_LOCKED)) {
-			lock_page(old_page);	/* LRU manipulation */
-			munlock_vma_page(old_page);
-			unlock_page(old_page);
-		}
-		page_cache_release(old_page);
-	}
-	return page_copied ? VM_FAULT_WRITE : 0;
-oom_free_new:
-	page_cache_release(new_page);
-oom:
-	if (old_page)
-		page_cache_release(old_page);
-	return VM_FAULT_OOM;
+	return wp_page_copy(mm, vma, address, page_table, pmd,
+			    orig_pte, old_page);
 }
 
 static void unmap_mapping_range_vma(struct vm_area_struct *vma,
-- 
1.7.11.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V5 3/4] mm: refactor do_wp_page, extract the page copy flow
  2015-02-22 13:42 ` [PATCH V5 3/4] mm: refactor do_wp_page, extract the page copy flow Shachar Raindel
@ 2015-02-24 13:56   ` Michal Hocko
  0 siblings, 0 replies; 11+ messages in thread
From: Michal Hocko @ 2015-02-24 13:56 UTC (permalink / raw)
  To: Shachar Raindel
  Cc: linux-mm, kirill.shutemov, mgorman, riel, ak, matthew.r.wilcox,
	dave.hansen, n-horiguchi, akpm, torvalds, haggaie, aarcange,
	pfeiner, hannes, sagig, walken, Dave Hansen

On Sun 22-02-15 15:42:17, Shachar Raindel wrote:
> In some cases, do_wp_page had to copy the page suffering a write fault
> to a new location. If the function logic decided that to do this, it
> was done by jumping with a "goto" operation to the relevant code
> block. This made the code really hard to understand. It is also
> against the kernel coding style guidelines.
> 
> This patch extracts the page copy and page table update logic to a
> separate function. It also clean up the naming, from "gotten" to
> "wp_page_copy", and adds few comments.
> 
> Signed-off-by: Shachar Raindel <raindel@mellanox.com>
> Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Acked-by: Rik van Riel <riel@redhat.com>
> Acked-by: Andi Kleen <ak@linux.intel.com>
> Acked-by: Haggai Eran <haggaie@mellanox.com>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Peter Feiner <pfeiner@google.com>
> Cc: Michel Lespinasse <walken@google.com>

Nice cleanup as well!
Reviewed-by: Michal Hocko <mhocko@suse.cz>

> ---
>  mm/memory.c | 265 +++++++++++++++++++++++++++++++++---------------------------
>  1 file changed, 147 insertions(+), 118 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 3afd9ce..a06e705 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2042,6 +2042,146 @@ static int wp_page_reuse(struct mm_struct *mm, struct vm_area_struct *vma,
>  }
>  
>  /*
> + * Handle the case of a page which we actually need to copy to a new page.
> + *
> + * Called with mmap_sem locked and the old page referenced, but
> + * without the ptl held.
> + *
> + * High level logic flow:
> + *
> + * - Allocate a page, copy the content of the old page to the new one.
> + * - Handle book keeping and accounting - cgroups, mmu-notifiers, etc.
> + * - Take the PTL. If the pte changed, bail out and release the allocated page
> + * - If the pte is still the way we remember it, update the page table and all
> + *   relevant references. This includes dropping the reference the page-table
> + *   held to the old page, as well as updating the rmap.
> + * - In any case, unlock the PTL and drop the reference we took to the old page.
> + */
> +static int wp_page_copy(struct mm_struct *mm, struct vm_area_struct *vma,
> +			unsigned long address, pte_t *page_table, pmd_t *pmd,
> +			pte_t orig_pte, struct page *old_page)
> +{
> +	struct page *new_page = NULL;
> +	spinlock_t *ptl = NULL;
> +	pte_t entry;
> +	int page_copied = 0;
> +	const unsigned long mmun_start = address & PAGE_MASK;	/* For mmu_notifiers */
> +	const unsigned long mmun_end = mmun_start + PAGE_SIZE;	/* For mmu_notifiers */
> +	struct mem_cgroup *memcg;
> +
> +	if (unlikely(anon_vma_prepare(vma)))
> +		goto oom;
> +
> +	if (is_zero_pfn(pte_pfn(orig_pte))) {
> +		new_page = alloc_zeroed_user_highpage_movable(vma, address);
> +		if (!new_page)
> +			goto oom;
> +	} else {
> +		new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address);
> +		if (!new_page)
> +			goto oom;
> +		cow_user_page(new_page, old_page, address, vma);
> +	}
> +	__SetPageUptodate(new_page);
> +
> +	if (mem_cgroup_try_charge(new_page, mm, GFP_KERNEL, &memcg))
> +		goto oom_free_new;
> +
> +	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
> +
> +	/*
> +	 * Re-check the pte - we dropped the lock
> +	 */
> +	page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
> +	if (likely(pte_same(*page_table, orig_pte))) {
> +		if (old_page) {
> +			if (!PageAnon(old_page)) {
> +				dec_mm_counter_fast(mm, MM_FILEPAGES);
> +				inc_mm_counter_fast(mm, MM_ANONPAGES);
> +			}
> +		} else {
> +			inc_mm_counter_fast(mm, MM_ANONPAGES);
> +		}
> +		flush_cache_page(vma, address, pte_pfn(orig_pte));
> +		entry = mk_pte(new_page, vma->vm_page_prot);
> +		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
> +		/*
> +		 * Clear the pte entry and flush it first, before updating the
> +		 * pte with the new entry. This will avoid a race condition
> +		 * seen in the presence of one thread doing SMC and another
> +		 * thread doing COW.
> +		 */
> +		ptep_clear_flush_notify(vma, address, page_table);
> +		page_add_new_anon_rmap(new_page, vma, address);
> +		mem_cgroup_commit_charge(new_page, memcg, false);
> +		lru_cache_add_active_or_unevictable(new_page, vma);
> +		/*
> +		 * We call the notify macro here because, when using secondary
> +		 * mmu page tables (such as kvm shadow page tables), we want the
> +		 * new page to be mapped directly into the secondary page table.
> +		 */
> +		set_pte_at_notify(mm, address, page_table, entry);
> +		update_mmu_cache(vma, address, page_table);
> +		if (old_page) {
> +			/*
> +			 * Only after switching the pte to the new page may
> +			 * we remove the mapcount here. Otherwise another
> +			 * process may come and find the rmap count decremented
> +			 * before the pte is switched to the new page, and
> +			 * "reuse" the old page writing into it while our pte
> +			 * here still points into it and can be read by other
> +			 * threads.
> +			 *
> +			 * The critical issue is to order this
> +			 * page_remove_rmap with the ptp_clear_flush above.
> +			 * Those stores are ordered by (if nothing else,)
> +			 * the barrier present in the atomic_add_negative
> +			 * in page_remove_rmap.
> +			 *
> +			 * Then the TLB flush in ptep_clear_flush ensures that
> +			 * no process can access the old page before the
> +			 * decremented mapcount is visible. And the old page
> +			 * cannot be reused until after the decremented
> +			 * mapcount is visible. So transitively, TLBs to
> +			 * old page will be flushed before it can be reused.
> +			 */
> +			page_remove_rmap(old_page);
> +		}
> +
> +		/* Free the old page.. */
> +		new_page = old_page;
> +		page_copied = 1;
> +	} else {
> +		mem_cgroup_cancel_charge(new_page, memcg);
> +	}
> +
> +	if (new_page)
> +		page_cache_release(new_page);
> +
> +	pte_unmap_unlock(page_table, ptl);
> +	mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
> +	if (old_page) {
> +		/*
> +		 * Don't let another task, with possibly unlocked vma,
> +		 * keep the mlocked page.
> +		 */
> +		if (page_copied && (vma->vm_flags & VM_LOCKED)) {
> +			lock_page(old_page);	/* LRU manipulation */
> +			munlock_vma_page(old_page);
> +			unlock_page(old_page);
> +		}
> +		page_cache_release(old_page);
> +	}
> +	return page_copied ? VM_FAULT_WRITE : 0;
> +oom_free_new:
> +	page_cache_release(new_page);
> +oom:
> +	if (old_page)
> +		page_cache_release(old_page);
> +	return VM_FAULT_OOM;
> +}
> +
> +/*
>   * This routine handles present pages, when users try to write
>   * to a shared page. It is done by copying the page to a new address
>   * and decrementing the shared-page counter for the old page.
> @@ -2064,12 +2204,7 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  		spinlock_t *ptl, pte_t orig_pte)
>  	__releases(ptl)
>  {
> -	struct page *old_page, *new_page = NULL;
> -	pte_t entry;
> -	int page_copied = 0;
> -	unsigned long mmun_start = 0;	/* For mmu_notifiers */
> -	unsigned long mmun_end = 0;	/* For mmu_notifiers */
> -	struct mem_cgroup *memcg;
> +	struct page *old_page;
>  
>  	old_page = vm_normal_page(vma, address, orig_pte);
>  	if (!old_page) {
> @@ -2085,7 +2220,10 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  				     (VM_WRITE|VM_SHARED))
>  			return wp_page_reuse(mm, vma, address, page_table, ptl,
>  					     orig_pte, old_page, 0, 0);
> -		goto gotten;
> +
> +		pte_unmap_unlock(page_table, ptl);
> +		return wp_page_copy(mm, vma, address, page_table, pmd,
> +				    orig_pte, old_page);
>  	}
>  
>  	/*
> @@ -2165,119 +2303,10 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  	 * Ok, we need to copy. Oh, well..
>  	 */
>  	page_cache_get(old_page);
> -gotten:
> -	pte_unmap_unlock(page_table, ptl);
> -
> -	if (unlikely(anon_vma_prepare(vma)))
> -		goto oom;
> -
> -	if (is_zero_pfn(pte_pfn(orig_pte))) {
> -		new_page = alloc_zeroed_user_highpage_movable(vma, address);
> -		if (!new_page)
> -			goto oom;
> -	} else {
> -		new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address);
> -		if (!new_page)
> -			goto oom;
> -		cow_user_page(new_page, old_page, address, vma);
> -	}
> -	__SetPageUptodate(new_page);
> -
> -	if (mem_cgroup_try_charge(new_page, mm, GFP_KERNEL, &memcg))
> -		goto oom_free_new;
> -
> -	mmun_start  = address & PAGE_MASK;
> -	mmun_end    = mmun_start + PAGE_SIZE;
> -	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
> -
> -	/*
> -	 * Re-check the pte - we dropped the lock
> -	 */
> -	page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
> -	if (likely(pte_same(*page_table, orig_pte))) {
> -		if (old_page) {
> -			if (!PageAnon(old_page)) {
> -				dec_mm_counter_fast(mm, MM_FILEPAGES);
> -				inc_mm_counter_fast(mm, MM_ANONPAGES);
> -			}
> -		} else
> -			inc_mm_counter_fast(mm, MM_ANONPAGES);
> -		flush_cache_page(vma, address, pte_pfn(orig_pte));
> -		entry = mk_pte(new_page, vma->vm_page_prot);
> -		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
> -		/*
> -		 * Clear the pte entry and flush it first, before updating the
> -		 * pte with the new entry. This will avoid a race condition
> -		 * seen in the presence of one thread doing SMC and another
> -		 * thread doing COW.
> -		 */
> -		ptep_clear_flush_notify(vma, address, page_table);
> -		page_add_new_anon_rmap(new_page, vma, address);
> -		mem_cgroup_commit_charge(new_page, memcg, false);
> -		lru_cache_add_active_or_unevictable(new_page, vma);
> -		/*
> -		 * We call the notify macro here because, when using secondary
> -		 * mmu page tables (such as kvm shadow page tables), we want the
> -		 * new page to be mapped directly into the secondary page table.
> -		 */
> -		set_pte_at_notify(mm, address, page_table, entry);
> -		update_mmu_cache(vma, address, page_table);
> -		if (old_page) {
> -			/*
> -			 * Only after switching the pte to the new page may
> -			 * we remove the mapcount here. Otherwise another
> -			 * process may come and find the rmap count decremented
> -			 * before the pte is switched to the new page, and
> -			 * "reuse" the old page writing into it while our pte
> -			 * here still points into it and can be read by other
> -			 * threads.
> -			 *
> -			 * The critical issue is to order this
> -			 * page_remove_rmap with the ptp_clear_flush above.
> -			 * Those stores are ordered by (if nothing else,)
> -			 * the barrier present in the atomic_add_negative
> -			 * in page_remove_rmap.
> -			 *
> -			 * Then the TLB flush in ptep_clear_flush ensures that
> -			 * no process can access the old page before the
> -			 * decremented mapcount is visible. And the old page
> -			 * cannot be reused until after the decremented
> -			 * mapcount is visible. So transitively, TLBs to
> -			 * old page will be flushed before it can be reused.
> -			 */
> -			page_remove_rmap(old_page);
> -		}
> -
> -		/* Free the old page.. */
> -		new_page = old_page;
> -		page_copied = 1;
> -	} else
> -		mem_cgroup_cancel_charge(new_page, memcg);
> -
> -	if (new_page)
> -		page_cache_release(new_page);
>  
>  	pte_unmap_unlock(page_table, ptl);
> -	mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
> -	if (old_page) {
> -		/*
> -		 * Don't let another task, with possibly unlocked vma,
> -		 * keep the mlocked page.
> -		 */
> -		if (page_copied && (vma->vm_flags & VM_LOCKED)) {
> -			lock_page(old_page);	/* LRU manipulation */
> -			munlock_vma_page(old_page);
> -			unlock_page(old_page);
> -		}
> -		page_cache_release(old_page);
> -	}
> -	return page_copied ? VM_FAULT_WRITE : 0;
> -oom_free_new:
> -	page_cache_release(new_page);
> -oom:
> -	if (old_page)
> -		page_cache_release(old_page);
> -	return VM_FAULT_OOM;
> +	return wp_page_copy(mm, vma, address, page_table, pmd,
> +			    orig_pte, old_page);
>  }
>  
>  static void unmap_mapping_range_vma(struct vm_area_struct *vma,
> -- 
> 1.7.11.2
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH V5 4/4] mm: Refactor do_wp_page handling of shared vma into a function
  2015-02-22 13:42 [PATCH V5 0/4] Refactor do_wp_page, no functional change Shachar Raindel
                   ` (2 preceding siblings ...)
  2015-02-22 13:42 ` [PATCH V5 3/4] mm: refactor do_wp_page, extract the page copy flow Shachar Raindel
@ 2015-02-22 13:42 ` Shachar Raindel
  2015-02-24 13:59   ` Michal Hocko
  2015-02-24  0:20 ` [PATCH V5 0/4] Refactor do_wp_page, no functional change Andrew Morton
  4 siblings, 1 reply; 11+ messages in thread
From: Shachar Raindel @ 2015-02-22 13:42 UTC (permalink / raw)
  To: linux-mm
  Cc: kirill.shutemov, mgorman, riel, ak, matthew.r.wilcox,
	dave.hansen, n-horiguchi, akpm, torvalds, haggaie, aarcange,
	pfeiner, hannes, sagig, walken, raindel, Dave Hansen

The do_wp_page function is extremely long. Extract the logic for
handling a page belonging to a shared vma into a function of its own.

This helps the readability of the code, without doing any functional
change in it.

Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Haggai Eran <haggaie@mellanox.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: Michel Lespinasse <walken@google.com>
---
 mm/memory.c | 86 ++++++++++++++++++++++++++++++++++---------------------------
 1 file changed, 48 insertions(+), 38 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index a06e705..b246d22 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2181,6 +2181,52 @@ oom:
 	return VM_FAULT_OOM;
 }
 
+static int wp_page_shared(struct mm_struct *mm, struct vm_area_struct *vma,
+			  unsigned long address, pte_t *page_table,
+			  pmd_t *pmd, spinlock_t *ptl, pte_t orig_pte,
+			  struct page *old_page)
+	__releases(ptl)
+{
+	int page_mkwrite = 0;
+
+	page_cache_get(old_page);
+
+	/*
+	 * Only catch write-faults on shared writable pages,
+	 * read-only shared pages can get COWed by
+	 * get_user_pages(.write=1, .force=1).
+	 */
+	if (vma->vm_ops && vma->vm_ops->page_mkwrite) {
+		int tmp;
+
+		pte_unmap_unlock(page_table, ptl);
+		tmp = do_page_mkwrite(vma, old_page, address);
+		if (unlikely(!tmp || (tmp &
+				      (VM_FAULT_ERROR | VM_FAULT_NOPAGE)))) {
+			page_cache_release(old_page);
+			return tmp;
+		}
+		/*
+		 * Since we dropped the lock we need to revalidate
+		 * the PTE as someone else may have changed it.  If
+		 * they did, we just return, as we can count on the
+		 * MMU to tell us if they didn't also make it writable.
+		 */
+		page_table = pte_offset_map_lock(mm, pmd, address,
+						 &ptl);
+		if (!pte_same(*page_table, orig_pte)) {
+			unlock_page(old_page);
+			pte_unmap_unlock(page_table, ptl);
+			page_cache_release(old_page);
+			return 0;
+		}
+		page_mkwrite = 1;
+	}
+
+	return wp_page_reuse(mm, vma, address, page_table, ptl,
+			     orig_pte, old_page, page_mkwrite, 1);
+}
+
 /*
  * This routine handles present pages, when users try to write
  * to a shared page. It is done by copying the page to a new address
@@ -2259,44 +2305,8 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		unlock_page(old_page);
 	} else if (unlikely((vma->vm_flags & (VM_WRITE|VM_SHARED)) ==
 					(VM_WRITE|VM_SHARED))) {
-		int page_mkwrite = 0;
-
-		page_cache_get(old_page);
-
-		/*
-		 * Only catch write-faults on shared writable pages,
-		 * read-only shared pages can get COWed by
-		 * get_user_pages(.write=1, .force=1).
-		 */
-		if (vma->vm_ops && vma->vm_ops->page_mkwrite) {
-			int tmp;
-
-			pte_unmap_unlock(page_table, ptl);
-			tmp = do_page_mkwrite(vma, old_page, address);
-			if (unlikely(!tmp || (tmp &
-					(VM_FAULT_ERROR | VM_FAULT_NOPAGE)))) {
-				page_cache_release(old_page);
-				return tmp;
-			}
-			/*
-			 * Since we dropped the lock we need to revalidate
-			 * the PTE as someone else may have changed it.  If
-			 * they did, we just return, as we can count on the
-			 * MMU to tell us if they didn't also make it writable.
-			 */
-			page_table = pte_offset_map_lock(mm, pmd, address,
-							 &ptl);
-			if (!pte_same(*page_table, orig_pte)) {
-				unlock_page(old_page);
-				pte_unmap_unlock(page_table, ptl);
-				page_cache_release(old_page);
-				return 0;
-			}
-			page_mkwrite = 1;
-		}
-
-		return wp_page_reuse(mm, vma, address, page_table, ptl,
-				     orig_pte, old_page, page_mkwrite, 1);
+		return wp_page_shared(mm, vma, address, page_table, pmd,
+				      ptl, orig_pte, old_page);
 	}
 
 	/*
-- 
1.7.11.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V5 4/4] mm: Refactor do_wp_page handling of shared vma into a function
  2015-02-22 13:42 ` [PATCH V5 4/4] mm: Refactor do_wp_page handling of shared vma into a function Shachar Raindel
@ 2015-02-24 13:59   ` Michal Hocko
  0 siblings, 0 replies; 11+ messages in thread
From: Michal Hocko @ 2015-02-24 13:59 UTC (permalink / raw)
  To: Shachar Raindel
  Cc: linux-mm, kirill.shutemov, mgorman, riel, ak, matthew.r.wilcox,
	dave.hansen, n-horiguchi, akpm, torvalds, haggaie, aarcange,
	pfeiner, hannes, sagig, walken, Dave Hansen

On Sun 22-02-15 15:42:18, Shachar Raindel wrote:
> The do_wp_page function is extremely long. Extract the logic for
> handling a page belonging to a shared vma into a function of its own.
> 
> This helps the readability of the code, without doing any functional
> change in it.
> 
> Signed-off-by: Shachar Raindel <raindel@mellanox.com>
> Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Acked-by: Rik van Riel <riel@redhat.com>
> Acked-by: Andi Kleen <ak@linux.intel.com>
> Acked-by: Haggai Eran <haggaie@mellanox.com>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Peter Feiner <pfeiner@google.com>
> Cc: Michel Lespinasse <walken@google.com>

Reviewed-by: Michal Hocko <mhocko@suse.cz>

> ---
>  mm/memory.c | 86 ++++++++++++++++++++++++++++++++++---------------------------
>  1 file changed, 48 insertions(+), 38 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index a06e705..b246d22 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2181,6 +2181,52 @@ oom:
>  	return VM_FAULT_OOM;
>  }
>  
> +static int wp_page_shared(struct mm_struct *mm, struct vm_area_struct *vma,
> +			  unsigned long address, pte_t *page_table,
> +			  pmd_t *pmd, spinlock_t *ptl, pte_t orig_pte,
> +			  struct page *old_page)
> +	__releases(ptl)
> +{
> +	int page_mkwrite = 0;
> +
> +	page_cache_get(old_page);
> +
> +	/*
> +	 * Only catch write-faults on shared writable pages,
> +	 * read-only shared pages can get COWed by
> +	 * get_user_pages(.write=1, .force=1).
> +	 */
> +	if (vma->vm_ops && vma->vm_ops->page_mkwrite) {
> +		int tmp;
> +
> +		pte_unmap_unlock(page_table, ptl);
> +		tmp = do_page_mkwrite(vma, old_page, address);
> +		if (unlikely(!tmp || (tmp &
> +				      (VM_FAULT_ERROR | VM_FAULT_NOPAGE)))) {
> +			page_cache_release(old_page);
> +			return tmp;
> +		}
> +		/*
> +		 * Since we dropped the lock we need to revalidate
> +		 * the PTE as someone else may have changed it.  If
> +		 * they did, we just return, as we can count on the
> +		 * MMU to tell us if they didn't also make it writable.
> +		 */
> +		page_table = pte_offset_map_lock(mm, pmd, address,
> +						 &ptl);
> +		if (!pte_same(*page_table, orig_pte)) {
> +			unlock_page(old_page);
> +			pte_unmap_unlock(page_table, ptl);
> +			page_cache_release(old_page);
> +			return 0;
> +		}
> +		page_mkwrite = 1;
> +	}
> +
> +	return wp_page_reuse(mm, vma, address, page_table, ptl,
> +			     orig_pte, old_page, page_mkwrite, 1);
> +}
> +
>  /*
>   * This routine handles present pages, when users try to write
>   * to a shared page. It is done by copying the page to a new address
> @@ -2259,44 +2305,8 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  		unlock_page(old_page);
>  	} else if (unlikely((vma->vm_flags & (VM_WRITE|VM_SHARED)) ==
>  					(VM_WRITE|VM_SHARED))) {
> -		int page_mkwrite = 0;
> -
> -		page_cache_get(old_page);
> -
> -		/*
> -		 * Only catch write-faults on shared writable pages,
> -		 * read-only shared pages can get COWed by
> -		 * get_user_pages(.write=1, .force=1).
> -		 */
> -		if (vma->vm_ops && vma->vm_ops->page_mkwrite) {
> -			int tmp;
> -
> -			pte_unmap_unlock(page_table, ptl);
> -			tmp = do_page_mkwrite(vma, old_page, address);
> -			if (unlikely(!tmp || (tmp &
> -					(VM_FAULT_ERROR | VM_FAULT_NOPAGE)))) {
> -				page_cache_release(old_page);
> -				return tmp;
> -			}
> -			/*
> -			 * Since we dropped the lock we need to revalidate
> -			 * the PTE as someone else may have changed it.  If
> -			 * they did, we just return, as we can count on the
> -			 * MMU to tell us if they didn't also make it writable.
> -			 */
> -			page_table = pte_offset_map_lock(mm, pmd, address,
> -							 &ptl);
> -			if (!pte_same(*page_table, orig_pte)) {
> -				unlock_page(old_page);
> -				pte_unmap_unlock(page_table, ptl);
> -				page_cache_release(old_page);
> -				return 0;
> -			}
> -			page_mkwrite = 1;
> -		}
> -
> -		return wp_page_reuse(mm, vma, address, page_table, ptl,
> -				     orig_pte, old_page, page_mkwrite, 1);
> +		return wp_page_shared(mm, vma, address, page_table, pmd,
> +				      ptl, orig_pte, old_page);
>  	}
>  
>  	/*
> -- 
> 1.7.11.2
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V5 0/4] Refactor do_wp_page, no functional change
  2015-02-22 13:42 [PATCH V5 0/4] Refactor do_wp_page, no functional change Shachar Raindel
                   ` (3 preceding siblings ...)
  2015-02-22 13:42 ` [PATCH V5 4/4] mm: Refactor do_wp_page handling of shared vma into a function Shachar Raindel
@ 2015-02-24  0:20 ` Andrew Morton
  2015-02-24  8:36   ` Shachar Raindel
  4 siblings, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2015-02-24  0:20 UTC (permalink / raw)
  To: Shachar Raindel
  Cc: linux-mm, kirill.shutemov, mgorman, riel, ak, matthew.r.wilcox,
	dave.hansen, n-horiguchi, torvalds, haggaie, aarcange, pfeiner,
	hannes, sagig, walken

On Sun, 22 Feb 2015 15:42:14 +0200 Shachar Raindel <raindel@mellanox.com> wrote:

> Currently do_wp_page contains 265 code lines. It also contains 9 goto
> statements, of which 5 are targeting labels which are not cleanup
> related. This makes the function extremely difficult to
> understand. The following patches are an attempt at breaking the
> function to its basic components, and making it easier to understand.
> 
> The patches are straight forward function extractions from
> do_wp_page. As we extract functions, we remove unneeded parameters and
> simplify the code as much as possible. However, the functionality is
> supposed to remain completely unchanged. The patches also attempt to
> document the functionality of each extracted function. In patch 2, we
> split the unlock logic to the contain logic relevant to specific needs
> of each use case, instead of having huge number of conditional
> decisions in a single unlock flow.

gcc-4.4.4:

   text    data     bss     dec     hex filename
  40898     186   13344   54428    d49c mm/memory.o-before
  41422     186   13456   55064    d718 mm/memory.o-after

gcc-4.8.2:

   text    data     bss     dec     hex filename
  35261   12118   13904   61283    ef63 mm/memory.o
  35646   12278   14032   61956    f204 mm/memory.o

The more recent compiler is more interesting but either way, that's a
somewhat disappointing increase in code size for refactoring of a
single function.

I had a brief poke around and couldn't find any obvious improvements
to make.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH V5 0/4] Refactor do_wp_page, no functional change
  2015-02-24  0:20 ` [PATCH V5 0/4] Refactor do_wp_page, no functional change Andrew Morton
@ 2015-02-24  8:36   ` Shachar Raindel
  0 siblings, 0 replies; 11+ messages in thread
From: Shachar Raindel @ 2015-02-24  8:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, kirill.shutemov, mgorman, riel, ak, matthew.r.wilcox,
	dave.hansen, n-horiguchi, torvalds, Haggai Eran, aarcange,
	pfeiner, hannes, Sagi Grimberg, walken



> -----Original Message-----
> From: Andrew Morton [mailto:akpm@linux-foundation.org]
> Sent: Tuesday, February 24, 2015 2:20 AM
> To: Shachar Raindel
> Cc: linux-mm@kvack.org; kirill.shutemov@linux.intel.com;
> mgorman@suse.de; riel@redhat.com; ak@linux.intel.com;
> matthew.r.wilcox@intel.com; dave.hansen@linux.intel.com; n-
> horiguchi@ah.jp.nec.com; torvalds@linux-foundation.org; Haggai Eran;
> aarcange@redhat.com; pfeiner@google.com; hannes@cmpxchg.org; Sagi
> Grimberg; walken@google.com
> Subject: Re: [PATCH V5 0/4] Refactor do_wp_page, no functional change
> 
> On Sun, 22 Feb 2015 15:42:14 +0200 Shachar Raindel
> <raindel@mellanox.com> wrote:
> 
> > Currently do_wp_page contains 265 code lines. It also contains 9 goto
> > statements, of which 5 are targeting labels which are not cleanup
> > related. This makes the function extremely difficult to
> > understand. The following patches are an attempt at breaking the
> > function to its basic components, and making it easier to understand.
> >
> > The patches are straight forward function extractions from
> > do_wp_page. As we extract functions, we remove unneeded parameters and
> > simplify the code as much as possible. However, the functionality is
> > supposed to remain completely unchanged. The patches also attempt to
> > document the functionality of each extracted function. In patch 2, we
> > split the unlock logic to the contain logic relevant to specific needs
> > of each use case, instead of having huge number of conditional
> > decisions in a single unlock flow.
> 
> gcc-4.4.4:
> 
>    text    data     bss     dec     hex filename
>   40898     186   13344   54428    d49c mm/memory.o-before
>   41422     186   13456   55064    d718 mm/memory.o-after
> 
> gcc-4.8.2:
> 
>    text    data     bss     dec     hex filename
>   35261   12118   13904   61283    ef63 mm/memory.o
>   35646   12278   14032   61956    f204 mm/memory.o
> 

My results (gcc version 4.8.2 20140120 (Red Hat 4.8.2-16)) differ:
   text	   data	    bss	    dec	    hex	filename
  29957	     70	     32	  30059	   756b	memory.o.next-20150219
  30061	     70	     32	  30163	   75d3	memory.o.next-20150219+1
  30093	     70	     32	  30195	   75f3	memory.o.next-20150219+2
  30165	     70	     32	  30267	   763b	memory.o.next-20150219+3
  30165	     70	     32	  30267	   763b	memory.o.next-20150219+4


> The more recent compiler is more interesting but either way, that's a
> somewhat disappointing increase in code size for refactoring of a
> single function.
> 

Seems like the majority of the size impact (104 bytes out of 208) originate
from the first patch - "mm: Refactor do_wp_page, extract the reuse case"

This is probably due to changing 3 gotos into returning a function call.
As gcc cannot do tail call optimization there, it is forced to create
3 new call sites there. Adding an inline to wp_page_reuse reduced the total
size to:
   text	   data	    bss	    dec	    hex	filename
  30109	     70	     32	  30211	   7603	memory.o

> I had a brief poke around and couldn't find any obvious improvements
> to make.

IMHO, 152 bytes in code size is a small price to pay for code that is
not spaghetti.

The patch to add the strategic inline which saves 56 bytes on my GCC:

diff --git a/mm/memory.c b/mm/memory.c
index b246d22..9025285 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1990,7 +1990,7 @@ static int do_page_mkwrite(struct vm_area_struct *vma, struct page *page,
  * case, all we need to do here is to mark the page as writable and update
  * any related book-keeping.
  */
-static int wp_page_reuse(struct mm_struct *mm, struct vm_area_struct *vma,
+static inline int wp_page_reuse(struct mm_struct *mm, struct vm_area_struct *vma,
                         unsigned long address, pte_t *page_table,
                         spinlock_t *ptl, pte_t orig_pte,
                         struct page *page, int page_mkwrite,



Thanks,
--Shachar

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-02-24 13:59 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-22 13:42 [PATCH V5 0/4] Refactor do_wp_page, no functional change Shachar Raindel
2015-02-22 13:42 ` [PATCH V5 1/4] mm: Refactor do_wp_page, extract the reuse case Shachar Raindel
2015-02-24 13:39   ` Michal Hocko
2015-02-22 13:42 ` [PATCH V5 2/4] mm: Refactor do_wp_page - rewrite the unlock flow Shachar Raindel
2015-02-24 13:45   ` Michal Hocko
2015-02-22 13:42 ` [PATCH V5 3/4] mm: refactor do_wp_page, extract the page copy flow Shachar Raindel
2015-02-24 13:56   ` Michal Hocko
2015-02-22 13:42 ` [PATCH V5 4/4] mm: Refactor do_wp_page handling of shared vma into a function Shachar Raindel
2015-02-24 13:59   ` Michal Hocko
2015-02-24  0:20 ` [PATCH V5 0/4] Refactor do_wp_page, no functional change Andrew Morton
2015-02-24  8:36   ` Shachar Raindel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox