linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 2/2] [BUGFIX] pagemap: fix pfn calculation for hugepage
@ 2010-03-19  6:26 Naoya Horiguchi
  2010-03-19  7:10 ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 9+ messages in thread
From: Naoya Horiguchi @ 2010-03-19  6:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, akpm, andi.kleen, fengguang.wu

When we look into pagemap using page-types with option -p, the value
of pfn for hugepages looks wrong (see below.)
This is because pte was evaluated only once for one vma
although it should be updated for each hugepage. This patch fixes it.

$ page-types -p 3277 -Nl -b huge
voffset   offset  len     flags
7f21e8a00 11e400  1       ___U___________H_G________________
7f21e8a01 11e401  1ff     ________________TG________________
7f21e8c00 11e400  1       ___U___________H_G________________
7f21e8c01 11e401  1ff     ________________TG________________
             ^^^
             should not be the same

With this patch applied:

$ page-types -p 3386 -Nl -b huge
voffset   offset   len    flags
7fec7a600 112c00   1      ___UD__________H_G________________
7fec7a601 112c01   1ff    ________________TG________________
7fec7a800 113200   1      ___UD__________H_G________________
7fec7a801 113201   1ff    ________________TG________________
             ^^^
             OK

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 fs/proc/task_mmu.c |   37 +++++++++++++++++++------------------
 include/linux/mm.h |    4 ++--
 mm/pagewalk.c      |   14 ++++----------
 3 files changed, 25 insertions(+), 30 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 2a3ef17..cc14479 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -662,31 +662,32 @@ static u64 huge_pte_to_pagemap_entry(pte_t pte, int offset)
 	return pme;
 }
 
-static int pagemap_hugetlb_range(pte_t *pte, unsigned long addr,
+/* This function walks only within @vma */
+static int pagemap_hugetlb_range(struct vm_area_struct *vma, unsigned long addr,
 				 unsigned long end, struct mm_walk *walk)
 {
-	struct vm_area_struct *vma;
+	struct mm_struct *mm = walk->mm;
 	struct pagemapread *pm = walk->private;
 	struct hstate *hs = NULL;
 	int err = 0;
-
-	vma = find_vma(walk->mm, addr);
-	if (vma)
-		hs = hstate_vma(vma);
+	pte_t *pte = NULL;
+
+	BUG_ON(!mm);
+	BUG_ON(!vma || !is_vm_hugetlb_page(vma));
+	BUG_ON(addr < vma->vm_start || addr >= vma->vm_end);
+	hs = hstate_vma(vma);
+	BUG_ON(!hs);
+	pte = huge_pte_offset(mm, addr);
+	if (!pte)
+		return err;
 	for (; addr != end; addr += PAGE_SIZE) {
 		u64 pfn = PM_NOT_PRESENT;
-
-		if (vma && (addr >= vma->vm_end)) {
-			vma = find_vma(walk->mm, addr);
-			if (vma)
-				hs = hstate_vma(vma);
-		}
-
-		if (vma && (vma->vm_start <= addr) && is_vm_hugetlb_page(vma)) {
-			/* calculate pfn of the "raw" page in the hugepage. */
-			int offset = (addr & ~huge_page_mask(hs)) >> PAGE_SHIFT;
-			pfn = huge_pte_to_pagemap_entry(*pte, offset);
-		}
+		/* calculate pfn of the "raw" page in the hugepage. */
+		int offset = (addr & ~huge_page_mask(hs)) >> PAGE_SHIFT;
+		/* next hugepage */
+		if (!offset)
+			pte = huge_pte_offset(mm, addr);
+		pfn = huge_pte_to_pagemap_entry(*pte, offset);
 		err = add_to_pagemap(addr, pfn, pm);
 		if (err)
 			return err;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 3899395..5faafc2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -783,8 +783,8 @@ struct mm_walk {
 	int (*pmd_entry)(pmd_t *, unsigned long, unsigned long, struct mm_walk *);
 	int (*pte_entry)(pte_t *, unsigned long, unsigned long, struct mm_walk *);
 	int (*pte_hole)(unsigned long, unsigned long, struct mm_walk *);
-	int (*hugetlb_entry)(pte_t *, unsigned long, unsigned long,
-			     struct mm_walk *);
+	int (*hugetlb_entry)(struct vm_area_struct *,
+			     unsigned long, unsigned long, struct mm_walk *);
 	struct mm_struct *mm;
 	void *private;
 };
diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index 7b47a57..3148dc5 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -128,20 +128,14 @@ int walk_page_range(unsigned long addr, unsigned long end,
 		vma = find_vma(walk->mm, addr);
 #ifdef CONFIG_HUGETLB_PAGE
 		if (vma && is_vm_hugetlb_page(vma)) {
-			pte_t *pte;
-			struct hstate *hs;
-
 			if (vma->vm_end < next)
 				next = vma->vm_end;
-			hs = hstate_vma(vma);
-			pte = huge_pte_offset(walk->mm,
-					      addr & huge_page_mask(hs));
-			if (pte && !huge_pte_none(huge_ptep_get(pte))
-			    && walk->hugetlb_entry)
-				err = walk->hugetlb_entry(pte, addr,
-							  next, walk);
+			if (walk->hugetlb_entry)
+				err = walk->hugetlb_entry(vma, addr, next,
+							  walk);
 			if (err)
 				break;
+			pgd = pgd_offset(walk->mm, next);
 			continue;
 		}
 #endif
-- 
1.7.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] [BUGFIX] pagemap: fix pfn calculation for hugepage
  2010-03-19  6:26 [PATCH 2/2] [BUGFIX] pagemap: fix pfn calculation for hugepage Naoya Horiguchi
@ 2010-03-19  7:10 ` KAMEZAWA Hiroyuki
  2010-03-19  7:27   ` KAMEZAWA Hiroyuki
                     ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-03-19  7:10 UTC (permalink / raw)
  To: Naoya Horiguchi; +Cc: linux-kernel, linux-mm, akpm, andi.kleen, fengguang.wu

On Fri, 19 Mar 2010 15:26:36 +0900
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:

> When we look into pagemap using page-types with option -p, the value
> of pfn for hugepages looks wrong (see below.)
> This is because pte was evaluated only once for one vma
> although it should be updated for each hugepage. This patch fixes it.
> 
> $ page-types -p 3277 -Nl -b huge
> voffset   offset  len     flags
> 7f21e8a00 11e400  1       ___U___________H_G________________
> 7f21e8a01 11e401  1ff     ________________TG________________
> 7f21e8c00 11e400  1       ___U___________H_G________________
> 7f21e8c01 11e401  1ff     ________________TG________________
>              ^^^
>              should not be the same
> 
> With this patch applied:
> 
> $ page-types -p 3386 -Nl -b huge
> voffset   offset   len    flags
> 7fec7a600 112c00   1      ___UD__________H_G________________
> 7fec7a601 112c01   1ff    ________________TG________________
> 7fec7a800 113200   1      ___UD__________H_G________________
> 7fec7a801 113201   1ff    ________________TG________________
>              ^^^
>              OK
> 
Hmm. Is this bug ? To me, it's just shown in hugepage's pagesize, by design.

_And_, Doesn't this patch change behavior of walk_pagemap_range() implicitly ?
No influence to other users ? (as memcontrol.c. in mmotm. Ask Nishimura-san ;)


some nitpicks.


> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> ---
>  fs/proc/task_mmu.c |   37 +++++++++++++++++++------------------
>  include/linux/mm.h |    4 ++--
>  mm/pagewalk.c      |   14 ++++----------
>  3 files changed, 25 insertions(+), 30 deletions(-)
> 
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 2a3ef17..cc14479 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -662,31 +662,32 @@ static u64 huge_pte_to_pagemap_entry(pte_t pte, int offset)
>  	return pme;
>  }
>  
> -static int pagemap_hugetlb_range(pte_t *pte, unsigned long addr,
> +/* This function walks only within @vma */
> +static int pagemap_hugetlb_range(struct vm_area_struct *vma, unsigned long addr,
>  				 unsigned long end, struct mm_walk *walk)
>  {
> -	struct vm_area_struct *vma;
> +	struct mm_struct *mm = walk->mm;
>  	struct pagemapread *pm = walk->private;
>  	struct hstate *hs = NULL;
>  	int err = 0;
> -
> -	vma = find_vma(walk->mm, addr);
> -	if (vma)
> -		hs = hstate_vma(vma);
> +	pte_t *pte = NULL;
> +
> +	BUG_ON(!mm);
> +	BUG_ON(!vma || !is_vm_hugetlb_page(vma));
> +	BUG_ON(addr < vma->vm_start || addr >= vma->vm_end);

This is my personal opinion, may not be popular.

When you add BUG_ON(), please confirm "you have real concern about this."
After reading this, code reader will take care of avoiding calling this
function with above condition. yes.

But, this function itself is only for pagemap_read() and it seems no
other one will call this function externally in future.
Above 3 BUG_ON will never happen because of simple logic around this.
Then, it seems unnecessary noise to me.

If your changes in walk_page_range() causes concerns to add above BUG_ON()s,
please avoid such changes. 

Bye.
-Kame

> +	hs = hstate_vma(vma);
> +	BUG_ON(!hs);
> +	pte = huge_pte_offset(mm, addr);
> +	if (!pte)
> +		return err;
>  	for (; addr != end; addr += PAGE_SIZE) {
>  		u64 pfn = PM_NOT_PRESENT;
> -
> -		if (vma && (addr >= vma->vm_end)) {
> -			vma = find_vma(walk->mm, addr);
> -			if (vma)
> -				hs = hstate_vma(vma);
> -		}
> -
> -		if (vma && (vma->vm_start <= addr) && is_vm_hugetlb_page(vma)) {
> -			/* calculate pfn of the "raw" page in the hugepage. */
> -			int offset = (addr & ~huge_page_mask(hs)) >> PAGE_SHIFT;
> -			pfn = huge_pte_to_pagemap_entry(*pte, offset);
> -		}
> +		/* calculate pfn of the "raw" page in the hugepage. */
> +		int offset = (addr & ~huge_page_mask(hs)) >> PAGE_SHIFT;
> +		/* next hugepage */
> +		if (!offset)
> +			pte = huge_pte_offset(mm, addr);
> +		pfn = huge_pte_to_pagemap_entry(*pte, offset);
>  		err = add_to_pagemap(addr, pfn, pm);
>  		if (err)
>  			return err;
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 3899395..5faafc2 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -783,8 +783,8 @@ struct mm_walk {
>  	int (*pmd_entry)(pmd_t *, unsigned long, unsigned long, struct mm_walk *);
>  	int (*pte_entry)(pte_t *, unsigned long, unsigned long, struct mm_walk *);
>  	int (*pte_hole)(unsigned long, unsigned long, struct mm_walk *);
> -	int (*hugetlb_entry)(pte_t *, unsigned long, unsigned long,
> -			     struct mm_walk *);
> +	int (*hugetlb_entry)(struct vm_area_struct *,
> +			     unsigned long, unsigned long, struct mm_walk *);
>  	struct mm_struct *mm;
>  	void *private;
>  };
> diff --git a/mm/pagewalk.c b/mm/pagewalk.c
> index 7b47a57..3148dc5 100644
> --- a/mm/pagewalk.c
> +++ b/mm/pagewalk.c
> @@ -128,20 +128,14 @@ int walk_page_range(unsigned long addr, unsigned long end,
>  		vma = find_vma(walk->mm, addr);
>  #ifdef CONFIG_HUGETLB_PAGE
>  		if (vma && is_vm_hugetlb_page(vma)) {
> -			pte_t *pte;
> -			struct hstate *hs;
> -
>  			if (vma->vm_end < next)
>  				next = vma->vm_end;
> -			hs = hstate_vma(vma);
> -			pte = huge_pte_offset(walk->mm,
> -					      addr & huge_page_mask(hs));
> -			if (pte && !huge_pte_none(huge_ptep_get(pte))
> -			    && walk->hugetlb_entry)
> -				err = walk->hugetlb_entry(pte, addr,
> -							  next, walk);
> +			if (walk->hugetlb_entry)
> +				err = walk->hugetlb_entry(vma, addr, next,
> +							  walk);
>  			if (err)
>  				break;
> +			pgd = pgd_offset(walk->mm, next);
>  			continue;
>  		}
>  #endif
> -- 
> 1.7.0
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] [BUGFIX] pagemap: fix pfn calculation for hugepage
  2010-03-19  7:10 ` KAMEZAWA Hiroyuki
@ 2010-03-19  7:27   ` KAMEZAWA Hiroyuki
  2010-03-19  8:13     ` KAMEZAWA Hiroyuki
  2010-03-19  8:40     ` Naoya Horiguchi
  2010-03-19  8:37   ` Naoya Horiguchi
  2010-03-19  8:48   ` Daisuke Nishimura
  2 siblings, 2 replies; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-03-19  7:27 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Naoya Horiguchi, linux-kernel, linux-mm, akpm, andi.kleen, fengguang.wu

On Fri, 19 Mar 2010 16:10:23 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Fri, 19 Mar 2010 15:26:36 +0900
> Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:
> 
> > When we look into pagemap using page-types with option -p, the value
> > of pfn for hugepages looks wrong (see below.)
> > This is because pte was evaluated only once for one vma
> > although it should be updated for each hugepage. This patch fixes it.
> > 
> > $ page-types -p 3277 -Nl -b huge
> > voffset   offset  len     flags
> > 7f21e8a00 11e400  1       ___U___________H_G________________
> > 7f21e8a01 11e401  1ff     ________________TG________________
> > 7f21e8c00 11e400  1       ___U___________H_G________________
> > 7f21e8c01 11e401  1ff     ________________TG________________
> >              ^^^
> >              should not be the same
> > 
> > With this patch applied:
> > 
> > $ page-types -p 3386 -Nl -b huge
> > voffset   offset   len    flags
> > 7fec7a600 112c00   1      ___UD__________H_G________________
> > 7fec7a601 112c01   1ff    ________________TG________________
> > 7fec7a800 113200   1      ___UD__________H_G________________
> > 7fec7a801 113201   1ff    ________________TG________________
> >              ^^^
> >              OK
> > 
> Hmm. Is this bug ? To me, it's just shown in hugepage's pagesize, by design.
> 
I'm sorry it seems this is bugfix.

But, this means hugeltb_entry() is not called per hugetlb entry...isn't it ?

Why hugetlb_entry() cannot be called per hugeltb entry ? Don't we need a code
for a case as pmd_size != hugetlb_size in walk_page_range() for generic fix ?

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] [BUGFIX] pagemap: fix pfn calculation for hugepage
  2010-03-19  7:27   ` KAMEZAWA Hiroyuki
@ 2010-03-19  8:13     ` KAMEZAWA Hiroyuki
  2010-03-24  5:18       ` Naoya Horiguchi
  2010-03-19  8:40     ` Naoya Horiguchi
  1 sibling, 1 reply; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-03-19  8:13 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Naoya Horiguchi, linux-kernel, linux-mm, akpm, andi.kleen, fengguang.wu

On Fri, 19 Mar 2010 16:27:32 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Fri, 19 Mar 2010 16:10:23 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> 
> > On Fri, 19 Mar 2010 15:26:36 +0900
> > Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:
> > 
> > > When we look into pagemap using page-types with option -p, the value
> > > of pfn for hugepages looks wrong (see below.)
> > > This is because pte was evaluated only once for one vma
> > > although it should be updated for each hugepage. This patch fixes it.
> > > 
> > > $ page-types -p 3277 -Nl -b huge
> > > voffset   offset  len     flags
> > > 7f21e8a00 11e400  1       ___U___________H_G________________
> > > 7f21e8a01 11e401  1ff     ________________TG________________
> > > 7f21e8c00 11e400  1       ___U___________H_G________________
> > > 7f21e8c01 11e401  1ff     ________________TG________________
> > >              ^^^
> > >              should not be the same
> > > 
> > > With this patch applied:
> > > 
> > > $ page-types -p 3386 -Nl -b huge
> > > voffset   offset   len    flags
> > > 7fec7a600 112c00   1      ___UD__________H_G________________
> > > 7fec7a601 112c01   1ff    ________________TG________________
> > > 7fec7a800 113200   1      ___UD__________H_G________________
> > > 7fec7a801 113201   1ff    ________________TG________________
> > >              ^^^
> > >              OK
> > > 
> > Hmm. Is this bug ? To me, it's just shown in hugepage's pagesize, by design.
> > 
> I'm sorry it seems this is bugfix.
> 
> But, this means hugeltb_entry() is not called per hugetlb entry...isn't it ?
> 
> Why hugetlb_entry() cannot be called per hugeltb entry ? Don't we need a code
> for a case as pmd_size != hugetlb_size in walk_page_range() for generic fix ?
> 

How about this style ? This is an idea-level patch. not tested at all.
(I have no test enviroment for multiple hugepage size.)

feel free to reuse fragments from this patch.
==

Not-tested-at-all!!

Now, walk_page_range() support page-table-walk on hugetlb.
But, it assumes that hugepage-size == pmd_size and hugetlb callback
is called per pmd not per pte of hugepage.

In some arch (ia64), hugepage-size is not pmd size (by config.)

This patch modifies hugetlb callback is called per hugetlb-ptes.

---
 mm/pagewalk.c |   81 +++++++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 67 insertions(+), 14 deletions(-)

Index: mmotm-2.6.34-Mar11/mm/pagewalk.c
===================================================================
--- mmotm-2.6.34-Mar11.orig/mm/pagewalk.c
+++ mmotm-2.6.34-Mar11/mm/pagewalk.c
@@ -79,7 +79,65 @@ static int walk_pud_range(pgd_t *pgd, un
 
 	return err;
 }
+#ifdef CONFIG_HUGELB_PAGE
+static unsigned long
+__hugepage_next_offset(struct vm_area_struct *vma,
+		unsigned long addr, unsigned long end)
+{
+	struct hstate *hs = hstate_vma(vma);
+	unsigned long size = huge_page_size(hs);
+	unsigned long base = addr & huge_page_mask(hs);
+	unsigned long limit;
+	/* Hugepage is very tighty coupled with vma. */
+	if (end > vma->end)
+		limit = vma->end;
+	else
+		limit = end;
+
+	if (base + size < limit)
+		return base + size;
+	return limit;
+}
 
+static int walk_hugepage_range(struct mm_struct *mm, struct vm_area_struct *vma,
+		unsigned long addr, unsigned long end)
+{
+	pte_t *pte;
+	struct hstate *hs = hstate_vma(vma);
+	unsigned long size = huge_page_size(hs);
+	
+	/*
+	 * [addr...end) is guaranteeed to be under a vma
+	 *  (see __hugepage_next_offset())
+	 */
+
+	hs = hstate_vma(vma);
+	do {
+		pte = huge_pte_offset(walk->mm, addr & huge_page_mask(hs));
+		next = (addr & huge_page_mask(hs)) + size;
+		if (next > end)
+			next = end;
+		if (pte && !huge_pte_none(huge_ptep_get(pte))
+			&& walk->hugetlb_entry)
+			err = walk->hugetlb_entry(pte, addr, next, walk);
+	} while (addr = next, addr != end);
+
+	return err;
+}
+#else
+static unsigned long
+__hugepage_next_offset(struct vm_area_struct *vma,
+		unsigned long addr, unsigned long end)
+{
+	BUG();
+	return 0;
+}
+static int walk_hugepage_range(struct mm_struct *mm,
+	struct vm_area_struct *vma, unsigned long addr, unsigned long end)
+{
+	return 0;
+}
+#endif
 /**
  * walk_page_range - walk a memory map's page tables with a callback
  * @mm: memory map to walk
@@ -126,25 +184,20 @@ int walk_page_range(unsigned long addr, 
 		 * we can't handled it in the same manner as non-huge pages.
 		 */
 		vma = find_vma(walk->mm, addr);
-#ifdef CONFIG_HUGETLB_PAGE
-		if (vma && is_vm_hugetlb_page(vma)) {
-			pte_t *pte;
-			struct hstate *hs;
 
-			if (vma->vm_end < next)
-				next = vma->vm_end;
-			hs = hstate_vma(vma);
-			pte = huge_pte_offset(walk->mm,
-					      addr & huge_page_mask(hs));
-			if (pte && !huge_pte_none(huge_ptep_get(pte))
-			    && walk->hugetlb_entry)
-				err = walk->hugetlb_entry(pte, addr,
-							  next, walk);
+		if (vma && is_vm_hugetlb_page(vma)) {
+			/*
+			 * In many archs, hugepage-size fits pmd size. But in
+			 * some arch, we cannot assume that.
+			 */
+			next = __hugepage_next_offset(vma, addr, end);
+			err = walk_hugepage_range(walk->mm, vma, addr, next);
 			if (err)
 				break;
+			/* We don't know next's pgd == pgd+1 */
+			pgd = pgd_offset(walk->mm, next);
 			continue;
 		}
-#endif
 		if (pgd_none_or_clear_bad(pgd)) {
 			if (walk->pte_hole)
 				err = walk->pte_hole(addr, next, walk);




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] [BUGFIX] pagemap: fix pfn calculation for hugepage
  2010-03-19  7:10 ` KAMEZAWA Hiroyuki
  2010-03-19  7:27   ` KAMEZAWA Hiroyuki
@ 2010-03-19  8:37   ` Naoya Horiguchi
  2010-03-19  8:48   ` Daisuke Nishimura
  2 siblings, 0 replies; 9+ messages in thread
From: Naoya Horiguchi @ 2010-03-19  8:37 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-kernel, linux-mm, akpm, andi.kleen, fengguang.wu

On Fri, Mar 19, 2010 at 04:10:23PM +0900, KAMEZAWA Hiroyuki wrote:
> On Fri, 19 Mar 2010 15:26:36 +0900
> Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:
> 
> > When we look into pagemap using page-types with option -p, the value
> > of pfn for hugepages looks wrong (see below.)
> > This is because pte was evaluated only once for one vma
> > although it should be updated for each hugepage. This patch fixes it.
> > 
> > $ page-types -p 3277 -Nl -b huge
> > voffset   offset  len     flags
> > 7f21e8a00 11e400  1       ___U___________H_G________________
> > 7f21e8a01 11e401  1ff     ________________TG________________
> > 7f21e8c00 11e400  1       ___U___________H_G________________
> > 7f21e8c01 11e401  1ff     ________________TG________________
> >              ^^^
> >              should not be the same
> > 
> > With this patch applied:
> > 
> > $ page-types -p 3386 -Nl -b huge
> > voffset   offset   len    flags
> > 7fec7a600 112c00   1      ___UD__________H_G________________
> > 7fec7a601 112c01   1ff    ________________TG________________
> > 7fec7a800 113200   1      ___UD__________H_G________________
> > 7fec7a801 113201   1ff    ________________TG________________
> >              ^^^
> >              OK
> > 
> Hmm. Is this bug ? To me, it's just shown in hugepage's pagesize, by design.
> 
> _And_, Doesn't this patch change behavior of walk_pagemap_range() implicitly ?

No. It just fixes the bug of wrong offset calculation.

> No influence to other users ? (as memcontrol.c. in mmotm. Ask Nishimura-san ;)
> 

I asked him for double-check.

> 
> some nitpicks.
> 
> 
> > Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> > ---
> >  fs/proc/task_mmu.c |   37 +++++++++++++++++++------------------
> >  include/linux/mm.h |    4 ++--
> >  mm/pagewalk.c      |   14 ++++----------
> >  3 files changed, 25 insertions(+), 30 deletions(-)
> > 
> > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> > index 2a3ef17..cc14479 100644
> > --- a/fs/proc/task_mmu.c
> > +++ b/fs/proc/task_mmu.c
> > @@ -662,31 +662,32 @@ static u64 huge_pte_to_pagemap_entry(pte_t pte, int offset)
> >  	return pme;
> >  }
> >  
> > -static int pagemap_hugetlb_range(pte_t *pte, unsigned long addr,
> > +/* This function walks only within @vma */
> > +static int pagemap_hugetlb_range(struct vm_area_struct *vma, unsigned long addr,
> >  				 unsigned long end, struct mm_walk *walk)
> >  {
> > -	struct vm_area_struct *vma;
> > +	struct mm_struct *mm = walk->mm;
> >  	struct pagemapread *pm = walk->private;
> >  	struct hstate *hs = NULL;
> >  	int err = 0;
> > -
> > -	vma = find_vma(walk->mm, addr);
> > -	if (vma)
> > -		hs = hstate_vma(vma);
> > +	pte_t *pte = NULL;
> > +
> > +	BUG_ON(!mm);
> > +	BUG_ON(!vma || !is_vm_hugetlb_page(vma));
> > +	BUG_ON(addr < vma->vm_start || addr >= vma->vm_end);
> 
> This is my personal opinion, may not be popular.
> 
> When you add BUG_ON(), please confirm "you have real concern about this."
> After reading this, code reader will take care of avoiding calling this
> function with above condition. yes.
> 
> But, this function itself is only for pagemap_read() and it seems no
> other one will call this function externally in future.
> Above 3 BUG_ON will never happen because of simple logic around this.
> Then, it seems unnecessary noise to me.
> 

OK. It sounds reasonable.

> If your changes in walk_page_range() causes concerns to add above BUG_ON()s,
> please avoid such changes. 

There is no concern about it.  I'll turn off BUG_ON()s.

Thanks,
Naoya Horiguchi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] [BUGFIX] pagemap: fix pfn calculation for hugepage
  2010-03-19  7:27   ` KAMEZAWA Hiroyuki
  2010-03-19  8:13     ` KAMEZAWA Hiroyuki
@ 2010-03-19  8:40     ` Naoya Horiguchi
  1 sibling, 0 replies; 9+ messages in thread
From: Naoya Horiguchi @ 2010-03-19  8:40 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-kernel, linux-mm, akpm, andi.kleen, fengguang.wu

On Fri, Mar 19, 2010 at 04:27:32PM +0900, KAMEZAWA Hiroyuki wrote:
> On Fri, 19 Mar 2010 16:10:23 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> 
> > On Fri, 19 Mar 2010 15:26:36 +0900
> > Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:
> > 
> > > When we look into pagemap using page-types with option -p, the value
> > > of pfn for hugepages looks wrong (see below.)
> > > This is because pte was evaluated only once for one vma
> > > although it should be updated for each hugepage. This patch fixes it.
> > > 
> > > $ page-types -p 3277 -Nl -b huge
> > > voffset   offset  len     flags
> > > 7f21e8a00 11e400  1       ___U___________H_G________________
> > > 7f21e8a01 11e401  1ff     ________________TG________________
> > > 7f21e8c00 11e400  1       ___U___________H_G________________
> > > 7f21e8c01 11e401  1ff     ________________TG________________
> > >              ^^^
> > >              should not be the same
> > > 
> > > With this patch applied:
> > > 
> > > $ page-types -p 3386 -Nl -b huge
> > > voffset   offset   len    flags
> > > 7fec7a600 112c00   1      ___UD__________H_G________________
> > > 7fec7a601 112c01   1ff    ________________TG________________
> > > 7fec7a800 113200   1      ___UD__________H_G________________
> > > 7fec7a801 113201   1ff    ________________TG________________
> > >              ^^^
> > >              OK
> > > 
> > Hmm. Is this bug ? To me, it's just shown in hugepage's pagesize, by design.
> > 
> I'm sorry it seems this is bugfix.
> 
> But, this means hugeltb_entry() is not called per hugetlb entry...isn't it ?
> 

Correct. Hugetlb_entry() is called per vma.

> Why hugetlb_entry() cannot be called per hugeltb entry ? Don't we need a code
> for a case as pmd_size != hugetlb_size in walk_page_range() for generic fix ?
> 

Because in some architecture there is no generic means to know whether
a pgd/pmd/pud/pte is hugetlb entry or not.
For second question, vma-based walking is generic solution and
works for "pmd_size != hugetlb_size" case.

Thanks,
Naoya Horiguchi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] [BUGFIX] pagemap: fix pfn calculation for hugepage
  2010-03-19  7:10 ` KAMEZAWA Hiroyuki
  2010-03-19  7:27   ` KAMEZAWA Hiroyuki
  2010-03-19  8:37   ` Naoya Horiguchi
@ 2010-03-19  8:48   ` Daisuke Nishimura
  2 siblings, 0 replies; 9+ messages in thread
From: Daisuke Nishimura @ 2010-03-19  8:48 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: nishimura, Naoya Horiguchi, linux-kernel, linux-mm, akpm,
	andi.kleen, fengguang.wu

On Fri, 19 Mar 2010 16:10:23 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Fri, 19 Mar 2010 15:26:36 +0900
> Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:
> 
> > When we look into pagemap using page-types with option -p, the value
> > of pfn for hugepages looks wrong (see below.)
> > This is because pte was evaluated only once for one vma
> > although it should be updated for each hugepage. This patch fixes it.
> > 
> > $ page-types -p 3277 -Nl -b huge
> > voffset   offset  len     flags
> > 7f21e8a00 11e400  1       ___U___________H_G________________
> > 7f21e8a01 11e401  1ff     ________________TG________________
> > 7f21e8c00 11e400  1       ___U___________H_G________________
> > 7f21e8c01 11e401  1ff     ________________TG________________
> >              ^^^
> >              should not be the same
> > 
> > With this patch applied:
> > 
> > $ page-types -p 3386 -Nl -b huge
> > voffset   offset   len    flags
> > 7fec7a600 112c00   1      ___UD__________H_G________________
> > 7fec7a601 112c01   1ff    ________________TG________________
> > 7fec7a800 113200   1      ___UD__________H_G________________
> > 7fec7a801 113201   1ff    ________________TG________________
> >              ^^^
> >              OK
> > 
> Hmm. Is this bug ? To me, it's just shown in hugepage's pagesize, by design.
> 
> _And_, Doesn't this patch change behavior of walk_pagemap_range() implicitly ?
> No influence to other users ? (as memcontrol.c. in mmotm. Ask Nishimura-san ;)
> 
>From the view point of memcg, this change in walk_pagemap_range() has no influence,
because memcg does "if (is_vm_hugetlb_page(vma)) break" before calling walk_pagemap_range().
But we must check other callers too.


Thanks,
Daisuke Nishimura.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] [BUGFIX] pagemap: fix pfn calculation for hugepage
  2010-03-19  8:13     ` KAMEZAWA Hiroyuki
@ 2010-03-24  5:18       ` Naoya Horiguchi
  2010-03-24  5:22         ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 9+ messages in thread
From: Naoya Horiguchi @ 2010-03-24  5:18 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-kernel, linux-mm, akpm, andi.kleen, fengguang.wu

On Fri, Mar 19, 2010 at 05:13:10PM +0900, KAMEZAWA Hiroyuki wrote:
...
> > 
> > But, this means hugeltb_entry() is not called per hugetlb entry...isn't it ?
> > 
> > Why hugetlb_entry() cannot be called per hugeltb entry ? Don't we need a code
> > for a case as pmd_size != hugetlb_size in walk_page_range() for generic fix ?
> > 
> 
> How about this style ? This is an idea-level patch. not tested at all.
> (I have no test enviroment for multiple hugepage size.)
> 
> feel free to reuse fragments from this patch.
>

So the point is calling hugetlb_entry() for each huge page, right?

It looks good.
I've rewritten my patch based on your idea and make sure it works.
Is it ok to add your Signed-off-by?

Thanks,
Naoya Horiguchi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] [BUGFIX] pagemap: fix pfn calculation for hugepage
  2010-03-24  5:18       ` Naoya Horiguchi
@ 2010-03-24  5:22         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-03-24  5:22 UTC (permalink / raw)
  To: Naoya Horiguchi; +Cc: linux-kernel, linux-mm, akpm, andi.kleen, fengguang.wu

On Wed, 24 Mar 2010 14:18:45 +0900
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:

> On Fri, Mar 19, 2010 at 05:13:10PM +0900, KAMEZAWA Hiroyuki wrote:
> ...
> > > 
> > > But, this means hugeltb_entry() is not called per hugetlb entry...isn't it ?
> > > 
> > > Why hugetlb_entry() cannot be called per hugeltb entry ? Don't we need a code
> > > for a case as pmd_size != hugetlb_size in walk_page_range() for generic fix ?
> > > 
> > 
> > How about this style ? This is an idea-level patch. not tested at all.
> > (I have no test enviroment for multiple hugepage size.)
> > 
> > feel free to reuse fragments from this patch.
> >
> 
> So the point is calling hugetlb_entry() for each huge page, right?
> 
yes.

> It looks good.
> I've rewritten my patch based on your idea and make sure it works.
> Is it ok to add your Signed-off-by?
Of course.

Thanks.
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-03-24  5:26 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-19  6:26 [PATCH 2/2] [BUGFIX] pagemap: fix pfn calculation for hugepage Naoya Horiguchi
2010-03-19  7:10 ` KAMEZAWA Hiroyuki
2010-03-19  7:27   ` KAMEZAWA Hiroyuki
2010-03-19  8:13     ` KAMEZAWA Hiroyuki
2010-03-24  5:18       ` Naoya Horiguchi
2010-03-24  5:22         ` KAMEZAWA Hiroyuki
2010-03-19  8:40     ` Naoya Horiguchi
2010-03-19  8:37   ` Naoya Horiguchi
2010-03-19  8:48   ` Daisuke Nishimura

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox