linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* regression in /proc/self/numa_maps with huge pages
@ 2011-10-19 19:35 Stephen Hemminger
  2011-10-19 20:10 ` Andrew Morton
  0 siblings, 1 reply; 4+ messages in thread
From: Stephen Hemminger @ 2011-10-19 19:35 UTC (permalink / raw)
  To: Stephen Wilson
  Cc: linux-mm, KOSAKI Motohiro, Hugh Dickins, David Rientjes,
	Lee Schermerhorn, Alexey Dobriyan, Andrew Morton

We are working on an application that uses a library that uses
both huge pages and parses numa_maps.  This application is no longer
able to identify the socket id correctly for huge pages because the
that 'huge' is no longer part of /proc/self/numa_maps.

Basically, application sets up huge page mmaps, then reads /proc/self/numa_maps
and skips all entries without the string " huge ".  Then it looks for address
and socket info.

Why was this information dropped? Looks like the desire to be generic
overstepped the desire to remain compatible.


This regression in kernel ABI was introduced by:
commit 29ea2f6982f1edc4302729116f2246dd7b45471d
Author: Stephen Wilson <wilsons@start.ca>
Date:   Tue May 24 17:12:42 2011 -0700

    mm: use walk_page_range() instead of custom page table walking code
    
    Converting show_numa_map() to use the generic routine decouples the
    function from mempolicy.c, allowing it to be moved out of the mm subsystem
    and into fs/proc.
    
    Also, include KSM pages in /proc/pid/numa_maps statistics.  The pagewalk
    logic implemented by check_pte_range() failed to account for such pages as
    they were not applicable to the page migration case.
    
    Signed-off-by: Stephen Wilson <wilsons@start.ca>
    Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
    Cc: Alexey Dobriyan <adobriyan@gmail.com>
    Cc: Christoph Lameter <cl@linux-foundation.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: regression in /proc/self/numa_maps with huge pages
  2011-10-19 19:35 regression in /proc/self/numa_maps with huge pages Stephen Hemminger
@ 2011-10-19 20:10 ` Andrew Morton
  2011-10-19 20:46   ` Stephen Hemminger
  2011-10-19 20:52   ` David Rientjes
  0 siblings, 2 replies; 4+ messages in thread
From: Andrew Morton @ 2011-10-19 20:10 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Stephen Wilson, linux-mm, KOSAKI Motohiro, Hugh Dickins,
	David Rientjes, Lee Schermerhorn, Alexey Dobriyan

On Wed, 19 Oct 2011 12:35:30 -0700
Stephen Hemminger <shemminger@vyatta.com> wrote:

> We are working on an application that uses a library that uses
> both huge pages and parses numa_maps.  This application is no longer
> able to identify the socket id correctly for huge pages because the
> that 'huge' is no longer part of /proc/self/numa_maps.
> 
> Basically, application sets up huge page mmaps, then reads /proc/self/numa_maps
> and skips all entries without the string " huge ".  Then it looks for address
> and socket info.
> 
> Why was this information dropped?

Mistake?

> Looks like the desire to be generic
> overstepped the desire to remain compatible.

Or it was a mistake.

This?

--- a/fs/proc/task_mmu.c~a
+++ a/fs/proc/task_mmu.c
@@ -1009,6 +1009,9 @@ static int show_numa_map(struct seq_file
 		seq_printf(m, " stack");
 	}
 
+	if (is_vm_hugetlb_page(vma))
+		seq_printf(m, " huge");
+
 	walk_page_range(vma->vm_start, vma->vm_end, &walk);
 
 	if (!md->pages)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: regression in /proc/self/numa_maps with huge pages
  2011-10-19 20:10 ` Andrew Morton
@ 2011-10-19 20:46   ` Stephen Hemminger
  2011-10-19 20:52   ` David Rientjes
  1 sibling, 0 replies; 4+ messages in thread
From: Stephen Hemminger @ 2011-10-19 20:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Stephen Wilson, linux-mm, KOSAKI Motohiro, Hugh Dickins,
	David Rientjes, Lee Schermerhorn, Alexey Dobriyan

On Wed, 19 Oct 2011 13:10:07 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Wed, 19 Oct 2011 12:35:30 -0700
> Stephen Hemminger <shemminger@vyatta.com> wrote:
> 
> > We are working on an application that uses a library that uses
> > both huge pages and parses numa_maps.  This application is no longer
> > able to identify the socket id correctly for huge pages because the
> > that 'huge' is no longer part of /proc/self/numa_maps.
> > 
> > Basically, application sets up huge page mmaps, then reads /proc/self/numa_maps
> > and skips all entries without the string " huge ".  Then it looks for address
> > and socket info.
> > 
> > Why was this information dropped?
> 
> Mistake?
> 
> > Looks like the desire to be generic
> > overstepped the desire to remain compatible.
> 
> Or it was a mistake.
> 
> This?
> 
> --- a/fs/proc/task_mmu.c~a
> +++ a/fs/proc/task_mmu.c
> @@ -1009,6 +1009,9 @@ static int show_numa_map(struct seq_file
>  		seq_printf(m, " stack");
>  	}
>  
> +	if (is_vm_hugetlb_page(vma))
> +		seq_printf(m, " huge");
> +
>  	walk_page_range(vma->vm_start, vma->vm_end, &walk);
>  
>  	if (!md->pages)
> 

That works, the application is happy.
Never would have found it except someone put the CPU in the wrong socket
on one dual CPU motherboard so everything is reported as socket 1!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: regression in /proc/self/numa_maps with huge pages
  2011-10-19 20:10 ` Andrew Morton
  2011-10-19 20:46   ` Stephen Hemminger
@ 2011-10-19 20:52   ` David Rientjes
  1 sibling, 0 replies; 4+ messages in thread
From: David Rientjes @ 2011-10-19 20:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Stephen Hemminger, Stephen Wilson, linux-mm, KOSAKI Motohiro,
	Hugh Dickins, Lee Schermerhorn, Alexey Dobriyan, Dave Hansen

On Wed, 19 Oct 2011, Andrew Morton wrote:

> > We are working on an application that uses a library that uses
> > both huge pages and parses numa_maps.  This application is no longer
> > able to identify the socket id correctly for huge pages because the
> > that 'huge' is no longer part of /proc/self/numa_maps.
> > 
> > Basically, application sets up huge page mmaps, then reads /proc/self/numa_maps
> > and skips all entries without the string " huge ".  Then it looks for address
> > and socket info.
> > 
> > Why was this information dropped?
> 
> Mistake?
> 
> > Looks like the desire to be generic
> > overstepped the desire to remain compatible.
> 
> Or it was a mistake.
> 
> This?
> 
> --- a/fs/proc/task_mmu.c~a
> +++ a/fs/proc/task_mmu.c
> @@ -1009,6 +1009,9 @@ static int show_numa_map(struct seq_file
>  		seq_printf(m, " stack");
>  	}
>  
> +	if (is_vm_hugetlb_page(vma))
> +		seq_printf(m, " huge");
> +
>  	walk_page_range(vma->vm_start, vma->vm_end, &walk);
>  
>  	if (!md->pages)

Hmm, Dave Hansen (cc'd) was working on a patch that would add a pagesize= 
field to /proc/pid/numa_maps because there's now a discrepency in what is 
labeled "huge."  Hugetlbfs pages, for which "huge" would now be shown 
again for the patch above, always have their page counts shown in their 
appropriate hugepage size (2M, 1G for x86, others for other archs) which 
is ambiguous with just "huge" shown.  THP page counts, on the other hand, 
are always shown in PAGE_SIZE pages.

So adding "huge" back is ambiguous in terms of hugetlbfs size and doesn't 
represent THP hugepages.  No objection to the patch if it's strictly for 
numa_maps compatibility starting from 3.0, but we need to extend the 
output with a pagesize= type field unless we want to require users to use 
/proc/pid/smaps anytime they want to parse the page counts emitted by 
numa_maps.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-10-19 20:52 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-19 19:35 regression in /proc/self/numa_maps with huge pages Stephen Hemminger
2011-10-19 20:10 ` Andrew Morton
2011-10-19 20:46   ` Stephen Hemminger
2011-10-19 20:52   ` David Rientjes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox