From: Stephen Wilson <wilsons@start.ca>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Stephen Wilson <wilsons@start.ca>,
Alexander Viro <viro@zeniv.linux.org.uk>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Hugh Dickins <hughd@google.com>,
David Rientjes <rientjes@google.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Jeremy Fitzhardinge <jeremy@goop.org>
Subject: Re: [PATCH 0/8] avoid allocation in show_numa_map()
Date: Wed, 4 May 2011 22:37:15 -0400 [thread overview]
Message-ID: <20110505023715.GA4569@fibrous.localdomain> (raw)
In-Reply-To: <20110504161020.e2d0a7f2.akpm@linux-foundation.org>
On Wed, May 04, 2011 at 04:10:20PM -0700, Andrew Morton wrote:
> On Wed, 27 Apr 2011 19:35:41 -0400
> Stephen Wilson <wilsons@start.ca> wrote:
>
> > Recently a concern was raised[1] that performing an allocation while holding a
> > reference on a tasks mm could lead to a stalemate in the oom killer. The
> > concern was specific to the goings-on in /proc. Hugh Dickins stated the issue
> > thusly:
> >
> > ...imagine what happens if the system is out of memory, and the mm
> > we're looking at is selected for killing by the OOM killer: while we
> > wait in __get_free_page for more memory, no memory is freed from the
> > selected mm because it cannot reach exit_mmap while we hold that
> > reference.
> >
> > The primary goal of this series is to eliminate repeated allocation/free cycles
> > currently happening in show_numa_maps() while we hold a reference to an mm.
> >
> > The strategy is to perform the allocation once when /proc/pid/numa_maps is
> > opened, before a reference on the target tasks mm is taken.
> >
> > Unfortunately, show_numa_maps() is implemented in mm/mempolicy.c while the
> > primary procfs implementation lives in fs/proc/task_mmu.c. This makes
> > clean cooperation between show_numa_maps() and the other seq_file operations
> > (start(), stop(), etc) difficult.
> >
> >
> > Patches 1-5 convert show_numa_maps() to use the generic walk_page_range()
> > functionality instead of the mempolicy.c specific page table walking logic.
> > Also, get_vma_policy() is exported. This makes the show_numa_maps()
> > implementation independent of mempolicy.c.
> >
> > Patch 6 moves show_numa_maps() and supporting routines over to
> > fs/proc/task_mmu.c.
> >
> > Finally, patches 7 and 8 provide minor cleanup and eliminates the troublesome
> > allocation.
> >
> >
> > Please note that moving show_numa_maps() into fs/proc/task_mmu.c essentially
> > reverts 1a75a6c825 and 48fce3429d. Also, please see the discussion at [2]. My
> > main justifications for moving the code back into task_mmu.c is:
> >
> > - Having the show() operation "miles away" from the corresponding
> > seq_file iteration operations is a maintenance burden.
> >
> > - The need to export ad hoc info like struct proc_maps_private is
> > eliminated.
> >
> >
> > These patches are based on v2.6.39-rc5.
>
> The patches look reasonable. It would be nice to get some more review
> happening (poke).
If anyone would like me to resend the series please let me know.
> >
> > Please note that this series is VERY LIGHTLY TESTED. I have been using
> > CONFIG_NUMA_EMU=y thus far as I will not have access to a real NUMA system for
> > another week or two.
>
> "lightly tested" evokes fear, but the patches don't look too scary to
> me.
Indeed. I hope to have some real hardware to test the patches on by
the end of the week; fingers crossed. Will certainly address any
issues that come up at that time.
> Did you look at using apply_to_page_range()?
I did not look into it deeply, no. The main reason for using
walk_page_range() was that it supports hugetlb vma's in the same way as
was done in mempolicy.c's check_huge_range(). The algorithm was a very
natural fit so I ran with it.
> I'm trying to remember why we're carrying both walk_page_range() and
> apply_to_page_range() but can't immediately think of a reason.
>
> There's also an apply_to_page_range_batch() in -mm but that code is
> broken on PPC and not much is happening with it, so it will probably go
> away again.
--
steve
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
prev parent reply other threads:[~2011-05-05 2:37 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-04-27 23:35 Stephen Wilson
2011-04-27 23:35 ` [PATCH 1/8] mm: export get_vma_policy() Stephen Wilson
2011-05-09 7:39 ` KOSAKI Motohiro
2011-04-27 23:35 ` [PATCH 2/8] mm: use walk_page_range() instead of custom page table walking code Stephen Wilson
2011-05-09 7:38 ` KOSAKI Motohiro
2011-05-09 19:36 ` Stephen Wilson
2011-05-10 0:20 ` KOSAKI Motohiro
2011-04-27 23:35 ` [PATCH 3/8] mm: remove MPOL_MF_STATS Stephen Wilson
2011-05-09 7:44 ` KOSAKI Motohiro
2011-05-09 19:39 ` Stephen Wilson
2011-04-27 23:35 ` [PATCH 4/8] mm: make gather_stats() type-safe and remove forward declaration Stephen Wilson
2011-05-09 7:45 ` KOSAKI Motohiro
2011-04-27 23:35 ` [PATCH 5/8] mm: remove check_huge_range() Stephen Wilson
2011-05-09 7:46 ` KOSAKI Motohiro
2011-04-27 23:35 ` [PATCH 6/8] mm: proc: move show_numa_map() to fs/proc/task_mmu.c Stephen Wilson
2011-05-09 7:49 ` KOSAKI Motohiro
2011-04-27 23:35 ` [PATCH 7/8] proc: make struct proc_maps_private truly private Stephen Wilson
2011-05-09 7:51 ` KOSAKI Motohiro
2011-04-27 23:35 ` [PATCH 8/8] proc: allocate storage for numa_maps statistics once Stephen Wilson
2011-05-09 8:24 ` KOSAKI Motohiro
2011-05-04 23:10 ` [PATCH 0/8] avoid allocation in show_numa_map() Andrew Morton
2011-05-05 2:37 ` Stephen Wilson [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110505023715.GA4569@fibrous.localdomain \
--to=wilsons@start.ca \
--cc=akpm@linux-foundation.org \
--cc=hughd@google.com \
--cc=jeremy@goop.org \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=rientjes@google.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox