From: David Rientjes <rientjes@google.com>
To: "Ricardo M. Correia" <ricardo.correia@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, Brian Behlendorf <behlendorf1@llnl.gov>,
Andreas Dilger <andreas.dilger@oracle.com>
Subject: Re: Propagating GFP_NOFS inside __vmalloc()
Date: Mon, 15 Nov 2010 13:28:54 -0800 (PST) [thread overview]
Message-ID: <alpine.DEB.2.00.1011151303130.8167@chino.kir.corp.google.com> (raw)
In-Reply-To: <1289840500.13446.65.camel@oralap>
On Mon, 15 Nov 2010, Ricardo M. Correia wrote:
> When __vmalloc() / __vmalloc_area_node() calls map_vm_area(), the latter can
> allocate pages with GFP_KERNEL despite the caller of __vmalloc having requested
> a more strict gfp mask.
>
> We fix this by introducing a per-thread gfp_mask, similar to gfp_allowed_mask
> but which only applies to the current thread. __vmalloc_area_node() will now
> temporarily restrict the per-thread gfp_mask when it calls map_vm_area().
>
> This new per-thread gfp mask may also be used for other useful purposes, for
> example, after thread creation, to make sure that certain threads
> (e.g. filesystem I/O threads) never allocate memory with certain flags (e.g.
> __GFP_FS or __GFP_IO).
I dislike this approach not only for its performance degradation in core
areas like the page and slab allocators, but also because it requires full
knowledge of the callchain to determine the gfp flags of the allocation.
This will become nasty very quickly.
This proposal essentially defines an entirely new method for passing gfp
flags to the page allocator when it isn't strictly needed. I think the
problem you're addressing can be done in one of two ways:
- create lower-level functions in each arch that pass a gfp argument to
the allocator rather than hard-coded GFP_KERNEL, or
- avoid doing anything other than GFP_KERNEL allocations for __vmalloc():
the only current users are gfs2, ntfs, and ceph (the page allocator
__vmalloc() can be discounted since it's done at boot and GFP_ATOMIC
here has almost no chance of failing since the size is determined based
on what is available).
The first option really addresses the bug that you're running into and can
be addressed in a relatively simple way by redefining current users of
pmd_alloc_one(), for instance, as a form of a new lower-level
__pmd_alloc_one():
static inline pmd_t *__pmd_alloc_one(struct mm_struct *mm,
unsigned long addr, gfp_t flags)
{
return (pmd_t *)get_zeroed_page(flags);
}
static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
{
return __pmd_alloc_one(GFP_KERNEL|__GFP_REPEAT);
}
and then using __pmd_alloc_one() in the vmalloc path with the passed mask
rather than pmd_alloc_one(). This _will_ be slightly intrusive because it
will require fixing up some short callchains to pass the appropriate mask,
that will be limited to the vmalloc code and arch code that currently does
unconditional GFP_KERNEL allocations. Both are bugs that you'll be
addressing for each architecture, so the intrusiveness of that change has
merit (and be sure to cc linux-arch@vger.kernel.org on it as well).
I only mention the second option because passing GFP_NOFS to __vmalloc()
for sufficiently large sizes has a much higher probability of failing if
you're running into issues where GFP_KERNEL is causing synchronous
reclaim. We may not be able to do any better in the contexts in which
gfs2, ntfs, and ceph use it without some sort of preallocation at an
earlier time, but the liklihood of those allocations failing is much
harder than the typical vmalloc() that tries really hard with __GFP_REPEAT
to allocate the memory required.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-11-15 21:29 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-10 20:42 Ricardo M. Correia
2010-11-10 21:35 ` Ricardo M. Correia
2010-11-10 22:10 ` Dave Chinner
2010-11-11 20:06 ` Andrew Morton
2010-11-11 22:02 ` Ricardo M. Correia
2010-11-11 22:25 ` Andrew Morton
2010-11-11 22:45 ` Ricardo M. Correia
2010-11-11 23:19 ` Ricardo M. Correia
2010-11-11 23:27 ` Andrew Morton
2010-11-11 23:29 ` Ricardo M. Correia
2010-11-15 17:01 ` Ricardo M. Correia
2010-11-15 21:28 ` David Rientjes [this message]
2010-11-15 22:19 ` Ricardo M. Correia
2010-11-15 22:50 ` David Rientjes
2010-11-15 23:30 ` Ricardo M. Correia
2010-11-15 23:55 ` David Rientjes
2010-11-16 22:11 ` Andrew Morton
2010-11-17 7:18 ` Andreas Dilger
2010-11-17 7:24 ` Andrew Morton
2010-11-17 7:37 ` David Rientjes
2010-11-17 9:04 ` Christoph Hellwig
2010-11-17 21:24 ` David Rientjes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.2.00.1011151303130.8167@chino.kir.corp.google.com \
--to=rientjes@google.com \
--cc=akpm@linux-foundation.org \
--cc=andreas.dilger@oracle.com \
--cc=behlendorf1@llnl.gov \
--cc=linux-mm@kvack.org \
--cc=ricardo.correia@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox