Re: Propagating GFP_NOFS inside __vmalloc()

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Rientjes <rientjes@google.com>
To: "Ricardo M. Correia" <ricardo.correia@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Brian Behlendorf <behlendorf1@llnl.gov>,
	Andreas Dilger <andreas.dilger@oracle.com>
Subject: Re: Propagating GFP_NOFS inside __vmalloc()
Date: Mon, 15 Nov 2010 13:28:54 -0800 (PST)	[thread overview]
Message-ID: <alpine.DEB.2.00.1011151303130.8167@chino.kir.corp.google.com> (raw)
In-Reply-To: <1289840500.13446.65.camel@oralap>

On Mon, 15 Nov 2010, Ricardo M. Correia wrote:

> When __vmalloc() / __vmalloc_area_node() calls map_vm_area(), the latter can
> allocate pages with GFP_KERNEL despite the caller of __vmalloc having requested
> a more strict gfp mask.
> 
> We fix this by introducing a per-thread gfp_mask, similar to gfp_allowed_mask
> but which only applies to the current thread. __vmalloc_area_node() will now
> temporarily restrict the per-thread gfp_mask when it calls map_vm_area().
> 
> This new per-thread gfp mask may also be used for other useful purposes, for
> example, after thread creation, to make sure that certain threads
> (e.g. filesystem I/O threads) never allocate memory with certain flags (e.g.
> __GFP_FS or __GFP_IO).

I dislike this approach not only for its performance degradation in core 
areas like the page and slab allocators, but also because it requires full 
knowledge of the callchain to determine the gfp flags of the allocation.  
This will become nasty very quickly.

This proposal essentially defines an entirely new method for passing gfp 
flags to the page allocator when it isn't strictly needed.  I think the 
problem you're addressing can be done in one of two ways:

 - create lower-level functions in each arch that pass a gfp argument to 
   the allocator rather than hard-coded GFP_KERNEL, or

 - avoid doing anything other than GFP_KERNEL allocations for __vmalloc():
   the only current users are gfs2, ntfs, and ceph (the page allocator
   __vmalloc() can be discounted since it's done at boot and GFP_ATOMIC
   here has almost no chance of failing since the size is determined based 
   on what is available).

The first option really addresses the bug that you're running into and can 
be addressed in a relatively simple way by redefining current users of 
pmd_alloc_one(), for instance, as a form of a new lower-level 
__pmd_alloc_one():

	static inline pmd_t *__pmd_alloc_one(struct mm_struct *mm,
					unsigned long addr, gfp_t flags)
	{
        	return (pmd_t *)get_zeroed_page(flags);
	}

	static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
	{
        	return __pmd_alloc_one(GFP_KERNEL|__GFP_REPEAT);
	}

and then using __pmd_alloc_one() in the vmalloc path with the passed mask 
rather than pmd_alloc_one().  This _will_ be slightly intrusive because it 
will require fixing up some short callchains to pass the appropriate mask, 
that will be limited to the vmalloc code and arch code that currently does 
unconditional GFP_KERNEL allocations.  Both are bugs that you'll be 
addressing for each architecture, so the intrusiveness of that change has 
merit (and be sure to cc linux-arch@vger.kernel.org on it as well).

I only mention the second option because passing GFP_NOFS to __vmalloc() 
for sufficiently large sizes has a much higher probability of failing if 
you're running into issues where GFP_KERNEL is causing synchronous 
reclaim.  We may not be able to do any better in the contexts in which 
gfs2, ntfs, and ceph use it without some sort of preallocation at an 
earlier time, but the liklihood of those allocations failing is much 
harder than the typical vmalloc() that tries really hard with __GFP_REPEAT 
to allocate the memory required.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2010-11-15 21:29 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-10 20:42 Ricardo M. Correia
2010-11-10 21:35 ` Ricardo M. Correia
2010-11-10 22:10   ` Dave Chinner
2010-11-11 20:06 ` Andrew Morton
2010-11-11 22:02   ` Ricardo M. Correia
2010-11-11 22:25     ` Andrew Morton
2010-11-11 22:45       ` Ricardo M. Correia
2010-11-11 23:19         ` Ricardo M. Correia
2010-11-11 23:27           ` Andrew Morton
2010-11-11 23:29             ` Ricardo M. Correia
2010-11-15 17:01       ` Ricardo M. Correia
2010-11-15 21:28         ` David Rientjes [this message]
2010-11-15 22:19           ` Ricardo M. Correia
2010-11-15 22:50             ` David Rientjes
2010-11-15 23:30               ` Ricardo M. Correia
2010-11-15 23:55                 ` David Rientjes
2010-11-16 22:11           ` Andrew Morton
2010-11-17  7:18             ` Andreas Dilger
2010-11-17  7:24               ` Andrew Morton
2010-11-17  7:37               ` David Rientjes
2010-11-17  9:04                 ` Christoph Hellwig
2010-11-17 21:24                   ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.00.1011151303130.8167@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=andreas.dilger@oracle.com \
    --cc=behlendorf1@llnl.gov \
    --cc=linux-mm@kvack.org \
    --cc=ricardo.correia@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox