Re: [PATCH 0/2] Move away from non-failing small allocations

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: mhocko@suse.cz, akpm@linux-foundation.org
Cc: hannes@cmpxchg.org, david@fromorbit.com, mgorman@suse.de,
	riel@redhat.com, fengguang.wu@intel.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/2] Move away from non-failing small allocations
Date: Tue, 17 Mar 2015 23:06:34 +0900	[thread overview]
Message-ID: <201503172305.DIH52162.FOFMFOVJHLOtQS@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <20150317090738.GB28112@dhcp22.suse.cz>

Michal Hocko wrote:
> On Mon 16-03-15 15:38:43, Andrew Morton wrote:
> > Realistically, I don't think this overall effort will be successful -
> > we'll add the knob, it won't get enough testing and any attempt to
> > alter the default will be us deliberately destabilizing the kernel
> > without knowing how badly :(
> 
> Without the knob we do not allow users to test this at all though and
> the transition will _never_ happen. Which is IMHO bad.
> 

Even with the knob, quite little users will test this. The consequence is
likely that end users rush into customer support center about obscure bugs.
I'm working at a support center, and such bugs are really annoying.

> > I wonder if we can alter the behaviour only for filesystem code, so we
> > constrain the new behaviour just to that code where we're having
> > problems.  Most/all fs code goes via vfs methods so there's a reasonably
> > small set of places where we can call
> 
> We are seeing issues with the fs code now because the test cases which
> led to the current discussion exercise FS code. The code which does
> lock(); kmalloc(GFP_KERNEL) is not reduced there though. I am pretty sure
> we can find other subsystems if we try hard enough.

I'm expecting for patches which avoids deadlock by lock(); kmalloc(GFP_KERNEL).

> > static inline void enter_fs_code(struct super_block *sb)
> > {
> > 	if (sb->my_small_allocations_can_fail)
> > 		current->small_allocations_can_fail++;
> > }
> > 
> > that way (or something similar) we can select the behaviour on a per-fs
> > basis and the rest of the kernel remains unaffected.  Other subsystems
> > can opt in as well.
> 
> This is basically leading to GFP_MAYFAIL which is completely backwards
> (the hard requirement should be an exception not a default rule).
> I really do not want to end up with stuffing random may_fail annotations
> all over the kernel.
> 

I wish that GFP_NOFS / GFP_NOIO regions are annotated with

  static inline void enter_fs_code(void)
  {
  #ifdef CONFIG_DEBUG_GFP_FLAGS
  	current->in_fs_code++;
  #endif
  }

  static inline void leave_fs_code(void)
  {
  #ifdef CONFIG_DEBUG_GFP_FLAGS
  	current->in_fs_code--;
  #endif
  }

  static inline void enter_io_code(void)
  {
  #ifdef CONFIG_DEBUG_GFP_FLAGS
  	current->in_io_code++;
  #endif
  }

  static inline void leave_io_code(void)
  {
  #ifdef CONFIG_DEBUG_GFP_FLAGS
  	current->in_io_code--;
  #endif
  }

so that inappropriate GFP_KERNEL usage inside GFP_NOFS region are catchable
by doing

  struct page *
  __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
                          struct zonelist *zonelist, nodemask_t *nodemask)
  {
  	struct zoneref *preferred_zoneref;
  	struct page *page = NULL;
  	unsigned int cpuset_mems_cookie;
  	int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR;
  	gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
  	struct alloc_context ac = {
  		.high_zoneidx = gfp_zone(gfp_mask),
  		.nodemask = nodemask,
  		.migratetype = gfpflags_to_migratetype(gfp_mask),
  	};
  	
  	gfp_mask &= gfp_allowed_mask;
 +#ifdef CONFIG_DEBUG_GFP_FLAGS
 +	WARN_ON(current->in_fs_code & (gfp_mask & __GFP_FS));
 +	WARN_ON(current->in_io_code & (gfp_mask & __GFP_IO));
 +#endif
  
  	lockdep_trace_alloc(gfp_mask);
  

. It is difficult for non-fs developers to determine whether they need to use
GFP_NOFS than GFP_KERNEL in their code. An example is seen at
http://marc.info/?l=linux-security-module&m=138556479607024&w=2 .

Moreover, I don't know how GFP flags are managed when stacked like
"a swap file on ext4 on top of LVM (with snapshots) on a RAID array
connected over iSCSI" (quoted from comments on Jon's writeup), but I
wish that the distinction between GFP_KERNEL / GFP_NOFS / GFP_NOIO
are removed from memory allocating function callers by doing

  static inline void enter_fs_code(void)
  {
  	current->in_fs_code++;
  }

  static inline void leave_fs_code(void)
  {
  	current->in_fs_code--;
  }

  static inline void enter_io_code(void)
  {
  	current->in_io_code++;
  }

  static inline void leave_io_code(void)
  {
  	current->in_io_code--;
  }

  struct page *
  __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
                          struct zonelist *zonelist, nodemask_t *nodemask)
  {
  	struct zoneref *preferred_zoneref;
  	struct page *page = NULL;
  	unsigned int cpuset_mems_cookie;
  	int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR;
  	gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
  	struct alloc_context ac = {
  		.high_zoneidx = gfp_zone(gfp_mask),
  		.nodemask = nodemask,
  		.migratetype = gfpflags_to_migratetype(gfp_mask),
  	};
  	
  	gfp_mask &= gfp_allowed_mask;
 +	if (current->in_fs_code)
 +		gfp_mask &= ~__GFP_FS;
 +	if (current->in_io_code)
 +		gfp_mask &= ~__GFP_IO;
  
  	lockdep_trace_alloc(gfp_mask);
  

so that GFP flags passed to memory allocations involved by stacking
will be appropriately masked.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2015-03-17 14:06 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-11 20:54 Michal Hocko
2015-03-11 20:54 ` [PATCH 1/2] mm: Allow small allocations to fail Michal Hocko
2015-03-12 12:54   ` Tetsuo Handa
2015-03-12 13:12     ` Michal Hocko
2015-03-15  5:43   ` Tetsuo Handa
2015-03-15 12:13     ` Michal Hocko
2015-03-15 13:06       ` Tetsuo Handa
2015-03-16  7:46         ` [PATCH 1/2 v2] " Michal Hocko
2015-03-16 21:11           ` Johannes Weiner
2015-03-17 10:25             ` Michal Hocko
2015-03-17 13:29               ` Johannes Weiner
2015-03-17 14:17                 ` Michal Hocko
2015-03-17 17:26                   ` Johannes Weiner
2015-03-17 19:41                     ` Michal Hocko
2015-03-18  9:10                       ` Vlastimil Babka
2015-03-18 12:04                         ` Michal Hocko
2015-03-18 12:36                         ` Tetsuo Handa
2015-03-18 11:35                       ` Tetsuo Handa
2015-03-17 11:13           ` Tetsuo Handa
2015-03-17 13:15             ` Michal Hocko
2015-03-18 11:33               ` Tetsuo Handa
2015-03-18 12:23                 ` Michal Hocko
2015-03-19 11:03                   ` Tetsuo Handa
2015-03-11 20:54 ` [PATCH 2/2] mmotm: Enable small allocation " Michal Hocko
2015-03-11 22:36 ` [PATCH 0/2] Move away from non-failing small allocations Sasha Levin
2015-03-16 22:38 ` Andrew Morton
2015-03-17  9:07   ` Michal Hocko
2015-03-17 14:06     ` Tetsuo Handa [this message]
2015-04-02 11:53       ` Tetsuo Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201503172305.DIH52162.FOFMFOVJHLOtQS@I-love.SAKURA.ne.jp \
    --to=penguin-kernel@i-love.sakura.ne.jp \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=fengguang.wu@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox