From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: mhocko@suse.cz, akpm@linux-foundation.org
Cc: hannes@cmpxchg.org, david@fromorbit.com, mgorman@suse.de,
riel@redhat.com, fengguang.wu@intel.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/2] Move away from non-failing small allocations
Date: Tue, 17 Mar 2015 23:06:34 +0900 [thread overview]
Message-ID: <201503172305.DIH52162.FOFMFOVJHLOtQS@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <20150317090738.GB28112@dhcp22.suse.cz>
Michal Hocko wrote:
> On Mon 16-03-15 15:38:43, Andrew Morton wrote:
> > Realistically, I don't think this overall effort will be successful -
> > we'll add the knob, it won't get enough testing and any attempt to
> > alter the default will be us deliberately destabilizing the kernel
> > without knowing how badly :(
>
> Without the knob we do not allow users to test this at all though and
> the transition will _never_ happen. Which is IMHO bad.
>
Even with the knob, quite little users will test this. The consequence is
likely that end users rush into customer support center about obscure bugs.
I'm working at a support center, and such bugs are really annoying.
> > I wonder if we can alter the behaviour only for filesystem code, so we
> > constrain the new behaviour just to that code where we're having
> > problems. Most/all fs code goes via vfs methods so there's a reasonably
> > small set of places where we can call
>
> We are seeing issues with the fs code now because the test cases which
> led to the current discussion exercise FS code. The code which does
> lock(); kmalloc(GFP_KERNEL) is not reduced there though. I am pretty sure
> we can find other subsystems if we try hard enough.
I'm expecting for patches which avoids deadlock by lock(); kmalloc(GFP_KERNEL).
> > static inline void enter_fs_code(struct super_block *sb)
> > {
> > if (sb->my_small_allocations_can_fail)
> > current->small_allocations_can_fail++;
> > }
> >
> > that way (or something similar) we can select the behaviour on a per-fs
> > basis and the rest of the kernel remains unaffected. Other subsystems
> > can opt in as well.
>
> This is basically leading to GFP_MAYFAIL which is completely backwards
> (the hard requirement should be an exception not a default rule).
> I really do not want to end up with stuffing random may_fail annotations
> all over the kernel.
>
I wish that GFP_NOFS / GFP_NOIO regions are annotated with
static inline void enter_fs_code(void)
{
#ifdef CONFIG_DEBUG_GFP_FLAGS
current->in_fs_code++;
#endif
}
static inline void leave_fs_code(void)
{
#ifdef CONFIG_DEBUG_GFP_FLAGS
current->in_fs_code--;
#endif
}
static inline void enter_io_code(void)
{
#ifdef CONFIG_DEBUG_GFP_FLAGS
current->in_io_code++;
#endif
}
static inline void leave_io_code(void)
{
#ifdef CONFIG_DEBUG_GFP_FLAGS
current->in_io_code--;
#endif
}
so that inappropriate GFP_KERNEL usage inside GFP_NOFS region are catchable
by doing
struct page *
__alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
struct zonelist *zonelist, nodemask_t *nodemask)
{
struct zoneref *preferred_zoneref;
struct page *page = NULL;
unsigned int cpuset_mems_cookie;
int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR;
gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
struct alloc_context ac = {
.high_zoneidx = gfp_zone(gfp_mask),
.nodemask = nodemask,
.migratetype = gfpflags_to_migratetype(gfp_mask),
};
gfp_mask &= gfp_allowed_mask;
+#ifdef CONFIG_DEBUG_GFP_FLAGS
+ WARN_ON(current->in_fs_code & (gfp_mask & __GFP_FS));
+ WARN_ON(current->in_io_code & (gfp_mask & __GFP_IO));
+#endif
lockdep_trace_alloc(gfp_mask);
. It is difficult for non-fs developers to determine whether they need to use
GFP_NOFS than GFP_KERNEL in their code. An example is seen at
http://marc.info/?l=linux-security-module&m=138556479607024&w=2 .
Moreover, I don't know how GFP flags are managed when stacked like
"a swap file on ext4 on top of LVM (with snapshots) on a RAID array
connected over iSCSI" (quoted from comments on Jon's writeup), but I
wish that the distinction between GFP_KERNEL / GFP_NOFS / GFP_NOIO
are removed from memory allocating function callers by doing
static inline void enter_fs_code(void)
{
current->in_fs_code++;
}
static inline void leave_fs_code(void)
{
current->in_fs_code--;
}
static inline void enter_io_code(void)
{
current->in_io_code++;
}
static inline void leave_io_code(void)
{
current->in_io_code--;
}
struct page *
__alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
struct zonelist *zonelist, nodemask_t *nodemask)
{
struct zoneref *preferred_zoneref;
struct page *page = NULL;
unsigned int cpuset_mems_cookie;
int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR;
gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
struct alloc_context ac = {
.high_zoneidx = gfp_zone(gfp_mask),
.nodemask = nodemask,
.migratetype = gfpflags_to_migratetype(gfp_mask),
};
gfp_mask &= gfp_allowed_mask;
+ if (current->in_fs_code)
+ gfp_mask &= ~__GFP_FS;
+ if (current->in_io_code)
+ gfp_mask &= ~__GFP_IO;
lockdep_trace_alloc(gfp_mask);
so that GFP flags passed to memory allocations involved by stacking
will be appropriately masked.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-03-17 14:06 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-11 20:54 Michal Hocko
2015-03-11 20:54 ` [PATCH 1/2] mm: Allow small allocations to fail Michal Hocko
2015-03-12 12:54 ` Tetsuo Handa
2015-03-12 13:12 ` Michal Hocko
2015-03-15 5:43 ` Tetsuo Handa
2015-03-15 12:13 ` Michal Hocko
2015-03-15 13:06 ` Tetsuo Handa
2015-03-16 7:46 ` [PATCH 1/2 v2] " Michal Hocko
2015-03-16 21:11 ` Johannes Weiner
2015-03-17 10:25 ` Michal Hocko
2015-03-17 13:29 ` Johannes Weiner
2015-03-17 14:17 ` Michal Hocko
2015-03-17 17:26 ` Johannes Weiner
2015-03-17 19:41 ` Michal Hocko
2015-03-18 9:10 ` Vlastimil Babka
2015-03-18 12:04 ` Michal Hocko
2015-03-18 12:36 ` Tetsuo Handa
2015-03-18 11:35 ` Tetsuo Handa
2015-03-17 11:13 ` Tetsuo Handa
2015-03-17 13:15 ` Michal Hocko
2015-03-18 11:33 ` Tetsuo Handa
2015-03-18 12:23 ` Michal Hocko
2015-03-19 11:03 ` Tetsuo Handa
2015-03-11 20:54 ` [PATCH 2/2] mmotm: Enable small allocation " Michal Hocko
2015-03-11 22:36 ` [PATCH 0/2] Move away from non-failing small allocations Sasha Levin
2015-03-16 22:38 ` Andrew Morton
2015-03-17 9:07 ` Michal Hocko
2015-03-17 14:06 ` Tetsuo Handa [this message]
2015-04-02 11:53 ` Tetsuo Handa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201503172305.DIH52162.FOFMFOVJHLOtQS@I-love.SAKURA.ne.jp \
--to=penguin-kernel@i-love.sakura.ne.jp \
--cc=akpm@linux-foundation.org \
--cc=david@fromorbit.com \
--cc=fengguang.wu@intel.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@suse.cz \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox