Re: [RFC] can we use vmalloc to alloc thread stack if compaction failed

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>,
	Xishi Qiu <qiuxishi@huawei.com>, Michal Hocko <mhocko@kernel.org>,
	Tejun Heo <tj@kernel.org>, Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>,
	Yisheng Xie <xieyisheng1@huawei.com>
Subject: Re: [RFC] can we use vmalloc to alloc thread stack if compaction failed
Date: Mon, 1 Aug 2016 14:30:51 +0900	[thread overview]
Message-ID: <20160801053051.GA8623@js1304-P5Q-DELUXE> (raw)
In-Reply-To: <CALCETrV++O=ynMKYwdhG-AksnVXX6hBpBxtXfNaa_dhVLMu2Tg@mail.gmail.com>

On Fri, Jul 29, 2016 at 12:47:38PM -0700, Andy Lutomirski wrote:
> ---------- Forwarded message ----------
> From: "Joonsoo Kim" <iamjoonsoo.kim@lge.com>
> Date: Jul 28, 2016 7:57 PM
> Subject: Re: [RFC] can we use vmalloc to alloc thread stack if compaction failed
> To: "Andy Lutomirski" <luto@kernel.org>
> Cc: "Xishi Qiu" <qiuxishi@huawei.com>, "Michal Hocko"
> <mhocko@kernel.org>, "Tejun Heo" <tj@kernel.org>, "Ingo Molnar"
> <mingo@kernel.org>, "Peter Zijlstra" <peterz@infradead.org>, "LKML"
> <linux-kernel@vger.kernel.org>, "Linux MM" <linux-mm@kvack.org>,
> "Yisheng Xie" <xieyisheng1@huawei.com>
> 
> > On Thu, Jul 28, 2016 at 08:07:51AM -0700, Andy Lutomirski wrote:
> > > On Thu, Jul 28, 2016 at 3:51 AM, Xishi Qiu <qiuxishi@huawei.com> wrote:
> > > > On 2016/7/28 17:43, Michal Hocko wrote:
> > > >
> > > >> On Thu 28-07-16 16:45:06, Xishi Qiu wrote:
> > > >>> On 2016/7/28 15:58, Michal Hocko wrote:
> > > >>>
> > > >>>> On Thu 28-07-16 15:41:53, Xishi Qiu wrote:
> > > >>>>> On 2016/7/28 15:20, Michal Hocko wrote:
> > > >>>>>
> > > >>>>>> On Thu 28-07-16 15:08:26, Xishi Qiu wrote:
> > > >>>>>>> Usually THREAD_SIZE_ORDER is 2, it means we need to alloc 16kb continuous
> > > >>>>>>> physical memory during fork a new process.
> > > >>>>>>>
> > > >>>>>>> If the system's memory is very small, especially the smart phone, maybe there
> > > >>>>>>> is only 1G memory. So the free memory is very small and compaction is not
> > > >>>>>>> always success in slowpath(__alloc_pages_slowpath), then alloc thread stack
> > > >>>>>>> may be failed for memory fragment.
> > > >>>>>>
> > > >>>>>> Well, with the current implementation of the page allocator those
> > > >>>>>> requests will not fail in most cases. The oom killer would be invoked in
> > > >>>>>> order to free up some memory.
> > > >>>>>>
> > > >>>>>
> > > >>>>> Hi Michal,
> > > >>>>>
> > > >>>>> Yes, it success in most cases, but I did have seen this problem in some
> > > >>>>> stress-test.
> > > >>>>>
> > > >>>>> DMA free:470628kB, but alloc 2 order block failed during fork a new process.
> > > >>>>> There are so many memory fragments and the large block may be soon taken by
> > > >>>>> others after compact because of stress-test.
> > > >>>>>
> > > >>>>> --- dmesg messages ---
> > > >>>>> 07-13 08:41:51.341 <4>[309805.658142s][pid:1361,cpu5,sManagerService]sManagerService: page allocation failure: order:2, mode:0x2000d1
> > > >>>>
> > > >>>> Yes but this is __GFP_DMA allocation. I guess you have already reported
> > > >>>> this failure and you've been told that this is quite unexpected for the
> > > >>>> kernel stack allocation. It is your out-of-tree patch which just makes
> > > >>>> things worse because DMA restricted allocations are considered "lowmem"
> > > >>>> and so they do not invoke OOM killer and do not retry like regular
> > > >>>> GFP_KERNEL allocations.
> > > >>>
> > > >>> Hi Michal,
> > > >>>
> > > >>> Yes, we add GFP_DMA, but I don't think this is the key for the problem.
> > > >>
> > > >> You are restricting the allocation request to a single zone which is
> > > >> definitely not good. Look at how many larger order pages are available
> > > >> in the Normal zone.
> > > >>
> > > >>> If we do oom-killer, maybe we will get a large block later, but there
> > > >>> is enough free memory before oom(although most of them are fragments).
> > > >>
> > > >> Killing a task is of course the last resort action. It would give you
> > > >> larger order blocks used for the victims thread.
> > > >>
> > > >>> I wonder if we can alloc success without kill any process in this situation.
> > > >>
> > > >> Sure it would be preferable to compact that memory but that might be
> > > >> hard with your restriction in place. Consider that DMA zone would tend
> > > >> to be less movable than normal zones as users would have to pin it for
> > > >> DMA. Your DMA is really large so this might turn out to just happen to
> > > >> work but note that the primary problem here is that you put a zone
> > > >> restriction for your allocations.
> > > >>
> > > >>> Maybe use vmalloc is a good way, but I don't know the influence.
> > > >>
> > > >> You can have a look at vmalloc patches posted by Andy. They are not that
> > > >> trivial.
> > > >>
> > > >
> > > > Hi Michal,
> > > >
> > > > Thank you for your comment, could you give me the link?
> > > >
> > >
> > > I've been keeping it mostly up to date in this branch:
> > >
> > > https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/log/?h=x86/vmap_stack
> > >
> > > It's currently out of sync due to a bunch of the patches being queued
> > > elsewhere for the merge window.
> >
> > Hello, Andy.
> >
> > I have some questions about it.
> >
> > IIUC, to turn on HAVE_ARCH_VMAP_STACK on different architecture, there
> > is nothing to be done in architecture side if the architecture doesn't
> > support lazily faults in top-level paging entries for the vmalloc
> > area. Is my understanding is correct?
> >
> 
> There should be nothing fundamental that needs to be done.  On the
> other hand, it might be good to make sure the arch code can print a
> clean stack trace on stack overflow.
> 
> If it's helpful, I just pushed out anew

You mean that you can turn on HAVE_ARCH_VMAP_STACK on the other arch? It
would be helpful. :)

> 
> > And, I'd like to know how you search problematic places using kernel
> > stack for DMA.
> >
> 
> I did some searching for problematic sg_init_buf calls using
> Coccinelle.  I'm not very good at Coccinelle, so I may have missed
> something.

I'm also not familiar with Coccinelle. Could you share your .cocci
script? I can think of following one but there would be a better way.

virtual report

@stack_var depends on report@
type T1;
expression E1, E2;
identifier I1;
@@
(
* T1 I1;
)
...
(
* sg_init_one(E1, &I1, E2)
|
* sg_set_buf(E1, &I1, E2)
)

@stack_arr depends on report@
type T1;
expression E1, E2, E3;
identifier I1;
@@
(
* T1 I1[E1];
)
...
(
* sg_init_one(E2, I1, E3)
|
* sg_set_buf(E2, I1, E3)
)


> For the most part, DMA API debugging should have found the problems
> already.  The ones I found were in drivers that didn't do real DMA:
> crypto users and virtio.

Ah... using stack for DMA API is already prohibited.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2016-08-01  5:25 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-28  7:08 Xishi Qiu
2016-07-28  7:20 ` Michal Hocko
2016-07-28  7:41   ` Xishi Qiu
2016-07-28  7:58     ` Michal Hocko
2016-07-28  8:45       ` Xishi Qiu
2016-07-28  9:43         ` Michal Hocko
2016-07-28 10:51           ` Xishi Qiu
2016-07-28 15:07             ` Andy Lutomirski
2016-07-29  3:01               ` Joonsoo Kim
2016-07-29 19:47                 ` Andy Lutomirski
2016-08-01  5:30                   ` Joonsoo Kim [this message]
2016-08-10 11:59                     ` Andy Lutomirski
2016-08-16  4:18                       ` Joonsoo Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160801053051.GA8623@js1304-P5Q-DELUXE \
    --to=iamjoonsoo.kim@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@amacapital.net \
    --cc=luto@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=qiuxishi@huawei.com \
    --cc=tj@kernel.org \
    --cc=xieyisheng1@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox