Re: why the kmalloc return fail when there is free physical address but return success after dropping page caches

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: 陈宗志 <baotiao@gmail.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Vlastimil Babka <vbabka@suse.cz>, linux-mm@kvack.org
Subject: Re: why the kmalloc return fail when there is free physical address but return success after dropping page caches
Date: Wed, 25 May 2016 17:25:05 +0800	[thread overview]
Message-ID: <CAGbZs7j=c=eRYFGvpv5NRhKs16Vq-cQTcbTZTKa4xKP4QGRuzQ@mail.gmail.com> (raw)
In-Reply-To: <20160518144148.GD21200@dastard>

[-- Attachment #1: Type: text/plain, Size: 5524 bytes --]

Hi Dave

> >> The machine's status is describe as blow:
> >>
> >> the machine has 96 physical memory. And the real use memory is about
> >> 64G, and the page cache use about 32G. we also use the swap area, at
> >> that time we have about 10G(we set the swap max size to 32G). At that
> >> moment, we find xfs report
> >>
> >> |Apr 29 21:54:31 w-openstack86 kernel: XFS: possible memory allocation
> >> deadlock in kmem_alloc (mode:0x250) |

Pretty sure that's a GFP_NOFS allocation context.

You are right, it is a GFP_NOFS operator from the xfs,  xfs use GFP_NOFS
flag to avoid recursive filesystem call


> > Just once, or many times?
>
> the message appear many times
> from the code, I know that xfs will try 100 time of kmalloc() function

The curent upstream kernels report much more information - process,
size of allocation, etc.

In general, the cause of such problems is memory fragmentation
preventing a large contiguous allocation from taking place (e.g.
when you try to read a file with millions of extents).

> >> in the system. But there is still 32G page cache.
> >>
> >> So I run
> >>
> >> |echo 3 > /proc/sys/vm/drop_caches |
> >>
> >> to drop the page cache.
> >>
> >> Then the system is fine.
> >
> > Are you saying that the error message was repeated infinitely until you
did the drop_caches?
>
>
> No. the error message don't appear after I drop_cache.


Yes, you are right, before I echo 3 > /proc/sys/vm/drop_caches, the
/proc/buddyinfo is list blow:
Node 0, zone      DMA      0      0      0      1      2      1      1
0      1      1      3
Node 0, zone    DMA32   2983   2230   1037    290    121     63     47
61     16      0      0
Node 0, zone   Normal  13707   1126    285    268    291    160     64
21     11      0      0
Node 1, zone   Normal  10678   5041   1167    705    316    158     61
22      0      0      0


after the operator the /proc/buddyinfo is list blow:
Node 0, zone      DMA      0      0      0      1      2      1      1
0      1      1      3
Node 0, zone    DMA32  61091  22791   3659    348    169     81     89
63     16      0      0
Node 0, zone   Normal 781723 532596 246195  57076   9853   4061   1922
799    217     19      0
Node 1, zone   Normal 334903 138984  49608   6929   2770   1603    843
447    232      2      0


we can find that after the operator, we get more large size pages

beside the /proc/buddyinfo, is there any other command the get the memory
fragmentation info?

And beside the drop_caches operator, is there any other command can avoid
the memory fragmentation?




IIRC, the reason the system can't recover itself is that memory
compaction is not triggered from GFP_NOFS allocation context, which
means memory reclaim won't try to create contiguous regions by
moving things around and hence the allocation will not succeed until
a significant amount of memory is freed by some other trigger....


The GFP_NOFS will not triggered memory compaction, where can I find the
logic in kernel source code?

thank you

On Wed, May 18, 2016 at 10:41 PM, Dave Chinner <david@fromorbit.com> wrote:

> On Wed, May 18, 2016 at 04:58:31PM +0800, baotiao wrote:
> > Thanks for your reply
> >
> > >> Hello every, I meet an interesting kernel memory problem. Can anyone
> > >> help me explain what happen under the kernel
> > >
> > > Which kernel version is that?
> >
> > The kernel version is 3.10.0-327.4.5.el7.x86_64
>
> RHEL7 kernel. Best you report the problem to your RH support
> contact - the RHEL7 kernels are far different to upstream kernels..
>
> > >> The machine's status is describe as blow:
> > >>
> > >> the machine has 96 physical memory. And the real use memory is about
> > >> 64G, and the page cache use about 32G. we also use the swap area, at
> > >> that time we have about 10G(we set the swap max size to 32G). At that
> > >> moment, we find xfs report
> > >>
> > >> |Apr 29 21:54:31 w-openstack86 kernel: XFS: possible memory allocation
> > >> deadlock in kmem_alloc (mode:0x250) |
>
> Pretty sure that's a GFP_NOFS allocation context.
>
> > > Just once, or many times?
> >
> > the message appear many times
> > from the code, I know that xfs will try 100 time of kmalloc() function
>
> The curent upstream kernels report much more information - process,
> size of allocation, etc.
>
> In general, the cause of such problems is memory fragmentation
> preventing a large contiguous allocation from taking place (e.g.
> when you try to read a file with millions of extents).
>
> > >> in the system. But there is still 32G page cache.
> > >>
> > >> So I run
> > >>
> > >> |echo 3 > /proc/sys/vm/drop_caches |
> > >>
> > >> to drop the page cache.
> > >>
> > >> Then the system is fine.
> > >
> > > Are you saying that the error message was repeated infinitely until
> you did the drop_caches?
> >
> >
> > No. the error message don't appear after I drop_cache.
>
> Of course - freeing memory will cause contiguous free space to
> reform. then the allocation will succeed.
>
> IIRC, the reason the system can't recover itself is that memory
> compaction is not triggered from GFP_NOFS allocation context, which
> means memory reclaim won't try to create contiguous regions by
> moving things around and hence the allocation will not succeed until
> a significant amount of memory is freed by some other trigger....
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>



-- 
---
Blog: http://www.chenzongzhi.info
Twitter: https://twitter.com/baotiao <https://twitter.com/#%21/baotiao>
Git: https://github.com/baotiao

[-- Attachment #2: Type: text/html, Size: 7717 bytes --]

     prev parent reply	other threads:[~2016-05-25  9:25 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-18  2:38 baotiao
2016-05-18  8:45 ` Vlastimil Babka
2016-05-18  8:58   ` baotiao
2016-05-18 14:41     ` Dave Chinner
2016-05-25  9:25       ` 陈宗志 [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGbZs7j=c=eRYFGvpv5NRhKs16Vq-cQTcbTZTKa4xKP4QGRuzQ@mail.gmail.com' \
    --to=baotiao@gmail.com \
    --cc=david@fromorbit.com \
    --cc=linux-mm@kvack.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox