Hi Dave

> >> The machine's status is describe as blow:
> >>
> >> the machine has 96 physical memory. And the real use memory is about
> >> 64G, and the page cache use about 32G. we also use the swap area, at
> >> that time we have about 10G(we set the swap max size to 32G). At that
> >> moment, we find xfs report
> >>
> >> |Apr 29 21:54:31 w-openstack86 kernel: XFS: possible memory allocation
> >> deadlock in kmem_alloc (mode:0x250) |

Pretty sure that's a GFP_NOFS allocation context.

You are right, it is a GFP_NOFS operator from the xfs,  xfs use GFP_NOFS
flag to avoid recursive filesystem call


> > Just once, or many times?
>
> the message appear many times
> from the code, I know that xfs will try 100 time of kmalloc() function

The curent upstream kernels report much more information - process,
size of allocation, etc.

In general, the cause of such problems is memory fragmentation
preventing a large contiguous allocation from taking place (e.g.
when you try to read a file with millions of extents).

> >> in the system. But there is still 32G page cache.
> >>
> >> So I run
> >>
> >> |echo 3 > /proc/sys/vm/drop_caches |
> >>
> >> to drop the page cache.
> >>
> >> Then the system is fine.
> >
> > Are you saying that the error message was repeated infinitely until you
did the drop_caches?
>
>
> No. the error message don't appear after I drop_cache.


Yes, you are right, before I echo 3 > /proc/sys/vm/drop_caches, the
/proc/buddyinfo is list blow:
Node 0, zone      DMA      0      0      0      1      2      1      1
0      1      1      3
Node 0, zone    DMA32   2983   2230   1037    290    121     63     47
61     16      0      0
Node 0, zone   Normal  13707   1126    285    268    291    160     64
21     11      0      0
Node 1, zone   Normal  10678   5041   1167    705    316    158     61
22      0      0      0


after the operator the /proc/buddyinfo is list blow:
Node 0, zone      DMA      0      0      0      1      2      1      1
0      1      1      3
Node 0, zone    DMA32  61091  22791   3659    348    169     81     89
63     16      0      0
Node 0, zone   Normal 781723 532596 246195  57076   9853   4061   1922
799    217     19      0
Node 1, zone   Normal 334903 138984  49608   6929   2770   1603    843
447    232      2      0


we can find that after the operator, we get more large size pages

beside the /proc/buddyinfo, is there any other command the get the memory
fragmentation info?

And beside the drop_caches operator, is there any other command can avoid
the memory fragmentation?


IIRC, the reason the system can't recover itself is that memory
compaction is not triggered from GFP_NOFS allocation context, which
means memory reclaim won't try to create contiguous regions by
moving things around and hence the allocation will not succeed until
a significant amount of memory is freed by some other trigger....


The GFP_NOFS will not triggered memory compaction, where can I find the
logic in kernel source code?

thank you

On Wed, May 18, 2016 at 10:41 PM, Dave Chinner <david@fromorbit.com> wrote:

> On Wed, May 18, 2016 at 04:58:31PM +0800, baotiao wrote:
> > Thanks for your reply
> >
> > >> Hello every, I meet an interesting kernel memory problem. Can anyone
> > >> help me explain what happen under the kernel
> > >
> > > Which kernel version is that?
> >
> > The kernel version is 3.10.0-327.4.5.el7.x86_64
>
> RHEL7 kernel. Best you report the problem to your RH support
> contact - the RHEL7 kernels are far different to upstream kernels..
>
> > >> The machine's status is describe as blow:
> > >>
> > >> the machine has 96 physical memory. And the real use memory is about
> > >> 64G, and the page cache use about 32G. we also use the swap area, at
> > >> that time we have about 10G(we set the swap max size to 32G). At that
> > >> moment, we find xfs report
> > >>
> > >> |Apr 29 21:54:31 w-openstack86 kernel: XFS: possible memory allocation
> > >> deadlock in kmem_alloc (mode:0x250) |
>
> Pretty sure that's a GFP_NOFS allocation context.
>
> > > Just once, or many times?
> >
> > the message appear many times
> > from the code, I know that xfs will try 100 time of kmalloc() function
>
> The curent upstream kernels report much more information - process,
> size of allocation, etc.
>
> In general, the cause of such problems is memory fragmentation
> preventing a large contiguous allocation from taking place (e.g.
> when you try to read a file with millions of extents).
>
> > >> in the system. But there is still 32G page cache.
> > >>
> > >> So I run
> > >>
> > >> |echo 3 > /proc/sys/vm/drop_caches |
> > >>
> > >> to drop the page cache.
> > >>
> > >> Then the system is fine.
> > >
> > > Are you saying that the error message was repeated infinitely until
> you did the drop_caches?
> >
> >
> > No. the error message don't appear after I drop_cache.
>
> Of course - freeing memory will cause contiguous free space to
> reform. then the allocation will succeed.
>
> IIRC, the reason the system can't recover itself is that memory
> compaction is not triggered from GFP_NOFS allocation context, which
> means memory reclaim won't try to create contiguous regions by
> moving things around and hence the allocation will not succeed until
> a significant amount of memory is freed by some other trigger....
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>


-- 
---
Blog: http://www.chenzongzhi.info
Twitter: https://twitter.com/baotiao <https://twitter.com/#%21/baotiao>
Git: https://github.com/baotiao