Hi Dave > >> The machine's status is describe as blow: > >> > >> the machine has 96 physical memory. And the real use memory is about > >> 64G, and the page cache use about 32G. we also use the swap area, at > >> that time we have about 10G(we set the swap max size to 32G). At that > >> moment, we find xfs report > >> > >> |Apr 29 21:54:31 w-openstack86 kernel: XFS: possible memory allocation > >> deadlock in kmem_alloc (mode:0x250) | Pretty sure that's a GFP_NOFS allocation context. You are right, it is a GFP_NOFS operator from the xfs, xfs use GFP_NOFS flag to avoid recursive filesystem call > > Just once, or many times? > > the message appear many times > from the code, I know that xfs will try 100 time of kmalloc() function The curent upstream kernels report much more information - process, size of allocation, etc. In general, the cause of such problems is memory fragmentation preventing a large contiguous allocation from taking place (e.g. when you try to read a file with millions of extents). > >> in the system. But there is still 32G page cache. > >> > >> So I run > >> > >> |echo 3 > /proc/sys/vm/drop_caches | > >> > >> to drop the page cache. > >> > >> Then the system is fine. > > > > Are you saying that the error message was repeated infinitely until you did the drop_caches? > > > No. the error message don't appear after I drop_cache. Yes, you are right, before I echo 3 > /proc/sys/vm/drop_caches, the /proc/buddyinfo is list blow: Node 0, zone DMA 0 0 0 1 2 1 1 0 1 1 3 Node 0, zone DMA32 2983 2230 1037 290 121 63 47 61 16 0 0 Node 0, zone Normal 13707 1126 285 268 291 160 64 21 11 0 0 Node 1, zone Normal 10678 5041 1167 705 316 158 61 22 0 0 0 after the operator the /proc/buddyinfo is list blow: Node 0, zone DMA 0 0 0 1 2 1 1 0 1 1 3 Node 0, zone DMA32 61091 22791 3659 348 169 81 89 63 16 0 0 Node 0, zone Normal 781723 532596 246195 57076 9853 4061 1922 799 217 19 0 Node 1, zone Normal 334903 138984 49608 6929 2770 1603 843 447 232 2 0 we can find that after the operator, we get more large size pages beside the /proc/buddyinfo, is there any other command the get the memory fragmentation info? And beside the drop_caches operator, is there any other command can avoid the memory fragmentation? IIRC, the reason the system can't recover itself is that memory compaction is not triggered from GFP_NOFS allocation context, which means memory reclaim won't try to create contiguous regions by moving things around and hence the allocation will not succeed until a significant amount of memory is freed by some other trigger.... The GFP_NOFS will not triggered memory compaction, where can I find the logic in kernel source code? thank you On Wed, May 18, 2016 at 10:41 PM, Dave Chinner wrote: > On Wed, May 18, 2016 at 04:58:31PM +0800, baotiao wrote: > > Thanks for your reply > > > > >> Hello every, I meet an interesting kernel memory problem. Can anyone > > >> help me explain what happen under the kernel > > > > > > Which kernel version is that? > > > > The kernel version is 3.10.0-327.4.5.el7.x86_64 > > RHEL7 kernel. Best you report the problem to your RH support > contact - the RHEL7 kernels are far different to upstream kernels.. > > > >> The machine's status is describe as blow: > > >> > > >> the machine has 96 physical memory. And the real use memory is about > > >> 64G, and the page cache use about 32G. we also use the swap area, at > > >> that time we have about 10G(we set the swap max size to 32G). At that > > >> moment, we find xfs report > > >> > > >> |Apr 29 21:54:31 w-openstack86 kernel: XFS: possible memory allocation > > >> deadlock in kmem_alloc (mode:0x250) | > > Pretty sure that's a GFP_NOFS allocation context. > > > > Just once, or many times? > > > > the message appear many times > > from the code, I know that xfs will try 100 time of kmalloc() function > > The curent upstream kernels report much more information - process, > size of allocation, etc. > > In general, the cause of such problems is memory fragmentation > preventing a large contiguous allocation from taking place (e.g. > when you try to read a file with millions of extents). > > > >> in the system. But there is still 32G page cache. > > >> > > >> So I run > > >> > > >> |echo 3 > /proc/sys/vm/drop_caches | > > >> > > >> to drop the page cache. > > >> > > >> Then the system is fine. > > > > > > Are you saying that the error message was repeated infinitely until > you did the drop_caches? > > > > > > No. the error message don't appear after I drop_cache. > > Of course - freeing memory will cause contiguous free space to > reform. then the allocation will succeed. > > IIRC, the reason the system can't recover itself is that memory > compaction is not triggered from GFP_NOFS allocation context, which > means memory reclaim won't try to create contiguous regions by > moving things around and hence the allocation will not succeed until > a significant amount of memory is freed by some other trigger.... > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > -- --- Blog: http://www.chenzongzhi.info Twitter: https://twitter.com/baotiao Git: https://github.com/baotiao