From: Vlastimil Babka <vbabka@suse.cz>
To: Joonsoo Kim <js1304@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Nazarewicz <mina86@mina86.com>,
Minchan Kim <minchan@kernel.org>, Mel Gorman <mgorman@suse.de>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linux-api@vger.kernel.org, Joonsoo Kim <iamjoonsoo.kim@lge.com>
Subject: Re: [PATCH 2/2] mm/page_ref: add tracepoint to track down page reference manipulation
Date: Wed, 18 Nov 2015 16:34:30 +0100 [thread overview]
Message-ID: <564C9A86.1090906@suse.cz> (raw)
In-Reply-To: <1447053784-27811-2-git-send-email-iamjoonsoo.kim@lge.com>
On 11/09/2015 08:23 AM, Joonsoo Kim wrote:
> CMA allocation should be guaranteed to succeed by definition, but,
> unfortunately, it would be failed sometimes. It is hard to track down
> the problem, because it is related to page reference manipulation and
> we don't have any facility to analyze it.
Reminds me of the PeterZ's VM_PINNED patchset. What happened to it?
https://lwn.net/Articles/600502/
> This patch adds tracepoints to track down page reference manipulation.
> With it, we can find exact reason of failure and can fix the problem.
> Following is an example of tracepoint output.
>
> <...>-9018 [004] 92.678375: page_ref_set: pfn=0x17ac9 flags=0x0 count=1 mapcount=0 mapping=(nil) mt=4 val=1
> <...>-9018 [004] 92.678378: kernel_stack:
> => get_page_from_freelist (ffffffff81176659)
> => __alloc_pages_nodemask (ffffffff81176d22)
> => alloc_pages_vma (ffffffff811bf675)
> => handle_mm_fault (ffffffff8119e693)
> => __do_page_fault (ffffffff810631ea)
> => trace_do_page_fault (ffffffff81063543)
> => do_async_page_fault (ffffffff8105c40a)
> => async_page_fault (ffffffff817581d8)
> [snip]
> <...>-9018 [004] 92.678379: page_ref_mod: pfn=0x17ac9 flags=0x40048 count=2 mapcount=1 mapping=0xffff880015a78dc1 mt=4 val=1
> [snip]
> ...
> ...
> <...>-9131 [001] 93.174468: test_pages_isolated: start_pfn=0x17800 end_pfn=0x17c00 fin_pfn=0x17ac9 ret=fail
> [snip]
> <...>-9018 [004] 93.174843: page_ref_mod_and_test: pfn=0x17ac9 flags=0x40068 count=0 mapcount=0 mapping=0xffff880015a78dc1 mt=4 val=-1 ret=1
> => release_pages (ffffffff8117c9e4)
> => free_pages_and_swap_cache (ffffffff811b0697)
> => tlb_flush_mmu_free (ffffffff81199616)
> => tlb_finish_mmu (ffffffff8119a62c)
> => exit_mmap (ffffffff811a53f7)
> => mmput (ffffffff81073f47)
> => do_exit (ffffffff810794e9)
> => do_group_exit (ffffffff81079def)
> => SyS_exit_group (ffffffff81079e74)
> => entry_SYSCALL_64_fastpath (ffffffff817560b6)
>
> This output shows that problem comes from exit path. In exit path,
> to improve performance, pages are not freed immediately. They are gathered
> and processed by batch. During this process, migration cannot be possible
> and CMA allocation is failed. This problem is hard to find without this
> page reference tracepoint facility.
Yeah but when you realized it was this problem, what was the fix? Probably not
remove batching from exit path? Shouldn't CMA in this case just try waiting for
the pins to go away, which would eventually happen? And for long-term pins,
VM_PINNED would make sure the pages are migrated away from CMA pageblocks first?
So I'm worried that this is quite nontrivial change for a very specific usecase.
> Enabling this feature bloat kernel text 20 KB in my configuration.
It's not just that, see below.
[...]
> static inline int page_ref_freeze(struct page *page, int count)
> {
> - return likely(atomic_cmpxchg(&page->_count, count, 0) == count);
> + int ret = likely(atomic_cmpxchg(&page->_count, count, 0) == count);
The "likely" mean makes no sense anymore, doe it?
> diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug
> index 957d3da..71d2399 100644
> --- a/mm/Kconfig.debug
> +++ b/mm/Kconfig.debug
> @@ -28,3 +28,7 @@ config DEBUG_PAGEALLOC
>
> config PAGE_POISONING
> bool
> +
> +config DEBUG_PAGE_REF
> + bool "Enable tracepoint to track down page reference manipulation"
So you should probably state the costs. Which is the extra memory, and also that
all the page ref manipulations are now turned to function calls, even if the
tracepoints are disabled. Patch 1 didn't change that many callsites, so maybe it
would be feasible to have the tracepoints inline, where being disabled has
near-zero overhead?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-11-18 15:34 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-09 7:23 [PATCH 1/2] mm: introduce page reference manipulation functions Joonsoo Kim
2015-11-09 7:23 ` [PATCH 2/2] mm/page_ref: add tracepoint to track down page reference manipulation Joonsoo Kim
2015-11-10 16:02 ` Michal Nazarewicz
2015-11-18 15:34 ` Vlastimil Babka [this message]
2015-11-19 6:50 ` Minchan Kim
2015-11-20 6:33 ` Joonsoo Kim
2015-11-20 16:42 ` Steven Rostedt
2015-11-23 8:28 ` Joonsoo Kim
2015-11-23 14:26 ` Steven Rostedt
2015-11-24 1:45 ` Joonsoo Kim
2015-12-03 4:16 ` Joonsoo Kim
2015-12-09 20:01 ` Steven Rostedt
2015-12-10 2:50 ` Joonsoo Kim
2015-12-10 3:36 ` Steven Rostedt
2015-12-10 4:07 ` Joonsoo Kim
2015-11-24 1:56 ` Joonsoo Kim
2015-11-09 7:53 ` [PATCH 1/2] mm: introduce page reference manipulation functions Sergey Senozhatsky
2015-11-09 8:00 ` Joonsoo Kim
2015-11-09 11:45 ` Kirill A. Shutemov
2015-11-10 0:28 ` Joonsoo Kim
2015-11-10 15:58 ` Michal Nazarewicz
2016-02-15 3:04 js1304
2016-02-15 3:04 ` [PATCH 2/2] mm/page_ref: add tracepoint to track down page reference manipulation js1304
2016-02-15 5:08 ` Sergey Senozhatsky
2016-02-15 5:28 ` Sergey Senozhatsky
2016-02-15 14:18 ` Joonsoo Kim
2016-02-15 16:07 ` Steven Rostedt
2016-02-16 0:47 ` Joonsoo Kim
2016-02-16 1:16 ` Steven Rostedt
2016-02-18 7:46 ` Joonsoo Kim
2016-02-18 14:20 ` Steven Rostedt
2016-02-18 14:29 ` Steven Rostedt
2016-02-19 0:34 ` Sergey Senozhatsky
2016-02-19 1:39 ` Joonsoo Kim
2016-02-19 1:46 ` Steven Rostedt
2016-02-19 2:15 ` Sergey Senozhatsky
2016-02-19 1:20 ` Joonsoo Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=564C9A86.1090906@suse.cz \
--to=vbabka@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=iamjoonsoo.kim@lge.com \
--cc=js1304@gmail.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mina86@mina86.com \
--cc=minchan@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox