From: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
David Rientjes <rientjes@google.com>,
Minchan Kim <minchan@kernel.org>, Mel Gorman <mgorman@suse.de>,
Joonsoo Kim <iamjoonsoo.kim@lge.com>,
Michal Nazarewicz <mina86@mina86.com>,
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
Christoph Lameter <cl@linux.com>, Rik van Riel <riel@redhat.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 09/13] mm, compaction: skip buddy pages by their order in the migrate scanner
Date: Mon, 23 Jun 2014 17:29:29 +0800 [thread overview]
Message-ID: <53A7F379.7020504@cn.fujitsu.com> (raw)
In-Reply-To: <1403279383-5862-10-git-send-email-vbabka@suse.cz>
On 06/20/2014 11:49 PM, Vlastimil Babka wrote:
> The migration scanner skips PageBuddy pages, but does not consider their order
> as checking page_order() is generally unsafe without holding the zone->lock,
> and acquiring the lock just for the check wouldn't be a good tradeoff.
>
> Still, this could avoid some iterations over the rest of the buddy page, and
> if we are careful, the race window between PageBuddy() check and page_order()
> is small, and the worst thing that can happen is that we skip too much and miss
> some isolation candidates. This is not that bad, as compaction can already fail
> for many other reasons like parallel allocations, and those have much larger
> race window.
>
> This patch therefore makes the migration scanner obtain the buddy page order
> and use it to skip the whole buddy page, if the order appears to be in the
> valid range.
>
> It's important that the page_order() is read only once, so that the value used
> in the checks and in the pfn calculation is the same. But in theory the
> compiler can replace the local variable by multiple inlines of page_order().
> Therefore, the patch introduces page_order_unsafe() that uses ACCESS_ONCE to
> prevent this.
>
> Testing with stress-highalloc from mmtests shows a 15% reduction in number of
> pages scanned by migration scanner. This change is also a prerequisite for a
> later patch which is detecting when a cc->order block of pages contains
> non-buddy pages that cannot be isolated, and the scanner should thus skip to
> the next block immediately.
>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Michal Nazarewicz <mina86@mina86.com>
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: David Rientjes <rientjes@google.com>
Fair enough.
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> ---
> mm/compaction.c | 36 +++++++++++++++++++++++++++++++-----
> mm/internal.h | 16 +++++++++++++++-
> 2 files changed, 46 insertions(+), 6 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 41c7005..df0961b 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -270,8 +270,15 @@ static inline bool compact_should_abort(struct compact_control *cc)
> static bool suitable_migration_target(struct page *page)
> {
> /* If the page is a large free page, then disallow migration */
> - if (PageBuddy(page) && page_order(page) >= pageblock_order)
> - return false;
> + if (PageBuddy(page)) {
> + /*
> + * We are checking page_order without zone->lock taken. But
> + * the only small danger is that we skip a potentially suitable
> + * pageblock, so it's not worth to check order for valid range.
> + */
> + if (page_order_unsafe(page) >= pageblock_order)
> + return false;
> + }
>
> /* If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration */
> if (migrate_async_suitable(get_pageblock_migratetype(page)))
> @@ -591,11 +598,23 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
> valid_page = page;
>
> /*
> - * Skip if free. page_order cannot be used without zone->lock
> - * as nothing prevents parallel allocations or buddy merging.
> + * Skip if free. We read page order here without zone lock
> + * which is generally unsafe, but the race window is small and
> + * the worst thing that can happen is that we skip some
> + * potential isolation targets.
> */
> - if (PageBuddy(page))
> + if (PageBuddy(page)) {
> + unsigned long freepage_order = page_order_unsafe(page);
> +
> + /*
> + * Without lock, we cannot be sure that what we got is
> + * a valid page order. Consider only values in the
> + * valid order range to prevent low_pfn overflow.
> + */
> + if (freepage_order > 0 && freepage_order < MAX_ORDER)
> + low_pfn += (1UL << freepage_order) - 1;
> continue;
> + }
>
> /*
> * Check may be lockless but that's ok as we recheck later.
> @@ -683,6 +702,13 @@ next_pageblock:
> low_pfn = ALIGN(low_pfn + 1, pageblock_nr_pages) - 1;
> }
>
> + /*
> + * The PageBuddy() check could have potentially brought us outside
> + * the range to be scanned.
> + */
> + if (unlikely(low_pfn > end_pfn))
> + low_pfn = end_pfn;
> +
> acct_isolated(zone, locked, cc);
>
> if (locked)
> diff --git a/mm/internal.h b/mm/internal.h
> index 2c187d2..584cd69 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -171,7 +171,8 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
> * general, page_zone(page)->lock must be held by the caller to prevent the
> * page from being allocated in parallel and returning garbage as the order.
> * If a caller does not hold page_zone(page)->lock, it must guarantee that the
> - * page cannot be allocated or merged in parallel.
> + * page cannot be allocated or merged in parallel. Alternatively, it must
> + * handle invalid values gracefully, and use page_order_unsafe() below.
> */
> static inline unsigned long page_order(struct page *page)
> {
> @@ -179,6 +180,19 @@ static inline unsigned long page_order(struct page *page)
> return page_private(page);
> }
>
> +/*
> + * Like page_order(), but for callers who cannot afford to hold the zone lock.
> + * PageBuddy() should be checked first by the caller to minimize race window,
> + * and invalid values must be handled gracefully.
> + *
> + * ACCESS_ONCE is used so that if the caller assigns the result into a local
> + * variable and e.g. tests it for valid range before using, the compiler cannot
> + * decide to remove the variable and inline the page_private(page) multiple
> + * times, potentially observing different values in the tests and the actual
> + * use of the result.
> + */
> +#define page_order_unsafe(page) ACCESS_ONCE(page_private(page))
> +
> static inline bool is_cow_mapping(vm_flags_t flags)
> {
> return (flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE;
>
--
Thanks.
Zhang Yanfei
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2014-06-23 9:29 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-20 15:49 [PATCH v3 00/13] compaction: balancing overhead and success rates Vlastimil Babka
2014-06-20 15:49 ` [PATCH v3 01/13] mm, THP: don't hold mmap_sem in khugepaged when allocating THP Vlastimil Babka
2014-06-20 17:45 ` Kirill A. Shutemov
2014-06-23 5:39 ` Zhang Yanfei
2014-06-23 9:52 ` Vlastimil Babka
2014-06-23 10:40 ` Zhang Yanfei
2014-06-20 15:49 ` [PATCH v3 02/13] mm, compaction: defer each zone individually instead of preferred zone Vlastimil Babka
2014-06-23 2:24 ` Minchan Kim
2014-06-23 6:26 ` Zhang Yanfei
2014-06-24 8:23 ` Joonsoo Kim
2014-06-24 15:29 ` Vlastimil Babka
2014-06-25 1:02 ` Joonsoo Kim
2014-06-20 15:49 ` [PATCH v3 03/13] mm, compaction: do not recheck suitable_migration_target under lock Vlastimil Babka
2014-06-20 15:49 ` [PATCH v3 04/13] mm, compaction: move pageblock checks up from isolate_migratepages_range() Vlastimil Babka
2014-06-23 6:57 ` Zhang Yanfei
2014-06-24 4:52 ` Naoya Horiguchi
2014-06-24 15:34 ` Vlastimil Babka
2014-06-24 16:58 ` Naoya Horiguchi
2014-06-25 8:50 ` Vlastimil Babka
2014-06-25 15:46 ` Naoya Horiguchi
2014-06-24 8:33 ` Joonsoo Kim
2014-06-24 15:42 ` Vlastimil Babka
2014-06-25 0:53 ` Joonsoo Kim
2014-06-25 8:59 ` Vlastimil Babka
2014-06-27 5:57 ` Joonsoo Kim
2014-06-20 15:49 ` [PATCH v3 05/13] mm, compaction: report compaction as contended only due to lock contention Vlastimil Babka
2014-06-23 1:39 ` Minchan Kim
2014-06-23 8:55 ` Zhang Yanfei
2014-06-23 23:35 ` Minchan Kim
2014-06-24 1:07 ` Zhang Yanfei
2014-07-11 8:28 ` Vlastimil Babka
2014-07-11 9:38 ` Vlastimil Babka
2014-06-20 15:49 ` [PATCH v3 06/13] mm, compaction: periodically drop lock and restore IRQs in scanners Vlastimil Babka
2014-06-23 2:53 ` Minchan Kim
2014-07-11 12:03 ` Vlastimil Babka
2014-06-23 9:13 ` Zhang Yanfei
2014-06-24 15:39 ` Naoya Horiguchi
2014-06-24 15:44 ` Vlastimil Babka
2014-06-20 15:49 ` [PATCH v3 07/13] mm, compaction: skip rechecks when lock was already held Vlastimil Babka
2014-06-23 9:16 ` Zhang Yanfei
2014-06-24 18:55 ` Naoya Horiguchi
2014-06-20 15:49 ` [PATCH v3 08/13] mm, compaction: remember position within pageblock in free pages scanner Vlastimil Babka
2014-06-23 3:04 ` Minchan Kim
2014-06-23 9:17 ` Zhang Yanfei
2014-06-24 19:09 ` Naoya Horiguchi
2014-06-20 15:49 ` [PATCH v3 09/13] mm, compaction: skip buddy pages by their order in the migrate scanner Vlastimil Babka
2014-06-23 3:05 ` Minchan Kim
2014-06-23 9:29 ` Zhang Yanfei [this message]
2014-06-20 15:49 ` [PATCH v3 10/13] mm: rename allocflags_to_migratetype for clarity Vlastimil Babka
2014-06-24 20:34 ` Naoya Horiguchi
2014-06-20 15:49 ` [PATCH v3 11/13] mm, compaction: pass gfp mask to compact_control Vlastimil Babka
2014-06-23 3:06 ` Minchan Kim
2014-06-23 9:31 ` Zhang Yanfei
2014-06-20 15:49 ` [PATCH v3 12/13] mm, compaction: try to capture the just-created high-order freepage Vlastimil Babka
2014-06-25 1:57 ` Naoya Horiguchi
2014-06-25 8:57 ` Vlastimil Babka
2014-06-20 15:49 ` [RFC PATCH v3 13/13] mm, compaction: do not migrate pages when that cannot satisfy page fault allocation Vlastimil Babka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53A7F379.7020504@cn.fujitsu.com \
--to=zhangyanfei@cn.fujitsu.com \
--cc=akpm@linux-foundation.org \
--cc=cl@linux.com \
--cc=iamjoonsoo.kim@lge.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mina86@mina86.com \
--cc=minchan@kernel.org \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=riel@redhat.com \
--cc=rientjes@google.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox