From: Alexander Duyck <alexander.duyck@gmail.com>
To: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Mel Gorman <mgorman@techsingularity.net>,
Tejun Heo <tj@kernel.org>, Hugh Dickins <hughd@google.com>,
Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
Daniel Jordan <daniel.m.jordan@oracle.com>,
Yang Shi <yang.shi@linux.alibaba.com>,
Matthew Wilcox <willy@infradead.org>,
Johannes Weiner <hannes@cmpxchg.org>,
kbuild test robot <lkp@intel.com>, linux-mm <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>,
cgroups@vger.kernel.org, Shakeel Butt <shakeelb@google.com>,
Joonsoo Kim <iamjoonsoo.kim@lge.com>,
Wei Yang <richard.weiyang@gmail.com>,
"Kirill A. Shutemov" <kirill@shutemov.name>
Subject: Re: [PATCH v16 15/22] mm/compaction: do page isolation first in compaction
Date: Thu, 16 Jul 2020 14:32:39 -0700 [thread overview]
Message-ID: <CAKgT0Ue72SfAmxCS+tay1NjioW9WBOvVgrhwUtVPz2aDCrcHPQ@mail.gmail.com> (raw)
In-Reply-To: <1594429136-20002-16-git-send-email-alex.shi@linux.alibaba.com>
On Fri, Jul 10, 2020 at 5:59 PM Alex Shi <alex.shi@linux.alibaba.com> wrote:
>
> Johannes Weiner has suggested:
> "So here is a crazy idea that may be worth exploring:
>
> Right now, pgdat->lru_lock protects both PageLRU *and* the lruvec's
> linked list.
>
> Can we make PageLRU atomic and use it to stabilize the lru_lock
> instead, and then use the lru_lock only serialize list operations?
> ..."
>
> Yes, this patch is doing so on __isolate_lru_page which is the core
> page isolation func in compaction and shrinking path.
> With this patch, the compaction will only deal the PageLRU set and now
> isolated pages to skip the just alloced page which no LRU bit. And the
> isolation could exclusive the other isolations in memcg move_account,
> page migrations and thp split_huge_page.
>
> As a side effect, PageLRU may be cleared during shrink_inactive_list
> path for isolation reason. If so, we can skip that page.
>
> Hugh Dickins <hughd@google.com> fixed following bugs in this patch's
> early version:
>
> Fix lots of crashes under compaction load: isolate_migratepages_block()
> must clean up appropriately when rejecting a page, setting PageLRU again
> if it had been cleared; and a put_page() after get_page_unless_zero()
> cannot safely be done while holding locked_lruvec - it may turn out to
> be the final put_page(), which will take an lruvec lock when PageLRU.
> And move __isolate_lru_page_prepare back after get_page_unless_zero to
> make trylock_page() safe:
> trylock_page() is not safe to use at this time: its setting PG_locked
> can race with the page being freed or allocated ("Bad page"), and can
> also erase flags being set by one of those "sole owners" of a freshly
> allocated page who use non-atomic __SetPageFlag().
>
> Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-mm@kvack.org
> ---
> include/linux/swap.h | 2 +-
> mm/compaction.c | 42 +++++++++++++++++++++++++++++++++---------
> mm/vmscan.c | 38 ++++++++++++++++++++++----------------
> 3 files changed, 56 insertions(+), 26 deletions(-)
>
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 2c29399b29a0..6d23d3beeff7 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -358,7 +358,7 @@ extern void lru_cache_add_active_or_unevictable(struct page *page,
> extern unsigned long zone_reclaimable_pages(struct zone *zone);
> extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
> gfp_t gfp_mask, nodemask_t *mask);
> -extern int __isolate_lru_page(struct page *page, isolate_mode_t mode);
> +extern int __isolate_lru_page_prepare(struct page *page, isolate_mode_t mode);
> extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg,
> unsigned long nr_pages,
> gfp_t gfp_mask,
> diff --git a/mm/compaction.c b/mm/compaction.c
> index f14780fc296a..2da2933fe56b 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -869,6 +869,7 @@ static bool too_many_isolated(pg_data_t *pgdat)
> if (!valid_page && IS_ALIGNED(low_pfn, pageblock_nr_pages)) {
> if (!cc->ignore_skip_hint && get_pageblock_skip(page)) {
> low_pfn = end_pfn;
> + page = NULL;
> goto isolate_abort;
> }
> valid_page = page;
> @@ -950,6 +951,21 @@ static bool too_many_isolated(pg_data_t *pgdat)
> if (!(cc->gfp_mask & __GFP_FS) && page_mapping(page))
> goto isolate_fail;
>
> + /*
> + * Be careful not to clear PageLRU until after we're
> + * sure the page is not being freed elsewhere -- the
> + * page release code relies on it.
> + */
> + if (unlikely(!get_page_unless_zero(page)))
> + goto isolate_fail;
> +
> + if (__isolate_lru_page_prepare(page, isolate_mode) != 0)
> + goto isolate_fail_put;
> +
> + /* Try isolate the page */
> + if (!TestClearPageLRU(page))
> + goto isolate_fail_put;
> +
> /* If we already hold the lock, we can skip some rechecking */
> if (!locked) {
> locked = compact_lock_irqsave(&pgdat->lru_lock,
Why not do the __isolate_lru_page_prepare before getting the page?
That way you can avoid performing an extra atomic operation on non-LRU
pages.
> @@ -962,10 +978,6 @@ static bool too_many_isolated(pg_data_t *pgdat)
> goto isolate_abort;
> }
>
> - /* Recheck PageLRU and PageCompound under lock */
> - if (!PageLRU(page))
> - goto isolate_fail;
> -
> /*
> * Page become compound since the non-locked check,
> * and it's on LRU. It can only be a THP so the order
> @@ -973,16 +985,13 @@ static bool too_many_isolated(pg_data_t *pgdat)
> */
> if (unlikely(PageCompound(page) && !cc->alloc_contig)) {
> low_pfn += compound_nr(page) - 1;
> - goto isolate_fail;
> + SetPageLRU(page);
> + goto isolate_fail_put;
> }
> }
>
> lruvec = mem_cgroup_page_lruvec(page, pgdat);
>
> - /* Try isolate the page */
> - if (__isolate_lru_page(page, isolate_mode) != 0)
> - goto isolate_fail;
> -
> /* The whole page is taken off the LRU; skip the tail pages. */
> if (PageCompound(page))
> low_pfn += compound_nr(page) - 1;
> @@ -1011,6 +1020,15 @@ static bool too_many_isolated(pg_data_t *pgdat)
> }
>
> continue;
> +
> +isolate_fail_put:
> + /* Avoid potential deadlock in freeing page under lru_lock */
> + if (locked) {
> + spin_unlock_irqrestore(&pgdat->lru_lock, flags);
> + locked = false;
> + }
> + put_page(page);
> +
> isolate_fail:
> if (!skip_on_failure)
> continue;
> @@ -1047,9 +1065,15 @@ static bool too_many_isolated(pg_data_t *pgdat)
> if (unlikely(low_pfn > end_pfn))
> low_pfn = end_pfn;
>
> + page = NULL;
> +
> isolate_abort:
> if (locked)
> spin_unlock_irqrestore(&pgdat->lru_lock, flags);
> + if (page) {
> + SetPageLRU(page);
> + put_page(page);
> + }
>
> /*
> * Updated the cached scanner pfn once the pageblock has been scanned
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 18986fefd49b..f77748adc340 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1544,7 +1544,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
> *
> * returns 0 on success, -ve errno on failure.
> */
> -int __isolate_lru_page(struct page *page, isolate_mode_t mode)
> +int __isolate_lru_page_prepare(struct page *page, isolate_mode_t mode)
> {
> int ret = -EINVAL;
>
> @@ -1598,20 +1598,9 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode)
> if ((mode & ISOLATE_UNMAPPED) && page_mapped(page))
> return ret;
>
> - if (likely(get_page_unless_zero(page))) {
> - /*
> - * Be careful not to clear PageLRU until after we're
> - * sure the page is not being freed elsewhere -- the
> - * page release code relies on it.
> - */
> - ClearPageLRU(page);
> - ret = 0;
> - }
> -
> - return ret;
> + return 0;
> }
>
> -
> /*
> * Update LRU sizes after isolating pages. The LRU size updates must
> * be complete before mem_cgroup_update_lru_size due to a sanity check.
> @@ -1691,17 +1680,34 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
> * only when the page is being freed somewhere else.
> */
> scan += nr_pages;
> - switch (__isolate_lru_page(page, mode)) {
> + switch (__isolate_lru_page_prepare(page, mode)) {
> case 0:
> + /*
> + * Be careful not to clear PageLRU until after we're
> + * sure the page is not being freed elsewhere -- the
> + * page release code relies on it.
> + */
> + if (unlikely(!get_page_unless_zero(page)))
> + goto busy;
> +
> + if (!TestClearPageLRU(page)) {
> + /*
> + * This page may in other isolation path,
> + * but we still hold lru_lock.
> + */
> + put_page(page);
> + goto busy;
> + }
> +
I wonder if it wouldn't make sense to combine these two atomic ops
with tests and the put_page into a single inline function? Then it
could be possible to just do one check and if succeeds you do the
block of code below, otherwise you just fall-through into the -EBUSY
case.
> nr_taken += nr_pages;
> nr_zone_taken[page_zonenum(page)] += nr_pages;
> list_move(&page->lru, dst);
> break;
> -
> +busy:
> case -EBUSY:
> /* else it is being freed elsewhere */
> list_move(&page->lru, src);
> - continue;
> + break;
>
> default:
> BUG();
> --
> 1.8.3.1
>
>
next prev parent reply other threads:[~2020-07-16 21:32 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-11 0:58 [PATCH v16 00/22] per memcg lru_lock Alex Shi
2020-07-11 0:58 ` [PATCH v16 01/22] mm/vmscan: remove unnecessary lruvec adding Alex Shi
2020-07-11 0:58 ` [PATCH v16 02/22] mm/page_idle: no unlikely double check for idle page counting Alex Shi
2020-07-11 0:58 ` [PATCH v16 03/22] mm/compaction: correct the comments of compact_defer_shift Alex Shi
2020-07-11 0:58 ` [PATCH v16 04/22] mm/compaction: rename compact_deferred as compact_should_defer Alex Shi
2020-07-11 0:58 ` [PATCH v16 05/22] mm/thp: move lru_add_page_tail func to huge_memory.c Alex Shi
2020-07-16 8:59 ` Alex Shi
2020-07-16 13:17 ` Kirill A. Shutemov
2020-07-17 5:13 ` Alex Shi
2020-07-20 8:37 ` Kirill A. Shutemov
2020-07-11 0:58 ` [PATCH v16 06/22] mm/thp: clean up lru_add_page_tail Alex Shi
2020-07-20 8:43 ` Kirill A. Shutemov
2020-07-11 0:58 ` [PATCH v16 07/22] mm/thp: remove code path which never got into Alex Shi
2020-07-20 8:43 ` Kirill A. Shutemov
2020-07-11 0:58 ` [PATCH v16 08/22] mm/thp: narrow lru locking Alex Shi
2020-07-11 0:58 ` [PATCH v16 09/22] mm/memcg: add debug checking in lock_page_memcg Alex Shi
2020-07-11 0:58 ` [PATCH v16 10/22] mm/swap: fold vm event PGROTATED into pagevec_move_tail_fn Alex Shi
2020-07-11 0:58 ` [PATCH v16 11/22] mm/lru: move lru_lock holding in func lru_note_cost_page Alex Shi
2020-07-11 0:58 ` [PATCH v16 12/22] mm/lru: move lock into lru_note_cost Alex Shi
2020-07-11 0:58 ` [PATCH v16 13/22] mm/lru: introduce TestClearPageLRU Alex Shi
2020-07-16 9:06 ` Alex Shi
2020-07-16 21:12 ` Alexander Duyck
2020-07-17 7:45 ` Alex Shi
2020-07-17 18:26 ` Alexander Duyck
2020-07-19 4:45 ` Alex Shi
2020-07-19 11:24 ` Alex Shi
2020-07-11 0:58 ` [PATCH v16 14/22] mm/thp: add tail pages into lru anyway in split_huge_page() Alex Shi
2020-07-17 9:30 ` Alex Shi
2020-07-20 8:49 ` Kirill A. Shutemov
2020-07-20 9:04 ` Alex Shi
2020-07-11 0:58 ` [PATCH v16 15/22] mm/compaction: do page isolation first in compaction Alex Shi
2020-07-16 21:32 ` Alexander Duyck [this message]
[not found] ` <e724c44b-4135-3302-16fa-1df624fa81fa@linux.alibaba.com>
2020-07-17 16:09 ` Alexander Duyck
2020-07-19 3:59 ` Alex Shi
2020-07-11 0:58 ` [PATCH v16 16/22] mm/mlock: reorder isolation sequence during munlock Alex Shi
2020-07-17 20:30 ` Alexander Duyck
2020-07-19 3:55 ` Alex Shi
2020-07-20 18:51 ` Alexander Duyck
2020-07-21 9:26 ` Alex Shi
2020-07-21 13:51 ` Alex Shi
2020-07-11 0:58 ` [PATCH v16 17/22] mm/swap: serialize memcg changes during pagevec_lru_move_fn Alex Shi
2020-07-11 0:58 ` [PATCH v16 18/22] mm/lru: replace pgdat lru_lock with lruvec lock Alex Shi
2020-07-17 21:38 ` Alexander Duyck
2020-07-18 14:15 ` Alex Shi
2020-07-19 9:12 ` Alex Shi
2020-07-19 15:14 ` Alexander Duyck
2020-07-20 5:47 ` Alex Shi
2020-07-11 0:58 ` [PATCH v16 19/22] mm/lru: introduce the relock_page_lruvec function Alex Shi
2020-07-17 22:03 ` Alexander Duyck
2020-07-18 14:01 ` Alex Shi
2020-07-11 0:58 ` [PATCH v16 20/22] mm/vmscan: use relock for move_pages_to_lru Alex Shi
2020-07-17 21:44 ` Alexander Duyck
2020-07-18 14:15 ` Alex Shi
2020-07-11 0:58 ` [PATCH v16 21/22] mm/pgdat: remove pgdat lru_lock Alex Shi
2020-07-17 21:09 ` Alexander Duyck
2020-07-18 14:17 ` Alex Shi
2020-07-11 0:58 ` [PATCH v16 22/22] mm/lru: revise the comments of lru_lock Alex Shi
2020-07-11 1:02 ` [PATCH v16 00/22] per memcg lru_lock Alex Shi
2020-07-16 8:49 ` Alex Shi
2020-07-16 14:11 ` Alexander Duyck
2020-07-17 5:24 ` Alex Shi
2020-07-19 15:23 ` Hugh Dickins
2020-07-20 3:01 ` Alex Shi
2020-07-20 4:47 ` Hugh Dickins
2020-07-20 7:30 ` Alex Shi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAKgT0Ue72SfAmxCS+tay1NjioW9WBOvVgrhwUtVPz2aDCrcHPQ@mail.gmail.com \
--to=alexander.duyck@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=alex.shi@linux.alibaba.com \
--cc=cgroups@vger.kernel.org \
--cc=daniel.m.jordan@oracle.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=iamjoonsoo.kim@lge.com \
--cc=khlebnikov@yandex-team.ru \
--cc=kirill@shutemov.name \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lkp@intel.com \
--cc=mgorman@techsingularity.net \
--cc=richard.weiyang@gmail.com \
--cc=shakeelb@google.com \
--cc=tj@kernel.org \
--cc=willy@infradead.org \
--cc=yang.shi@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox