From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=4F7u=A4=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-10.1 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no
	version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 1EB4BC433E1
	for <linux-mm@archiver.kernel.org>; Fri, 17 Jul 2020 20:31:03 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id A965621741
	for <linux-mm@archiver.kernel.org>; Fri, 17 Jul 2020 20:31:02 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="b6CIPYc2"
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A965621741
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id E60046B0008; Fri, 17 Jul 2020 16:31:01 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id E0EDC8D0003; Fri, 17 Jul 2020 16:31:01 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id D03B76B000C; Fri, 17 Jul 2020 16:31:01 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0070.hostedemail.com [216.40.44.70])
	by kanga.kvack.org (Postfix) with ESMTP id C3BE96B0008
	for <linux-mm@kvack.org>; Fri, 17 Jul 2020 16:31:01 -0400 (EDT)
Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay03.hostedemail.com (Postfix) with ESMTP id 83E378248D52
	for <linux-mm@kvack.org>; Fri, 17 Jul 2020 20:31:01 +0000 (UTC)
X-FDA: 77048711922.22.ant98_4116edf26f0d
Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251])
	by smtpin22.hostedemail.com (Postfix) with ESMTP id 44A381803AF0D
	for <linux-mm@kvack.org>; Fri, 17 Jul 2020 20:31:01 +0000 (UTC)
X-HE-Tag: ant98_4116edf26f0d
X-Filterd-Recvd-Size: 12082
Received: from mail-il1-f195.google.com (mail-il1-f195.google.com [209.85.166.195])
	by imf19.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Fri, 17 Jul 2020 20:31:00 +0000 (UTC)
Received: by mail-il1-f195.google.com with SMTP id a11so8446874ilk.0
        for <linux-mm@kvack.org>; Fri, 17 Jul 2020 13:31:00 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc;
        bh=Fa0GcN94SAkMndYvc3rpkB4mYfViww4odBhNhRRf9p4=;
        b=b6CIPYc2VxBTb04tX/asXA0o9d0VrXp8fIGsmGa87N7hweYpBaZQSSRtQliAvy2SpA
         x1RVaFTjrFwbtpLQTVaabypRIwb+plt5beFUWUODt1NqHYc+NjmgMyN769sU6E77xUI7
         Z0mxC1qfRMX8tPq6m4bqljR5bhlcINhTUvT/+8xH0+CuJR3KvUxUld+cUoR+7Wpk4yRr
         UA1YkYlBLGuCr1sk5f+sEcuoBocb7aA0JsqpZbAb7xosNfdsr0c3UC3/H1TKcaNp6m/I
         v/bxxezggEE41VMfiDkyDDc499gkYAPpxOz7oGPGSbkSXR3RLPoM3y17FWmnbElWkvwd
         nqVQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=Fa0GcN94SAkMndYvc3rpkB4mYfViww4odBhNhRRf9p4=;
        b=pgeTzAd8h8GUwCCI/TXkSOG72PUfEf/Amwaoa3ntniJQa/4z+Jb12g/XrUADVWwoqq
         y14wi445l+phAj45q9LG5LNqC5VV6cNFTqfND3qMO4wYYJud/1LHNDLwVLdR1U2+H+ma
         moVqHBXQD4qkskND09Y8Pt+eSO28ox97kmQN6UknxY2qwYE0TOqXct4PS+6vsw58kUtA
         k8/4wQFkAdTa8ntwKDp/6vPNutM+Fw8QJwd1b0C47Ly2QhMxDQcvYDDPnD5Xw7/X+Glw
         8UxyvMxtGqLcTzwxyxLzm5w829a2CFlh1v/XStX7GnY18tdxbhGbNiKIX/au1fkHLSaj
         myhA==
X-Gm-Message-State: AOAM5306gF92hAW3uHzkMBslyHu+IyCQ0rdXmLbGiwPnfmgjaUTRwFiI
	fYZrbg30CBygl8Cd0r6hUX2DB4pqAY/WtEiWqSQ=
X-Google-Smtp-Source: ABdhPJwa13P8NrbQ/dNqsrcei2jQUP+iK4pzjDOkoi+cVp406kaWYHUtuX9XxJ8BQAYglTHtRZx0XoKXSHpSMbw3sEE=
X-Received: by 2002:a92:8544:: with SMTP id f65mr11605442ilh.42.1595017859659;
 Fri, 17 Jul 2020 13:30:59 -0700 (PDT)
MIME-Version: 1.0
References: <1594429136-20002-1-git-send-email-alex.shi@linux.alibaba.com> <1594429136-20002-17-git-send-email-alex.shi@linux.alibaba.com>
In-Reply-To: <1594429136-20002-17-git-send-email-alex.shi@linux.alibaba.com>
From: Alexander Duyck <alexander.duyck@gmail.com>
Date: Fri, 17 Jul 2020 13:30:48 -0700
Message-ID: <CAKgT0Udcry01samXT54RkurNqFKnVmv-686ZFHF+iw4b+12T_A@mail.gmail.com>
Subject: Re: [PATCH v16 16/22] mm/mlock: reorder isolation sequence during munlock
To: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Andrew Morton <akpm@linux-foundation.org>, Mel Gorman <mgorman@techsingularity.net>, 
	Tejun Heo <tj@kernel.org>, Hugh Dickins <hughd@google.com>, 
	Konstantin Khlebnikov <khlebnikov@yandex-team.ru>, Daniel Jordan <daniel.m.jordan@oracle.com>, 
	Yang Shi <yang.shi@linux.alibaba.com>, Matthew Wilcox <willy@infradead.org>, 
	Johannes Weiner <hannes@cmpxchg.org>, kbuild test robot <lkp@intel.com>, linux-mm <linux-mm@kvack.org>, 
	LKML <linux-kernel@vger.kernel.org>, cgroups@vger.kernel.org, 
	Shakeel Butt <shakeelb@google.com>, Joonsoo Kim <iamjoonsoo.kim@lge.com>, 
	Wei Yang <richard.weiyang@gmail.com>, "Kirill A. Shutemov" <kirill@shutemov.name>
Content-Type: text/plain; charset="UTF-8"
X-Rspamd-Queue-Id: 44A381803AF0D
X-Spamd-Result: default: False [0.00 / 100.00]
X-Rspamd-Server: rspam01
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Fri, Jul 10, 2020 at 5:59 PM Alex Shi <alex.shi@linux.alibaba.com> wrote:
>
> This patch reorder the isolation steps during munlock, move the lru lock
> to guard each pages, unfold __munlock_isolate_lru_page func, to do the
> preparation for lru lock change.
>
> __split_huge_page_refcount doesn't exist, but we still have to guard
> PageMlocked and PageLRU for tail page in __split_huge_page_tail.
>
> [lkp@intel.com: found a sleeping function bug ... at mm/rmap.c]
> Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
> Cc: Kirill A. Shutemov <kirill@shutemov.name>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  mm/mlock.c | 93 ++++++++++++++++++++++++++++++++++----------------------------
>  1 file changed, 51 insertions(+), 42 deletions(-)
>
> diff --git a/mm/mlock.c b/mm/mlock.c
> index 228ba5a8e0a5..0bdde88b4438 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -103,25 +103,6 @@ void mlock_vma_page(struct page *page)
>  }
>
>  /*
> - * Isolate a page from LRU with optional get_page() pin.
> - * Assumes lru_lock already held and page already pinned.
> - */
> -static bool __munlock_isolate_lru_page(struct page *page, bool getpage)
> -{
> -       if (TestClearPageLRU(page)) {
> -               struct lruvec *lruvec;
> -
> -               lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page));
> -               if (getpage)
> -                       get_page(page);
> -               del_page_from_lru_list(page, lruvec, page_lru(page));
> -               return true;
> -       }
> -
> -       return false;
> -}
> -
> -/*
>   * Finish munlock after successful page isolation
>   *
>   * Page must be locked. This is a wrapper for try_to_munlock()
> @@ -181,6 +162,7 @@ static void __munlock_isolation_failed(struct page *page)
>  unsigned int munlock_vma_page(struct page *page)
>  {
>         int nr_pages;
> +       bool clearlru = false;
>         pg_data_t *pgdat = page_pgdat(page);
>
>         /* For try_to_munlock() and to serialize with page migration */
> @@ -189,32 +171,42 @@ unsigned int munlock_vma_page(struct page *page)
>         VM_BUG_ON_PAGE(PageTail(page), page);
>
>         /*
> -        * Serialize with any parallel __split_huge_page_refcount() which
> +        * Serialize split tail pages in __split_huge_page_tail() which
>          * might otherwise copy PageMlocked to part of the tail pages before
>          * we clear it in the head page. It also stabilizes hpage_nr_pages().
>          */
> +       get_page(page);

I don't think this get_page() call needs to be up here. It could be
left down before we delete the page from the LRU list as it is really
needed to take a reference on the page before we call
__munlock_isolated_page(), or at least that is the way it looks to me.
By doing that you can avoid a bunch of cleanup in these exception
cases.

> +       clearlru = TestClearPageLRU(page);

I'm not sure I fully understand the reason for moving this here. By
clearing this flag before you clear Mlocked does this give you some
sort of extra protection? I don't see how since Mlocked doesn't
necessarily imply the page is on LRU.

>         spin_lock_irq(&pgdat->lru_lock);
>
>         if (!TestClearPageMlocked(page)) {
> -               /* Potentially, PTE-mapped THP: do not skip the rest PTEs */
> -               nr_pages = 1;
> -               goto unlock_out;
> +               if (clearlru)
> +                       SetPageLRU(page);
> +               /*
> +                * Potentially, PTE-mapped THP: do not skip the rest PTEs
> +                * Reuse lock as memory barrier for release_pages racing.
> +                */
> +               spin_unlock_irq(&pgdat->lru_lock);
> +               put_page(page);
> +               return 0;
>         }
>
>         nr_pages = hpage_nr_pages(page);
>         __mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages);
>
> -       if (__munlock_isolate_lru_page(page, true)) {
> +       if (clearlru) {
> +               struct lruvec *lruvec;
> +

You could just place the get_page() call here.

> +               lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page));
> +               del_page_from_lru_list(page, lruvec, page_lru(page));
>                 spin_unlock_irq(&pgdat->lru_lock);
>                 __munlock_isolated_page(page);
> -               goto out;
> +       } else {
> +               spin_unlock_irq(&pgdat->lru_lock);
> +               put_page(page);
> +               __munlock_isolation_failed(page);

If you move the get_page() as I suggested above there wouldn't be a
need for the put_page(). It then becomes possible to simplify the code
a bit by merging the unlock paths and doing an if/else with the
__munlock functions like so:
if (clearlru) {
    ...
    del_page_from_lru..
}

spin_unlock_irq()

if (clearlru)
    __munlock_isolated_page();
else
    __munlock_isolated_failed();

>         }
> -       __munlock_isolation_failed(page);
> -
> -unlock_out:
> -       spin_unlock_irq(&pgdat->lru_lock);
>
> -out:
>         return nr_pages - 1;
>  }
>
> @@ -297,34 +289,51 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
>         pagevec_init(&pvec_putback);
>
>         /* Phase 1: page isolation */
> -       spin_lock_irq(&zone->zone_pgdat->lru_lock);
>         for (i = 0; i < nr; i++) {
>                 struct page *page = pvec->pages[i];
> +               struct lruvec *lruvec;
> +               bool clearlru;
>
> -               if (TestClearPageMlocked(page)) {
> -                       /*
> -                        * We already have pin from follow_page_mask()
> -                        * so we can spare the get_page() here.
> -                        */
> -                       if (__munlock_isolate_lru_page(page, false))
> -                               continue;
> -                       else
> -                               __munlock_isolation_failed(page);
> -               } else {
> +               clearlru = TestClearPageLRU(page);
> +               spin_lock_irq(&zone->zone_pgdat->lru_lock);

I still don't see what you are gaining by moving the bit test up to
this point. Seems like it would be better left below with the lock
just being used to prevent a possible race while you are pulling the
page out of the LRU list.

> +
> +               if (!TestClearPageMlocked(page)) {
>                         delta_munlocked++;
> +                       if (clearlru)
> +                               SetPageLRU(page);
> +                       goto putback;
> +               }
> +
> +               if (!clearlru) {
> +                       __munlock_isolation_failed(page);
> +                       goto putback;
>                 }

With the other function you were processing this outside of the lock,
here you are doing it inside. It would probably make more sense here
to follow similar logic and take care of the del_page_from_lru_list
ifr clealru is set, unlock, and then if clearlru is set continue else
track the isolation failure. That way you can avoid having to use as
many jump labels.

>                 /*
> +                * Isolate this page.
> +                * We already have pin from follow_page_mask()
> +                * so we can spare the get_page() here.
> +                */
> +               lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page));
> +               del_page_from_lru_list(page, lruvec, page_lru(page));
> +               spin_unlock_irq(&zone->zone_pgdat->lru_lock);
> +               continue;
> +
> +               /*
>                  * We won't be munlocking this page in the next phase
>                  * but we still need to release the follow_page_mask()
>                  * pin. We cannot do it under lru_lock however. If it's
>                  * the last pin, __page_cache_release() would deadlock.
>                  */
> +putback:
> +               spin_unlock_irq(&zone->zone_pgdat->lru_lock);
>                 pagevec_add(&pvec_putback, pvec->pages[i]);
>                 pvec->pages[i] = NULL;
>         }
> +       /* tempary disable irq, will remove later */
> +       local_irq_disable();
>         __mod_zone_page_state(zone, NR_MLOCK, delta_munlocked);
> -       spin_unlock_irq(&zone->zone_pgdat->lru_lock);
> +       local_irq_enable();
>
>         /* Now we can release pins of pages that we are not munlocking */
>         pagevec_release(&pvec_putback);
> --
> 1.8.3.1
>
>