From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.1 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1EB4BC433E1 for ; Fri, 17 Jul 2020 20:31:03 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A965621741 for ; Fri, 17 Jul 2020 20:31:02 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="b6CIPYc2" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A965621741 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E60046B0008; Fri, 17 Jul 2020 16:31:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E0EDC8D0003; Fri, 17 Jul 2020 16:31:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D03B76B000C; Fri, 17 Jul 2020 16:31:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0070.hostedemail.com [216.40.44.70]) by kanga.kvack.org (Postfix) with ESMTP id C3BE96B0008 for ; Fri, 17 Jul 2020 16:31:01 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 83E378248D52 for ; Fri, 17 Jul 2020 20:31:01 +0000 (UTC) X-FDA: 77048711922.22.ant98_4116edf26f0d Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin22.hostedemail.com (Postfix) with ESMTP id 44A381803AF0D for ; Fri, 17 Jul 2020 20:31:01 +0000 (UTC) X-HE-Tag: ant98_4116edf26f0d X-Filterd-Recvd-Size: 12082 Received: from mail-il1-f195.google.com (mail-il1-f195.google.com [209.85.166.195]) by imf19.hostedemail.com (Postfix) with ESMTP for ; Fri, 17 Jul 2020 20:31:00 +0000 (UTC) Received: by mail-il1-f195.google.com with SMTP id a11so8446874ilk.0 for ; Fri, 17 Jul 2020 13:31:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Fa0GcN94SAkMndYvc3rpkB4mYfViww4odBhNhRRf9p4=; b=b6CIPYc2VxBTb04tX/asXA0o9d0VrXp8fIGsmGa87N7hweYpBaZQSSRtQliAvy2SpA x1RVaFTjrFwbtpLQTVaabypRIwb+plt5beFUWUODt1NqHYc+NjmgMyN769sU6E77xUI7 Z0mxC1qfRMX8tPq6m4bqljR5bhlcINhTUvT/+8xH0+CuJR3KvUxUld+cUoR+7Wpk4yRr UA1YkYlBLGuCr1sk5f+sEcuoBocb7aA0JsqpZbAb7xosNfdsr0c3UC3/H1TKcaNp6m/I v/bxxezggEE41VMfiDkyDDc499gkYAPpxOz7oGPGSbkSXR3RLPoM3y17FWmnbElWkvwd nqVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Fa0GcN94SAkMndYvc3rpkB4mYfViww4odBhNhRRf9p4=; b=pgeTzAd8h8GUwCCI/TXkSOG72PUfEf/Amwaoa3ntniJQa/4z+Jb12g/XrUADVWwoqq y14wi445l+phAj45q9LG5LNqC5VV6cNFTqfND3qMO4wYYJud/1LHNDLwVLdR1U2+H+ma moVqHBXQD4qkskND09Y8Pt+eSO28ox97kmQN6UknxY2qwYE0TOqXct4PS+6vsw58kUtA k8/4wQFkAdTa8ntwKDp/6vPNutM+Fw8QJwd1b0C47Ly2QhMxDQcvYDDPnD5Xw7/X+Glw 8UxyvMxtGqLcTzwxyxLzm5w829a2CFlh1v/XStX7GnY18tdxbhGbNiKIX/au1fkHLSaj myhA== X-Gm-Message-State: AOAM5306gF92hAW3uHzkMBslyHu+IyCQ0rdXmLbGiwPnfmgjaUTRwFiI fYZrbg30CBygl8Cd0r6hUX2DB4pqAY/WtEiWqSQ= X-Google-Smtp-Source: ABdhPJwa13P8NrbQ/dNqsrcei2jQUP+iK4pzjDOkoi+cVp406kaWYHUtuX9XxJ8BQAYglTHtRZx0XoKXSHpSMbw3sEE= X-Received: by 2002:a92:8544:: with SMTP id f65mr11605442ilh.42.1595017859659; Fri, 17 Jul 2020 13:30:59 -0700 (PDT) MIME-Version: 1.0 References: <1594429136-20002-1-git-send-email-alex.shi@linux.alibaba.com> <1594429136-20002-17-git-send-email-alex.shi@linux.alibaba.com> In-Reply-To: <1594429136-20002-17-git-send-email-alex.shi@linux.alibaba.com> From: Alexander Duyck Date: Fri, 17 Jul 2020 13:30:48 -0700 Message-ID: Subject: Re: [PATCH v16 16/22] mm/mlock: reorder isolation sequence during munlock To: Alex Shi Cc: Andrew Morton , Mel Gorman , Tejun Heo , Hugh Dickins , Konstantin Khlebnikov , Daniel Jordan , Yang Shi , Matthew Wilcox , Johannes Weiner , kbuild test robot , linux-mm , LKML , cgroups@vger.kernel.org, Shakeel Butt , Joonsoo Kim , Wei Yang , "Kirill A. Shutemov" Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 44A381803AF0D X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Jul 10, 2020 at 5:59 PM Alex Shi wrote: > > This patch reorder the isolation steps during munlock, move the lru lock > to guard each pages, unfold __munlock_isolate_lru_page func, to do the > preparation for lru lock change. > > __split_huge_page_refcount doesn't exist, but we still have to guard > PageMlocked and PageLRU for tail page in __split_huge_page_tail. > > [lkp@intel.com: found a sleeping function bug ... at mm/rmap.c] > Signed-off-by: Alex Shi > Cc: Kirill A. Shutemov > Cc: Andrew Morton > Cc: Johannes Weiner > Cc: Matthew Wilcox > Cc: Hugh Dickins > Cc: linux-mm@kvack.org > Cc: linux-kernel@vger.kernel.org > --- > mm/mlock.c | 93 ++++++++++++++++++++++++++++++++++---------------------------- > 1 file changed, 51 insertions(+), 42 deletions(-) > > diff --git a/mm/mlock.c b/mm/mlock.c > index 228ba5a8e0a5..0bdde88b4438 100644 > --- a/mm/mlock.c > +++ b/mm/mlock.c > @@ -103,25 +103,6 @@ void mlock_vma_page(struct page *page) > } > > /* > - * Isolate a page from LRU with optional get_page() pin. > - * Assumes lru_lock already held and page already pinned. > - */ > -static bool __munlock_isolate_lru_page(struct page *page, bool getpage) > -{ > - if (TestClearPageLRU(page)) { > - struct lruvec *lruvec; > - > - lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); > - if (getpage) > - get_page(page); > - del_page_from_lru_list(page, lruvec, page_lru(page)); > - return true; > - } > - > - return false; > -} > - > -/* > * Finish munlock after successful page isolation > * > * Page must be locked. This is a wrapper for try_to_munlock() > @@ -181,6 +162,7 @@ static void __munlock_isolation_failed(struct page *page) > unsigned int munlock_vma_page(struct page *page) > { > int nr_pages; > + bool clearlru = false; > pg_data_t *pgdat = page_pgdat(page); > > /* For try_to_munlock() and to serialize with page migration */ > @@ -189,32 +171,42 @@ unsigned int munlock_vma_page(struct page *page) > VM_BUG_ON_PAGE(PageTail(page), page); > > /* > - * Serialize with any parallel __split_huge_page_refcount() which > + * Serialize split tail pages in __split_huge_page_tail() which > * might otherwise copy PageMlocked to part of the tail pages before > * we clear it in the head page. It also stabilizes hpage_nr_pages(). > */ > + get_page(page); I don't think this get_page() call needs to be up here. It could be left down before we delete the page from the LRU list as it is really needed to take a reference on the page before we call __munlock_isolated_page(), or at least that is the way it looks to me. By doing that you can avoid a bunch of cleanup in these exception cases. > + clearlru = TestClearPageLRU(page); I'm not sure I fully understand the reason for moving this here. By clearing this flag before you clear Mlocked does this give you some sort of extra protection? I don't see how since Mlocked doesn't necessarily imply the page is on LRU. > spin_lock_irq(&pgdat->lru_lock); > > if (!TestClearPageMlocked(page)) { > - /* Potentially, PTE-mapped THP: do not skip the rest PTEs */ > - nr_pages = 1; > - goto unlock_out; > + if (clearlru) > + SetPageLRU(page); > + /* > + * Potentially, PTE-mapped THP: do not skip the rest PTEs > + * Reuse lock as memory barrier for release_pages racing. > + */ > + spin_unlock_irq(&pgdat->lru_lock); > + put_page(page); > + return 0; > } > > nr_pages = hpage_nr_pages(page); > __mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages); > > - if (__munlock_isolate_lru_page(page, true)) { > + if (clearlru) { > + struct lruvec *lruvec; > + You could just place the get_page() call here. > + lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); > + del_page_from_lru_list(page, lruvec, page_lru(page)); > spin_unlock_irq(&pgdat->lru_lock); > __munlock_isolated_page(page); > - goto out; > + } else { > + spin_unlock_irq(&pgdat->lru_lock); > + put_page(page); > + __munlock_isolation_failed(page); If you move the get_page() as I suggested above there wouldn't be a need for the put_page(). It then becomes possible to simplify the code a bit by merging the unlock paths and doing an if/else with the __munlock functions like so: if (clearlru) { ... del_page_from_lru.. } spin_unlock_irq() if (clearlru) __munlock_isolated_page(); else __munlock_isolated_failed(); > } > - __munlock_isolation_failed(page); > - > -unlock_out: > - spin_unlock_irq(&pgdat->lru_lock); > > -out: > return nr_pages - 1; > } > > @@ -297,34 +289,51 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone) > pagevec_init(&pvec_putback); > > /* Phase 1: page isolation */ > - spin_lock_irq(&zone->zone_pgdat->lru_lock); > for (i = 0; i < nr; i++) { > struct page *page = pvec->pages[i]; > + struct lruvec *lruvec; > + bool clearlru; > > - if (TestClearPageMlocked(page)) { > - /* > - * We already have pin from follow_page_mask() > - * so we can spare the get_page() here. > - */ > - if (__munlock_isolate_lru_page(page, false)) > - continue; > - else > - __munlock_isolation_failed(page); > - } else { > + clearlru = TestClearPageLRU(page); > + spin_lock_irq(&zone->zone_pgdat->lru_lock); I still don't see what you are gaining by moving the bit test up to this point. Seems like it would be better left below with the lock just being used to prevent a possible race while you are pulling the page out of the LRU list. > + > + if (!TestClearPageMlocked(page)) { > delta_munlocked++; > + if (clearlru) > + SetPageLRU(page); > + goto putback; > + } > + > + if (!clearlru) { > + __munlock_isolation_failed(page); > + goto putback; > } With the other function you were processing this outside of the lock, here you are doing it inside. It would probably make more sense here to follow similar logic and take care of the del_page_from_lru_list ifr clealru is set, unlock, and then if clearlru is set continue else track the isolation failure. That way you can avoid having to use as many jump labels. > /* > + * Isolate this page. > + * We already have pin from follow_page_mask() > + * so we can spare the get_page() here. > + */ > + lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); > + del_page_from_lru_list(page, lruvec, page_lru(page)); > + spin_unlock_irq(&zone->zone_pgdat->lru_lock); > + continue; > + > + /* > * We won't be munlocking this page in the next phase > * but we still need to release the follow_page_mask() > * pin. We cannot do it under lru_lock however. If it's > * the last pin, __page_cache_release() would deadlock. > */ > +putback: > + spin_unlock_irq(&zone->zone_pgdat->lru_lock); > pagevec_add(&pvec_putback, pvec->pages[i]); > pvec->pages[i] = NULL; > } > + /* tempary disable irq, will remove later */ > + local_irq_disable(); > __mod_zone_page_state(zone, NR_MLOCK, delta_munlocked); > - spin_unlock_irq(&zone->zone_pgdat->lru_lock); > + local_irq_enable(); > > /* Now we can release pins of pages that we are not munlocking */ > pagevec_release(&pvec_putback); > -- > 1.8.3.1 > >