From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4EC61C3601A for ; Mon, 7 Apr 2025 07:42:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5A68F6B0008; Mon, 7 Apr 2025 03:42:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 57AED6B000A; Mon, 7 Apr 2025 03:42:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 443886B000C; Mon, 7 Apr 2025 03:42:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 2646E6B0008 for ; Mon, 7 Apr 2025 03:42:07 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id D7735160EF7 for ; Mon, 7 Apr 2025 07:42:07 +0000 (UTC) X-FDA: 83306454294.15.E10DD28 Received: from szxga05-in.huawei.com (szxga05-in.huawei.com [45.249.212.191]) by imf17.hostedemail.com (Postfix) with ESMTP id 01D524000B for ; Mon, 7 Apr 2025 07:42:04 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf17.hostedemail.com: domain of tujinjiang@huawei.com designates 45.249.212.191 as permitted sender) smtp.mailfrom=tujinjiang@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744011726; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dgub6ICwkw/jDUkPIURQN6XczM536+DpdJUx3Q7YKFg=; b=uJqPRAKIZ7ZH3wlxmsW+olcWkw6tyFR9fD7NhEqy+mCInbR7zyeMdm1F/YSm4a+xabqoIS Ty+7/+xoLCi8mvJVCBQy4Bin7W5snRc/ceqYXWlW29srHY9YyEiNUMgl3UYESzM/EwdOdb DLQJ6b6OI8vwI8K0/aj8+Pea3H+wq30= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf17.hostedemail.com: domain of tujinjiang@huawei.com designates 45.249.212.191 as permitted sender) smtp.mailfrom=tujinjiang@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744011726; a=rsa-sha256; cv=none; b=QUsxD0uM/TgApy+1XlirMYEH4ejhq7qVuQapAIJpN7Xrba2jiVahmuRJqu6vNp5dsUV1oF aUqBlMKS4GbErgXQBX027NRzO2d4XxuyN8fqNfWxMgta/a6ybSkhc79WxriCNtFznt35r/ uM3t75Bvs0LmHvdZ4ZROD0BnX4H1EzM= Received: from mail.maildlp.com (unknown [172.19.88.163]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4ZWLhm0tNnz1R7gq; Mon, 7 Apr 2025 15:40:08 +0800 (CST) Received: from kwepemo200002.china.huawei.com (unknown [7.202.195.209]) by mail.maildlp.com (Postfix) with ESMTPS id AFD5B18005F; Mon, 7 Apr 2025 15:42:00 +0800 (CST) Received: from [10.174.179.13] (10.174.179.13) by kwepemo200002.china.huawei.com (7.202.195.209) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Mon, 7 Apr 2025 15:41:59 +0800 Message-ID: <76609f9c-7c2b-a64a-5999-f359e6acc3ca@huawei.com> Date: Mon, 7 Apr 2025 15:41:58 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1 Subject: Re: [PATCH V4] mm/gup: Clear the LRU flag of a page before adding to LRU batch To: David Hildenbrand , , CC: , , , <21cnbao@gmail.com>, , , , Kefeng Wang References: <1720075944-27201-1-git-send-email-yangge1116@126.com> <4119c1d0-5010-b2e7-3f1c-edd37f16f1f2@huawei.com> <91ac638d-b2d6-4683-ab29-fb647f58af63@redhat.com> <076babae-9fc6-13f5-36a3-95dde0115f77@huawei.com> <26870d6f-8bb9-44de-9d1f-dcb1b5a93eae@redhat.com> From: Jinjiang Tu In-Reply-To: <26870d6f-8bb9-44de-9d1f-dcb1b5a93eae@redhat.com> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.179.13] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemo200002.china.huawei.com (7.202.195.209) X-Rspamd-Server: rspam01 X-Stat-Signature: 6dg7rn9uoiu7oyuw4hen79qy7u8boo6z X-Rspam-User: X-Rspamd-Queue-Id: 01D524000B X-HE-Tag: 1744011724-71765 X-HE-Meta: U2FsdGVkX1/+YRtWK7dpz4Bx3FRBHj3yPLLfcY+Fni1gDbi/fEQGhTvj3Es6BEoYFtY6+2muzO6ClAei3IhFgdAxDylNBDpnjQH/+O8+rDg1zJXM0aZfaa5W0fqUl1u7b9GWHL0u2tlErm800kNP2G51SDCFU9r0BTvG7cq+1HBsPJOP816WYBqblRJVUhSxIN3FvQ6GOzy/76ul+THWDUIc0nHpLobxmlAx2uC0MXiu/X/SmIrD/oSc71ZGHjgmjY6yZuMvQsCMz1zzpuyY+TQMg8kapprHhA0p7exlCyQ8QxXgvxb1xATioaLPzXkLXvKtPKLeY9o1jUtd3CZ+VTG2LMuq+LqBB+cq/Nm8rFvSIND7j7krwDtEfgGuDK8G9nLOuhPrr7Yi0s+bmyVhLWyLh9BO0n1liSzFF1xUGHhz1a1iksQYk59TbvxwVpgOkQUDlYdTeAOEUGHwdVEDnJ9gc9Iu9etClkedFC+PsuNi/WNXpxHYaFu/7hR8QSMTbk8N1H1dMzelsWeqTqGtryqalWcKfTpIf7NKoDD0qturPV/4rtDSOb1noMsFs5mrIgwNUJb9XgTmjYiaxHxzo8jYub56n8chO929WeMCojr7jkKiIoDbBAmIMzYXlQMlRzfGx/FoJEDRgZz/7rBt5OD+qXpF8ofsnZQecQCdV9Vr/iOV3UTQIbbirkDUTdOmMPo82B5lJoMSmTGZLMYuEFFYx8eB1q/xN/Va0YjE4CVNiI74wZak3YFYX8Pynh5SLlvWQnJtQp8K2o0y3i+lgONmh9qjNfm5zGWld11APo1UvrKV1rsgYIqSjeUQeUioYWMrALYTpe3kSn3GHiSxNTC8TVj8ZhVtCOjyr/m2I98HXeznp4+rDFMJl0HkaqHm7yacJo11t5YT1m6nPKTP/JyGo1RjY1pHmaVSjcPyUClJP7gFaGxfUw47IRgDsM2z4HdETUa0rzpC0ejUDPR ZbRj1RqO /V99CLcQp/epGx5l4RINdnJMFKJ370HhHRShRVGQcprRyh9vezpC8+zJO2Guf9h4tjcztO0eKctjkOg8cFfImFMAUW1hUPgNdOaXFekpqZPvFvGMn7XE6N4zL/La8U8zhw1PlDCy3dqe54SBXn084S4fvYLBMxu4pw2xwxo2NaPyuaX9pPDQQdW4Cao/LN35gjeprZ9W28ucq4C0+0TOJmnMJcw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: 在 2025/4/1 22:33, David Hildenbrand 写道: > On 27.03.25 12:16, Jinjiang Tu wrote: >> >> 在 2025/3/26 20:46, David Hildenbrand 写道: >>> On 26.03.25 13:42, Jinjiang Tu wrote: >>>> Hi, >>>> >>> >>> Hi! >>> >>>> We notiched a 12.3% performance regression for LibMicro pwrite >>>> testcase due to >>>> commit 33dfe9204f29 ("mm/gup: clear the LRU flag of a page before >>>> adding to LRU batch"). >>>> >>>> The testcase is executed as follows, and the file is tmpfs file. >>>>       pwrite -E -C 200 -L -S -W -N "pwrite_t1k" -s 1k -I 500 -f $TFILE >>> >>> Do we know how much that reflects real workloads? (IOW, how much >>> should we care) >> >> No, it's hard to say. >> >>> >>>> >>>> this testcase writes 1KB (only one page) to the tmpfs and repeats >>>> this step for many times. The Flame >>>> graph shows the performance regression comes from >>>> folio_mark_accessed() and workingset_activation(). >>>> >>>> folio_mark_accessed() is called for the same page for many times. >>>> Before this patch, each call will >>>> add the page to cpu_fbatches.activate. When the fbatch is full, the >>>> fbatch is drained and the page >>>> is promoted to active list. And then, folio_mark_accessed() does >>>> nothing in later calls. >>>> >>>> But after this patch, the folio clear lru flags after it is added to >>>> cpu_fbatches.activate. After then, >>>> folio_mark_accessed will never call folio_activate() again due to the >>>> page is without lru flag, and >>>> the fbatch will not be full and the folio will not be marked active, >>>> later folio_mark_accessed() >>>> calls will always call workingset_activation(), leading to >>>> performance regression. >>> >>> Would there be a good place to drain the LRU to effectively get that >>> processed? (we can always try draining if the LRU flag is not set) >> >> Maybe we could drain the search the cpu_fbatches.activate of the >> local cpu in __lru_cache_activate_folio()? Drain other fbatches is >> meaningless . > > So the current behavior is that folio_mark_accessed() will end up > calling folio_activate() > once, and then __lru_cache_activate_folio() until the LRU was drained > (which can > take a looong time). > > The old behavior was that folio_mark_accessed() would keep calling > folio_activate() until > the LRU was drained simply because it was full of "all the same pages" > ?. Only *after* > the LRU was drained, folio_mark_accessed() would actually not do > anything (desired behavior). > > So the overhead comes primarily from __lru_cache_activate_folio() > searching through > the list "more" repeatedly because the LRU does get drained less > frequently; and > it would never find it in there in this case. > > So ... it used to be suboptimal before, now it's more suboptimal I > guess?! :) > > We'd need a way to better identify "this folio is already queued for > activation". Searching > that list as well would be one option, but the hole "search the list" > is nasty. Sorry for late reply. I tried to search the activate batch with the following diff, and the performance regression disappears. diff --git a/mm/swap.c b/mm/swap.c index f81c8aa93e67..cfe484ff12e6 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -400,6 +400,21 @@ static void __lru_cache_activate_folio(struct folio *folio)                 }         } +       fbatch = this_cpu_ptr(&cpu_fbatches.activate); +       for (i = folio_batch_count(fbatch) - 1; i >= 0; i--) { +               struct folio *batch_folio = fbatch->folios[i]; + +               if (batch_folio == folio) { +                       struct lruvec *lruvec; +                       unsigned long flags; + +                       lruvec = folio_lruvec_lock_irqsave(folio, &flags); +                       folio_activate_fn(lruvec, folio); +                       unlock_page_lruvec_irqrestore(lruvec, flags); +                       break; +               } +       } +         local_unlock(&cpu_fbatches.lock);  } > > Maybe we can simply set the folio as active when staging it for > activation, after having > cleared the LRU flag? We already do that during folio_add. > > As the LRU flag was cleared, nobody should be messing with that folio > until the cache was > drained and the move was successful. > Yes, anyone who wants to delete the folio from lru list should test LRU flag I look some folio_test_active() calls, madvise_cold_or_pageout_pte_range() will set workingset flag if active flag is set. It seems more reasonable if we set active flag after adding to activate batch, since adding to activate batch already means active. > > Pretty sure this doesn't work, but just to throw out an idea: Thanks, I will try it ASAP. > > From c26e1c0ceda6c818826e5b89dc7c7c9279138f80 Mon Sep 17 00:00:00 2001 > From: David Hildenbrand > Date: Tue, 1 Apr 2025 16:31:56 +0200 > Subject: [PATCH] test > > Signed-off-by: David Hildenbrand > --- >  mm/swap.c | 21 ++++++++++++++++----- >  1 file changed, 16 insertions(+), 5 deletions(-) > > diff --git a/mm/swap.c b/mm/swap.c > index fc8281ef42415..bbf9aa76db87f 100644 > --- a/mm/swap.c > +++ b/mm/swap.c > @@ -175,6 +175,8 @@ static void folio_batch_move_lru(struct > folio_batch *fbatch, move_fn_t move_fn) >      folios_put(fbatch); >  } > > +static void lru_activate(struct lruvec *lruvec, struct folio *folio); > + >  static void __folio_batch_add_and_move(struct folio_batch __percpu > *fbatch, >          struct folio *folio, move_fn_t move_fn, >          bool on_lru, bool disable_irq) > @@ -191,6 +193,10 @@ static void __folio_batch_add_and_move(struct > folio_batch __percpu *fbatch, >      else >          local_lock(&cpu_fbatches.lock); > > +    /* We'll only perform the actual list move deferred. */ > +    if (move_fn == lru_activate) > +        folio_set_active(folio); > + >      if (!folio_batch_add(this_cpu_ptr(fbatch), folio) || > folio_test_large(folio) || >          lru_cache_disabled()) >          folio_batch_move_lru(this_cpu_ptr(fbatch), move_fn); > @@ -299,12 +305,14 @@ static void lru_activate(struct lruvec *lruvec, > struct folio *folio) >  { >      long nr_pages = folio_nr_pages(folio); > > -    if (folio_test_active(folio) || folio_test_unevictable(folio)) > -        return; > - > +    /* > +     * We set the folio active after clearing the LRU flag, and set the > +     * LRU flag only after moving it to the right list. > +     */ > +    VM_WARN_ON_ONCE(!folio_test_active(folio)); > +    VM_WARN_ON_ONCE(folio_test_unevictable(folio)); > >      lruvec_del_folio(lruvec, folio); > -    folio_set_active(folio); >      lruvec_add_folio(lruvec, folio); >      trace_mm_lru_activate(folio); > > @@ -342,7 +350,10 @@ void folio_activate(struct folio *folio) >          return; > >      lruvec = folio_lruvec_lock_irq(folio); > -    lru_activate(lruvec, folio); > +    if (!folio_test_unevictable(folio)) { > +        folio_set_active(folio); > +        lru_activate(lruvec, folio); > +    } >      unlock_page_lruvec_irq(lruvec); >      folio_set_lru(folio); >  }