From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9727BC25B76 for ; Wed, 5 Jun 2024 11:38:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0BBE96B0082; Wed, 5 Jun 2024 07:38:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 06DD26B0085; Wed, 5 Jun 2024 07:38:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E77136B0089; Wed, 5 Jun 2024 07:38:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id C76906B0082 for ; Wed, 5 Jun 2024 07:38:04 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 7960B1612F6 for ; Wed, 5 Jun 2024 11:38:04 +0000 (UTC) X-FDA: 82196636088.15.BD5C9A6 Received: from out30-124.freemail.mail.aliyun.com (out30-124.freemail.mail.aliyun.com [115.124.30.124]) by imf18.hostedemail.com (Postfix) with ESMTP id 095631C0023 for ; Wed, 5 Jun 2024 11:38:00 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=fXXDREDu; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf18.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.124 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717587482; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=GQClWKdIZ+BJLdneee1gIM/rhZDMeoYKUCGbcbAUpmA=; b=VsHlrg3loCRjanPdTsynI6QGidsdMYTW0YtvywvKmvBX6obfRcFPuJyrISBwNqZ8nPF82p iLDbW7RAb0YSATda0lOtfb4tX9d9GfMXqS02sNAoOJCLGIh6NVs8Q6Olix9CEisLFFYjFz DDG4M5zcGPgiWh/+390IEKy7H01eraQ= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=fXXDREDu; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf18.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.124 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717587482; a=rsa-sha256; cv=none; b=VOBNDP60BO6t86IO70IjRw4TLpAwwZQf8Av9ecFPHfkO9C0Ded+l1CBfK3jhaGQBtSvrtq DPs8bIj9IzmKCLgO5FwnNe9h10iDlPuGTfWuZL4jc2D2ClK3SS/U82HzkOjKTJo+h3ULh3 BwgLuqT6xoPoBCwBbCdhs82ogBfNlEw= DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1717587477; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=GQClWKdIZ+BJLdneee1gIM/rhZDMeoYKUCGbcbAUpmA=; b=fXXDREDuQdHjR97hdequiylY76V9zmlgKFJiqA/j+pfbRBkzoaRZZvNWegdnVy/o0m7Hbz2OzzihBkfZCRDpY4w0XQq4U3Z/3TYsL4wYbTTEPqoPFIPKCifK2zVKlC6va5LCmrLa2VQDqr8Kt/4zbSOMtlDi2BrLXKf9uBvpziE= X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R161e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=maildocker-contentspam033068173054;MF=baolin.wang@linux.alibaba.com;NM=1;PH=DS;RN=7;SR=0;TI=SMTPD_---0W7v6r4V_1717587475; Received: from 30.97.56.61(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0W7v6r4V_1717587475) by smtp.aliyun-inc.com; Wed, 05 Jun 2024 19:37:56 +0800 Message-ID: <48150a28-ed48-49ff-9432-9cd30cda4da4@linux.alibaba.com> Date: Wed, 5 Jun 2024 19:37:55 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] mm/gup: don't check page lru flag before draining it To: David Hildenbrand , yangge1116 , akpm@linux-foundation.org, Matthew Wilcox Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, liuzixing@hygon.cn References: <1717498121-20926-1-git-send-email-yangge1116@126.com> <0d7a4405-9a2e-4bd1-ba89-a31486155233@redhat.com> <776de760-e817-43b2-bd00-8ce96f4e37a8@redhat.com> <7063920f-963a-4b3e-a3f3-c5cc227bc877@redhat.com> From: Baolin Wang In-Reply-To: <7063920f-963a-4b3e-a3f3-c5cc227bc877@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 095631C0023 X-Stat-Signature: fi8pfkw4mo51198x7jc8ywhwkffqzymr X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1717587480-96344 X-HE-Meta: U2FsdGVkX19z5/MQ6R83rCPXcrTp26PGKwvKQrdZVVc8Y3tvbSUDEdLJlj4DUw2Hljyi2XxSzScbnqjEHZRnRdo641G/6bV3sSqEkir31nVuTyoehk6pS9KFqz11S8EuhqcvujMbtVIiOmqJDuyYkD4eCNjdJ21cDkvSSMjBAk9AGt2vlyI7Vo4O0ZaMFztwJwc5UvETcc2ecOSBAa2hOah8o2JQb9oGaTrLverSsvsyp1yuTQ9zWwqvSk8v+Eav5b6IyglBqgp2g6XjPfiwWs3WUL8Jz/xl4WTT2LkgVf6jDFQBz3QRQUuMUS0iIV/t26t226s00zyEcvm5SijlOAaTQhkPOuVsVx5//0DF4C5mdjLcm/h4uDMOwLkZV0O2Pb2wDFx9PjXcZKuulQyUzWVspe1veEmzVZmXgJ/jeSKJMRprhZ7+HY9UgSlZ/tBWE7vCr7aqcbwB8TWYDHZCVuUzjGdSMZZx9+6rXeRvoSMu36hdy2WvT2HXjyCk58JHWLh1Tqx0RGm0UV1RrySNwn4roP2y4NTMl2v0X+BIpWnNYSCanLsc2xZFdT3XkNOoHSrJMW/9LyD39XO/pD220UoMZhlj+Sug6iSOMJM85mf2JtjJ+Cgr6kcCOXsoJKYzJz/Go8lPJtfbU3vPyu9QS8L7VpJ/ri6Wyw/nN+jUfN2OinBqEHEVZw19GcuHBCAm8nRquDYEGlGE91Vv10PvKiSyM2OI/k2ntqZMeRQ/XmpdHquhpwXYtlVUh6rcrT98CPEhiWuUFkKTpvnM7kzs4pg//NzN4x885ajeJ5amwJChDT+dg9uZHnbzYXljyhW3NX1dNPT+rsQTaP3STeLez5zL7NR1KOxtzMXuxOgROsFRSCMzKinKd6O/ykymhUbdZc7/a59y4ZJOQjdqK8foaarV+qn7wAZAbHFE3cI55HfbPtLDan2s98i0a4PKgyi19MBBc/P1d4wUzt333MC uYzFvxSo k1j2ULg09H47z7pSgTyXY7y6BW3rFo1Agdln2Q/zd6sfXRk+EKxVzWO+0KV5Wz2sBLJRStT5K312mNgL/UR55YLIEietfiFZqVnjOktsvVQvYf1BQskqbYEJhqe+SJgM1gsWuBscKl85/n2SYHzdaSq2zt/zULPSr9P9JEvWGusjhxGH7V0n3Spcd7HmOWl16dqYETEVjuvKKGHVIbIakWBobeFvHGKMnLuY+Dt6zxfW83AG6qfG1P4QgSafmQB7iBrwnYU6nh3VmK8d5GYD6RJL+kGdSFsTczBix+GIsu4fhQCjs73evz9sjakNA1JOuTVMGIDdl20i0YsEPExYRu7vicw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/6/5 17:53, David Hildenbrand wrote: > On 05.06.24 11:41, David Hildenbrand wrote: >> On 05.06.24 03:18, yangge1116 wrote: >>> >>> >>> 在 2024/6/4 下午9:47, David Hildenbrand 写道: >>>> On 04.06.24 12:48, yangge1116@126.com wrote: >>>>> From: yangge >>>>> >>>>> If a page is added in pagevec, its ref count increases one, remove >>>>> the page from pagevec decreases one. Page migration requires the >>>>> page is not referenced by others except page mapping. Before >>>>> migrating a page, we should try to drain the page from pagevec in >>>>> case the page is in it, however, folio_test_lru() is not sufficient >>>>> to tell whether the page is in pagevec or not, if the page is in >>>>> pagevec, the migration will fail. >>>>> >>>>> Remove the condition and drain lru once to ensure the page is not >>>>> referenced by pagevec. >>>> >>>> What you are saying is that we might have a page on which >>>> folio_test_lru() succeeds, that was added to one of the cpu_fbatches, >>>> correct? >>> >>> Yes >>> >>>> >>>> Can you describe under which circumstances that happens? >>>> >>> >>> If we call folio_activate() to move a page from inactive LRU list to >>> active LRU list, the page is not only in LRU list, but also in one of >>> the cpu_fbatches. >>> >>> void folio_activate(struct folio *folio) >>> { >>>        if (folio_test_lru(folio) && !folio_test_active(folio) && >>>            !folio_test_unevictable(folio)) { >>>            struct folio_batch *fbatch; >>> >>>            folio_get(folio); >>>            //After this, folio is in LRU list, and its ref count have >>> increased one. >>> >>>            local_lock(&cpu_fbatches.lock); >>>            fbatch = this_cpu_ptr(&cpu_fbatches.activate); >>>            folio_batch_add_and_move(fbatch, folio, folio_activate_fn); >>>            local_unlock(&cpu_fbatches.lock); >>>        } >>> } >> >> Interesting, the !SMP variant does the folio_test_clear_lru(). >> >> It would be really helpful if we could reliably identify whether LRU >> batching code has a raised reference on a folio. >> >> We have the same scenario in >> * folio_deactivate() >> * folio_mark_lazyfree() >> >> In folio_batch_move_lru() we do the folio_test_clear_lru(folio). >> >> No expert on that code, I'm wondering if we could move the >> folio_test_clear_lru() out, such that we can more reliably identify >> whether a folio is on the LRU batch or not. > > I'm sure there would be something extremely broken with the following > (I don't know what I'm doing ;) ), but I wonder if there would be a way > to make something like that work (and perform well enough?). > > diff --git a/mm/swap.c b/mm/swap.c > index 67786cb771305..642e471c3ec5a 100644 > --- a/mm/swap.c > +++ b/mm/swap.c > @@ -212,10 +212,6 @@ static void folio_batch_move_lru(struct folio_batch > *fbatch, move_fn_t move_fn) >         for (i = 0; i < folio_batch_count(fbatch); i++) { >                 struct folio *folio = fbatch->folios[i]; > > -               /* block memcg migration while the folio moves between > lru */ > -               if (move_fn != lru_add_fn && !folio_test_clear_lru(folio)) > -                       continue; > - >                 folio_lruvec_relock_irqsave(folio, &lruvec, &flags); >                 move_fn(lruvec, folio); > > @@ -255,8 +251,9 @@ static void lru_move_tail_fn(struct lruvec *lruvec, > struct folio *folio) >   */ >  void folio_rotate_reclaimable(struct folio *folio) >  { > -       if (!folio_test_locked(folio) && !folio_test_dirty(folio) && > -           !folio_test_unevictable(folio) && folio_test_lru(folio)) { > +       if (folio_test_lru(folio) && !folio_test_locked(folio) && > +           !folio_test_dirty(folio) && !folio_test_unevictable(folio) && > +           folio_test_clear_lru(folio)) { >                 struct folio_batch *fbatch; >                 unsigned long flags; > > @@ -354,7 +351,7 @@ static void folio_activate_drain(int cpu) >  void folio_activate(struct folio *folio) >  { >         if (folio_test_lru(folio) && !folio_test_active(folio) && > -           !folio_test_unevictable(folio)) { > +           !folio_test_unevictable(folio) && > folio_test_clear_lru(folio)) { IMO, this seems violate the semantics of the LRU flag, since it's clear that this folio is still in the LRU list. With your changes, I think we should drain the page vectors before calling folio_test_lru(), otherwise some cases will fail to check folio_test_lru() if the folio remain in the page vectors for an extended period. >                 struct folio_batch *fbatch; > >                 folio_get(folio); > @@ -699,6 +696,8 @@ void deactivate_file_folio(struct folio *folio) >         /* Deactivating an unevictable folio will not accelerate > reclaim */ >         if (folio_test_unevictable(folio)) >                 return; > +       if (!folio_test_clear_lru(folio)) > +               return; > >         folio_get(folio); >         local_lock(&cpu_fbatches.lock); > @@ -718,7 +717,8 @@ void deactivate_file_folio(struct folio *folio) >  void folio_deactivate(struct folio *folio) >  { >         if (folio_test_lru(folio) && !folio_test_unevictable(folio) && > -           (folio_test_active(folio) || lru_gen_enabled())) { > +           (folio_test_active(folio) || lru_gen_enabled()) && > +           folio_test_clear_lru(folio)) { >                 struct folio_batch *fbatch; > >                 folio_get(folio); > @@ -740,7 +740,8 @@ void folio_mark_lazyfree(struct folio *folio) >  { >         if (folio_test_lru(folio) && folio_test_anon(folio) && >             folio_test_swapbacked(folio) && > !folio_test_swapcache(folio) && > -           !folio_test_unevictable(folio)) { > +           !folio_test_unevictable(folio) && > +           folio_test_clear_lru(folio)) { >                 struct folio_batch *fbatch; > >                 folio_get(folio); > > >