From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 528A3C27C52 for ; Thu, 6 Jun 2024 08:50:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C96CF6B00A8; Thu, 6 Jun 2024 04:50:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C47E36B00A9; Thu, 6 Jun 2024 04:50:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B0F166B00AA; Thu, 6 Jun 2024 04:50:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 929B56B00A8 for ; Thu, 6 Jun 2024 04:50:41 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 0E40414042C for ; Thu, 6 Jun 2024 08:50:40 +0000 (UTC) X-FDA: 82199843082.14.0BAB2B0 Received: from m16.mail.126.com (m16.mail.126.com [220.197.31.7]) by imf22.hostedemail.com (Postfix) with ESMTP id 5F476C0010 for ; Thu, 6 Jun 2024 08:50:36 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=126.com header.s=s110527 header.b=JG0a6zGJ; spf=pass (imf22.hostedemail.com: domain of yangge1116@126.com designates 220.197.31.7 as permitted sender) smtp.mailfrom=yangge1116@126.com; dmarc=pass (policy=none) header.from=126.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717663838; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=l9dgqjvvyjtqb4tZl+GZ+hz5ridGS7XuFizCC/+kJg8=; b=oOgtK5mMXt1pkfE4qhSeiWryhmhTHvsTvPrpFNRq8V0mqOzERQdGZvG4aNBuGrWXT/xgay OE7FuLk8jjxxgva2FdeMsiiVVcbvu/XjDsRtyGN+cOEEXsZcmjTiAGaay1lHUM4o6I+Zzx E7xmxd/WrraTzHCZE93fzBooYmfTpDE= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=126.com header.s=s110527 header.b=JG0a6zGJ; spf=pass (imf22.hostedemail.com: domain of yangge1116@126.com designates 220.197.31.7 as permitted sender) smtp.mailfrom=yangge1116@126.com; dmarc=pass (policy=none) header.from=126.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717663838; a=rsa-sha256; cv=none; b=RuDi40Ol7MJSMT4Mg+adbo2GseaPC6/vtCJRYngBNQCgfTrn5kuvtAahpBCVVFsx2GpA8r aBQZy1kt7d0EVIsKC+kDhZ02qggInVykgjQCBoOqsWnXxWZIdEvQnTO5FxgyUdvVYy161L lXX2MsqTTruO2ld70uoXw5mqL3fwBHo= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=126.com; s=s110527; h=Subject:From:Message-ID:Date:MIME-Version: Content-Type; bh=l9dgqjvvyjtqb4tZl+GZ+hz5ridGS7XuFizCC/+kJg8=; b=JG0a6zGJdsac0KUjHwfMDue2zbAzSpXiprd2iMLcDRUPd/40d9dDpKY0FhULg9 c0kitkA9W23noUTOO8WTryEWXdbKkW3XsdLJCE/+7YDT4MuNHTh8aVnx2q2JW72c hE+LEJyTXMeJHHRFKbRAKkoJm39xdLS2QusGXZHPY5Y8A= Received: from [172.21.21.216] (unknown [118.242.3.34]) by gzga-smtp-mta-g0-0 (Coremail) with SMTP id _____wD3v81LeGFmGm5wBA--.59082S2; Thu, 06 Jun 2024 16:50:21 +0800 (CST) Subject: Re: [PATCH] mm/gup: don't check page lru flag before draining it To: David Hildenbrand , akpm@linux-foundation.org, Matthew Wilcox Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, baolin.wang@linux.alibaba.com, liuzixing@hygon.cn References: <1717498121-20926-1-git-send-email-yangge1116@126.com> <0d7a4405-9a2e-4bd1-ba89-a31486155233@redhat.com> <776de760-e817-43b2-bd00-8ce96f4e37a8@redhat.com> <7063920f-963a-4b3e-a3f3-c5cc227bc877@redhat.com> <19590664-5190-4d30-ba0d-ec9d0ea373d3@redhat.com> From: yangge1116 Message-ID: <08a54ebe-bf4d-0560-aaa8-a196c6c88276@126.com> Date: Thu, 6 Jun 2024 16:50:19 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: <19590664-5190-4d30-ba0d-ec9d0ea373d3@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-CM-TRANSID:_____wD3v81LeGFmGm5wBA--.59082S2 X-Coremail-Antispam: 1Uf129KBjvJXoW3Aw15ZF1ruFW7Ar4rtr4Dtwb_yoWxXrWrpF W8GF1qgFWDJF1UCr42qF1Fyr10kr90qr4UXF47Gry2yFn8tr1DKF47tw15CFsxJr18uF1I va4jqFn3WF1YqFDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x07bwoGLUUUUU= X-Originating-IP: [118.242.3.34] X-CM-SenderInfo: 51dqwwjhrrila6rslhhfrp/1tbiWQ71G2VLalhxNQABsg X-Stat-Signature: nhefd8ea1p65bbqxpttji4exwna45m6q X-Rspamd-Queue-Id: 5F476C0010 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1717663836-830745 X-HE-Meta: U2FsdGVkX1/FxaXpSJsEBs+IbjPWSU/vSkrqsmJLdNyS7yCiq2RZ2oYlkzcJOgR4fR6Tro2yQHkjbdk1GKPF+KBc4AeCwBAb5aGITbdWyvgq+BZWRLAr6SOYq3yueWPVX4yUvzzzz5kxXb7f8b03o7c5r9Yb6zwg8viLfVbID22AAIno4h78cNUnEGZ2Z0wvG/TGWuX2qwoLuJVtK9EvU7XAuUY8H0+n4hkjeCoIcLHQ1cdxlAX80HTgThYTlLqWz1pD/o5stGkZKq2UGgC/+PPcwGyNcVMfCvBmIWxXV9rkpZTgYCC1tU3xtdWozwHVbLsUn7Q7JrSKR2Cq3vs9RtnGIO1JX5kjxAJ8aSTeHcbhkfPctfzB6Dx/n378btrzFFkVa83hhn5J2vlmpZ60yIab0OWDMm8lWuO4alZCmQJW0G6Sw40IeqWUVBZMEjVK521nOhfunRsZ88BGT/qkVBmgLm00+NKL/iCPqKgMYVnllhT/pm2Xj+WdBBvlP5CG0U+gRhSzOmx051PULcYkNueeLiyTPznoK864nfu4DWPa44nGYhQo4r9Ue7lwgg7CBGfjdWLzc3xAhKifiqBZlqDyvmFwLMV189qhS2J6xf6WawDJ/I8GhI3I6iiPZ73+FckBwR8/sTIxtVSGzKlaWUnV+J5A4dPsgE709iEYuYOrTxoY8zjs4m+opiPlXggSgniZv+z9V3Nf/LVWaRH/qZiAcF5r+o/+BWzEE1+MIJcA8nmVxh5muDgenJG5H9F8A9Dy88sjHaWKq2SM6IpLpps+h3GTqjCANRmURl4VhB97HlBdbHq8Ohj0AuC8i+s8Uc27lprhpVkDiVRhRs3dFZgGuqyaFEARhIyHzYSz/qnZSc54YXGBLWewPp32EsAR5UEMLXCLk2G4nGZkCN2EO+6aThzWm/pDj6O/6PBZYiylyj0JGhIM7BYpLHejGtEM5Cw0zjn56j19evvnK5s tAF8/sMj H8rwMxtxG9hnZTvcaqng5FhqFcuT2U+epyWLpLWCOs98ARBsdrBy0C0P4tPKd2mwbmvDFkwuxL83VkDc5zMPe2R24g0iNXd0UwQzeFdvz4RmYAS4gf+OTOORjNHWXzK+SHv7ZnW4QJQr4K5UGfbHkqm1gR/3/q7a5Olf43aPnB/Iu8eNnluRZoIQg49JhIWxGqPb23r1MBU/qfe37tqvlAC9aKSIVOMdyL7+UUnqhZM875c5OtXqA2MrCvN8Q4Gsf1oOr27vCzYDZfw8JBOGgwvhMuLrlIEPcGJhbN7mm6EJDIJ91oLvV7ISOFZqiXV0KzYEHB6PTHDrmjDKAeA3eFfiK/8N2vQ+g1D2i7Ct7dJyFHC9g9N9cLJFnyVgPjnQJQBG7an04fsvs4TQ3W9T+0f23A9/MrMv+/8lU+Yc0+RSDI1WH1n5hfEYlGI2DlZQegfyrvIlB4YLr2vM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: 在 2024/6/6 下午3:39, David Hildenbrand 写道: > On 06.06.24 03:35, yangge1116 wrote: >> >> >> 在 2024/6/5 下午5:53, David Hildenbrand 写道: >>> On 05.06.24 11:41, David Hildenbrand wrote: >>>> On 05.06.24 03:18, yangge1116 wrote: >>>>> >>>>> >>>>> 在 2024/6/4 下午9:47, David Hildenbrand 写道: >>>>>> On 04.06.24 12:48, yangge1116@126.com wrote: >>>>>>> From: yangge >>>>>>> >>>>>>> If a page is added in pagevec, its ref count increases one, remove >>>>>>> the page from pagevec decreases one. Page migration requires the >>>>>>> page is not referenced by others except page mapping. Before >>>>>>> migrating a page, we should try to drain the page from pagevec in >>>>>>> case the page is in it, however, folio_test_lru() is not sufficient >>>>>>> to tell whether the page is in pagevec or not, if the page is in >>>>>>> pagevec, the migration will fail. >>>>>>> >>>>>>> Remove the condition and drain lru once to ensure the page is not >>>>>>> referenced by pagevec. >>>>>> >>>>>> What you are saying is that we might have a page on which >>>>>> folio_test_lru() succeeds, that was added to one of the cpu_fbatches, >>>>>> correct? >>>>> >>>>> Yes >>>>> >>>>>> >>>>>> Can you describe under which circumstances that happens? >>>>>> >>>>> >>>>> If we call folio_activate() to move a page from inactive LRU list to >>>>> active LRU list, the page is not only in LRU list, but also in one of >>>>> the cpu_fbatches. >>>>> >>>>> void folio_activate(struct folio *folio) >>>>> { >>>>>         if (folio_test_lru(folio) && !folio_test_active(folio) && >>>>>             !folio_test_unevictable(folio)) { >>>>>             struct folio_batch *fbatch; >>>>> >>>>>             folio_get(folio); >>>>>             //After this, folio is in LRU list, and its ref count have >>>>> increased one. >>>>> >>>>>             local_lock(&cpu_fbatches.lock); >>>>>             fbatch = this_cpu_ptr(&cpu_fbatches.activate); >>>>>             folio_batch_add_and_move(fbatch, folio, >>>>> folio_activate_fn); >>>>>             local_unlock(&cpu_fbatches.lock); >>>>>         } >>>>> } >>>> >>>> Interesting, the !SMP variant does the folio_test_clear_lru(). >>>> >>>> It would be really helpful if we could reliably identify whether LRU >>>> batching code has a raised reference on a folio. >>>> >>>> We have the same scenario in >>>> * folio_deactivate() >>>> * folio_mark_lazyfree() >>>> >>>> In folio_batch_move_lru() we do the folio_test_clear_lru(folio). >>>> >>>> No expert on that code, I'm wondering if we could move the >>>> folio_test_clear_lru() out, such that we can more reliably identify >>>> whether a folio is on the LRU batch or not. >>> >>> I'm sure there would be something extremely broken with the following >>> (I don't know what I'm doing ;) ), but I wonder if there would be a way >>> to make something like that work (and perform well enough?). >>> >>> diff --git a/mm/swap.c b/mm/swap.c >>> index 67786cb771305..642e471c3ec5a 100644 >>> --- a/mm/swap.c >>> +++ b/mm/swap.c >>> @@ -212,10 +212,6 @@ static void folio_batch_move_lru(struct folio_batch >>> *fbatch, move_fn_t move_fn) >>>           for (i = 0; i < folio_batch_count(fbatch); i++) { >>>                   struct folio *folio = fbatch->folios[i]; >>> >>> -               /* block memcg migration while the folio moves between >>> lru */ >>> -               if (move_fn != lru_add_fn && >>> !folio_test_clear_lru(folio)) >>> -                       continue; >>> - >>>                   folio_lruvec_relock_irqsave(folio, &lruvec, &flags); >>>                   move_fn(lruvec, folio); >>> >>> @@ -255,8 +251,9 @@ static void lru_move_tail_fn(struct lruvec *lruvec, >>> struct folio *folio) >>>     */ >>>    void folio_rotate_reclaimable(struct folio *folio) >>>    { >>> -       if (!folio_test_locked(folio) && !folio_test_dirty(folio) && >>> -           !folio_test_unevictable(folio) && folio_test_lru(folio)) { >>> +       if (folio_test_lru(folio) && !folio_test_locked(folio) && >>> +           !folio_test_dirty(folio) && >>> !folio_test_unevictable(folio) && >>> +           folio_test_clear_lru(folio)) { >>>                   struct folio_batch *fbatch; >>>                   unsigned long flags; >>> >>> @@ -354,7 +351,7 @@ static void folio_activate_drain(int cpu) >>>    void folio_activate(struct folio *folio) >>>    { >>>           if (folio_test_lru(folio) && !folio_test_active(folio) && >>> -           !folio_test_unevictable(folio)) { >>> +           !folio_test_unevictable(folio) && >>> folio_test_clear_lru(folio)) { >>>                   struct folio_batch *fbatch; >>> >>>                   folio_get(folio); >>> @@ -699,6 +696,8 @@ void deactivate_file_folio(struct folio *folio) >>>           /* Deactivating an unevictable folio will not accelerate >>> reclaim */ >>>           if (folio_test_unevictable(folio)) >>>                   return; >>> +       if (!folio_test_clear_lru(folio)) >>> +               return; >>> >>>           folio_get(folio); >>>           local_lock(&cpu_fbatches.lock); >>> @@ -718,7 +717,8 @@ void deactivate_file_folio(struct folio *folio) >>>    void folio_deactivate(struct folio *folio) >>>    { >>>           if (folio_test_lru(folio) && !folio_test_unevictable(folio) && >>> -           (folio_test_active(folio) || lru_gen_enabled())) { >>> +           (folio_test_active(folio) || lru_gen_enabled()) && >>> +           folio_test_clear_lru(folio)) { >>>                   struct folio_batch *fbatch; >>> >>>                   folio_get(folio); >>> @@ -740,7 +740,8 @@ void folio_mark_lazyfree(struct folio *folio) >>>    { >>>           if (folio_test_lru(folio) && folio_test_anon(folio) && >>>               folio_test_swapbacked(folio) && >>> !folio_test_swapcache(folio) && >>> -           !folio_test_unevictable(folio)) { >>> +           !folio_test_unevictable(folio) && >>> +           folio_test_clear_lru(folio)) { >>>                   struct folio_batch *fbatch; >>> >>>                   folio_get(folio); >> >> With your changes, we will call folio_test_clear_lru(folio) firstly to >> clear the LRU flag, and then call folio_get(folio) to pin the folio, >> seems a little unreasonable. Normally, folio_get(folio) is called >> firstly to pin the page, and then some other functions is called to >> handle the folio. > > Right, if that really matters (not sure if it does) we could do > > if (folio_test_lru(folio) && ... >     folio_get(folio); >     if (!folio_test_clear_lru(folio)) { >         folio_put(folio) >     } else { >         struct folio_batch *fbatch; >     } > } > Right, it seems can work. As discussed above, it will make the visible time where it is cleared "longer". Other users of lru_add_drain_all() don't check whether folio is in lru, seems there's no need to check here.