From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 69865CF6495 for ; Sun, 29 Sep 2024 02:19:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CE0B68000B; Sat, 28 Sep 2024 22:19:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C757180009; Sat, 28 Sep 2024 22:19:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ABB258000B; Sat, 28 Sep 2024 22:19:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 8903C80009 for ; Sat, 28 Sep 2024 22:19:16 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id E882441318 for ; Sun, 29 Sep 2024 02:19:15 +0000 (UTC) X-FDA: 82616168670.04.7D57593 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by imf15.hostedemail.com (Postfix) with ESMTP id 038E3A0007 for ; Sun, 29 Sep 2024 02:19:12 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf15.hostedemail.com: domain of linmiaohe@huawei.com designates 45.249.212.187 as permitted sender) smtp.mailfrom=linmiaohe@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727576335; a=rsa-sha256; cv=none; b=W0MsOhyXMkOY+f17S5uEBtdb0BGu3f6SYUCKs3orLF+lIXDMQSHrn81nldlWu0ivh/urMe JWZA07jNimafuxj1HYIGidFOxl7S+7Fmpgc1BM5dtW5Ft3taEn4ZL9+FozqCfmFBztiEY2 uDzh+FEuEbngjAB9LUr9wCbqllW76JU= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf15.hostedemail.com: domain of linmiaohe@huawei.com designates 45.249.212.187 as permitted sender) smtp.mailfrom=linmiaohe@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727576335; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ClZdcGQFF+wtJ4qCXU5qRDFVmeJViUrIdZpv/lOGNFI=; b=Z6Qd+zxwPSsuQLWMT2ZEnUyckQ1V5FbSiEGea2LCE1orE6FA3mH5PPRivfdqY8graRnP+w RwOUItRtsD71vImjKVT2xwOz6UfCd1+Zab/rWaCogOGQgYFoQQOq8dMOCswZzf/4I5jcSe RJKdqaQdVko3LXcGfLouaeYmceYruLg= Received: from mail.maildlp.com (unknown [172.19.163.48]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4XGSXp24NDzySZG; Sun, 29 Sep 2024 10:18:02 +0800 (CST) Received: from kwepemd200019.china.huawei.com (unknown [7.221.188.193]) by mail.maildlp.com (Postfix) with ESMTPS id 533C018007C; Sun, 29 Sep 2024 10:19:07 +0800 (CST) Received: from [10.173.127.72] (10.173.127.72) by kwepemd200019.china.huawei.com (7.221.188.193) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Sun, 29 Sep 2024 10:19:06 +0800 Subject: Re: [PATCH v3 1/5] mm: memory_hotplug: remove head variable in do_migrate_range() To: Kefeng Wang , David Hildenbrand , Matthew Wilcox CC: Andrew Morton , Oscar Salvador , Naoya Horiguchi , , , Jonathan Cameron References: <20240827114728.3212578-1-wangkefeng.wang@huawei.com> <20240827114728.3212578-2-wangkefeng.wang@huawei.com> <20a75b57-12a6-468f-bd7c-0aeb2f259228@redhat.com> <170546a8-e442-91e9-31e8-60a91018172a@huawei.com> <841eb150-fac6-461e-808f-e6ae607c7d81@huawei.com> From: Miaohe Lin Message-ID: <986bca05-7fd9-49ac-9129-934f31c28af6@huawei.com> Date: Sun, 29 Sep 2024 10:19:05 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0 MIME-Version: 1.0 In-Reply-To: <841eb150-fac6-461e-808f-e6ae607c7d81@huawei.com> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 8bit X-Originating-IP: [10.173.127.72] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To kwepemd200019.china.huawei.com (7.221.188.193) X-Rspam-User: X-Stat-Signature: 74tcq1fs5tffdazn7hisg5smdd58d1n4 X-Rspamd-Queue-Id: 038E3A0007 X-Rspamd-Server: rspam02 X-HE-Tag: 1727576352-719231 X-HE-Meta: U2FsdGVkX1+PImFeE6EXgzIJ4qQVf25ryPij2U5/dD9bDqRxbNC2O5MsJrRSfZuWaEzx3GOVJKFAwpprBsrZxptWYw2TXDYNfG51jCFLvq2y8EAujoAjzYDcJTfemcLpURH4SyQu+1KoVn9dxIL6yo6iXcgvIIBywualE4wzejTpUX6xepme6iUSSPioQzZUAI+ujJd4RJqSo37Fnjhm5RkP3DRH2J+GQTSJ/4QICOCgY3OMuh1Xx3JjnLWFLNyypZgLw+GZnaj53/DDW14qu2mbt3UbqaU1n2cFvc8pJ6BtLm/Oc+rY2CCjWq0bh7XRKmQJbDDo7dIX3dCRK6HTYQFXTSra3pkwI1vTVqTiIgRrLfRp+Asz+rxme49ujz28k6n4Buhswh7+HX+gQx7V371Odv975h6JYVkIpyaUVAcj8ZaSHmYlwJHONcUUVwiR8yGq/Dm4vpf9ihKC7ZAXtS/ikYarec8HeBPjOUSBf3ovXowZUJTgNA/sfkt7RmUIo7bv+T7AdOTKH2fu5DnZM96kanAz4J6WKR2ikD7SR2LaVKDnJNenvQ5Q4pq5hV3rHuwz1YbJSTRyK95b+3SEKtVjZzPAuk2Lx4q5N0MkgljNv7F/+OawhId3FgEKSjV0NfF8AlauhmKUwBs0USbpq+lOJ3mWPCzvnD+hvCSTfYXW/V3UJ0lJav8GVZbrPi/0wZpGjtNA3BX6GEJAoumvUkydde+2R+q1BL3rI6vjbFtfYBQWem80sK3u3nscxlf90ybiQ5P/9ue5mIwzo9WQPYIINL9a+dDNBqBtBkM0ObLtogusSH+hK25MBWh/OetdAkc6QYujg6TRTEku0maCVj08mBYGurBwtLAJxfvq39LRA/cI4St2UsyOvUlj2fY0YE0zccBT+vtA97l130putPXnBidZD8OuLD7cCiadVOf9UpKbsp0ibDprIkjEMDDp1QXi2NR0845+w6HiSw2 v9us/MN0 8trS6S0d7xB3UibXRIHlGLSWinyzggKGe9WEkEVxHI+TAGGSOJiLmlM6+Acn/vDAsOGVhIA9vots09f3Iyf22T7eY3xWuVmTWXWe1jpb6xiZ7artinxNUTXcsQba1Hua3L8zWAbSuohviwNOBwfHKdad66SjuUBb+zcBI9X8N9jSYYqwXANb734814b66yJ4FO2PW1jpxtcT+n7kTxUdgmf4MBB+LV8SG/YiyML+N2OoqqdOfWb9amoqeoyOqjhPHu1I9fYnNB96djX9odxoEvtgkSQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/9/29 10:04, Kefeng Wang wrote: > > > On 2024/9/29 9:16, Miaohe Lin wrote: >> On 2024/9/28 16:39, David Hildenbrand wrote: >>> On 28.09.24 10:34, David Hildenbrand wrote: >>>> On 28.09.24 06:55, Matthew Wilcox wrote: >>>>> On Tue, Aug 27, 2024 at 07:47:24PM +0800, Kefeng Wang wrote: >>>>>> Directly use a folio for HugeTLB and THP when calculate the next pfn, then >>>>>> remove unused head variable. >>>>> >>>>> I just noticed this got merged.  You're going to hit BUG_ON with it. >>>>> >>>>>> -        if (PageHuge(page)) { >>>>>> -            pfn = page_to_pfn(head) + compound_nr(head) - 1; >>>>>> -            isolate_hugetlb(folio, &source); >>>>>> -            continue; >>>>>> -        } else if (PageTransHuge(page)) >>>>>> -            pfn = page_to_pfn(head) + thp_nr_pages(page) - 1; >>>>>> +        /* >>>>>> +         * No reference or lock is held on the folio, so it might >>>>>> +         * be modified concurrently (e.g. split).  As such, >>>>>> +         * folio_nr_pages() may read garbage.  This is fine as the outer >>>>>> +         * loop will revisit the split folio later. >>>>>> +         */ >>>>>> +        if (folio_test_large(folio)) { >>>>> >>>>> But it's not fine.  Look at the implementation of folio_test_large(): >>>>> >>>>> static inline bool folio_test_large(const struct folio *folio) >>>>> { >>>>>            return folio_test_head(folio); >>>>> } >>>>> >>>>> That's going to be provided by: >>>>> >>>>> #define FOLIO_TEST_FLAG(name, page)                                     \ >>>>> static __always_inline bool folio_test_##name(const struct folio *folio) \ >>>>> { return test_bit(PG_##name, const_folio_flags(folio, page)); } >>>>> >>>>> and here's the BUG: >>>>> >>>>> static const unsigned long *const_folio_flags(const struct folio *folio, >>>>>                    unsigned n) >>>>> { >>>>>            const struct page *page = &folio->page; >>>>> >>>>>            VM_BUG_ON_PGFLAGS(PageTail(page), page); >>>>>            VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page->flags), page); >>>>>            return &page[n].flags; >>>>> } >>>>> >>>>> (this page can be transformed from a head page to a tail page because, >>>>> as the comment notes, we don't hold a reference. >>>>> >>>>> Please back this out. >>>> >>>> Should we generalize the approach in dump_folio() to locally copy a >>>> folio, so we can safely perform checks before deciding whether we want >>>> to try grabbing a reference on the real folio (if it's still a folio :) )? >>>> >>> >>> Oh, and I forgot: isn't the existing code already racy? >>> >>> PageTransHuge() -> VM_BUG_ON_PAGE(PageTail(page), page); > > Yes, in v1[1], I asked same question for existing code for PageTransHuge(page), > >   "If the page is a tail page, we will BUG_ON(DEBUG_VM enabled) here, >    but it seems that we don't guarantee the page won't be a tail page." > > > we could delay the calculation after we got a ref, but the traversal of pfn may slow down a little if hint a tail pfn, is it acceptable? > > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1786,15 +1786,6 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) >                 page = pfn_to_page(pfn); >                 folio = page_folio(page); > > -               /* > -                * No reference or lock is held on the folio, so it might > -                * be modified concurrently (e.g. split).  As such, > -                * folio_nr_pages() may read garbage.  This is fine as the outer > -                * loop will revisit the split folio later. > -                */ > -               if (folio_test_large(folio)) > -                       pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1; > - >                 /* >                  * HWPoison pages have elevated reference counts so the migration would >                  * fail on them. It also doesn't make any sense to migrate them in the > @@ -1807,6 +1798,8 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) >                                 folio_isolate_lru(folio); >                         if (folio_mapped(folio)) >                                 unmap_poisoned_folio(folio, TTU_IGNORE_MLOCK); > +                       if (folio_test_large(folio)) > +                               pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1; >                         continue; >                 } > > @@ -1823,6 +1816,9 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) >                                 dump_page(page, "isolation failed"); >                         } >                 } > + > +               if (folio_test_large(folio)) > +                       pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1; >  put_folio: >                 folio_put(folio); >         } > > >> >> do_migrate_range is called after start_isolate_page_range(). So a page might not be able to >> transform from a head page to a tail page as it's isolated? > start_isolate_page_range() is only isolate free pages, so maybe irrelevant. A page transform from a head page to a tail page should through the below steps: 1. The compound page is freed into buddy. 2. It's merged into larger order in buddy. 3. It's allocated as a larger order compound page. Since it is isolated, I think step 2 or 3 cannot happen. Or am I miss something? Thanks.