From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D0E50CF6491 for ; Sun, 29 Sep 2024 02:05:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3FD688000A; Sat, 28 Sep 2024 22:05:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3852F80009; Sat, 28 Sep 2024 22:05:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1FF2D8000A; Sat, 28 Sep 2024 22:05:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id EC55380009 for ; Sat, 28 Sep 2024 22:05:14 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 9B95A141240 for ; Sun, 29 Sep 2024 02:05:14 +0000 (UTC) X-FDA: 82616133348.09.F52041D Received: from szxga06-in.huawei.com (szxga06-in.huawei.com [45.249.212.32]) by imf15.hostedemail.com (Postfix) with ESMTP id 74DA4A0004 for ; Sun, 29 Sep 2024 02:05:11 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf15.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.32 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727575412; a=rsa-sha256; cv=none; b=DeHjs6oZBrFxMP81mjUiJfNGgbe2XZ1/UtpprvFrq4lVMZfJe3YXxW2qttArNVs0aD7zqC eK/0MhBjtMWGV54fLjWo+zVM5/cR3OwOuJKt1aPhmUpBztc620YllTyovCN/AKtuJ0HR6F jxK+E/DzYjxgots6tSgF1RzQbrWjlvU= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf15.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.32 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727575412; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kMGU8mCCNJ79kXOyCyV3eFIofo6c3GMfTI+oUeM7Glg=; b=qQhBc4zPwhwxNMjeaLiiNiiPbd+qm+m3Cj0Lm3Wsx2FB8K4CD6Gh/FhwQ+hTF0aJWF15hi vo9orhWXe0CClbS6Q9awrFS09k8BH79V26PCC1aSHtIQeOMlTwRGTad/kmXWG9T85SwQCe /8UXE0NpEQuJ14MdeZbguliPnSAO6ZA= Received: from mail.maildlp.com (unknown [172.19.162.112]) by szxga06-in.huawei.com (SkyGuard) with ESMTP id 4XGSFQ5j8nz1ymXj; Sun, 29 Sep 2024 10:04:42 +0800 (CST) Received: from dggpemf100008.china.huawei.com (unknown [7.185.36.138]) by mail.maildlp.com (Postfix) with ESMTPS id 6C502140157; Sun, 29 Sep 2024 10:04:40 +0800 (CST) Received: from [10.174.177.243] (10.174.177.243) by dggpemf100008.china.huawei.com (7.185.36.138) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Sun, 29 Sep 2024 10:04:39 +0800 Message-ID: <841eb150-fac6-461e-808f-e6ae607c7d81@huawei.com> Date: Sun, 29 Sep 2024 10:04:38 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3 1/5] mm: memory_hotplug: remove head variable in do_migrate_range() To: Miaohe Lin , David Hildenbrand , Matthew Wilcox CC: Andrew Morton , Oscar Salvador , Naoya Horiguchi , , , Jonathan Cameron References: <20240827114728.3212578-1-wangkefeng.wang@huawei.com> <20240827114728.3212578-2-wangkefeng.wang@huawei.com> <20a75b57-12a6-468f-bd7c-0aeb2f259228@redhat.com> <170546a8-e442-91e9-31e8-60a91018172a@huawei.com> Content-Language: en-US From: Kefeng Wang In-Reply-To: <170546a8-e442-91e9-31e8-60a91018172a@huawei.com> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.177.243] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To dggpemf100008.china.huawei.com (7.185.36.138) X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 74DA4A0004 X-Stat-Signature: siiybon5nxyu5cb5t9zan976sr14f3z5 X-Rspam-User: X-HE-Tag: 1727575511-805318 X-HE-Meta: U2FsdGVkX1/iTi3GJhGz53UQL3HLjdbHPz1IQBoBuJ5Af3fOpHnq3T3k5mx/dbFiy/OzwWplWqGdf5kBWHoISt+EM+H4Tfby9UohABXGxYBFU5jJi+laklda6gWg1ETkQhGPTMpCwy24bqIfDaP5KWHPJ5nfy7zStvE+JkAIzZooBsd3EacDiRc6vcvOt2paOnfY8tNswMvNnwX3xmFXu7dsPUVqgVf7EJu6JqfDqcukKVOQ5htZWnoMmNweLcb+VgTYv7Q5Njh0T2SW46N471590FXNY3GmpNWGVz1op8rdu7gkQlMj+hsF5dGgchRbobh7I8gJiYlusYG6+NtuUb8ZOM1rYGdjSlv8N6XDSSIJzBsNgSwWsNFYtLKRIVOEiD1PuyYIUIWrjodNRKJMXzhmt/uvNsEJm2VO/3ckKwAPnRIhy4z7yWJGGUEU4uPSQZiE+3DMKCPIUHgzPczvs6J9XSqdWJU5isrGQzUk9M6ihIKk3ryiDxMXhURDDUOHVxekQP8yvyxwf7g9M4mhoRkWMyTJ2wMHaCFiV1N0i2PaJry5e6cg/04Tc0yF9Ip3PfyDCNkGs4fCU3GAUrP6QdgzQRVdN6nXVpAxM6iPyWyYlxEc2SD+SsSpyo0OmlS2+FsIXnb5ArIud92ZXI0+GpwQwnJRVroZyRR16N+g+rNWWjkFvvmB6SrWYGt6qG5MwrRYJZoyCbTEmmm/IBoef2pFOZIlvxlVmJ6Hr9YxXHg5WcS1abUBC/eZ+E8WNkJ83gkht3IsF+ut4Go6T5RkT7DdjnuaqrrYGTtH3jkEiqYF2k61RjqWCVvw/E/KYckQTnGCIQDdto86C65v2sWCwHSvDl/N412NT4OK4CULeeXqFV1JndYZFRy15NH6uBZmlknIoCbfrjDkkasJYV5opdZrCAZZxsMt7iydK8ZYwoSCVfcPHA93HJL1hHFqTkzooa944zL+gpM+LtJsh1v 8sfXKKmX Hw+BF1Q6bz58zmkxi1ffPtj4ytBHO+Yu2OOdYbpMU9TV3eOCf1jp0TBJzPZoyzkm362+w6sAPlAMutrwSB54d8Q2rPKQ3jBLQ86vA/UMeuEQ242niw99liaE+jHeCL5mLLBoaVaoDrb5+PnPUcc2tOjRee41LG+M3mIwcgpWLEzcM9/hzVH5640m4Ax+tstSrOWn/iaejckTMKISbne8+/PZ4roWYMt8LHSO5x8YsOnwpVPZYIcqpI2wQDL3IBawlXFZyeaaEP4sWEoVUCXFUPfLm3gpXhT28hpfJrV76b2z+mv7unAHy0ea8UuU9PtAoTBXhqDpvRMuU7PYohbDPwhTEYWx/Txsf7sDfMMRwSWKWdNY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/9/29 9:16, Miaohe Lin wrote: > On 2024/9/28 16:39, David Hildenbrand wrote: >> On 28.09.24 10:34, David Hildenbrand wrote: >>> On 28.09.24 06:55, Matthew Wilcox wrote: >>>> On Tue, Aug 27, 2024 at 07:47:24PM +0800, Kefeng Wang wrote: >>>>> Directly use a folio for HugeTLB and THP when calculate the next pfn, then >>>>> remove unused head variable. >>>> >>>> I just noticed this got merged.  You're going to hit BUG_ON with it. >>>> >>>>> -        if (PageHuge(page)) { >>>>> -            pfn = page_to_pfn(head) + compound_nr(head) - 1; >>>>> -            isolate_hugetlb(folio, &source); >>>>> -            continue; >>>>> -        } else if (PageTransHuge(page)) >>>>> -            pfn = page_to_pfn(head) + thp_nr_pages(page) - 1; >>>>> +        /* >>>>> +         * No reference or lock is held on the folio, so it might >>>>> +         * be modified concurrently (e.g. split).  As such, >>>>> +         * folio_nr_pages() may read garbage.  This is fine as the outer >>>>> +         * loop will revisit the split folio later. >>>>> +         */ >>>>> +        if (folio_test_large(folio)) { >>>> >>>> But it's not fine.  Look at the implementation of folio_test_large(): >>>> >>>> static inline bool folio_test_large(const struct folio *folio) >>>> { >>>>           return folio_test_head(folio); >>>> } >>>> >>>> That's going to be provided by: >>>> >>>> #define FOLIO_TEST_FLAG(name, page)                                     \ >>>> static __always_inline bool folio_test_##name(const struct folio *folio) \ >>>> { return test_bit(PG_##name, const_folio_flags(folio, page)); } >>>> >>>> and here's the BUG: >>>> >>>> static const unsigned long *const_folio_flags(const struct folio *folio, >>>>                   unsigned n) >>>> { >>>>           const struct page *page = &folio->page; >>>> >>>>           VM_BUG_ON_PGFLAGS(PageTail(page), page); >>>>           VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page->flags), page); >>>>           return &page[n].flags; >>>> } >>>> >>>> (this page can be transformed from a head page to a tail page because, >>>> as the comment notes, we don't hold a reference. >>>> >>>> Please back this out. >>> >>> Should we generalize the approach in dump_folio() to locally copy a >>> folio, so we can safely perform checks before deciding whether we want >>> to try grabbing a reference on the real folio (if it's still a folio :) )? >>> >> >> Oh, and I forgot: isn't the existing code already racy? >> >> PageTransHuge() -> VM_BUG_ON_PAGE(PageTail(page), page); Yes, in v1[1], I asked same question for existing code for PageTransHuge(page), "If the page is a tail page, we will BUG_ON(DEBUG_VM enabled) here, but it seems that we don't guarantee the page won't be a tail page." we could delay the calculation after we got a ref, but the traversal of pfn may slow down a little if hint a tail pfn, is it acceptable? --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1786,15 +1786,6 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) page = pfn_to_page(pfn); folio = page_folio(page); - /* - * No reference or lock is held on the folio, so it might - * be modified concurrently (e.g. split). As such, - * folio_nr_pages() may read garbage. This is fine as the outer - * loop will revisit the split folio later. - */ - if (folio_test_large(folio)) - pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1; - /* * HWPoison pages have elevated reference counts so the migration would * fail on them. It also doesn't make any sense to migrate them in the @@ -1807,6 +1798,8 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) folio_isolate_lru(folio); if (folio_mapped(folio)) unmap_poisoned_folio(folio, TTU_IGNORE_MLOCK); + if (folio_test_large(folio)) + pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1; continue; } @@ -1823,6 +1816,9 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) dump_page(page, "isolation failed"); } } + + if (folio_test_large(folio)) + pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1; put_folio: folio_put(folio); } > > do_migrate_range is called after start_isolate_page_range(). So a page might not be able to > transform from a head page to a tail page as it's isolated? start_isolate_page_range() is only isolate free pages, so maybe irrelevant. > > Thanks. > [1] https://lore.kernel.org/linux-mm/4e693aa6-d742-4fe7-bd97-3d375f96fcfa@huawei.com/