From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06462CED618 for ; Wed, 9 Oct 2024 07:27:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3F5B46B00B1; Wed, 9 Oct 2024 03:27:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3A5C16B00B8; Wed, 9 Oct 2024 03:27:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 293F26B00D3; Wed, 9 Oct 2024 03:27:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 0C4BB6B00B1 for ; Wed, 9 Oct 2024 03:27:20 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id C9189120FAA for ; Wed, 9 Oct 2024 07:27:17 +0000 (UTC) X-FDA: 82653232998.03.5082511 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by imf10.hostedemail.com (Postfix) with ESMTP id 54C76C0015 for ; Wed, 9 Oct 2024 07:27:14 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=none; spf=pass (imf10.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1728458794; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=i01zztBXBYWWTHra3ne7+rXChpfgk1d95tHQZjaE7VU=; b=43FjqApdknwXH8xzhCk6WE4gcRa13cfYUVPYawpqJcIjkA3gbd3lEKAeqJPWzwHfqcmitR XNluRKcVpChZp2dKrl00HVVTkJb+19tXs5mBXfxkRZlrjXqqVScC+Ad+FNVo7Lp6jprhIt lpLN0ZDyMWgR+9jPsdFPWbZEvmF7mko= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=none; spf=pass (imf10.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1728458794; a=rsa-sha256; cv=none; b=U25WjlNe8rL8P0nIHhs8/F75bv/EVVYjjXuL5Kzbe6y+1q1fZG1fXz2qdohWE/mXN/VgdA vKlj9z274rMq4A8EYyJZlki2g8fnySl14w590AJoBVBxy6rl9TEFyS/BWrxBbGABbCLD0O Xg3AyrJsrjFyyxtHNvgJiA73ZNDT+3s= Received: from mail.maildlp.com (unknown [172.19.88.194]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4XNkt74d6jzfd3B; Wed, 9 Oct 2024 15:24:47 +0800 (CST) Received: from dggpemf100008.china.huawei.com (unknown [7.185.36.138]) by mail.maildlp.com (Postfix) with ESMTPS id 56AC3140123; Wed, 9 Oct 2024 15:27:11 +0800 (CST) Received: from [10.174.177.243] (10.174.177.243) by dggpemf100008.china.huawei.com (7.185.36.138) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Wed, 9 Oct 2024 15:27:10 +0800 Message-ID: <5ad0923e-7780-4147-b23d-637c8863141f@huawei.com> Date: Wed, 9 Oct 2024 15:27:10 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3 1/5] mm: memory_hotplug: remove head variable in do_migrate_range() To: David Hildenbrand , Miaohe Lin , Matthew Wilcox CC: Andrew Morton , Oscar Salvador , Naoya Horiguchi , , , Jonathan Cameron References: <20240827114728.3212578-1-wangkefeng.wang@huawei.com> <20240827114728.3212578-2-wangkefeng.wang@huawei.com> <20a75b57-12a6-468f-bd7c-0aeb2f259228@redhat.com> <170546a8-e442-91e9-31e8-60a91018172a@huawei.com> <841eb150-fac6-461e-808f-e6ae607c7d81@huawei.com> <986bca05-7fd9-49ac-9129-934f31c28af6@huawei.com> Content-Language: en-US From: Kefeng Wang In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.177.243] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To dggpemf100008.china.huawei.com (7.185.36.138) X-Rspam-User: X-Stat-Signature: dd933fok7n9wj53sxhzcjtb3qdzsf7an X-Rspamd-Queue-Id: 54C76C0015 X-Rspamd-Server: rspam11 X-HE-Tag: 1728458834-475584 X-HE-Meta: U2FsdGVkX19ssSCAmLdzsrdzjolU66TfHUYxh5m9i7xZ42KQsr1ybkVQZA4uj+gtVVm2tdng31Aejo0bryZdBbUgeB7CwI0rPpCyO+rIV5vaTMdkV7XyJ/24WIH4EQ/2ZABo2Vwj0S8edOv5mR55L7bJHoPz67Noy7t/cZX+L6KY2TNJ7qYsqduWkJq9e0spxpYgvTpWYwYU9xKFq6JMiRRwiIIWIa9hzb3srBVjtDfRmf4E7wr1293hJ4VVomHqM7LSRPkaC+SaAtTBFYbCGtAwBb9ENvxhh7mqiad5bHAwOgnG5Lm/HgQRuLpjAazec6pTcKSCLlhlhoyjXhtuY5hMPol8lqCvrJ+ZoTG+9ES7+PhXTGcXrEG/emn19VprsL5z/o/F4QiTuXcaFYkFTKihoBZS6eIf26I2JW6V7H/8L5h/PNsDZRje33ujRBT2oRf8BoP6gZFhwr4g7nGE9m1u0GvruzulFbVV4SYeKsVKt9kNJEuwV8iQxf+x2jq2hXYU3+Xkh9IKgXpuN6WSGyHo53Y5gHCTk1bv1Uhaw7wLM6rq3AYHBZWTRaGivMQFVXkKlQkHmWXqlbSzshcYO7KjacyojffY4UHvQIxjRCSlPAkNCsWhjk7rMnjoHjiUODmv1vtaIB3z6/DvMPX2Wz+LXkbYLvEJIQIxFukKN98NKBHDzMIjJj3DM2hVRfC83EVfWeeIsxd2MiVTODMAtPEKHxKBLCH/SQ3HCD/3cHHMSTJbcOcpfqF0vG/zkHYuo8TwexiqyERaHuxRSPaPHztmIht66MxWzb82uqguUwEkcp74wSjmIhnZLKy6qIZCQXYV6887PsZWAqm55lZyjy6cA6c2SPXor0CghabuebmSO+iCPfRIvaQbRZ9eWFb8iOZXiT1HAYD4HJL+2jeI8fAEFElZ63HBqk9ieiQBuHxzWOdEHpwn83CmShJVOQXGPwQ/X1V1f9rTdVoxuTW YpMkTWzL QF2kTTy/pyo6fhSF+BYH2k1rnVlDJtg6JhWS+B9V3QAQ7nytq+KIAtPCVn1+8t5tmwTKPC4ICaveYNirIyTB1nmr01/zxmh4k8vxa0R7SvaumTU+0flYqKqTMXq0Bjtfdzrpyi6vSz92vaqe/9FFsrnnc9VuNN7M+Oj2JOOpdKvxedKx3F5WnEFiRhTXEC8itY09p5Qrq5pwVvttQTYPiQwvgLuTSLenx9GQD+Q2oYi2urCDHzJVDdvU2oy9J6TEkmet/eyrHbuZy0jtZRXhDRsAfSthhNnpMWQG1aBHA0aQ9XZEM2BYx+IYX1L6Nac3uYhOC X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/9/30 17:25, David Hildenbrand wrote: > On 29.09.24 04:19, Miaohe Lin wrote: >> On 2024/9/29 10:04, Kefeng Wang wrote: >>> >>> >>> On 2024/9/29 9:16, Miaohe Lin wrote: >>>> On 2024/9/28 16:39, David Hildenbrand wrote: >>>>> On 28.09.24 10:34, David Hildenbrand wrote: >>>>>> On 28.09.24 06:55, Matthew Wilcox wrote: >>>>>>> On Tue, Aug 27, 2024 at 07:47:24PM +0800, Kefeng Wang wrote: >>>>>>>> Directly use a folio for HugeTLB and THP when calculate the next >>>>>>>> pfn, then >>>>>>>> remove unused head variable. >>>>>>> >>>>>>> I just noticed this got merged.  You're going to hit BUG_ON with it. >>>>>>> >>>>>>>> -        if (PageHuge(page)) { >>>>>>>> -            pfn = page_to_pfn(head) + compound_nr(head) - 1; >>>>>>>> -            isolate_hugetlb(folio, &source); >>>>>>>> -            continue; >>>>>>>> -        } else if (PageTransHuge(page)) >>>>>>>> -            pfn = page_to_pfn(head) + thp_nr_pages(page) - 1; >>>>>>>> +        /* >>>>>>>> +         * No reference or lock is held on the folio, so it might >>>>>>>> +         * be modified concurrently (e.g. split).  As such, >>>>>>>> +         * folio_nr_pages() may read garbage.  This is fine as >>>>>>>> the outer >>>>>>>> +         * loop will revisit the split folio later. >>>>>>>> +         */ >>>>>>>> +        if (folio_test_large(folio)) { >>>>>>> >>>>>>> But it's not fine.  Look at the implementation of >>>>>>> folio_test_large(): >>>>>>> >>>>>>> static inline bool folio_test_large(const struct folio *folio) >>>>>>> { >>>>>>>             return folio_test_head(folio); >>>>>>> } >>>>>>> >>>>>>> That's going to be provided by: >>>>>>> >>>>>>> #define FOLIO_TEST_FLAG(name, >>>>>>> page)                                     \ >>>>>>> static __always_inline bool folio_test_##name(const struct folio >>>>>>> *folio) \ >>>>>>> { return test_bit(PG_##name, const_folio_flags(folio, page)); } >>>>>>> >>>>>>> and here's the BUG: >>>>>>> >>>>>>> static const unsigned long *const_folio_flags(const struct folio >>>>>>> *folio, >>>>>>>                     unsigned n) >>>>>>> { >>>>>>>             const struct page *page = &folio->page; >>>>>>> >>>>>>>             VM_BUG_ON_PGFLAGS(PageTail(page), page); >>>>>>>             VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page- >>>>>>> >flags), page); >>>>>>>             return &page[n].flags; >>>>>>> } >>>>>>> >>>>>>> (this page can be transformed from a head page to a tail page >>>>>>> because, >>>>>>> as the comment notes, we don't hold a reference. >>>>>>> >>>>>>> Please back this out. >>>>>> >>>>>> Should we generalize the approach in dump_folio() to locally copy a >>>>>> folio, so we can safely perform checks before deciding whether we >>>>>> want >>>>>> to try grabbing a reference on the real folio (if it's still a >>>>>> folio :) )? >>>>>> >>>>> >>>>> Oh, and I forgot: isn't the existing code already racy? >>>>> >>>>> PageTransHuge() -> VM_BUG_ON_PAGE(PageTail(page), page); >>> >>> Yes, in v1[1], I asked same question for existing code for >>> PageTransHuge(page), >>> >>>    "If the page is a tail page, we will BUG_ON(DEBUG_VM enabled) here, >>>     but it seems that we don't guarantee the page won't be a tail page." >>> >>> ... >>>> >>>> do_migrate_range is called after start_isolate_page_range(). So a >>>> page might not be able to >>>> transform from a head page to a tail page as it's isolated? >>> start_isolate_page_range() is only isolate free pages, so maybe >>> irrelevant. >> >> A page transform from a head page to a tail page should through the >> below steps: >> 1. The compound page is freed into buddy. >> 2. It's merged into larger order in buddy. >> 3. It's allocated as a larger order compound page. >> >> Since it is isolated, I think step 2 or 3 cannot happen. Or am I miss >> something? > > By isolated, you mean that the pageblock is isolated, and all free pages > are in the MIGRATE_ISOLATE buddy list. Nice observation. > > Indeed, a tail page could become a head page (concurrent split is > possible), but a head page should not become a tail for the reason you > mention. > > Even mm/page_reporting.c will skip isolated pageblocks. > > I wonder if there are some corner cases, but nothing comes to mind that > would perform compound allocations from the MIGRATE_ISOLATE list. > As David/Miaohe explanation, it seems that we can't hint the VM_BUG_ON, so no more changes, thanks for all the review and comments.