From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 47C59CA0FFE for ; Fri, 30 Aug 2024 15:00:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B94ED6B0166; Fri, 30 Aug 2024 11:00:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B43896B0167; Fri, 30 Aug 2024 11:00:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A31EE6B0168; Fri, 30 Aug 2024 11:00:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 85E836B0166 for ; Fri, 30 Aug 2024 11:00:14 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id D756F1417E3 for ; Fri, 30 Aug 2024 15:00:13 +0000 (UTC) X-FDA: 82509222306.28.5EBC573 Received: from mout-p-103.mailbox.org (mout-p-103.mailbox.org [80.241.56.161]) by imf05.hostedemail.com (Postfix) with ESMTP id 461BF10001C for ; Fri, 30 Aug 2024 15:00:10 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=va46Gk3u; dmarc=pass (policy=quarantine) header.from=pankajraghav.com; spf=pass (imf05.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.161 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725029990; a=rsa-sha256; cv=none; b=3ObYgOCk49yO+h7XB1mvlNBcH+vqbk9vKoJrG6PeP33wPBARmqb1BB/60rC8zVHpzYn9Oq RPn2D7b1sDu4zDDzqPWusnh6qsgEaU1Nq7zaaWR2naj1PuEY6sMRx+d79gYaknVdF/3XPd zjAVnN+idWdlgbv/6V23+k4aVCFfInA= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=va46Gk3u; dmarc=pass (policy=quarantine) header.from=pankajraghav.com; spf=pass (imf05.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.161 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725029990; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=uSe2WIIADtHW0OxB1UXZsiUdfZiRjBZGocb5aTsJciI=; b=vB0nF9YRvpI7ZD/wTaqETOrmmLL738s4UK5dSRNsKL9D/UJVJxSm1hpd39FQeG1o71zL3q wCo/di8BIl5TakDkBiPD4FPydxZRln0Rho5CnGn9wUtuE9MBBBunYmm7NKTN39gWJZFG6B DYM+oO1R2w+OzA8TAVbVw+mK90qKbOI= Received: from smtp102.mailbox.org (smtp102.mailbox.org [10.196.197.102]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-103.mailbox.org (Postfix) with ESMTPS id 4WwLsy0nx7z9scM; Fri, 30 Aug 2024 17:00:06 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pankajraghav.com; s=MBO0001; t=1725030006; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uSe2WIIADtHW0OxB1UXZsiUdfZiRjBZGocb5aTsJciI=; b=va46Gk3u6PyxNBujpeH70lN5Ye/FJog7YLsIMawnr4bzneiKXMTARkqve4g9gFstPG2dNs DbM0qbI79lFdbTY/0vx2jIq9MkzJfV8Qt5d5k0GQWh2c+4I87EfgqipWKAEQErMysEMwt6 qlCh5hjYi8aFrJ0AFIfwdrxrAmiGyUfAuICDSjm+Wt63BeowiBNUoAhMuZ3UBJ/Tu39w7n 0uPigJg238xnS4tzMxkXRqugVN6Q3QBZT1UWR9j0S8RUmKEnM4q/+OIisVgz9DG+aM1roa hcGfMee4kQTPKgd5EoLs7i7DHw3tHEKxU1lQbctckc0QUlXhFFyqUQNyODNs0A== Message-ID: <2477a817-b482-43ed-9fd3-a7f8f948495f@pankajraghav.com> Date: Fri, 30 Aug 2024 16:59:57 +0200 MIME-Version: 1.0 Subject: Re: [PATCH v13 04/10] mm: split a folio in minimum folio order chunks To: Luis Chamberlain , Zi Yan Cc: Matthew Wilcox , Sven Schnelle , brauner@kernel.org, akpm@linux-foundation.org, chandan.babu@oracle.com, linux-fsdevel@vger.kernel.org, djwong@kernel.org, hare@suse.de, gost.dev@samsung.com, linux-xfs@vger.kernel.org, hch@lst.de, david@fromorbit.com, yang@os.amperecomputing.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, john.g.garry@oracle.com, cl@os.amperecomputing.com, p.raghav@samsung.com, ryan.roberts@arm.com, David Howells , linux-s390@vger.kernel.org References: <20240822135018.1931258-1-kernel@pankajraghav.com> <20240822135018.1931258-5-kernel@pankajraghav.com> <221FAE59-097C-4D31-A500-B09EDB07C285@nvidia.com> Content-Language: en-US From: Pankaj Raghav In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Queue-Id: 461BF10001C X-Rspamd-Server: rspam01 X-Stat-Signature: owd97qr7uadjoq88tamyy8tw3axkmgab X-HE-Tag: 1725030010-221770 X-HE-Meta: U2FsdGVkX18g0XVWwnqIALVSrbaBE+DrnxCrKj+1vnlA+/cOAEO7OvIHmo52ds8DyKe/bRiYej5fjg5i95YGuwsvnAQwwHetUTQOVnipv8vtGqz/Hgo+s/CDLd8fQjqNhAwXyvUu1++MQ7jrDGsguYtwAGFqez8HEX8zUd5quzEEwTU/uqXA7cuofj1yKbJBAZH04U3HUZdR4A6btfuLyeWKShQKRci86oDIsP6VagFuk0Smokxr6sBby1yh6JpSrt18T55neNlN3g8psHS+hDC7CcqTLewQekL0DZhPX8SKC4Dpux/+XnmDvdyJuEShiLisF0ls/xijDeczxM5JXTqp52gTtNSUQgRN2TjiJ4d+jUnQPGkeAxjSVYWibS7+qAAhFx71513Fr26gBf+L4oaCeXrUXieAIPPe2UkLK8urGEZq5sr5L5rSdednwsZ0X6PsPU2uAmbXRCdyp36pmta2QJwxuIJ0x1pvZzZZCKBIC0dkBjE7a6eWMEAt5BjXUnU017CuMMDH5wFfLdsAFfVSpHBUH9U/D0lcls7rcz28G1GFmDudtZ6izVsWo+z0bIzCqjhIGosgOa7vjmDKWHROjsvG1QaLUajbuYSyuK9pPFIKp8s6F1EZtgVGpSTcBDrimPMZ1tP+Kfqmto9IYhku6K4dQzmlvaiLaVeFE1J2XPSJWxgGEIzBQwtOFC9uGQmu8fmUxAHnNOoLdEj8diuDw+eQ+9WwFOYtXZ/NCXesb3jgyjrocI/MBs/OFWeFbTQZExqoYmk5nlmUGjaTQxbuH7mOIuEH/w2plGfAxJ4ZV+cbr37XXw62cF6iyt3kD9y2lASROBQ6N4AKhFMxvbH+5mPZSK5pJ8+r2ruRxqt80wR5C3UFLSaD8xP3JrRfkC8ppdnYblnqnMfS7dWZ087zJHXaviyGsTRi61MzeVcfxBdDqG7kBcEYCCS5G2+o7SVzraIEOuvK9T+bTSj 9FLHeo2S MpPWT2eouxUC0fSXx+lP6b1Cu1AjAOBT8jPH6qCWC43d/o6mHVXFgr0ZvMxSBPWFXGk206XLlzU1UhLwO5ehtBqrfqj2amO0np8cQVIAMmTsE0EmWmCeyP4/RvbkLKtwq4RpzPiEicqC+0nvLpwrqkJSuOJAvCmcSu85NXgrR1b9UwIVbAqHh7BxnX3UE5HikLr6O3K78zWe5TXyy+XO8eAilb+F4EsS1VvgUsX5aWGfeOS9RXCZjM6rpP0+ZiPYMfpeBaOhOW7/2U7Py35EL9wmq6Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 30/08/2024 01:41, Luis Chamberlain wrote: > On Thu, Aug 29, 2024 at 06:12:26PM -0400, Zi Yan wrote: >> The issue is that the change to split_huge_page() makes split_huge_page_to_list_to_order() >> unlocks the wrong subpage. split_huge_page() used to pass the “page” pointer >> to split_huge_page_to_list_to_order(), which keeps that “page” still locked. >> But this patch changes the “page” passed into split_huge_page_to_list_to_order() >> always to the head page. >> >> This fixes the crash on my x86 VM, but it can be improved: >> >> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h >> index 7c50aeed0522..eff5d2fb5d4e 100644 >> --- a/include/linux/huge_mm.h >> +++ b/include/linux/huge_mm.h >> @@ -320,10 +320,7 @@ bool can_split_folio(struct folio *folio, int *pextra_pins); >> int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, >> unsigned int new_order); >> int split_folio_to_list(struct folio *folio, struct list_head *list); >> -static inline int split_huge_page(struct page *page) >> -{ >> - return split_folio(page_folio(page)); >> -} >> +int split_huge_page(struct page *page); >> void deferred_split_folio(struct folio *folio); >> >> void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >> index c29af9451d92..4d723dab4336 100644 >> --- a/mm/huge_memory.c >> +++ b/mm/huge_memory.c >> @@ -3297,6 +3297,25 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, >> return ret; >> } >> >> +int split_huge_page(struct page *page) >> +{ >> + unsigned int min_order = 0; >> + struct folio *folio = page_folio(page); >> + >> + if (folio_test_anon(folio)) >> + goto out; >> + >> + if (!folio->mapping) { >> + if (folio_test_pmd_mappable(folio)) >> + count_vm_event(THP_SPLIT_PAGE_FAILED); >> + return -EBUSY; >> + } >> + >> + min_order = mapping_min_folio_order(folio->mapping); >> +out: >> + return split_huge_page_to_list_to_order(page, NULL, min_order); >> +} >> + >> int split_folio_to_list(struct folio *folio, struct list_head *list) >> { >> unsigned int min_order = 0; > > > Confirmed, and also although you suggest it can be improved, I thought > that we could do that by sharing more code and putting things in the > headers, the below also fixes this but tries to share more code, but > I think it is perhaps less easier to understand than your patch. > It feels a bit weird to pass both folio and the page in `split_page_folio_to_list()`. How about we extract the code that returns the min order so that we don't repeat. Something like this: diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index c275aa9cc105..d27febd5c639 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -331,10 +331,24 @@ unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long add bool can_split_folio(struct folio *folio, int caller_pins, int *pextra_pins); int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, unsigned int new_order); +int min_order_for_split(struct folio *folio); int split_folio_to_list(struct folio *folio, struct list_head *list); static inline int split_huge_page(struct page *page) { - return split_folio(page_folio(page)); + struct folio *folio = page_folio(page); + int ret = min_order_for_split(folio); + + if (ret) + return ret; + + /* + * split_huge_page() locks the page before splitting and + * expects the same page that has been split to be locked when + * returned. split_folio_to_list() cannot be used here because + * it converts the page to folio and passes the head page to be + * split. + */ + return split_huge_page_to_list_to_order(page, NULL, ret); } void deferred_split_folio(struct folio *folio); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 169f1a71c95d..b167e036d01b 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3529,12 +3529,10 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, return ret; } -int split_folio_to_list(struct folio *folio, struct list_head *list) +int min_order_for_split(struct folio *folio) { - unsigned int min_order = 0; - if (folio_test_anon(folio)) - goto out; + return 0; if (!folio->mapping) { if (folio_test_pmd_mappable(folio)) @@ -3542,10 +3540,17 @@ int split_folio_to_list(struct folio *folio, struct list_head *list) return -EBUSY; } - min_order = mapping_min_folio_order(folio->mapping); -out: - return split_huge_page_to_list_to_order(&folio->page, list, - min_order); + return mapping_min_folio_order(folio->mapping); +} + +int split_folio_to_list(struct folio *folio, struct list_head *list) +{ + int ret = min_order_for_split(folio); + + if (ret) + return ret; + + return split_huge_page_to_list_to_order(&folio->page, list, ret); } void __folio_undo_large_rmappable(struct folio *folio) > So I think your patch is cleaner and easier as a fix. > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > index c275aa9cc105..99cd9c7bf55b 100644 > --- a/include/linux/huge_mm.h > +++ b/include/linux/huge_mm.h > @@ -97,6 +97,7 @@ extern struct kobj_attribute thpsize_shmem_enabled_attr; > (!!thp_vma_allowable_orders(vma, vm_flags, tva_flags, BIT(order))) > > #define split_folio(f) split_folio_to_list(f, NULL) > +#define split_folio_to_list(f, list) split_page_folio_to_list(&f->page, f, list) > > #ifdef CONFIG_PGTABLE_HAS_HUGE_LEAVES > #define HPAGE_PMD_SHIFT PMD_SHIFT > @@ -331,10 +332,11 @@ unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long add > bool can_split_folio(struct folio *folio, int caller_pins, int *pextra_pins); > int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, > unsigned int new_order); > -int split_folio_to_list(struct folio *folio, struct list_head *list); > +int split_page_folio_to_list(struct page *page, struct folio *folio, > + struct list_head *list); > static inline int split_huge_page(struct page *page) > { > - return split_folio(page_folio(page)); > + return split_page_folio_to_list(page, page_folio(page), NULL); > } > void deferred_split_folio(struct folio *folio); > > @@ -511,7 +513,9 @@ static inline int split_huge_page(struct page *page) > return 0; > } > > -static inline int split_folio_to_list(struct folio *folio, struct list_head *list) > +static inline int split_page_folio_to_list(struct page *page, > + struct folio *folio, > + struct list_head *list) > { > return 0; > } > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 169f1a71c95d..b115bfe63b52 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -3529,7 +3529,8 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, > return ret; > } > > -int split_folio_to_list(struct folio *folio, struct list_head *list) > +int split_page_folio_to_list(struct page *page, struct folio *folio, > + struct list_head *list) > { > unsigned int min_order = 0; > > @@ -3544,8 +3545,7 @@ int split_folio_to_list(struct folio *folio, struct list_head *list) > > min_order = mapping_min_folio_order(folio->mapping); > out: > - return split_huge_page_to_list_to_order(&folio->page, list, > - min_order); > + return split_huge_page_to_list_to_order(page, list, min_order); > } > > void __folio_undo_large_rmappable(struct folio *folio)