From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 522ECC433EF for ; Wed, 23 Mar 2022 02:31:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B16256B0072; Tue, 22 Mar 2022 22:31:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AC5A36B0073; Tue, 22 Mar 2022 22:31:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 965B06B0074; Tue, 22 Mar 2022 22:31:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0140.hostedemail.com [216.40.44.140]) by kanga.kvack.org (Postfix) with ESMTP id 88EE16B0072 for ; Tue, 22 Mar 2022 22:31:12 -0400 (EDT) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 39378A5656 for ; Wed, 23 Mar 2022 02:31:12 +0000 (UTC) X-FDA: 79274073984.27.2AAC50A Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by imf25.hostedemail.com (Postfix) with ESMTP id E7857A001D for ; Wed, 23 Mar 2022 02:31:10 +0000 (UTC) Received: from canpemm500002.china.huawei.com (unknown [172.30.72.56]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4KNXPB0glbzfYxT; Wed, 23 Mar 2022 10:29:34 +0800 (CST) Received: from [10.174.177.76] (10.174.177.76) by canpemm500002.china.huawei.com (7.192.104.244) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Wed, 23 Mar 2022 10:31:06 +0800 Subject: Re: [RFC PATCH 3/5] mm: thp: split huge page to any lower order pages. To: Zi Yan CC: Roman Gushchin , Shuah Khan , Yang Shi , Hugh Dickins , "Kirill A . Shutemov" , , , , Matthew Wilcox , , Yu Zhao References: <20220321142128.2471199-1-zi.yan@sent.com> <20220321142128.2471199-4-zi.yan@sent.com> <165ec1a8-2b35-f6fb-82d3-b94613dd437a@huawei.com> From: Miaohe Lin Message-ID: Date: Wed, 23 Mar 2022 10:31:06 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Language: en-US X-Originating-IP: [10.174.177.76] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To canpemm500002.china.huawei.com (7.192.104.244) X-CFilter-Loop: Reflected X-Rspam-User: X-Stat-Signature: xb494duykyh3m9gtjnnna9ixpc5coao9 Authentication-Results: imf25.hostedemail.com; dkim=none; spf=pass (imf25.hostedemail.com: domain of linmiaohe@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=linmiaohe@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: E7857A001D X-HE-Tag: 1648002670-357892 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2022/3/22 22:30, Zi Yan wrote: > On 21 Mar 2022, at 23:21, Miaohe Lin wrote: >=20 >> On 2022/3/21 22:21, Zi Yan wrote: >>> From: Zi Yan >>> >>> To split a THP to any lower order pages, we need to reform THPs on >>> subpages at given order and add page refcount based on the new page >>> order. Also we need to reinitialize page_deferred_list after removing >>> the page from the split_queue, otherwise a subsequent split will see >>> list corruption when checking the page_deferred_list again. >>> >>> It has many uses, like minimizing the number of pages after >>> truncating a pagecache THP. For anonymous THPs, we can only split the= m >>> to order-0 like before until we add support for any size anonymous TH= Ps. >>> >>> Signed-off-by: Zi Yan >>> --- >>> include/linux/huge_mm.h | 8 +++ >>> mm/huge_memory.c | 111 ++++++++++++++++++++++++++++++--------= -- >>> 2 files changed, 91 insertions(+), 28 deletions(-) >>> >>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h >>> index 2999190adc22..c7153cd7e9e4 100644 >>> --- a/include/linux/huge_mm.h >>> +++ b/include/linux/huge_mm.h >>> @@ -186,6 +186,8 @@ void free_transhuge_page(struct page *page); >>> >>> bool can_split_folio(struct folio *folio, int *pextra_pins); >>> int split_huge_page_to_list(struct page *page, struct list_head *lis= t); >>> +int split_huge_page_to_list_to_order(struct page *page, struct list_= head *list, >>> + unsigned int new_order); >>> static inline int split_huge_page(struct page *page) >>> { >>> return split_huge_page_to_list(page, NULL); >>> @@ -355,6 +357,12 @@ split_huge_page_to_list(struct page *page, struc= t list_head *list) >>> { >>> return 0; >>> } >>> +static inline int >>> +split_huge_page_to_list_to_order(struct page *page, struct list_head= *list, >>> + unsigned int new_order) >>> +{ >>> + return 0; >>> +} >>> static inline int split_huge_page(struct page *page) >>> { >>> return 0; >>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >>> index fcfa46af6c4c..3617aa3ad0b1 100644 >>> --- a/mm/huge_memory.c >>> +++ b/mm/huge_memory.c >>> @@ -2236,11 +2236,13 @@ void vma_adjust_trans_huge(struct vm_area_str= uct *vma, >>> static void unmap_page(struct page *page) >>> { >>> struct folio *folio =3D page_folio(page); >>> - enum ttu_flags ttu_flags =3D TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD | >>> - TTU_SYNC; >>> + enum ttu_flags ttu_flags =3D TTU_RMAP_LOCKED | TTU_SYNC; >>> >>> VM_BUG_ON_PAGE(!PageHead(page), page); >>> >>> + if (folio_order(folio) >=3D HPAGE_PMD_ORDER) >>> + ttu_flags |=3D TTU_SPLIT_HUGE_PMD; >>> + >>> /* >>> * Anon pages need migration entries to preserve them, but file >>> * pages can simply be left unmapped, then faulted back on demand. >>> @@ -2254,9 +2256,9 @@ static void unmap_page(struct page *page) >>> VM_WARN_ON_ONCE_PAGE(page_mapped(page), page); >>> } >>> >>> -static void remap_page(struct folio *folio, unsigned long nr) >>> +static void remap_page(struct folio *folio, unsigned short nr) >>> { >>> - int i =3D 0; >>> + unsigned int i; >>> >>> /* If unmap_page() uses try_to_migrate() on file, remove this check= */ >>> if (!folio_test_anon(folio)) >>> @@ -2274,7 +2276,6 @@ static void lru_add_page_tail(struct page *head= , struct page *tail, >>> struct lruvec *lruvec, struct list_head *list) >>> { >>> VM_BUG_ON_PAGE(!PageHead(head), head); >>> - VM_BUG_ON_PAGE(PageCompound(tail), head); >>> VM_BUG_ON_PAGE(PageLRU(tail), head); >>> lockdep_assert_held(&lruvec->lru_lock); >>> >>> @@ -2295,9 +2296,10 @@ static void lru_add_page_tail(struct page *hea= d, struct page *tail, >>> } >>> >>> static void __split_huge_page_tail(struct page *head, int tail, >>> - struct lruvec *lruvec, struct list_head *list) >>> + struct lruvec *lruvec, struct list_head *list, unsigned int new_or= der) >>> { >>> struct page *page_tail =3D head + tail; >>> + unsigned long compound_head_flag =3D new_order ? (1L << PG_head) : = 0; >>> >>> VM_BUG_ON_PAGE(atomic_read(&page_tail->_mapcount) !=3D -1, page_tai= l); >>> >>> @@ -2321,6 +2323,7 @@ static void __split_huge_page_tail(struct page = *head, int tail, >>> #ifdef CONFIG_64BIT >>> (1L << PG_arch_2) | >>> #endif >>> + compound_head_flag | >>> (1L << PG_dirty))); >>> >>> /* ->mapping in first tail page is compound_mapcount */ >>> @@ -2329,7 +2332,10 @@ static void __split_huge_page_tail(struct page= *head, int tail, >>> page_tail->mapping =3D head->mapping; >>> page_tail->index =3D head->index + tail; >>> >>> - /* Page flags must be visible before we make the page non-compound.= */ >>> + /* >>> + * Page flags must be visible before we make the page non-compound = or >>> + * a compound page in new_order. >>> + */ >>> smp_wmb(); >>> >>> /* >>> @@ -2339,10 +2345,15 @@ static void __split_huge_page_tail(struct pag= e *head, int tail, >>> * which needs correct compound_head(). >>> */ >>> clear_compound_head(page_tail); >>> + if (new_order) { >>> + prep_compound_page(page_tail, new_order); >>> + prep_transhuge_page(page_tail); >>> + } >> >> Many thanks for your series. It looks really good. One question: >> IIUC, It seems there has assumption that LRU compound_pages should >> be PageTransHuge. So PageTransHuge just checks PageHead: >> >> static inline int PageTransHuge(struct page *page) >> { >> VM_BUG_ON_PAGE(PageTail(page), page); >> return PageHead(page); >> } >> >> So LRU pages with any order( > 0) will might be wrongly treated as THP= which >> has order =3D HPAGE_PMD_ORDER. We should ensure thp_nr_pages is used i= nstead of >> hard coded HPAGE_PMD_ORDER. >> >> Looks at the below code snippet: >> mm/mempolicy.c: >> static struct page *new_page(struct page *page, unsigned long start) >> { >> ... >> } else if (PageTransHuge(page)) { >> struct page *thp; >> >> thp =3D alloc_hugepage_vma(GFP_TRANSHUGE, vma, address, >> HPAGE_PMD_ORDER); >> ^^^^^^^^^^^^^^^^ >> if (!thp) >> return NULL; >> prep_transhuge_page(thp); >> return thp; >> } >> ... >> } >> >> HPAGE_PMD_ORDER is used instead of thp_nr_pages. So the lower order pa= ges might be >> used as if its order is HPAGE_PMD_ORDER. All of such usage might need = to be fixed. >> Or am I miss something ? >> >> Thanks again for your work. :) >=20 > THP will still only have HPAGE_PMD_ORDER and will not be split into any= order > other than 0. This series only allows to split huge page cache folio (a= dded by Matthew) > into any lower order. I have an explicit VM_BUG_ON() to ensure new_orde= r > is only 0 when non page cache page is the input. Since there is still n= on-trivial > amount of work to add any order THP support in the kernel. IIRC, Yu Zha= o (cc=E2=80=99d) was > planning to work on that. >=20 Many thanks for clarifying. I'm sorry but I haven't followed Matthew's pa= tches. I am wondering could huge page cache folio be treated as THP ? If so, how to e= nsure the correctness of huge page cache ? Thanks again! > Thanks for checking the patches. BTW: I like your patches. It's really interesting. :) >=20