From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 944E7CD5BDD for ; Fri, 6 Sep 2024 05:42:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 08F926B007B; Fri, 6 Sep 2024 01:42:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 03EAC6B0082; Fri, 6 Sep 2024 01:42:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E6F7F6B0085; Fri, 6 Sep 2024 01:42:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id CA6FF6B007B for ; Fri, 6 Sep 2024 01:42:24 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 8A27D1A0EE6 for ; Fri, 6 Sep 2024 05:42:24 +0000 (UTC) X-FDA: 82533218208.19.6A46406 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf04.hostedemail.com (Postfix) with ESMTP id 8704C40006 for ; Fri, 6 Sep 2024 05:42:22 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=none; spf=pass (imf04.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725601245; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Eeik6VgrnvP/OZl/sbDKxnwxw0cOSBmvn/Ha3RCIks0=; b=A+PEWkMKp1AnIcRGbtr1R/sVZwdb5Ue3I38eJTUBZN1J6U9apWPL2QNR3fXvOtgq9bUHjy gYI/vWbbw0L8X7izSU5zVd3c4Z8TVoTCzcuYW69bYnkqs0yMG9SxWf72W3z+lP6zqBilQI yRmvS6yAPGhpERGqrfFUPAiC8g/Tadg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725601245; a=rsa-sha256; cv=none; b=sVZlPgfpVz7kP/5/r51s6LBAOBn1gc5AXEzB2LKgyaAvBznOOVywEr+WqrIklqTyxLZa1A mChGVqX3bG3bf7SLcQjhucI47jfV0wdgBhBFVUEzb0wdUaGTMyCbKBvWNzegNqTmRLe5We YqHyiKC+S2p/kNoBK950zvLH2FE2qws= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=none; spf=pass (imf04.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com; dmarc=pass (policy=none) header.from=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 665FCFEC; Thu, 5 Sep 2024 22:42:48 -0700 (PDT) Received: from [10.162.41.22] (e116581.arm.com [10.162.41.22]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 9BA253F73F; Thu, 5 Sep 2024 22:42:13 -0700 (PDT) Message-ID: <10aea3c3-42be-4f82-8961-75d5142a9653@arm.com> Date: Fri, 6 Sep 2024 11:12:10 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 1/2] mm: Abstract THP allocation To: Ryan Roberts , akpm@linux-foundation.org, david@redhat.com, willy@infradead.org, kirill.shutemov@linux.intel.com Cc: anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, jack@suse.cz, mark.rutland@arm.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, ioworker0@gmail.com, jglisse@google.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20240904100923.290042-1-dev.jain@arm.com> <20240904100923.290042-2-dev.jain@arm.com> <336ce914-43dc-4613-a339-1a33f16f71ad@arm.com> Content-Language: en-US From: Dev Jain In-Reply-To: <336ce914-43dc-4613-a339-1a33f16f71ad@arm.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 8704C40006 X-Stat-Signature: hcswwbnpudg11npnewoxgfcc8i485nux X-HE-Tag: 1725601342-232798 X-HE-Meta: U2FsdGVkX1/iVIfoFd/TI5+C8fn8VmdWKrKPYt0q4pFaMC0EUu9jTqKvjb9IhfJcQHi3TiexOcF3vMst8CofUbF+pftPCOJEwWpyY+AuKqXVLvW6THvadodIx6TwWfIAC8/rSoschXh1kOarOKwrSX6wkmSU0iLYe8ya4Gz82gYgS2ckWejy8dgr0HBFXABVsgazA9nrV33myE2BzrUnkBFuPia99qrP9pdx6cUXvazig+b1hHfdbcGPjqQAPeW10zVmOvUXTND2cmSntd8WLFNEOLWvY56wKKPI9Y/T9roKAQjGXus8EVXQBEL1EyiD/pjASzSKIdghx2iYtQlLDD314UbcwETr5Ial728dWLrobGt1qhXQnATkEW9Na6D6ng1yRXBEc6cWqe7yqkaHgk5SylaBqtu7odycC/tL+68mp82J+mar8XrdQ17LacDejilLHC0F/VJGIfGmvNhSGF6ikOBoRVkkFuhTvqtBQQNzgwAdZllCQJUFubw6biTfnl5aAbYo0xOPCEFs1SIu773no038El3aj7v9pHwPvgmQ5k58mX9Qr290YdifCmLg1IKeJMwwdso+dVToKbht9fW0dR0Un/2Yto5IP07v1T1UBk6tjyGTy/6BEDp5pkS36ZsNmxQxyljCjtYtSLYMWsHX0cdBWPK4lf/7PSTVcM7majzosLc17g0oIEIdGNp2ZP7NRySNBC/Wgkan+RKEot0xHcAioaM3G2noTEpzls3ZqHo7CbHd8Zb+j+DWEvINLOVmEExqOrwK1dnb7cfA0QAAx4tMHNvbUU3ZdbUTkLasPZ2+nSv1na/AJuLz5YTsQ6GEKrMPLsM0rLnwCvVaA5RyN02xr0NpPyZQZidf1QMm36p24bGm6bA2YizXNRqpNKkVr4hs9yZwkgjA4cvYFjCV96N2uzbxUkCtAGpn9LiW9WbvIgXcx+pNlnImxu94UpydT6GurNRlgJBFZwF qoLqHi4p AbJVjUFYFxKfpImaJm+p9GvVly743zV39XsYw9efS98h53Uwqe6OcUMUwf983uP8TONZ2jD1Qg8mr4rcGMfvsb4GAZMGW85ipTn4ORuazvFSCW9xpcbFFUj+IRDc+uBRGRfGirqtgxQvGQC+CfWc+TcpvKrCI1IYXzrZE5vOhQ6qzJi5VbiDvucODrgPTlt2G7aDIpCKuO+HoALc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 9/5/24 16:38, Ryan Roberts wrote: > On 04/09/2024 11:09, Dev Jain wrote: >> In preparation for the second patch, abstract away the THP allocation >> logic present in the create_huge_pmd() path, which corresponds to the >> faulting case when no page is present. >> >> There should be no functional change as a result of applying >> this patch. >> >> Signed-off-by: Dev Jain >> --- >> mm/huge_memory.c | 110 +++++++++++++++++++++++++++++------------------ >> 1 file changed, 67 insertions(+), 43 deletions(-) >> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >> index 67c86a5d64a6..58125fbcc532 100644 >> --- a/mm/huge_memory.c >> +++ b/mm/huge_memory.c >> @@ -943,47 +943,89 @@ unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr, >> } >> EXPORT_SYMBOL_GPL(thp_get_unmapped_area); >> >> -static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf, >> - struct page *page, gfp_t gfp) >> +static vm_fault_t thp_fault_alloc(gfp_t gfp, int order, struct vm_area_struct *vma, > Is there a reason for specifying order as a parameter? Previously it was > hardcoded to HPAGE_PMD_ORDER. But now, thp_fault_alloc() and > __thp_fault_success_stats() both take order and map_pmd_thp() is still > implicitly mapping a PMD-sized block. Unless there is a reason you need this > parameter in the next patch (I don't think there is?) I suggest simplifying. If I am not wrong, thp_fault_alloc() and __thp_fault_success_stats() will remain the same in case of mTHP? I chose to pass order so that these functions can be used by others in the future. Therefore, these two functions can be generically used in the future while map_pmd_thp() (as the name suggests) maps only a PMD-mappable THP. > >> + unsigned long haddr, struct folio **foliop, > FWIW, I agree with Kirill's suggestion to just return folio* and drop the output > param. As I replied to Kirill, I do not think that is a good idea. If you do a git grep on the tree for "foliop", you will find several places where that is being used, for example, check out alloc_charge_folio() in mm/khugepaged.c. The author intends to do the stat computation and setting *foliop in the function itself, so that the caller is only concerned with the return value. Also, if we return the folio instead of VM_FAULT_FALLBACK from thp_fault_alloc(), then you will have two "if (unlikely(!folio))" branches, the first to do the stat computation in the function, and the second branch in the caller to set ret = VM_FAULT_FALLBACK and then goto out/release. And, it is already inconsistent to break away the stat computation and the return value setting, when the stat computation name is of the form "count_vm_event(THP_FAULT_FALLBACK)" and friends, at which point it would just be better to open code the function. If I am missing something and your suggestion can be implemented neatly, please guide. > > Thanks, > Ryan > >> + unsigned long addr) >> { >> - struct vm_area_struct *vma = vmf->vma; >> - struct folio *folio = page_folio(page); >> - pgtable_t pgtable; >> - unsigned long haddr = vmf->address & HPAGE_PMD_MASK; >> - vm_fault_t ret = 0; >> + struct folio *folio = vma_alloc_folio(gfp, order, vma, haddr, true); >> >> - VM_BUG_ON_FOLIO(!folio_test_large(folio), folio); >> + *foliop = folio; >> + if (unlikely(!folio)) { >> + count_vm_event(THP_FAULT_FALLBACK); >> + count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK); >> + return VM_FAULT_FALLBACK; >> + } >> >> + VM_BUG_ON_FOLIO(!folio_test_large(folio), folio); >> if (mem_cgroup_charge(folio, vma->vm_mm, gfp)) { >> folio_put(folio); >> count_vm_event(THP_FAULT_FALLBACK); >> count_vm_event(THP_FAULT_FALLBACK_CHARGE); >> - count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_FALLBACK); >> - count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE); >> + count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK); >> + count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE); >> return VM_FAULT_FALLBACK; >> } >> folio_throttle_swaprate(folio, gfp); >> >> - pgtable = pte_alloc_one(vma->vm_mm); >> - if (unlikely(!pgtable)) { >> - ret = VM_FAULT_OOM; >> - goto release; >> - } >> - >> - folio_zero_user(folio, vmf->address); >> + folio_zero_user(folio, addr); >> /* >> * The memory barrier inside __folio_mark_uptodate makes sure that >> * folio_zero_user writes become visible before the set_pmd_at() >> * write. >> */ >> __folio_mark_uptodate(folio); >> + return 0; >> +} >> + >> +static void __thp_fault_success_stats(struct vm_area_struct *vma, int order) >> +{ >> + count_vm_event(THP_FAULT_ALLOC); >> + count_mthp_stat(order, MTHP_STAT_ANON_FAULT_ALLOC); >> + count_memcg_event_mm(vma->vm_mm, THP_FAULT_ALLOC); >> +} >> + >> +static void map_pmd_thp(struct folio *folio, struct vm_fault *vmf, >> + struct vm_area_struct *vma, unsigned long haddr, >> + pgtable_t pgtable) >> +{ >> + pmd_t entry; >> + >> + entry = mk_huge_pmd(&folio->page, vma->vm_page_prot); >> + entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); >> + folio_add_new_anon_rmap(folio, vma, haddr, RMAP_EXCLUSIVE); >> + folio_add_lru_vma(folio, vma); >> + pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable); >> + set_pmd_at(vma->vm_mm, haddr, vmf->pmd, entry); >> + update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); >> + add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR); >> + mm_inc_nr_ptes(vma->vm_mm); >> +} >> + >> +static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf) >> +{ >> + struct vm_area_struct *vma = vmf->vma; >> + struct folio *folio = NULL; >> + pgtable_t pgtable; >> + unsigned long haddr = vmf->address & HPAGE_PMD_MASK; >> + vm_fault_t ret = 0; >> + gfp_t gfp = vma_thp_gfp_mask(vma); >> + >> + pgtable = pte_alloc_one(vma->vm_mm); >> + if (unlikely(!pgtable)) { >> + ret = VM_FAULT_OOM; >> + goto release; >> + } >> + >> + ret = thp_fault_alloc(gfp, HPAGE_PMD_ORDER, vma, haddr, &folio, >> + vmf->address); >> + if (ret) >> + goto release; >> >> vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd); >> + >> if (unlikely(!pmd_none(*vmf->pmd))) { >> goto unlock_release; >> } else { >> - pmd_t entry; >> - >> ret = check_stable_address_space(vma->vm_mm); >> if (ret) >> goto unlock_release; >> @@ -997,20 +1039,9 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf, >> VM_BUG_ON(ret & VM_FAULT_FALLBACK); >> return ret; >> } >> - >> - entry = mk_huge_pmd(page, vma->vm_page_prot); >> - entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); >> - folio_add_new_anon_rmap(folio, vma, haddr, RMAP_EXCLUSIVE); >> - folio_add_lru_vma(folio, vma); >> - pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable); >> - set_pmd_at(vma->vm_mm, haddr, vmf->pmd, entry); >> - update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); >> - add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR); >> - mm_inc_nr_ptes(vma->vm_mm); >> + map_pmd_thp(folio, vmf, vma, haddr, pgtable); >> spin_unlock(vmf->ptl); >> - count_vm_event(THP_FAULT_ALLOC); >> - count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_ALLOC); >> - count_memcg_event_mm(vma->vm_mm, THP_FAULT_ALLOC); >> + __thp_fault_success_stats(vma, HPAGE_PMD_ORDER); >> } >> >> return 0; >> @@ -1019,7 +1050,8 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf, >> release: >> if (pgtable) >> pte_free(vma->vm_mm, pgtable); >> - folio_put(folio); >> + if (folio) >> + folio_put(folio); >> return ret; >> >> } >> @@ -1077,8 +1109,6 @@ static void set_huge_zero_folio(pgtable_t pgtable, struct mm_struct *mm, >> vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf) >> { >> struct vm_area_struct *vma = vmf->vma; >> - gfp_t gfp; >> - struct folio *folio; >> unsigned long haddr = vmf->address & HPAGE_PMD_MASK; >> vm_fault_t ret; >> >> @@ -1129,14 +1159,8 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf) >> } >> return ret; >> } >> - gfp = vma_thp_gfp_mask(vma); >> - folio = vma_alloc_folio(gfp, HPAGE_PMD_ORDER, vma, haddr, true); >> - if (unlikely(!folio)) { >> - count_vm_event(THP_FAULT_FALLBACK); >> - count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_FALLBACK); >> - return VM_FAULT_FALLBACK; >> - } >> - return __do_huge_pmd_anonymous_page(vmf, &folio->page, gfp); >> + >> + return __do_huge_pmd_anonymous_page(vmf); >> } >> >> static void insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,