From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D4688CD4F59 for ; Thu, 5 Sep 2024 08:45:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 001A16B014C; Thu, 5 Sep 2024 04:45:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ECB0D6B014F; Thu, 5 Sep 2024 04:45:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D939C6B0150; Thu, 5 Sep 2024 04:45:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id BB2CC6B014C for ; Thu, 5 Sep 2024 04:45:26 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 536B11209CD for ; Thu, 5 Sep 2024 08:45:26 +0000 (UTC) X-FDA: 82530050652.27.963609C Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf20.hostedemail.com (Postfix) with ESMTP id 4BCA11C0009 for ; Thu, 5 Sep 2024 08:45:24 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=none; spf=pass (imf20.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725525900; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=myDvsMRczRSGBnd1MPAeclzRhEPY7oNyqceS/P8OH98=; b=p+5hYiXhB3NGJyj8CDiZi9za2nccG3L5Xvx+rC6VdDaLyqIRTSFLLXiAZCfhDDyQ7iRHOo /L18zskFhU12CjXcTKYYqW8whtGd3B0B4pAgvCBUWw1cm/kMp+cuB6NwX33vqEQmR89/13 ueXSAXlFxqfBUuQJgin5A5e7Jy7re5o= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=none; spf=pass (imf20.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725525900; a=rsa-sha256; cv=none; b=h4bZ768Wjhu5Y6MdBni0vGFrTm3IhGIwvG1iIUhA4aWm3E05EoK1KUDX0SkvmN/bZBmnYE BGeD32crKyyT0UMj2jMikbLhrp3dLjAWYosDnxV0lwtiCocDDI1o29iHWF8edCy8pPZLDm E536ePsdHt8noGj2wn3YK07DllVJR8M= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id BC1D0FEC; Thu, 5 Sep 2024 01:45:49 -0700 (PDT) Received: from [10.162.43.13] (e116581.arm.com [10.162.43.13]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 2E1EB3F66E; Thu, 5 Sep 2024 01:45:14 -0700 (PDT) Message-ID: <0c417434-94a8-4b1c-b798-47da6ccf964f@arm.com> Date: Thu, 5 Sep 2024 14:15:11 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 1/2] mm: Abstract THP allocation To: "Kirill A. Shutemov" Cc: akpm@linux-foundation.org, david@redhat.com, willy@infradead.org, ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, jack@suse.cz, mark.rutland@arm.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, ioworker0@gmail.com, jglisse@google.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20240904100923.290042-1-dev.jain@arm.com> <20240904100923.290042-2-dev.jain@arm.com> Content-Language: en-US From: Dev Jain In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Stat-Signature: ikwcekpfc7rk1xbn8xani9gcpbukm3nk X-Rspamd-Queue-Id: 4BCA11C0009 X-Rspamd-Server: rspam11 X-HE-Tag: 1725525924-693873 X-HE-Meta: U2FsdGVkX18pGFe5NX30rLVeHKEpEf0N5weyKG5KMBOxMxvE0VPe0CLMW7pRihkqBc2beUrJMKboMzP1PzHsdIiilcotdhBVCMkHNgotivnHRg4oBTtCCw6rzQMNKvrm4RF+uOFPmgIzi9I1Mw5JufYMPvx0N5aRxafn1OEo6GGaZ2+zkUSW5UztuR3yehHtMSjIzQNotLacRhB9aW92yHP0ggwvKb1OoTys442oWGTNqpiqTozGkLjrI1Q05RwZG57cN69QHhFR+UXMosMJohF9/rP155N4SJJTiVgFUacuvwfuiCrCvRyyzniLjD5L9nESjGh3xX0zhHrQHwbgoONOyfglQIqF4gg5Nuc/ODvvds31b684i0MULJpyC69/JEA3yGpBwzDTIUv+TF12SSHNCrVmuy2H+8kX+QENMAEwMW8nDATcSmo8F4Uy56aNritKbzPdGXGOBgiUIulAsv/dbHrCgh9I8DFjvPOiicRgLikJjKDFzJrkANCGK3BnIzKmg+N8p/6LbZZKPHVLMwdp4Nb5kOqTjhbSd6qbceaTBzHUhcW6W4t6jJS1YueLQuBl6gADlhsdD363LR4Ks9thuLxTZNB/1CKA+M4IzKqQv894H5YAT1PyoHkC+6CVePGJHnmKbMkD8LyKhUNGftabyJ0u0K+82ZMs7tz/tc6l7PIircrZw0QnXkbnewyVanMStyVLKO1oeQWCxSrVi3S0qekmPD01+9F+3/etp76KnuKna0OLo5n6sz1rm30uZa0XbaUXE5P73rsmiwIRUFTPxs33ax39SbvAEUeLNT9J8jJI9zS+/DU7Hfouyn8jJeMdrVZXeiWkTLGVtmk050/xwKejDHXy/w7G/U2KxSIX28hzvrVOvTP6CGzKgPrQqGshD7aWd646dnaOzmfo03MkfeCr14dW52RRbTFtEZfisexxcqaTECBmKM8DMB5JVcp3ElFCQlMGPxvx1i+ oOysjdVC o7YjJrYmehbyh39eVARQ0ECW0RlHly3HXl9dru3l++XDqS9DNFANTgt/De1rE8yiTPSgtETh3AkR+Fb7u9NbgWgdQq3JzmFvs46Dy71wJHMMeBI95KgAosqMLw6gw+vOw/nj8wsGNZm8brX5dfS1aGqnXNfejFcWfLEX8OW5Tl2Zxf2hwiJ59ePQ+GCn2jmknAMK/DLIqsAYnN8o= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 9/5/24 13:50, Kirill A. Shutemov wrote: > On Wed, Sep 04, 2024 at 03:39:22PM +0530, Dev Jain wrote: >> In preparation for the second patch, abstract away the THP allocation >> logic present in the create_huge_pmd() path, which corresponds to the >> faulting case when no page is present. >> >> There should be no functional change as a result of applying >> this patch. >> >> Signed-off-by: Dev Jain >> --- >> mm/huge_memory.c | 110 +++++++++++++++++++++++++++++------------------ >> 1 file changed, 67 insertions(+), 43 deletions(-) >> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >> index 67c86a5d64a6..58125fbcc532 100644 >> --- a/mm/huge_memory.c >> +++ b/mm/huge_memory.c >> @@ -943,47 +943,89 @@ unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr, >> } >> EXPORT_SYMBOL_GPL(thp_get_unmapped_area); >> >> -static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf, >> - struct page *page, gfp_t gfp) >> +static vm_fault_t thp_fault_alloc(gfp_t gfp, int order, struct vm_area_struct *vma, >> + unsigned long haddr, struct folio **foliop, >> + unsigned long addr) > foliop is awkward. > > Why not return folio? NULL would indicate to the caller to fallback. I took inspiration from other call sites like mfill_atomic_pte_copy() which have a double pointer to the folio. If we do as you say, then in thp_fault_alloc(), we will have to do all the fallback stat computation, and return the folio, then check in the caller whether that is NULL, then set ret = VM_FAULT_FALLBACK, then goto out/release. Basically, stat computation for fallback, and the actual setting of return value to fallback would get separated. Currently, I can simultaneously set the folio pointer and the return value. > >> { >> - struct vm_area_struct *vma = vmf->vma; >> - struct folio *folio = page_folio(page); >> - pgtable_t pgtable; >> - unsigned long haddr = vmf->address & HPAGE_PMD_MASK; >> - vm_fault_t ret = 0; >> + struct folio *folio = vma_alloc_folio(gfp, order, vma, haddr, true); >> >> - VM_BUG_ON_FOLIO(!folio_test_large(folio), folio); >> + *foliop = folio; >> + if (unlikely(!folio)) { >> + count_vm_event(THP_FAULT_FALLBACK); >> + count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK); >> + return VM_FAULT_FALLBACK; >> + } >> >> + VM_BUG_ON_FOLIO(!folio_test_large(folio), folio); >> if (mem_cgroup_charge(folio, vma->vm_mm, gfp)) { >> folio_put(folio); >> count_vm_event(THP_FAULT_FALLBACK); >> count_vm_event(THP_FAULT_FALLBACK_CHARGE); >> - count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_FALLBACK); >> - count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE); >> + count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK); >> + count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE); >> return VM_FAULT_FALLBACK; >> } >> folio_throttle_swaprate(folio, gfp); >> >> - pgtable = pte_alloc_one(vma->vm_mm); >> - if (unlikely(!pgtable)) { >> - ret = VM_FAULT_OOM; >> - goto release; >> - } >> - >> - folio_zero_user(folio, vmf->address); >> + folio_zero_user(folio, addr); >> /* >> * The memory barrier inside __folio_mark_uptodate makes sure that >> * folio_zero_user writes become visible before the set_pmd_at() >> * write. >> */ >> __folio_mark_uptodate(folio); >> + return 0; >> +} >> + >> +static void __thp_fault_success_stats(struct vm_area_struct *vma, int order) >> +{ >> + count_vm_event(THP_FAULT_ALLOC); >> + count_mthp_stat(order, MTHP_STAT_ANON_FAULT_ALLOC); >> + count_memcg_event_mm(vma->vm_mm, THP_FAULT_ALLOC); >> +} >> + >> +static void map_pmd_thp(struct folio *folio, struct vm_fault *vmf, >> + struct vm_area_struct *vma, unsigned long haddr, >> + pgtable_t pgtable) >> +{ >> + pmd_t entry; >> + >> + entry = mk_huge_pmd(&folio->page, vma->vm_page_prot); >> + entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); >> + folio_add_new_anon_rmap(folio, vma, haddr, RMAP_EXCLUSIVE); >> + folio_add_lru_vma(folio, vma); >> + pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable); >> + set_pmd_at(vma->vm_mm, haddr, vmf->pmd, entry); >> + update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); >> + add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR); >> + mm_inc_nr_ptes(vma->vm_mm); >> +} >> + >> +static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf) >> +{ >> + struct vm_area_struct *vma = vmf->vma; >> + struct folio *folio = NULL; >> + pgtable_t pgtable; >> + unsigned long haddr = vmf->address & HPAGE_PMD_MASK; >> + vm_fault_t ret = 0; >> + gfp_t gfp = vma_thp_gfp_mask(vma); >> + >> + pgtable = pte_alloc_one(vma->vm_mm); >> + if (unlikely(!pgtable)) { >> + ret = VM_FAULT_OOM; >> + goto release; >> + } >> + >> + ret = thp_fault_alloc(gfp, HPAGE_PMD_ORDER, vma, haddr, &folio, >> + vmf->address); >> + if (ret) >> + goto release; > THP page allocation has higher probability to fail than pgtable allocation. It > is better to allocate it first, before pgtable and do less work on error > path. Thanks, makes sense.