From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EE32CCD4F5B for ; Thu, 5 Sep 2024 08:21:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7954A6B00A3; Thu, 5 Sep 2024 04:21:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6F4B36B00ED; Thu, 5 Sep 2024 04:21:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 51FDF6B00CB; Thu, 5 Sep 2024 04:21:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 2BFC96B02B4 for ; Thu, 5 Sep 2024 04:21:14 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 7D3131C6378 for ; Thu, 5 Sep 2024 08:21:13 +0000 (UTC) X-FDA: 82529989626.16.03960E3 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.12]) by imf29.hostedemail.com (Postfix) with ESMTP id B3D5F120009 for ; Thu, 5 Sep 2024 08:21:10 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=cNMSQgAJ; spf=none (imf29.hostedemail.com: domain of kirill.shutemov@linux.intel.com has no SPF policy when checking 198.175.65.12) smtp.mailfrom=kirill.shutemov@linux.intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725524422; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=M4xrC02GqHCY8aje0sjiDvoAvHg7/YhFSomoa3n7BUQ=; b=wAbgrSNPRnJO8v6VTBTwD8RtAHc4MbaXCYRr7lMYV2Zn1r03sGS0meUTjJKGHgpB2HB1c3 rjTuRxmIaUnhD8d5LwtEIHEaWNYUlKlvZ+dDo+Ye/JTENFNja1eQDXj9ImHD64ns9jaX/N zEvk5hKXJ9a32KV5ZeSYOZbuiDo0CfU= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=cNMSQgAJ; spf=none (imf29.hostedemail.com: domain of kirill.shutemov@linux.intel.com has no SPF policy when checking 198.175.65.12) smtp.mailfrom=kirill.shutemov@linux.intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725524422; a=rsa-sha256; cv=none; b=d6M0E4eNMhjxQWwMQmgTaqJheuMrFwii0Bvo5ZVd8EQnBFQSYc1Ybw4l+nw6QHortyGS4e QOu0mIcUoUZSL4qBLQ08Wjjs4yxTgrCZqlFbAV2Ov89ZpePTt155GfH3BQ3qQSpusT2C6M iAcwahnbtDzhWOcjJP/ExHKyas+fvM4= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1725524471; x=1757060471; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=E2HxPBlqoipLkldSMcrE8bBtkv5H3xufdrJHsldotmA=; b=cNMSQgAJIHSZwVfM4bEINhz49KgOgZf6gkfWgZzCD/Meb7/WAsoE8Eui CoFR+7+kaoEwKkctnbqGYmMIJDlhyXC5U8jmQIPbWfGWFfPq+zkFnoUkN +DjkFmBqdTYFJE5DDpCFk4y6DedAnROoccDjyBDEYqvG33cozEH7l6SB2 oyhJ4md2Z6cz+F1wvjY+Cq1AxLwxKhT4+45PZKpcQl2Gs8esx2mfmwTJn UwOOOudkaVTwZpn0/Oui6U2yDF/UU5RIy+CS9YPLR6ab/sEMGAudxvVqe fLMwxmJE1JeYuO+EXzSCdrVMqi4yptfsKqRaOLN9KJIc4Jfai93ldtpEb Q==; X-CSE-ConnectionGUID: ooG+In0pQKOn7LAJ8Pxjug== X-CSE-MsgGUID: TCTP9bq/RH+eTeSTZQhRHQ== X-IronPort-AV: E=McAfee;i="6700,10204,11185"; a="35582899" X-IronPort-AV: E=Sophos;i="6.10,204,1719903600"; d="scan'208";a="35582899" Received: from fmviesa002.fm.intel.com ([10.60.135.142]) by orvoesa104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Sep 2024 01:20:50 -0700 X-CSE-ConnectionGUID: f7LIfq40TeKiMGOc9S6uXg== X-CSE-MsgGUID: 78q4W+MXTHS610/Oc2lwyw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,204,1719903600"; d="scan'208";a="88790593" Received: from black.fi.intel.com ([10.237.72.28]) by fmviesa002.fm.intel.com with ESMTP; 05 Sep 2024 01:20:44 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id DD3D6179; Thu, 05 Sep 2024 11:20:42 +0300 (EEST) Date: Thu, 5 Sep 2024 11:20:42 +0300 From: "Kirill A. Shutemov" To: Dev Jain Cc: akpm@linux-foundation.org, david@redhat.com, willy@infradead.org, ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, jack@suse.cz, mark.rutland@arm.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, ioworker0@gmail.com, jglisse@google.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v2 1/2] mm: Abstract THP allocation Message-ID: References: <20240904100923.290042-1-dev.jain@arm.com> <20240904100923.290042-2-dev.jain@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240904100923.290042-2-dev.jain@arm.com> X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: B3D5F120009 X-Stat-Signature: 3uiino9tr13rg99c5b81n8xz51ui4fax X-HE-Tag: 1725524470-49604 X-HE-Meta: U2FsdGVkX19h+PZh2Z5/KPdQv9ElqcMg9PO6Ns1qvGjv/JYLpCijqtl/Fb0FcX/sndOeuhmHUAn+xUpjJMQAw/hW1YmXCNK0KA+w/4te744QYSUY51eulD9YRQVHWIKf4TIXUF85J6gCvQmjVHcoJ5AHOVQ3pCTnrMjpuXV402FObUFZxyn5JB7Oava2z+BiZc7AAunX1I68KqHNHvAQN3xKgWAoq1WYtR9PGABvyiT8XFWaWKaFEpBFKuFRGqXYg+pZfQbe4wqZlsXcOWNtHNKdfoePPQvjAMopCpS88a5+3UcJSklG7eyExtTJR6UI9qdJqx4juBIN/B9Rdg4CY7YK2RQZL4Rx5t03iqr21/gVaBL20gvsck3/wsVXAjzj6SpeUW7SB3g+SR+2Hn9n77pDoGVtYX8Qn2jwVxbm+8pXD7iAkI8agzp7WKpj6/SBUT6sxrioFThUaZexvDyyrVfhouT0tA5uTCKosvXsIEy0piY+FTtq2LQxd3yM+sKXHdMsDl6CVEI22oHxbqOJvZGvuz4sZSwGrjtVpzLTxiyWM+kZVvEMZpYum+Mvsrsco7jQR4XEvmF7pB/tYmDNJ4G2d7hxdv4te0W2AYZlKdDThU/OTQpKKbQEwSAF/7jLN6TZI00DeAi9JRUxVDFmmvIj/nx38f8Q2tDszZNz4eHH/4/VXGynAN54W/ZRJC9KLY6R/ijjPCWDjj2vwHC1mNovME6nDdnGqBlKmKhf49xg26our5ATRHcznKEYvRhSikYyz9BBcZ88kM/BUCyCEsCdvFnDD6v4P+YaW78hdvwKNaCQ/qZbUnguhtYQMJUuT2z35PkvwrcFmEaSYTvfsvFdVd9WriRcNBAK+DJGZRBEfqzy3Plp+jilGJDtBNtGM0Ch5myZL5GI5VG0MARv3ZARvLpUmzuiQWyKcBOe0ZcXg3Ru//n7HZstrrjG2B3DYMMysyRsJpp5flu+YkM Exh4qkn6 WxlfIirmd8pFFOLgYO8m24hggX26XKEXSG+B38QsCcBOkC3yh4D5URc5H8Pi9RWlsyyjt9fiOSX573MTTPlBoXLXjQAEqrQhk4uamTXZwHihtYwt5XdrGoDYUCTp/513cWduon1axlB0TgueoIPLaAxa9w6iEp43Lzmo8T3+heG0DbZR2O+vteO8sTgkt5Q1XpVnVdxlmhixLr1NGqw2J5Cm8jwr4D0I+mezQHgOFpe3p56TOkMCuLLDOpg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Sep 04, 2024 at 03:39:22PM +0530, Dev Jain wrote: > In preparation for the second patch, abstract away the THP allocation > logic present in the create_huge_pmd() path, which corresponds to the > faulting case when no page is present. > > There should be no functional change as a result of applying > this patch. > > Signed-off-by: Dev Jain > --- > mm/huge_memory.c | 110 +++++++++++++++++++++++++++++------------------ > 1 file changed, 67 insertions(+), 43 deletions(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 67c86a5d64a6..58125fbcc532 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -943,47 +943,89 @@ unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr, > } > EXPORT_SYMBOL_GPL(thp_get_unmapped_area); > > -static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf, > - struct page *page, gfp_t gfp) > +static vm_fault_t thp_fault_alloc(gfp_t gfp, int order, struct vm_area_struct *vma, > + unsigned long haddr, struct folio **foliop, > + unsigned long addr) foliop is awkward. Why not return folio? NULL would indicate to the caller to fallback. > { > - struct vm_area_struct *vma = vmf->vma; > - struct folio *folio = page_folio(page); > - pgtable_t pgtable; > - unsigned long haddr = vmf->address & HPAGE_PMD_MASK; > - vm_fault_t ret = 0; > + struct folio *folio = vma_alloc_folio(gfp, order, vma, haddr, true); > > - VM_BUG_ON_FOLIO(!folio_test_large(folio), folio); > + *foliop = folio; > + if (unlikely(!folio)) { > + count_vm_event(THP_FAULT_FALLBACK); > + count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK); > + return VM_FAULT_FALLBACK; > + } > > + VM_BUG_ON_FOLIO(!folio_test_large(folio), folio); > if (mem_cgroup_charge(folio, vma->vm_mm, gfp)) { > folio_put(folio); > count_vm_event(THP_FAULT_FALLBACK); > count_vm_event(THP_FAULT_FALLBACK_CHARGE); > - count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_FALLBACK); > - count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE); > + count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK); > + count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE); > return VM_FAULT_FALLBACK; > } > folio_throttle_swaprate(folio, gfp); > > - pgtable = pte_alloc_one(vma->vm_mm); > - if (unlikely(!pgtable)) { > - ret = VM_FAULT_OOM; > - goto release; > - } > - > - folio_zero_user(folio, vmf->address); > + folio_zero_user(folio, addr); > /* > * The memory barrier inside __folio_mark_uptodate makes sure that > * folio_zero_user writes become visible before the set_pmd_at() > * write. > */ > __folio_mark_uptodate(folio); > + return 0; > +} > + > +static void __thp_fault_success_stats(struct vm_area_struct *vma, int order) > +{ > + count_vm_event(THP_FAULT_ALLOC); > + count_mthp_stat(order, MTHP_STAT_ANON_FAULT_ALLOC); > + count_memcg_event_mm(vma->vm_mm, THP_FAULT_ALLOC); > +} > + > +static void map_pmd_thp(struct folio *folio, struct vm_fault *vmf, > + struct vm_area_struct *vma, unsigned long haddr, > + pgtable_t pgtable) > +{ > + pmd_t entry; > + > + entry = mk_huge_pmd(&folio->page, vma->vm_page_prot); > + entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); > + folio_add_new_anon_rmap(folio, vma, haddr, RMAP_EXCLUSIVE); > + folio_add_lru_vma(folio, vma); > + pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable); > + set_pmd_at(vma->vm_mm, haddr, vmf->pmd, entry); > + update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); > + add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR); > + mm_inc_nr_ptes(vma->vm_mm); > +} > + > +static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf) > +{ > + struct vm_area_struct *vma = vmf->vma; > + struct folio *folio = NULL; > + pgtable_t pgtable; > + unsigned long haddr = vmf->address & HPAGE_PMD_MASK; > + vm_fault_t ret = 0; > + gfp_t gfp = vma_thp_gfp_mask(vma); > + > + pgtable = pte_alloc_one(vma->vm_mm); > + if (unlikely(!pgtable)) { > + ret = VM_FAULT_OOM; > + goto release; > + } > + > + ret = thp_fault_alloc(gfp, HPAGE_PMD_ORDER, vma, haddr, &folio, > + vmf->address); > + if (ret) > + goto release; THP page allocation has higher probability to fail than pgtable allocation. It is better to allocate it first, before pgtable and do less work on error path. > vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd); > + > if (unlikely(!pmd_none(*vmf->pmd))) { > goto unlock_release; > } else { > - pmd_t entry; > - > ret = check_stable_address_space(vma->vm_mm); > if (ret) > goto unlock_release; > @@ -997,20 +1039,9 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf, > VM_BUG_ON(ret & VM_FAULT_FALLBACK); > return ret; > } > - > - entry = mk_huge_pmd(page, vma->vm_page_prot); > - entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); > - folio_add_new_anon_rmap(folio, vma, haddr, RMAP_EXCLUSIVE); > - folio_add_lru_vma(folio, vma); > - pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable); > - set_pmd_at(vma->vm_mm, haddr, vmf->pmd, entry); > - update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); > - add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR); > - mm_inc_nr_ptes(vma->vm_mm); > + map_pmd_thp(folio, vmf, vma, haddr, pgtable); > spin_unlock(vmf->ptl); > - count_vm_event(THP_FAULT_ALLOC); > - count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_ALLOC); > - count_memcg_event_mm(vma->vm_mm, THP_FAULT_ALLOC); > + __thp_fault_success_stats(vma, HPAGE_PMD_ORDER); > } > > return 0; > @@ -1019,7 +1050,8 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf, > release: > if (pgtable) > pte_free(vma->vm_mm, pgtable); > - folio_put(folio); > + if (folio) > + folio_put(folio); > return ret; > > } > @@ -1077,8 +1109,6 @@ static void set_huge_zero_folio(pgtable_t pgtable, struct mm_struct *mm, > vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf) > { > struct vm_area_struct *vma = vmf->vma; > - gfp_t gfp; > - struct folio *folio; > unsigned long haddr = vmf->address & HPAGE_PMD_MASK; > vm_fault_t ret; > > @@ -1129,14 +1159,8 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf) > } > return ret; > } > - gfp = vma_thp_gfp_mask(vma); > - folio = vma_alloc_folio(gfp, HPAGE_PMD_ORDER, vma, haddr, true); > - if (unlikely(!folio)) { > - count_vm_event(THP_FAULT_FALLBACK); > - count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_FALLBACK); > - return VM_FAULT_FALLBACK; > - } > - return __do_huge_pmd_anonymous_page(vmf, &folio->page, gfp); > + > + return __do_huge_pmd_anonymous_page(vmf); > } > > static void insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr, > -- > 2.30.2 > -- Kiryl Shutsemau / Kirill A. Shutemov