From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 46090C3ABDA for ; Wed, 14 May 2025 23:44:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A82AD6B000A; Wed, 14 May 2025 19:43:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A04116B00DA; Wed, 14 May 2025 19:43:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8080A6B00DC; Wed, 14 May 2025 19:43:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 535CB6B000A for ; Wed, 14 May 2025 19:43:34 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 9798AE2C9E for ; Wed, 14 May 2025 23:43:35 +0000 (UTC) X-FDA: 83443142790.17.6EB26C3 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) by imf26.hostedemail.com (Postfix) with ESMTP id C88D614000F for ; Wed, 14 May 2025 23:43:33 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=mm1SlZfu; spf=pass (imf26.hostedemail.com: domain of 3pColaAsKCNU13B5IC5PKE77FF7C5.3FDC9ELO-DDBM13B.FI7@flex--ackerleytng.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3pColaAsKCNU13B5IC5PKE77FF7C5.3FDC9ELO-DDBM13B.FI7@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747266213; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wlApZPNMFWcR7jIJ9KHUoWOXBlzS7oZ5fyFUsGvGlHI=; b=PSj7/oRTWqecm9NO4j8mzic6RD3mmCKN8aUn9LFroK4xIoLXNvhodsM0K69mYhM8TSoubz kDlzOciqJbaRGA3t1qX++7xjntqdariL88u+H2RR3AI3Bi6ljnkdc01ef8n5KRSypD9wxB B9TaB5z8khXRQjf9BLy6ZfisVb97y50= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747266213; a=rsa-sha256; cv=none; b=WH9cfH6sEtAJMhACj0OY+eTQrtzB6hnSr1J6e+WStUT7CHfp15IshggqnnyJKswmLeRT+w BzjMFwfslYV7UPrNUMp10Mt5M/aBQfrwvqxhE6ciIPxZEPJXmlHPHN8Taau7gomx4dm2Iz A97yAD8h0VcC5SIxL6rnDyOGPUbkIOY= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=mm1SlZfu; spf=pass (imf26.hostedemail.com: domain of 3pColaAsKCNU13B5IC5PKE77FF7C5.3FDC9ELO-DDBM13B.FI7@flex--ackerleytng.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3pColaAsKCNU13B5IC5PKE77FF7C5.3FDC9ELO-DDBM13B.FI7@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-30ad109bc89so371404a91.0 for ; Wed, 14 May 2025 16:43:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266213; x=1747871013; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=wlApZPNMFWcR7jIJ9KHUoWOXBlzS7oZ5fyFUsGvGlHI=; b=mm1SlZfuASe38dKp8mpXyW6sdn5ePPrba2F1oE3G8RhdZmASo7iVoMEQld6Ul3s9vh py+cMuRe4gkWuygsbcDTOgqaPaEsPl05BwQ48ASJliA+cQWMlTRW/3iwPeUjs/nRksN+ Shnpau2OPYt7dcPf90xbXovlDdiDiGMFIlBRwTVomQP6OFgoSNjw6w/4MCykHWShhqcU gP7KZTJ4i7mAnb6KLPST6R3kXIvLFctDiXL8AALpQz14RPp7c7No88JdQGVEGsOnj8l9 K4ugWYwNUekNXoz1HFYvx+wyEi6kaQQxn307GdODGe2jE/yBZB6V2JIfk+MHPqDpoDuM GPzw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266213; x=1747871013; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=wlApZPNMFWcR7jIJ9KHUoWOXBlzS7oZ5fyFUsGvGlHI=; b=QJirmOA202GscIWRe9jPwPBliib2fwD3ig+dMQ0/0q0xHaP/WuX1aQ1QEm3PLwjW5N 8OqzjQy/LULS9qB5TW6txYdnyGBkaiHWcwu+DXVpXIhSywaOfV56eygK+/LLyQBjevCh xO9ECR69emVtB9zwOwH775W33h2Ezq4ckl8GEkbMo60+q1XKbsASZGqMt2j2wEau0ZgO 4P1fYA5b6TLqqlfemfJLWxlKEDnjkWRnjU281AzCxumOeqjDAko7QOkorX+7mGWjXj1z rkkg9Ll4LrupMGPGbEm35otUiuwfXSyEe8qHDoS9qzmwh3lgYejr7g6vJ+WAyDiYgA0e vYyA== X-Forwarded-Encrypted: i=1; AJvYcCWhkoC6uu03KKhjVhT+5MrlVaXRwkPMmq/+BrzOBUHiJWRBQncm7nsJopQiKgI1qPQI28ENrdxH7g==@kvack.org X-Gm-Message-State: AOJu0Yylk0GeR4BhVNHeeKJOq6VenXIL5X2JeFOxgBtlDEef1xbtRx+W MzOWL0ZMoy3qv/B/ZQMs4B1fs0OAPxDDe1PmDxtOyO9siXtWpaw9KFpTJXhn+/Mi6/f6z2dj1sX hQahtMLtVMszFJkd6JDoo8g== X-Google-Smtp-Source: AGHT+IGpGoT229pDQNLOtvYT1MW+Z5D28EV3QKdaocKPzRw0I86J+skBPLzQ0QgVUYX98zFangxne/p4We6UZ3/4Ig== X-Received: from pjbpw8.prod.google.com ([2002:a17:90b:2788:b0:301:1ea9:63b0]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:57c4:b0:2fa:15ab:4de7 with SMTP id 98e67ed59e1d1-30e51589dbamr900643a91.12.1747266212707; Wed, 14 May 2025 16:43:32 -0700 (PDT) Date: Wed, 14 May 2025 16:42:02 -0700 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: Subject: [RFC PATCH v2 23/51] mm: hugetlb: Refactor out hugetlb_alloc_folio() From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: C88D614000F X-Stat-Signature: tji1cg4ucjri1ws53hgogqd6fmqs7dct X-Rspam-User: X-HE-Tag: 1747266213-289210 X-HE-Meta: U2FsdGVkX1/KMKh5NtsiJleYa4lhjuBq/yjx8hmoYb1K5slcMVs3FrhGEE8L0XgLrMdcS+nWIDlmdz+cGVt4Nc86U//nn6c/xTeratGRdMP2eAOvdG+SHS7POMlEIl0Fgw02cbwNPvUaduHk/APs0aL/7cY8mY9FMbsP/rKSwlIos1emzORy2+Axcbdd2VjehjxZvsJNVv/SdrrfYPBRsYnvFqmdsAbsJwaPcaGRj0cvNfpEHOusdbCMRRz6soH0xr1A9PTGUGwauxmbUzEIROrX1g5ionpWpPDq5OIVuEixl+9vs50pydckg7yGXmwt8U9vFlnF9lRNUfbt/u0TZUwzjcPtt4HyC2+j4DYjcbssUjfOYMfT3ef1/yIxZEwxlZVebgAW4ECQrB2IEZy19ViJlvxJPqGU3AEFG8CGEJ/pgjoRM22QZXSLH10nh06cJNLsCLfO7XbmEirFjO0BDCN59Zjohuk8HvB9FCgbQkmtIMFgDp1+51K6BhW0JWyLvnN8jB3/Od2OPxxzjAMJbPKJIr5PNHbHW3XQkrwxNFfGK4xgv3VPmXFX0aL6ER8at6XsNSWK36TweVR53JeetwVfJmxnVqxxgMmd7g2zZynEwpdbOwzYGatk1AZXTG0L6U3hsGT7cP7IC6HGfZM+kMxjt/ztzhP+OvmP7F4QzHkLQXmqpY7jsYKgL9xKBR+j3ppmz7ZJTOTSEixAleLRwPpxGudANMJMOjb4gpdeuUN1uGwda10tnZ9TPSozX8WjJEbINlzr8GzJ8NaV5l/8mAFSkiWsjSyDmO8mXKDcBw+HxE7j1IZtZhSOVU5NDA4xTtMndurFlC8C2rtiFVlmKdSuihaz+os6Xz7g8wE5r2RIVSqdCIUATRjK7ah4qsCKSxEcvZaSEsqsGJ8OwTRQYLeggZd90TqXfQWjc6LtDyhN4Q8qVkbHJd5U0jLNTrW5fCY3FOclFTdYukCpxMs s0YHxX1u pyy5RFWAIeetTCOQ60hFkBlG3JY8wfWVLhGO3H3/R9xCJfYsIRZ0E4AQQmBEJwoGgjRZoQGlaKnLTETywEVKvfZg4qkamU/UfMBiVP5Gga711R60w+n4jgskrTHpcQpMbSjfDf4qbOAZe2s1hPGnq/Iav/P6ugSl9y7tbSRaT2Nw1yTROtk6Myc3BKFr9gxWFtkOyD3RdDsCMVQQnms27knZwRYaSYSzWhd6Cu0bXITGc7zRGwXlHT45DXKl29kZaQQQITQKCJn4ZiwatswE0BWYxktGozi3QBtVb6kHARqIq44PwandbqmEfPuP/2n2Dlp1uu+h795WLHOYeneL5FUFerFtqPuCHYahB0qCJ35YdvmrxCi9GQYWeYYX/eNKDnMsM2SU428EhqhhwqeQmQT2ZOH2ErxkdzYc1gnCBdBIG3HgOL0UssbhFBE2qFeePUneO3F2eV1aDsSDsH+AzHcmivZ2GIaJeVdnikqkd2Ghp2nPmFpjGFLfyLdIulbQta5blCW+/agarJhtG0PUoUcZISHzxKj2yp0rmiplbuwx3XWAPrDq7S24XkVyXhdqw/IhyFJI75Me2Vv9+pF/J+vhCvCSDvfgXjqgYW1TIHHc2/jDdKgv7jsIlGC+rQ4yHsOmRLYW7DhMdkFRdRLBYre748ECEmtiPwEHxDPZnRfkApQbJCMewUoBQgwbPx7lOsag/ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Refactor out hugetlb_alloc_folio() from alloc_hugetlb_folio(), which handles allocation of a folio and cgroup charging. Other than flags to control charging in the allocation process, hugetlb_alloc_folio() also has parameters for memory policy. This refactoring as a whole decouples the hugetlb page allocation from hugetlbfs, (1) where the subpool is stored at the fs mount, (2) reservations are made during mmap and stored in the vma, and (3) mpol must be stored at vma->vm_policy (4) a vma must be used for allocation even if the pages are not meant to be used by host process. This decoupling will allow hugetlb_alloc_folio() to be used by guest_memfd in later patches. In guest_memfd, (1) a subpool is created per-fd and is stored on the inode, (2) no vma-related reservations are used (3) mpol may not be associated with a vma since (4) for private pages, the pages will not be mappable to userspace and hence have to associated vmas. This could hopefully also open hugetlb up as a more generic source of hugetlb pages that are not bound to hugetlbfs, with the complexities of userspace/mmap/vma-related reservations contained just to hugetlbfs. Signed-off-by: Ackerley Tng Change-Id: I60528f246341268acbf0ed5de7752ae2cacbef93 --- include/linux/hugetlb.h | 12 +++ mm/hugetlb.c | 192 ++++++++++++++++++++++------------------ 2 files changed, 118 insertions(+), 86 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 8f3ac832ee7f..8ba941d88956 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -698,6 +698,9 @@ bool hugetlb_bootmem_page_zones_valid(int nid, struct huge_bootmem_page *m); int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list); int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn); void wait_for_freed_hugetlb_folios(void); +struct folio *hugetlb_alloc_folio(struct hstate *h, struct mempolicy *mpol, + pgoff_t ilx, bool charge_cgroup_rsvd, + bool use_existing_reservation); struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, unsigned long addr, bool cow_from_owner); struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid, @@ -1099,6 +1102,15 @@ static inline void wait_for_freed_hugetlb_folios(void) { } +static inline struct folio *hugetlb_alloc_folio(struct hstate *h, + struct mempolicy *mpol, + pgoff_t ilx, + bool charge_cgroup_rsvd, + bool use_existing_reservation) +{ + return NULL; +} + static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, unsigned long addr, bool cow_from_owner) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 29d1a3fb10df..5b088fe002a2 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2954,6 +2954,101 @@ void wait_for_freed_hugetlb_folios(void) flush_work(&free_hpage_work); } +/** + * hugetlb_alloc_folio() - Allocates a hugetlb folio. + * + * @h: struct hstate to allocate from. + * @mpol: struct mempolicy to apply for this folio allocation. + * @ilx: Interleave index for interpretation of @mpol. + * @charge_cgroup_rsvd: Set to true to charge cgroup reservation. + * @use_existing_reservation: Set to true if this allocation should use an + * existing hstate reservation. + * + * This function handles cgroup and global hstate reservations. VMA-related + * reservations and subpool debiting must be handled by the caller if necessary. + * + * Return: folio on success or negated error otherwise. + */ +struct folio *hugetlb_alloc_folio(struct hstate *h, struct mempolicy *mpol, + pgoff_t ilx, bool charge_cgroup_rsvd, + bool use_existing_reservation) +{ + unsigned int nr_pages = pages_per_huge_page(h); + struct hugetlb_cgroup *h_cg = NULL; + struct folio *folio = NULL; + nodemask_t *nodemask; + gfp_t gfp_mask; + int nid; + int idx; + int ret; + + idx = hstate_index(h); + + if (charge_cgroup_rsvd) { + if (hugetlb_cgroup_charge_cgroup_rsvd(idx, nr_pages, &h_cg)) + goto out; + } + + if (hugetlb_cgroup_charge_cgroup(idx, nr_pages, &h_cg)) + goto out_uncharge_cgroup_reservation; + + gfp_mask = htlb_alloc_mask(h); + nid = policy_node_nodemask(mpol, gfp_mask, ilx, &nodemask); + + spin_lock_irq(&hugetlb_lock); + + if (use_existing_reservation || available_huge_pages(h)) + folio = dequeue_hugetlb_folio(h, gfp_mask, mpol, nid, nodemask); + + if (!folio) { + spin_unlock_irq(&hugetlb_lock); + folio = alloc_surplus_hugetlb_folio(h, gfp_mask, mpol, nid, nodemask); + if (!folio) + goto out_uncharge_cgroup; + spin_lock_irq(&hugetlb_lock); + list_add(&folio->lru, &h->hugepage_activelist); + folio_ref_unfreeze(folio, 1); + /* Fall through */ + } + + if (use_existing_reservation) { + folio_set_hugetlb_restore_reserve(folio); + h->resv_huge_pages--; + } + + hugetlb_cgroup_commit_charge(idx, nr_pages, h_cg, folio); + + if (charge_cgroup_rsvd) + hugetlb_cgroup_commit_charge_rsvd(idx, nr_pages, h_cg, folio); + + spin_unlock_irq(&hugetlb_lock); + + gfp_mask = htlb_alloc_mask(h) | __GFP_RETRY_MAYFAIL; + ret = mem_cgroup_charge_hugetlb(folio, gfp_mask); + /* + * Unconditionally increment NR_HUGETLB here. If it turns out that + * mem_cgroup_charge_hugetlb failed, then immediately free the page and + * decrement NR_HUGETLB. + */ + lruvec_stat_mod_folio(folio, NR_HUGETLB, pages_per_huge_page(h)); + + if (ret == -ENOMEM) { + free_huge_folio(folio); + return ERR_PTR(-ENOMEM); + } + + return folio; + +out_uncharge_cgroup: + hugetlb_cgroup_uncharge_cgroup(idx, nr_pages, h_cg); +out_uncharge_cgroup_reservation: + if (charge_cgroup_rsvd) + hugetlb_cgroup_uncharge_cgroup_rsvd(idx, nr_pages, h_cg); +out: + folio = ERR_PTR(-ENOSPC); + goto out; +} + /* * NOTE! "cow_from_owner" represents a very hacky usage only used in CoW * faults of hugetlb private mappings on top of a non-page-cache folio (in @@ -2971,16 +3066,8 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, bool reservation_exists; bool charge_cgroup_rsvd; struct folio *folio; - int ret, idx; - struct hugetlb_cgroup *h_cg = NULL; - gfp_t gfp = htlb_alloc_mask(h) | __GFP_RETRY_MAYFAIL; struct mempolicy *mpol; - nodemask_t *nodemask; - gfp_t gfp_mask; pgoff_t ilx; - int nid; - - idx = hstate_index(h); if (cow_from_owner) { /* @@ -3020,69 +3107,22 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, } reservation_exists = vma_reservation_exists || subpool_reservation_exists; - /* - * If a vma_reservation_exists, we can skip charging hugetlb - * reservations since that was charged in hugetlb_reserve_pages() when - * the reservation was recorded on the resv_map. - */ - charge_cgroup_rsvd = !vma_reservation_exists; - if (charge_cgroup_rsvd) { - ret = hugetlb_cgroup_charge_cgroup_rsvd( - idx, pages_per_huge_page(h), &h_cg); - if (ret) - goto out_subpool_put; - } - mpol = get_vma_policy(vma, addr, h->order, &ilx); - ret = hugetlb_cgroup_charge_cgroup(idx, pages_per_huge_page(h), &h_cg); - if (ret) { - mpol_cond_put(mpol); - goto out_uncharge_cgroup_reservation; - } - - gfp_mask = htlb_alloc_mask(h); - nid = policy_node_nodemask(mpol, gfp_mask, ilx, &nodemask); - - spin_lock_irq(&hugetlb_lock); - - folio = NULL; - if (reservation_exists || available_huge_pages(h)) - folio = dequeue_hugetlb_folio(h, gfp_mask, mpol, nid, nodemask); - - if (!folio) { - spin_unlock_irq(&hugetlb_lock); - folio = alloc_surplus_hugetlb_folio(h, gfp_mask, mpol, nid, nodemask); - if (!folio) { - mpol_cond_put(mpol); - goto out_uncharge_cgroup; - } - spin_lock_irq(&hugetlb_lock); - list_add(&folio->lru, &h->hugepage_activelist); - folio_ref_unfreeze(folio, 1); - /* Fall through */ - } - /* - * Either dequeued or buddy-allocated folio needs to add special - * mark to the folio when it consumes a global reservation. + * If a vma_reservation_exists, we can skip charging cgroup reservations + * since that was charged during vma reservation. Use a reservation as + * long as it exists. */ - if (reservation_exists) { - folio_set_hugetlb_restore_reserve(folio); - h->resv_huge_pages--; - } - - hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, folio); - - if (charge_cgroup_rsvd) { - hugetlb_cgroup_commit_charge_rsvd(idx, pages_per_huge_page(h), - h_cg, folio); - } - - spin_unlock_irq(&hugetlb_lock); + charge_cgroup_rsvd = !vma_reservation_exists; + folio = hugetlb_alloc_folio(h, mpol, ilx, charge_cgroup_rsvd, + reservation_exists); mpol_cond_put(mpol); + if (IS_ERR_OR_NULL(folio)) + goto out_subpool_put; + hugetlb_set_folio_subpool(folio, spool); /* If vma accounting wasn't bypassed earlier, follow up with commit. */ @@ -3091,9 +3131,8 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, /* * If there is a discrepancy in reservation status between the * time of vma_needs_reservation() and vma_commit_reservation(), - * then there the page must have been added to the reservation - * map between vma_needs_reservation() and - * vma_commit_reservation(). + * then the page must have been added to the reservation map + * between vma_needs_reservation() and vma_commit_reservation(). * * Adjust for the subpool count incremented above AND * in hugetlb_reserve_pages for the same page. Also, @@ -3115,27 +3154,8 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, } } - ret = mem_cgroup_charge_hugetlb(folio, gfp); - /* - * Unconditionally increment NR_HUGETLB here. If it turns out that - * mem_cgroup_charge_hugetlb failed, then immediately free the page and - * decrement NR_HUGETLB. - */ - lruvec_stat_mod_folio(folio, NR_HUGETLB, pages_per_huge_page(h)); - - if (ret == -ENOMEM) { - free_huge_folio(folio); - return ERR_PTR(-ENOMEM); - } - return folio; -out_uncharge_cgroup: - hugetlb_cgroup_uncharge_cgroup(idx, pages_per_huge_page(h), h_cg); -out_uncharge_cgroup_reservation: - if (charge_cgroup_rsvd) - hugetlb_cgroup_uncharge_cgroup_rsvd(idx, pages_per_huge_page(h), - h_cg); out_subpool_put: if (!vma_reservation_exists) hugepage_subpool_put_pages(spool, 1); -- 2.49.0.1045.g170613ef41-goog