From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49D8FC3ABDA for ; Wed, 14 May 2025 23:44:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3E2316B00D9; Wed, 14 May 2025 19:43:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 368E76B00DA; Wed, 14 May 2025 19:43:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 16B4E6B00DB; Wed, 14 May 2025 19:43:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id E12AA6B00D9 for ; Wed, 14 May 2025 19:43:26 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 1F880C11C0 for ; Wed, 14 May 2025 23:43:28 +0000 (UTC) X-FDA: 83443142496.01.4B95681 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) by imf12.hostedemail.com (Postfix) with ESMTP id 3B8C240008 for ; Wed, 14 May 2025 23:43:26 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=bfc1CFm5; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf12.hostedemail.com: domain of 3nSolaAsKCM4uw4yB5yID7008805y.w86527EH-664Fuw4.8B0@flex--ackerleytng.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=3nSolaAsKCM4uw4yB5yID7008805y.w86527EH-664Fuw4.8B0@flex--ackerleytng.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747266206; a=rsa-sha256; cv=none; b=C2mYLTuSFTBjG7XR/vzEZAlXOR/+WMC12c++zHu/OKEVZ1Y0Rf5GGhnJrliExrqah5JMY9 VqB38lIEtOnYZnUyH5i4O4pIJzufs0Aw+X6ZOvV5uGGNOdOqWrX4UcLMdFxn4VtOEiRxGJ my+rO8UtA+HjKfSbET/1ttMihOi51uw= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=bfc1CFm5; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf12.hostedemail.com: domain of 3nSolaAsKCM4uw4yB5yID7008805y.w86527EH-664Fuw4.8B0@flex--ackerleytng.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=3nSolaAsKCM4uw4yB5yID7008805y.w86527EH-664Fuw4.8B0@flex--ackerleytng.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747266206; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gu1hE3sOpXQcmnP4gjBDOWYVUzePLag+pWRkAyBiM5E=; b=WyLmgAI2J1U4PlIPYT4Vd0r84kLNs1ybUDs2dxwwP4utNbSlvjXgPu1I58Hc4uWQT7EBF1 skx/Qrr5ZDbubmfg1ZgwYXDyZBonG2tB7oW8lxPvoGylp3MIdp80fUlaD7PlBcsty4LoZ+ YKhDXuuDB+zE+nAP5rvnkBvu4uLvCUQ= Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-b1f71ae2181so157304a12.1 for ; Wed, 14 May 2025 16:43:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266205; x=1747871005; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=gu1hE3sOpXQcmnP4gjBDOWYVUzePLag+pWRkAyBiM5E=; b=bfc1CFm5aId74jglwhX7VVv6l53hJRgv+ev+B2PZfd3OOH8uk718Ra7vlalx2BqFz7 f6lkDHGeyO6uiO1OlNLsWwQbSrTOjXpnq9cOpsEnAJ/VM6VGAo6mzozBl53eovu87698 xGrftf7NJUmUoAb/gRSghnlbvbVF0ZTotrAQDWUoCHs74ziSsi9Fb4B2KEgDzNBUn+4o HGVS8SUPoqvgWhIwuL0AKRvvO06ZFH4RwgndpZ/EqHkZwtr1voLEwZoCN5HVpXN0//OS Q4RqLrMV0ZHQfQrKln6g0uXaxg2jFRhPSwIVi1SWY5a5OIR1NeTDTqla+cPuLYDdrqge ZQ7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266205; x=1747871005; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=gu1hE3sOpXQcmnP4gjBDOWYVUzePLag+pWRkAyBiM5E=; b=JSNKf8HYrHVGjp3Bf+M5pgp8/6Qr4vqIhdlGVjPy4YiQPfMnKB/i2klchvGLTgp+wq agEQ5oOHd8iPE8wz3GTh//gWsX1evRu0+5Wga8fgFwh9V2Imq/niEsQ4UIrkvmcR7xqT AJ1qO+APdExsfErSkbnoCmLyQHHb7pezKiGmA7q7fOjCRw7DXpIMND00ZVBDFVuevv8Z bZ+3SQvfasSO3/zyULjtF748wk8eKOXMxVvO+SUtcFAS0v8w4kJOsvWy507SBYB8v5KG bTBBsUSzcSl+fyzrLo9sCTo9299JhSo406YEF1/f5FfdDFEg+FvUBa67HkIFTo1j5PJf boPQ== X-Forwarded-Encrypted: i=1; AJvYcCXftSFK8zLTRPs998jykWXnD1ju+w2Ii37St5NDaaXWCOkUKWlceZd3mgu/Q0UtOMSQQun46Uhu2w==@kvack.org X-Gm-Message-State: AOJu0YySb0jsMTbYC0tjvyxg1zpFXR1eQ+Z++tt/cXdGPUo+a0Lb6IpM gRYRrUYGVs7h/5mlhDl9wPwqHOBmL6zkcjxF579acM+UlSju5AW/NVtOaI9w1Xx1lLe4lPGzxtp Wzdne5loW8YOEhXavN+PCqQ== X-Google-Smtp-Source: AGHT+IHrutaRsnjMWCkKIi090WWBQzOHOSc5C6RScHY9wSl72p4og14yPX1uNLmn95Ao7YjcjSzx202Z6voCwDWGow== X-Received: from pjbeu14.prod.google.com ([2002:a17:90a:f94e:b0:2fc:2f33:e07d]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:2dc3:b0:30a:883a:ea5b with SMTP id 98e67ed59e1d1-30e2e5c84f4mr9725120a91.17.1747266205029; Wed, 14 May 2025 16:43:25 -0700 (PDT) Date: Wed, 14 May 2025 16:41:57 -0700 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: <782bb82a0d2d62b616daebb77dc3d9e345fb76fa.1747264138.git.ackerleytng@google.com> Subject: [RFC PATCH v2 18/51] mm: hugetlb: Cleanup interpretation of map_chg_state within alloc_hugetlb_folio() From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Type: text/plain; charset="UTF-8" X-Stat-Signature: bmhu8hmky36a8wcpebg6rrb8bp91ne9n X-Rspam-User: X-Rspamd-Queue-Id: 3B8C240008 X-Rspamd-Server: rspam06 X-HE-Tag: 1747266206-24166 X-HE-Meta: U2FsdGVkX189bpymHzd8zUgtOkAZA/mrjuIa9YjBjJOnLX9fMuyPaFMBcncPG6LwlEco8Sg4nwoRc21o92Ui5by2ucyd70K93khkynhY5OgcWslwmUezanN4WWl54CMipiy10dsaw178aimRKGJQFhRXZYLNROgEoss6JjTPQKoAle/y439cCUuO37Fbvo6q49BnnoWS9ipIwBVFOPT1FEMtGs8/5GhEVdZZKt97q3eOr5aC9gwo4KO67TpamzAAwqztNiUkWF3WVUQF3fCAiAXEbRbkaMQJERVEcxQw2GHZ0ShxkJx+J4cIHpn2JHqzTc13euwGwNX0URDqzM4IGN7/JOV/7xlkE4JDHoeEjPwKG7FmKrj79RCI0KHCGLTUuWP2BXjHhwNn2HqcuAZ2qQn1N5iDzC91d9ybgggn661XMLdZsg+/745ALex2O0U0q+bNk0ReGO/0R+x3TtdQW3gNpYaZeJl9qGI5R4sPtXzIcNyUA0P2oWbXEy9A+ux6PaGXz8d+n/2UfgmHwWUdZyDU+GfTOmxO0uxTQwZEt628WIaGRYf0sQXSEB1bf+NOoVkFlYgWqjs987XNisASIk4MZC3U/N4AVS/a1a1ZMmNWtKFq2vln1xTI2slB1XH0Z5cXSZyvkpKjuofcypvMDpN4YATsmXP9tD8AV4hwac82ZqaeNPl/ZKL4WvMcRQBmllcDOjcqMKyuZcygwefpDxlz6rHWhxzYCC6A3LCLV8i8I7kbkxKft+1TyfNN0bPQBMB2128pgC00PIR6lt4OujgvdXd0fWdKZ36MAXNFU94N74bxV0J6kMo93UUiiVN6uLsghbOHWyNabTeXQd2tqtpzu2yfyjLPz7x8KHKT05WY4iBKZdSFzjSq7qc8KsHg5+hMsMzaEWLy05pFTDLPzqI1K+bNnOLtsZaMlmSjFe2bNzKNMYfEy34C8pn15DJ2bMVY0Wmhs+eDYp6N7G8 PV6g5Rzx bT0W7BzBwYPDAhXocrKgocpkzcMqA8DY6O95JbYCcWWfBNFM1AGBUmNbS608n8RPFzatLcFJgvYUz0jo1EUlPRmDWXCJjARc8XTd9oekzUyvLlcesngIXDaOUaWVnCVhfVI2FU0Sgoq3qCAELGUq/DjKwo+LNIe/i7qh88VB12+UYB0hOyzjvPJE/B2mJ4CheagZJS1SLWTRM7NRJ0jr4pSIA7quvABHR0ZknCDOnzuHWqIoCjOAKRiNwNU+VX8MMAeqdg5eq18q+z75kTuCCUpUFqv1x8Yefh+QZzFq6SIr0IdEQNL8pln5X5dcjCUc22hnNdikzWPIFElCU1IIRGqvz8r8ZuT1k3WW2UayXCovuLrxN0+U6sAL1htx8dpDAl65jz7t8ef8dXz5O/SElrq3faU67X1s5SQjZpmA1wKrk9X2tcr3Mv+xH+I60Yh98DqcexnX8XQhGPIcCrb6M4YrnEPn5wutOOnrPlzyYVpeAHmWSFfUxHgfs/YHCRyd3U6bjcdo7RRxiJhBt1bHpo9DvI8eDhZSe5ZOlOD5HGpTr6G4MqED/74Hg6LOStSSiFkGJ3ZG7lS1jhDhXmb0LVDetq9+B6Kr9isEThdVb/pjCCvsuomwgn10RxaOZJZQy62hNZAFt7nWnhg9e9w7YUk4AcFZHURepY0sCw8ILzMOt4wzx+p4VO4VNI4/1dXl/bMhP X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Interpreting map_chg_state inline, within alloc_hugetlb_folio(), improves readability. Instead of having cow_from_owner and the result of vma_needs_reservation() compute a map_chg_state, and then interpreting map_chg_state within alloc_hugetlb_folio() to determine whether to + Get a page from the subpool or + Charge cgroup reservations or + Commit vma reservations or + Clean up reservations This refactoring makes those decisions just based on whether a vma_reservation_exists. If a vma_reservation_exists, the subpool had already been debited and the cgroup had been charged, hence alloc_hugetlb_folio() should not double-debit or double-charge. If the vma reservation can't be used (as in cow_from_owner), then the vma reservation effectively does not exist and vma_reservation_exists is set to false. The conditions for committing reservations or cleaning are also updated to be paired with the corresponding conditions guarding reservation creation. Signed-off-by: Ackerley Tng Change-Id: I22d72a2cae61fb64dc78e0a870b254811a06a31e --- mm/hugetlb.c | 94 ++++++++++++++++++++++------------------------------ 1 file changed, 39 insertions(+), 55 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 597f2b9f62b5..67144af7ab79 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2968,25 +2968,6 @@ void wait_for_freed_hugetlb_folios(void) flush_work(&free_hpage_work); } -typedef enum { - /* - * For either 0/1: we checked the per-vma resv map, and one resv - * count either can be reused (0), or an extra needed (1). - */ - MAP_CHG_REUSE = 0, - MAP_CHG_NEEDED = 1, - /* - * Cannot use per-vma resv count can be used, hence a new resv - * count is enforced. - * - * NOTE: This is mostly identical to MAP_CHG_NEEDED, except - * that currently vma_needs_reservation() has an unwanted side - * effect to either use end() or commit() to complete the - * transaction. Hence it needs to differenciate from NEEDED. - */ - MAP_CHG_ENFORCED = 2, -} map_chg_state; - /* * NOTE! "cow_from_owner" represents a very hacky usage only used in CoW * faults of hugetlb private mappings on top of a non-page-cache folio (in @@ -3000,46 +2981,45 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, struct hugepage_subpool *spool = subpool_vma(vma); struct hstate *h = hstate_vma(vma); bool subpool_reservation_exists; + bool vma_reservation_exists; bool reservation_exists; + bool charge_cgroup_rsvd; struct folio *folio; - long retval; - map_chg_state map_chg; int ret, idx; struct hugetlb_cgroup *h_cg = NULL; gfp_t gfp = htlb_alloc_mask(h) | __GFP_RETRY_MAYFAIL; idx = hstate_index(h); - /* Whether we need a separate per-vma reservation? */ if (cow_from_owner) { /* * Special case! Since it's a CoW on top of a reserved * page, the private resv map doesn't count. So it cannot * consume the per-vma resv map even if it's reserved. */ - map_chg = MAP_CHG_ENFORCED; + vma_reservation_exists = false; } else { /* * Examine the region/reserve map to determine if the process - * has a reservation for the page to be allocated. A return - * code of zero indicates a reservation exists (no change). + * has a reservation for the page to be allocated and debit the + * reservation. If the number of pages required is 0, + * reservation exists. */ - retval = vma_needs_reservation(h, vma, addr); - if (retval < 0) + int npages_req = vma_needs_reservation(h, vma, addr); + + if (npages_req < 0) return ERR_PTR(-ENOMEM); - map_chg = retval ? MAP_CHG_NEEDED : MAP_CHG_REUSE; + + vma_reservation_exists = npages_req == 0; } /* - * Whether we need a separate global reservation? - * - * Processes that did not create the mapping will have no - * reserves as indicated by the region/reserve map. Check - * that the allocation will not exceed the subpool limit. - * Or if it can get one from the pool reservation directly. + * Debit subpool only if a vma reservation does not exist. If + * vma_reservation_exists, the vma reservation was either moved from the + * subpool or taken directly from hstate in hugetlb_reserve_pages() */ subpool_reservation_exists = false; - if (map_chg) { + if (!vma_reservation_exists) { int npages_req = hugepage_subpool_get_pages(spool, 1); if (npages_req < 0) @@ -3047,13 +3027,16 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, subpool_reservation_exists = npages_req == 0; } - reservation_exists = !map_chg || subpool_reservation_exists; + + reservation_exists = vma_reservation_exists || subpool_reservation_exists; /* - * If this allocation is not consuming a per-vma reservation, - * charge the hugetlb cgroup now. + * If a vma_reservation_exists, we can skip charging hugetlb + * reservations since that was charged in hugetlb_reserve_pages() when + * the reservation was recorded on the resv_map. */ - if (map_chg) { + charge_cgroup_rsvd = !vma_reservation_exists; + if (charge_cgroup_rsvd) { ret = hugetlb_cgroup_charge_cgroup_rsvd( idx, pages_per_huge_page(h), &h_cg); if (ret) @@ -3091,10 +3074,8 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, } hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, folio); - /* If allocation is not consuming a reservation, also store the - * hugetlb_cgroup pointer on the page. - */ - if (map_chg) { + + if (charge_cgroup_rsvd) { hugetlb_cgroup_commit_charge_rsvd(idx, pages_per_huge_page(h), h_cg, folio); } @@ -3103,25 +3084,27 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, hugetlb_set_folio_subpool(folio, spool); - if (map_chg != MAP_CHG_ENFORCED) { - /* commit() is only needed if the map_chg is not enforced */ - retval = vma_commit_reservation(h, vma, addr); + /* If vma accounting wasn't bypassed earlier, follow up with commit. */ + if (!cow_from_owner) { + int ret = vma_commit_reservation(h, vma, addr); /* - * Check for possible race conditions. When it happens.. - * The page was added to the reservation map between - * vma_needs_reservation and vma_commit_reservation. - * This indicates a race with hugetlb_reserve_pages. + * If there is a discrepancy in reservation status between the + * time of vma_needs_reservation() and vma_commit_reservation(), + * then there the page must have been added to the reservation + * map between vma_needs_reservation() and + * vma_commit_reservation(). + * * Adjust for the subpool count incremented above AND * in hugetlb_reserve_pages for the same page. Also, * the reservation count added in hugetlb_reserve_pages * no longer applies. */ - if (unlikely(map_chg == MAP_CHG_NEEDED && retval == 0)) { + if (unlikely(!vma_reservation_exists && ret == 0)) { long rsv_adjust; rsv_adjust = hugepage_subpool_put_pages(spool, 1); hugetlb_acct_memory(h, -rsv_adjust); - if (map_chg) { + if (charge_cgroup_rsvd) { spin_lock_irq(&hugetlb_lock); hugetlb_cgroup_uncharge_folio_rsvd( hstate_index(h), pages_per_huge_page(h), @@ -3149,14 +3132,15 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, out_uncharge_cgroup: hugetlb_cgroup_uncharge_cgroup(idx, pages_per_huge_page(h), h_cg); out_uncharge_cgroup_reservation: - if (map_chg) + if (charge_cgroup_rsvd) hugetlb_cgroup_uncharge_cgroup_rsvd(idx, pages_per_huge_page(h), h_cg); out_subpool_put: - if (map_chg) + if (!vma_reservation_exists) hugepage_subpool_put_pages(spool, 1); out_end_reservation: - if (map_chg != MAP_CHG_ENFORCED) + /* If vma accounting wasn't bypassed earlier, cleanup. */ + if (!cow_from_owner) vma_end_reservation(h, vma, addr); return ERR_PTR(-ENOSPC); } -- 2.49.0.1045.g170613ef41-goog