From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E52C9E7718B for ; Sat, 28 Dec 2024 00:06:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3BD606B007B; Fri, 27 Dec 2024 19:06:40 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 36D046B0082; Fri, 27 Dec 2024 19:06:40 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 20E446B0083; Fri, 27 Dec 2024 19:06:40 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 020686B007B for ; Fri, 27 Dec 2024 19:06:39 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 698A01C8F29 for ; Sat, 28 Dec 2024 00:06:39 +0000 (UTC) X-FDA: 82942424376.01.78A3F5E Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) by imf26.hostedemail.com (Postfix) with ESMTP id 8D33C140013 for ; Sat, 28 Dec 2024 00:06:04 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=pbh2hJFS; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf26.hostedemail.com: domain of 3DEFvZwsKCI0rt1v82vFA4xx55x2v.t532z4BE-331Crt1.58x@flex--ackerleytng.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3DEFvZwsKCI0rt1v82vFA4xx55x2v.t532z4BE-331Crt1.58x@flex--ackerleytng.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1735344356; a=rsa-sha256; cv=none; b=8b0gVZ/YBuRov92D9qxUATMxwV1r7WZVZ53RO/gxHGS8uHu5LB1dBMlKZTdXvnsFc36cOd R5yxn3FRBwtx78SwSAygWEcvAuNU9dGBmuSex7dduu21DPWQ5UabrltrUw4A8sSb0/xVd7 juon6Y1UIvggoWYhBVk56mKypzVcma4= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=pbh2hJFS; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf26.hostedemail.com: domain of 3DEFvZwsKCI0rt1v82vFA4xx55x2v.t532z4BE-331Crt1.58x@flex--ackerleytng.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3DEFvZwsKCI0rt1v82vFA4xx55x2v.t532z4BE-331Crt1.58x@flex--ackerleytng.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1735344356; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:dkim-signature; bh=Djg3nfiltreJGbfrzud/NNXS+TtrFMGAdRSFgQOujU0=; b=MCt0J+2e5rD95kiDY20Ol/jAHmTsf/MFyG1gP8NHVSMO4drawx9VPXRX6bfTLg2vC9l8k/ myKwW378TmlaFjgdx4YtGp0opxAU7UnZzllJStxz8VtVw0ANuTyeozsR0ThCXWHvXTrDew TM3LGkXxh4XdEI0cJtRo8iT8bw8R43Y= Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-2ef114d8346so9922759a91.0 for ; Fri, 27 Dec 2024 16:06:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1735344396; x=1735949196; darn=kvack.org; h=cc:to:from:subject:message-id:mime-version:in-reply-to:date:from:to :cc:subject:date:message-id:reply-to; bh=Djg3nfiltreJGbfrzud/NNXS+TtrFMGAdRSFgQOujU0=; b=pbh2hJFSMvVtME5oCgqmgLowAEZdvyN3tVGKYBCw/iT2EZEUWEw3SJbOv/Qap/5Xb7 LbTriGSbaCy3Pj2aRbeHNSKhFoJCDtczsVcAXfddZ3Qt2nn1SHRqyf399eJGGeW21vCO BgUZ5zMrtCe1342Jka9GfebeIBuYBu86/kq/tGIgYm9xrmI3fJhXSYV24eH3Lc/uWQ2I FJOwghc/UkOXcat/lMDKToaj9lJ5ee2fwrkogY21FOSviXNOjxh+ectrqwq8t5gpFneJ wJmY0sbfuUIs5qkDgoFyJ9e0NTtFfGYfGOWl73R79Hzz/Aq3AevDv4eOZtjpctn3bMmu 0mfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1735344396; x=1735949196; h=cc:to:from:subject:message-id:mime-version:in-reply-to:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Djg3nfiltreJGbfrzud/NNXS+TtrFMGAdRSFgQOujU0=; b=HvqutVvdiww9+2iclLTbM8sy3ZklK+FOLjskOXBXjL+azAnotMUkW7NzkzmTTA4J4i YyWdeETvPwiqJep1mOREfPd44BxhuqqS9dQFBcqKGtYnPiOHwRyEEPC9qu37WFeC7zcb Jp1faJYX1WMyskBdkY66UFvQQAwbuBNkWD5wIiNo8/rS3fN0uYckPL1f6ooC6VR12m8d zbxc+E2BKPUy1TOt0TKqeqB+UNBCtcbu2IPhLFVfTRIwTxxYh0Pzu7D9YkcHRBN48vE7 bnE+FyqkwXCaJG5qryhnmrWyc2gE3yA4x7oZKAbeMiyj+vOd+rrNDCglf0b8aW5tcKsy CmRQ== X-Forwarded-Encrypted: i=1; AJvYcCU+PRxin+RZt01PdQZsy03qkyVcL8zCDmwnArxo1nbynGIihWPSt4nf6//fOUhY0lWqwRdN61WUZg==@kvack.org X-Gm-Message-State: AOJu0Yz3ItK/PyTYMCcC2uTVaoSmhQfbZ2I0EjWJy7dpqlejuCS465yg 8NmdKVs+e3r7ISqfb/xPuGvmQUjcMUL5//MIoV9kKrXKAI/flmZy3N4fsMavCcgldmGQN4LGf7g Jo7NJQ8mADpOSX6pe8xm13g== X-Google-Smtp-Source: AGHT+IGvuK5qHi8aQEnjobM92qhiCAi71lW+R1rZb3g1fvpzcniCZ5rWsYrRIc+Mfm9HeN3U+KNURgh9vL/XPoqT+A== X-Received: from pjtd9.prod.google.com ([2002:a17:90b:49:b0:2f2:e5c9:de99]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:538e:b0:2ee:fc08:1bc1 with SMTP id 98e67ed59e1d1-2f452eeb5aemr42992664a91.37.1735344396163; Fri, 27 Dec 2024 16:06:36 -0800 (PST) Date: Sat, 28 Dec 2024 00:06:34 +0000 In-Reply-To: <20241201212240.533824-5-peterx@redhat.com> (message from Peter Xu on Sun, 1 Dec 2024 16:22:37 -0500) Mime-Version: 1.0 Message-ID: Subject: Re: [PATCH 4/7] mm/hugetlb: Clean up map/global resv accounting when allocate From: Ackerley Tng To: Peter Xu Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, riel@surriel.com, leitao@debian.org, akpm@linux-foundation.org, peterx@redhat.com, muchun.song@linux.dev, osalvador@suse.de, roman.gushchin@linux.dev, nao.horiguchi@gmail.com Content-Type: text/plain; charset="UTF-8" X-Stat-Signature: t595i7q799tdpifrbqxnu9xih59utgya X-Rspam-User: X-Rspamd-Queue-Id: 8D33C140013 X-Rspamd-Server: rspam08 X-HE-Tag: 1735344364-51804 X-HE-Meta: U2FsdGVkX1+RTtxyg/419Nm/MdKzi2me1CJNBFon+48yY0K037lU00karMJt6qdonYz1IdXSDifcTBOxDgOUin0DPYLn3X7vi9uIz0SvnkOcD6am0Xlyk797W+Oc237HScADpjvd6iTlD6jJcYJQ9s+CX06g16K+YIa6Ko2gOZe2rQk0q82EgppGy2I0OueL9TV1TapnTJ5OBQRaNNeGmwqVzrFCvLrocVcGc/7ChrPWOsNGKqFsEmzfGKxxmLN4D9RFAfRIj5XlQ9QvvgcG6FZc4KidbdltogEZPU0eHcokJHTRP5K63SJPFsaOTs3L99Oee03PvxEYcVk2eRz6z71uYc6MagRBdEABR9o2S2deyTCuJM2ONOtU3wiTuhGvmR/2Q7s2Z+dUOniKNnoZPbXwB0hKf5nVFbK/OjYhjwSlea5ycct8tUkIaR/nn8E/5s8sKkQlCl4eO9uXInBeLOdvx4VDwo6FZJyH8H1XT+86Kkg57wNjcaWkR+V0CgDvXbFk7JgKvqFYDmsm6dX7u0A5IAr/0YPTMSH0rXUi6HK1ZkOMJntmls4HyXzco3JVDhPkvbAstMLBvVzmjwV7ZKSQ54EAG//zoa/DC46BXixdILaeHyPF4HwMVnN+n89MOVZOvQKuJ6EGUH3woGStOEVGwvfpjn6GVRtiaiWO5ajvJFS8RKxI3v35VUXckj7oA5lHZNh6V1hv3p4YvVTf4KPErt0ZK3BY6KWs9n/tAAxWqUViuPESyFaUHNHe52iUQF7aJz6QqBu3yWE7U2/atly3EqgwcGB5X4/sIHaFsc0A5jZAje75e6P//4lVo8oyCYy3/oL3YT77OBoYAxA2s9IUJSDaDPsfRi4ElLmKNeCzpRLuLGCcGPGNtWJ1bSkihIwYp83LVzLnS8Yg3wLzBtNfydxgH+trhyIkUdstBmzKRNVx6BjC7m6osnK0UOfW4L9x82eVjTScUKidJUu TRt7h00v sWhG1OHm2XqoMOnZh5ORVdBEhE4po26LDsWrpIHTSThjPrKqcCAPMaS0n3XUidfvYmNsMG6Sk0bwQOTesirps5PyK4riTDAmlchaIfntUHVWuA9b+7mDm2ceY6H2fqVOPU95h4uZbCppLTX/kEOuxE5ftF0PXeXt/xSwY6ZkVumLgjIXhKpk4UAajsJVVJlq8uJ6TwdGy77KwXG26nPwLM4cZa8R2I4UJ131Lot4kjcGwrfnR+KRprRaz4/+JSMdbOWLxBU3gAh5+Kj/sy/Rl+Opg2ECC7fDiI93KMo6oDATNMC1Rg2ynjrbG1WwcB3EfgIGkVRuBY1NrT1Kc8afvESSEDWjW6IjY273JpSYB1nWpmDPwIi0IQ8NxjVxhkRYHAr0/1AsU0qB36K6qJb9Nuf0lnogEtFH5/zFVsFy6xa4ICU4vMOXUR/hG0htSlKhNV7A45BQAQjw3gpOMxUjx08ExoFKzygweTB/S4Q663/SVSIHnURssoZ0ZtCIaIk31Dww3hs6l/zYBuxBQNngfIoJKr6L+CX4vgGxGdCAX4tfAxM8MUNtLvgrB1bT+8kORJGJvhDk7lcMvvhTUFVsVO/99wHSY4BF0ZvV7qMUuctw619/LxxHAWDT7NhsWIywgr7HKqSHmVnlHPoc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Peter Xu writes: > > > Signed-off-by: Peter Xu > --- > mm/hugetlb.c | 116 +++++++++++++++++++++++++++++++++++---------------- > 1 file changed, 80 insertions(+), 36 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index dfd479a857b6..14cfe0bb01e4 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -2956,6 +2956,25 @@ int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list) > return ret; > } > > +typedef enum { > + /* > + * For either 0/1: we checked the per-vma resv map, and one resv > + * count either can be reused (0), or an extra needed (1). > + */ > + MAP_CHG_REUSE = 0, > + MAP_CHG_NEEDED = 1, > + /* > + * Cannot use per-vma resv count can be used, hence a new resv > + * count is enforced. > + * > + * NOTE: This is mostly identical to MAP_CHG_NEEDED, except > + * that currently vma_needs_reservation() has an unwanted side > + * effect to either use end() or commit() to complete the > + * transaction. Hence it needs to differenciate from NEEDED. > + */ > + MAP_CHG_ENFORCED = 2, > +} map_chg_state; > + > /* > * NOTE! "cow_from_owner" represents a very hacky usage only used in CoW > * faults of hugetlb private mappings on top of a non-page-cache folio (in > @@ -2969,12 +2988,11 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, > struct hugepage_subpool *spool = subpool_vma(vma); > struct hstate *h = hstate_vma(vma); > struct folio *folio; > - long map_chg, map_commit, nr_pages = pages_per_huge_page(h); > - long gbl_chg; > + long retval, gbl_chg, nr_pages = pages_per_huge_page(h); > + map_chg_state map_chg; > int memcg_charge_ret, ret, idx; > struct hugetlb_cgroup *h_cg = NULL; > struct mem_cgroup *memcg; > - bool deferred_reserve; > gfp_t gfp = htlb_alloc_mask(h) | __GFP_RETRY_MAYFAIL; > > memcg = get_mem_cgroup_from_current(); > @@ -2985,36 +3003,56 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, > } > > idx = hstate_index(h); > - /* > - * Examine the region/reserve map to determine if the process > - * has a reservation for the page to be allocated. A return > - * code of zero indicates a reservation exists (no change). > - */ > - map_chg = gbl_chg = vma_needs_reservation(h, vma, addr); > - if (map_chg < 0) { > - if (!memcg_charge_ret) > - mem_cgroup_cancel_charge(memcg, nr_pages); > - mem_cgroup_put(memcg); > - return ERR_PTR(-ENOMEM); > + > + /* Whether we need a separate per-vma reservation? */ > + if (cow_from_owner) { > + /* > + * Special case! Since it's a CoW on top of a reserved > + * page, the private resv map doesn't count. So it cannot > + * consume the per-vma resv map even if it's reserved. > + */ > + map_chg = MAP_CHG_ENFORCED; > + } else { > + /* > + * Examine the region/reserve map to determine if the process > + * has a reservation for the page to be allocated. A return > + * code of zero indicates a reservation exists (no change). > + */ > + retval = vma_needs_reservation(h, vma, addr); > + if (retval < 0) { > + if (!memcg_charge_ret) > + mem_cgroup_cancel_charge(memcg, nr_pages); > + mem_cgroup_put(memcg); > + return ERR_PTR(-ENOMEM); > + } > + map_chg = retval ? MAP_CHG_NEEDED : MAP_CHG_REUSE; > } > > /* > + * Whether we need a separate global reservation? > + * > * Processes that did not create the mapping will have no > * reserves as indicated by the region/reserve map. Check > * that the allocation will not exceed the subpool limit. > - * Allocations for MAP_NORESERVE mappings also need to be > - * checked against any subpool limit. > + * Or if it can get one from the pool reservation directly. > */ > - if (map_chg || cow_from_owner) { > + if (map_chg) { > gbl_chg = hugepage_subpool_get_pages(spool, 1); > if (gbl_chg < 0) > goto out_end_reservation; > + } else { > + /* > + * If we have the vma reservation ready, no need for extra > + * global reservation. > + */ > + gbl_chg = 0; > } > > - /* If this allocation is not consuming a reservation, charge it now. > + /* > + * If this allocation is not consuming a per-vma reservation, > + * charge the hugetlb cgroup now. > */ > - deferred_reserve = map_chg || cow_from_owner; > - if (deferred_reserve) { > + if (map_chg) { > ret = hugetlb_cgroup_charge_cgroup_rsvd( > idx, pages_per_huge_page(h), &h_cg); Should hugetlb_cgroup_charge_cgroup_rsvd() be called when map_chg == MAP_CHG_ENFORCED? > if (ret) > @@ -3038,7 +3076,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, > if (!folio) > goto out_uncharge_cgroup; > spin_lock_irq(&hugetlb_lock); > - if (!cow_from_owner && vma_has_reserves(vma, gbl_chg)) { > + if (vma_has_reserves(vma, gbl_chg)) { > folio_set_hugetlb_restore_reserve(folio); > h->resv_huge_pages--; > } > @@ -3051,7 +3089,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, > /* If allocation is not consuming a reservation, also store the > * hugetlb_cgroup pointer on the page. > */ > - if (deferred_reserve) { > + if (map_chg) { > hugetlb_cgroup_commit_charge_rsvd(idx, pages_per_huge_page(h), > h_cg, folio); > } same for this, > @@ -3060,26 +3098,31 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, > > hugetlb_set_folio_subpool(folio, spool); > > - map_commit = vma_commit_reservation(h, vma, addr); > - if (unlikely(map_chg > map_commit)) { > + if (map_chg != MAP_CHG_ENFORCED) { > + /* commit() is only needed if the map_chg is not enforced */ > + retval = vma_commit_reservation(h, vma, addr); > /* > + * Check for possible race conditions. When it happens.. > * The page was added to the reservation map between > * vma_needs_reservation and vma_commit_reservation. > * This indicates a race with hugetlb_reserve_pages. > * Adjust for the subpool count incremented above AND > - * in hugetlb_reserve_pages for the same page. Also, > + * in hugetlb_reserve_pages for the same page. Also, > * the reservation count added in hugetlb_reserve_pages > * no longer applies. > */ > - long rsv_adjust; > + if (unlikely(map_chg == MAP_CHG_NEEDED && retval == 0)) { > + long rsv_adjust; > > - rsv_adjust = hugepage_subpool_put_pages(spool, 1); > - hugetlb_acct_memory(h, -rsv_adjust); > - if (deferred_reserve) { > - spin_lock_irq(&hugetlb_lock); > - hugetlb_cgroup_uncharge_folio_rsvd(hstate_index(h), > - pages_per_huge_page(h), folio); > - spin_unlock_irq(&hugetlb_lock); > + rsv_adjust = hugepage_subpool_put_pages(spool, 1); > + hugetlb_acct_memory(h, -rsv_adjust); > + if (map_chg) { > + spin_lock_irq(&hugetlb_lock); > + hugetlb_cgroup_uncharge_folio_rsvd( > + hstate_index(h), pages_per_huge_page(h), > + folio); > + spin_unlock_irq(&hugetlb_lock); > + } > } > } > > @@ -3093,14 +3136,15 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, > out_uncharge_cgroup: > hugetlb_cgroup_uncharge_cgroup(idx, pages_per_huge_page(h), h_cg); > out_uncharge_cgroup_reservation: > - if (deferred_reserve) > + if (map_chg) > hugetlb_cgroup_uncharge_cgroup_rsvd(idx, pages_per_huge_page(h), > h_cg); and same for this. > out_subpool_put: > - if (map_chg || cow_from_owner) > + if (map_chg) > hugepage_subpool_put_pages(spool, 1); > out_end_reservation: > - vma_end_reservation(h, vma, addr); > + if (map_chg != MAP_CHG_ENFORCED) > + vma_end_reservation(h, vma, addr); > if (!memcg_charge_ret) > mem_cgroup_cancel_charge(memcg, nr_pages); > mem_cgroup_put(memcg);