From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C129EC3DA6D for ; Tue, 20 May 2025 06:50:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E4E156B008C; Tue, 20 May 2025 02:50:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DFDA46B0092; Tue, 20 May 2025 02:50:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D3ACA6B0093; Tue, 20 May 2025 02:50:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id ADF3A6B008C for ; Tue, 20 May 2025 02:50:38 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 3EE73803F5 for ; Tue, 20 May 2025 06:50:38 +0000 (UTC) X-FDA: 83462362956.18.189CB67 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by imf19.hostedemail.com (Postfix) with ESMTP id 0EA6D1A0008 for ; Tue, 20 May 2025 06:50:35 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=NV9eeuFF; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=7bTBJEoW; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=NV9eeuFF; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=7bTBJEoW; spf=pass (imf19.hostedemail.com: domain of osalvador@suse.de designates 195.135.223.131 as permitted sender) smtp.mailfrom=osalvador@suse.de; dmarc=pass (policy=none) header.from=suse.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747723836; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=38q5kuWOMm+vxNVhXpwzv28l295x38kNYTHAxXOJoi4=; b=MeI9NrIolVB1RrRTn9fFSorXiolJ4myy5qcqfTpWof0W1ffZ7WGAksbDbnCqizNDY3ldA/ +3g4M2fogwpctthNySDVZBBy/feSsOrZl+4bhOth/5AIwNlhCuJ4oZmx2HpmDAVwQlcLNc K3Y6xjnQekwJiKTUcdIi7snZPu5LVVo= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=NV9eeuFF; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=7bTBJEoW; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=NV9eeuFF; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=7bTBJEoW; spf=pass (imf19.hostedemail.com: domain of osalvador@suse.de designates 195.135.223.131 as permitted sender) smtp.mailfrom=osalvador@suse.de; dmarc=pass (policy=none) header.from=suse.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747723836; a=rsa-sha256; cv=none; b=72USVS07AnjDyAUz4a9lRzk8pUXNBo2Ohub6ZhryS/FxEV/rLiuQC8GKs4ip7Hwwrets5/ m17GlgOA9FYCvh+l3CkGjN4d04/01zJB9SBcWbSeSVsufJDVRdBOsgsQIai6bjE+8Pwvyy X6eXym+Z7FWLOrD4/sLWgwjYijLceBA= Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 5703D205DB; Tue, 20 May 2025 06:50:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1747723834; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=38q5kuWOMm+vxNVhXpwzv28l295x38kNYTHAxXOJoi4=; b=NV9eeuFFJgSjIWJRmXQeFyN5UQzTi02AvrcubYOOH4ENNUSN8Y6lK/EkC0DM6j2ZJ63YVp h9jn9/nUsoVLDCDCWcg2Y7p8JDZ8eFgXVxj2kkE2GCjmV5a83EikaEoFvzffgg1dHgFBgL 028a0l4TA/HO6kJsTDODxmNcXtRCDVU= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1747723834; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=38q5kuWOMm+vxNVhXpwzv28l295x38kNYTHAxXOJoi4=; b=7bTBJEoW9oyJCe2iWYqDcIXyDPYhzGbhTSFx2xmkntHXh/RQqYhidRRfC/sfoFU4DJFApI grryvKZdiR4BiQAQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1747723834; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=38q5kuWOMm+vxNVhXpwzv28l295x38kNYTHAxXOJoi4=; b=NV9eeuFFJgSjIWJRmXQeFyN5UQzTi02AvrcubYOOH4ENNUSN8Y6lK/EkC0DM6j2ZJ63YVp h9jn9/nUsoVLDCDCWcg2Y7p8JDZ8eFgXVxj2kkE2GCjmV5a83EikaEoFvzffgg1dHgFBgL 028a0l4TA/HO6kJsTDODxmNcXtRCDVU= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1747723834; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=38q5kuWOMm+vxNVhXpwzv28l295x38kNYTHAxXOJoi4=; b=7bTBJEoW9oyJCe2iWYqDcIXyDPYhzGbhTSFx2xmkntHXh/RQqYhidRRfC/sfoFU4DJFApI grryvKZdiR4BiQAQ== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 3D27913888; Tue, 20 May 2025 06:50:34 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id DqbODTomLGgKLwAAD6G6ig (envelope-from ); Tue, 20 May 2025 06:50:34 +0000 Date: Tue, 20 May 2025 08:50:24 +0200 From: Oscar Salvador To: Andrey Alekhin Cc: muchun.song@linux.dev, linux-mm@kvack.org Subject: Re: [PATCH] mm: free surplus huge pages properly on NUMA systems Message-ID: References: <20250515191327.41089-1-andrei.aleohin@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250515191327.41089-1-andrei.aleohin@gmail.com> X-Rspamd-Server: rspam10 X-Stat-Signature: 89yqizzkseifb8tsiwpxi8x1mokejxms X-Rspamd-Queue-Id: 0EA6D1A0008 X-Rspam-User: X-HE-Tag: 1747723835-258814 X-HE-Meta: U2FsdGVkX1/4xMHgUyKG54LQeZxtbc31vqDSx1KmLgc6wshYViOiwNFC6IaYzTxQEWQ3uZ8o1ng3U+MdwH99B2bZAZwe2QSX1qgJAGtpBTHpOBF/04yMc513k/oPmg/Vrpn1djarjl/83kEYemWzhADg/eyhGGkzwIedyEuT1dTcHRfnHwSvwt3idnqek4uAYZh3qAutktUsHLYvFyiuesr3GNK63OTIBDuD77c7U6IZ1Yuzjq3kLh+vT2q4j5X0I1smwtyEe46XtAHZELJGIbEUL6T5+QDKJ2KsVoyDKvlrvp+rXNlRU1aBfGPj/A+WyfUfsQIHi78yEy/dueN5Sa27UOE61JEu36PZJ6FRimWBHuHnxVOFhT/f0aDM8igiikGcJALtpZVVLdJXXRqPZN+t167raZu9gR+CdyxdRQIF9d8jzToMN6quCjrfDqbQuza7OBEE9jAvf4l19oSUcx4pMHpe9ThEg5LaVcCUxHx9dGiw/1N+xFLipBbvo6AmkoUjDi9LaOVJKXyUbqysoLqSxGgq7gH5xuTXFlhuOwNJG8PrMteMbuy8f0bNl5BV/qIR3nXGNJNsYbWuli6QttKwabxlyjrZ9IRmfbIO1geEHwMv8jobru3bXUmC5fn9YNS0ZIOW/UHG4GmqqXu4F7W+ou7myqSxsoPBTbwEeJsm6H3X75IXllbTU/+EruQIWVtTp7kQWi6MY8iOSsgDhojlDwhKWIAZ2wi5mz4L9nDMbFUK+6F3oejXwV1zNkeMQnraBmXofB8b/9zSrxfWmokxzdhoUIb2qHk3y2uDCVqUtXUh/+Iaj1SNsJitGbLi440RMMaRQTSyyvC3JP2qgHoXOTVMEuCwPIBq3PybIMA3VySnpax0IQlaPLH7Xll23hlWvpkM2Wd6dFn5rjae/KMevvBaCe/rhU/QgpBabo3A17r8mjFDDGkZ+eQ0s39ZfQHWmHlrXd+XLlcdeFc pn9Wa7gg IjhGxG6D2LGsHcbQWg4uuyPZ95OpWLh559+RwK52K+LZ1Rm/eiget82JFvc254/EoMF3LmAylX/rZaDIxpUiLp+yXVJKoV/s9w9UHtGuEAMbOJ4pGHNgpq64BjdTx6Hw/3pihdFBb9ViG296mh1hwMkRlK7s9wHg9wt9Mcldsky78Pf8QVqC+uQwmNssP6NeFuvFmS43zHfaPiI6e2Qadx1gw979TVzXuaXOk+44qMlJrWBGNeHKr6g64blhCV9cIEpDwgrTraT4kRN7fg9/2hX7cfjGGsCUwoccoqb8tWMvVtV8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, May 15, 2025 at 10:13:27PM +0300, Andrey Alekhin wrote: > == History == > > Wrong values of huge pages counters were detected on Red Hat 9.0 (linux > 5.14) when runing ltp test hugemmap10. Inspection of linux source code > showed that the problem was not fixed even in linux 6.14. Besides code inspection, have you tried this on a current upstream kernel? We have recently changed the way we deal with surplus huge pages. > == Problem == > > free_huge_folio function does not properly free surplus huge pages on > NUMA systems. free_huge_folio checks surplus huge page counter only on > current node (where folio is allocated), but gather_surplus_pages > function can allocate surplus huge pages on any node. > > The following sequence is possible on NUMA system: > > n - overall number of huge pages > f - number of free huge pages > s - number of surplus huge pages > huge page counters: [before] > | > [after] > > Process runs on node #1 > | > node0 node1 > 1) addr1 = mmap(MAP_SHARED, ...) // 1 huge page is mmaped (cur_nid=1) > [n=2 f=2 s=0] [n=1 f=1 s=0] r=0 > | > [n=2 f=2 s=0] [n=1 f=1 s=0] r=1 I take that 'r' means reserved? > void free_huge_folio(struct folio *folio) > { > /* > @@ -1833,6 +1850,8 @@ void free_huge_folio(struct folio *folio) > struct hugepage_subpool *spool = hugetlb_folio_subpool(folio); > bool restore_reserve; > unsigned long flags; > + int node; > + nodemask_t *mbind_nodemask, alloc_nodemask; > > VM_BUG_ON_FOLIO(folio_ref_count(folio), folio); > VM_BUG_ON_FOLIO(folio_mapcount(folio), folio); > @@ -1883,6 +1902,25 @@ void free_huge_folio(struct folio *folio) > remove_hugetlb_folio(h, folio, true); > spin_unlock_irqrestore(&hugetlb_lock, flags); > update_and_free_hugetlb_folio(h, folio, true); > + } else if (h->surplus_huge_pages) { > + mbind_nodemask = policy_mbind_nodemask(htlb_alloc_mask(h)); > + if (mbind_nodemask) > + nodes_and(alloc_nodemask, *mbind_nodemask, > + cpuset_current_mems_allowed); > + else > + alloc_nodemask = cpuset_current_mems_allowed; > + > + for_each_node_mask(node, alloc_nodemask) { > + if (h->surplus_huge_pages_node[node]) { > + h->surplus_huge_pages_node[node]--; > + h->surplus_huge_pages--; > + break; > + } > + } I am not really happy with this, it feels quite ad-hoc to be honest. If we are really having this, and I need to take closer look, we should join the two 'else if' that handle surplus pages because as it stands it is not really obvious what is going on. And we might need some comments as well. Let me think about this. -- Oscar Salvador SUSE Labs