From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34A78C36010 for ; Wed, 9 Apr 2025 03:29:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9B85C280042; Tue, 8 Apr 2025 23:29:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9662D28003C; Tue, 8 Apr 2025 23:29:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 82D29280042; Tue, 8 Apr 2025 23:29:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 6434228003C for ; Tue, 8 Apr 2025 23:29:51 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id F02E81419FD for ; Wed, 9 Apr 2025 03:29:52 +0000 (UTC) X-FDA: 83313076224.11.9BDDB92 Received: from szxga05-in.huawei.com (szxga05-in.huawei.com [45.249.212.191]) by imf30.hostedemail.com (Postfix) with ESMTP id 170BC8000C for ; Wed, 9 Apr 2025 03:29:49 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=none; spf=pass (imf30.hostedemail.com: domain of tujinjiang@huawei.com designates 45.249.212.191 as permitted sender) smtp.mailfrom=tujinjiang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744169390; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=527MI7uVs6+gACRjOJbKAScYuA3Io98YPK5lGZsUD9o=; b=ArIvTtlKTZcsu/t4Ht4DJnqXFDPTllETDlX2CXl0jbFQHMlo3K0HC0r3tUyaWYeMb+wpMP OFNs/iVsPyghmOKZF/O73ZQbFvpejjaMnIIqwEEAmoK4HzRmV/ncCqjT4vMmotviY0F0XR D5W2Jkgb0VVMY4dSulYWNyroSwv6w/0= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=none; spf=pass (imf30.hostedemail.com: domain of tujinjiang@huawei.com designates 45.249.212.191 as permitted sender) smtp.mailfrom=tujinjiang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744169390; a=rsa-sha256; cv=none; b=qKJZYGqn1b/KxcsbfWtxGq0Yhic+qLDq+I1TqAu2uLSvF4GPcXtCvSWmfCfpJXNzA9VsOf eEC4VqGwJAXRrhSrKszItRPD755s6vb94ufokeeSJ6wwMau4PVqtv5i1uPHNdmuC1u29P1 QGtZ+qS/kwxTa5Icm97wH+FtOz5xU74= Received: from mail.maildlp.com (unknown [172.19.162.112]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4ZXSxF3bxhz26gsj; Wed, 9 Apr 2025 11:24:49 +0800 (CST) Received: from kwepemo200002.china.huawei.com (unknown [7.202.195.209]) by mail.maildlp.com (Postfix) with ESMTPS id 34D50140158; Wed, 9 Apr 2025 11:29:46 +0800 (CST) Received: from [10.174.179.13] (10.174.179.13) by kwepemo200002.china.huawei.com (7.202.195.209) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Wed, 9 Apr 2025 11:29:45 +0800 Message-ID: <1cd7663b-f052-c52b-1f73-946f97b9fe03@huawei.com> Date: Wed, 9 Apr 2025 11:29:44 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1 Subject: Re: [PATCH v3] mm/hugetlb: fix set_max_huge_pages() when there are surplus pages To: Oscar Salvador CC: , , , , References: <20250407124706.2688092-1-tujinjiang@huawei.com> From: Jinjiang Tu In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.179.13] X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To kwepemo200002.china.huawei.com (7.202.195.209) X-Rspamd-Queue-Id: 170BC8000C X-Stat-Signature: 44u7z57k6t47fbeam77e9xub14m5hi6n X-Rspam-User: X-Rspamd-Server: rspam12 X-HE-Tag: 1744169389-129646 X-HE-Meta: U2FsdGVkX1+ZdxaHW77siRpOH28st+Yq1ioXy1lJNMqiyUIfxNd+ZRrSW2uI7y7v/etOdsf4GdsHlLVUpSruNtEZS78RhBjuvPVijnbaZ/171JqYHQv/WxAPTBFu8NX4krGFaVWRxoYnWFWeebzQN1CEoqGq/kfuWoRBJxiZ2lIH70jhREKqcW8YOt1NtdkWM8H9P2ZUr4kO3FOkaKiZLVQSa0FUKSNk8sci+dsk/4+sHUlgVO2d3NePRACJ70AnISKALTfZam086LABo4FW8MO9lNfJ+IQBfo97UqrHDqi/2OFjcS7JaTGTNjZ8AaPP8NFMOexQpiz+eWGIxuYNcr8FmX43CoPj9n+jkethE84bAWHRU0vNkJPM7uwQgTrO2U9Tg67Arltl5bHTi/8sAV0HMaijWgRgXTANxE3EZfl6HDV8BGg4YleFWUr+Ol9z/K26OlcA83Ky0iM4pBefy8RympDtC+8eXSAY582FB8BdOhHzYNEmumTTeD48wVMXy0pC/Rh/0Ous3KTeWpnxLD/fsB9z045WfH1NP3eivX7Lmh9U1r5+k8udqnHNUOBXQz6nZoCcByQc/9BbtDZPOwfoo6GWLEeCqUFDhM6z4Zn+ozCp0Q341mG0Jmd6B6LahNkknL0uDjCmyYF9YmCmiUMsWlvWzT5DdENHhnk9ZePbb64mIZRAgvSlgKHcvwFAITHEA2TwJVuZKvZWS+F1g1VDQl3D7Z6buuw1PveWPQSYPh5VveXmyAWLORq9f8KoLIp5EhsnXC0HbUfTtH+g0nry5NekmHiIL04ZwqvgcVm1tjIINgGmtmXUEB/ecPTa7XYyy4TR0LbTCE0WUJVEKRsM7BarJHAjG9tRfKF8fdgD+c/6MAGc0shxwGmZ4jtxcB16oNWqn8daLYemQDtOorQJZFNgOzDH5jD3+9nC9ngvDbgpoZpIJX5rcXJ0t75yiFQsSsBrlvdLY566ga+ RAvyk6/w PBOXup6y4agiFISMrtua0x8xKy9k27v0ib99yXwhXA0iiM1zZUmkgBqsaZFA+iewiZMo7zupqpl6XCRypy2eCtNx4C7ckndxk25tjk+iWnxlnk+h3VEImPieo1Suo1W2nK9Vr80fRVwQ6HlkiOhlXVhJq3xmZJ3jpUph/AHJr4Si3Uir8DElkoi1yMRzw7M8Fn5b95l9Gav5sOL8k12y2R03IrC44KqTkHqy5 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: 在 2025/4/8 21:10, Oscar Salvador 写道: > On Mon, Apr 07, 2025 at 08:47:06PM +0800, Jinjiang Tu wrote: >> In set_max_huge_pages(), min_count should mean the acquired persistent >> huge pages, but it contains surplus huge pages. It will leads to failing >> to freeing free huge pages for a Node. >> >> Steps to reproduce: >> 1) create 5 hugetlb folios in Node0 >> 2) run a program to use all the hugetlb folios >> 3) echo 0 > nr_hugepages for Node0 to free the hugetlb folios. Thus the 5 >> hugetlb folios in Node0 are accounted as surplus. >> 4) create 5 hugetlb folios in Node1 >> 5) echo 0 > nr_hugepages for Node1 to free the hugetlb folios >> >> The result: >> Node0 Node1 >> Total 5 5 >> Free 0 5 >> Surp 5 5 > I would put this after the explanation, as otherwise is a bit hard to > follow. > >> We couldn't subtract surplus_huge_pages from min_mount, since free hugetlb >> folios may be surplus due to HVO. In __update_and_free_hugetlb_folio(), >> hugetlb_vmemmap_restore_folio() may fail, add the folio back to pool and >> treat it as surplus. If we directly subtract surplus_huge_pages from >> min_mount, some free folios will be subtracted twice. >> >> To fix it, check if count is less than the num of free huge pages that >> could be destroyed (i.e., available_huge_pages(h)), and remove hugetlb >> folios if so. > But this is not true, you are no longer comparing against > available_huge_pages(h) as you did in v2. > > I would go with something along these lines as changelog. > > "In set_max_huge_pages(), min_count is computed taking into account also > surplushuge pages, which might lead in some cases to not be able to free > huge pages and end up accounting them as surplus intead. > > One way to solve it is to substract surplus_huge_pages directly, but we > cannot do it blindly because there might be surplus pages thar are also > free pages, which might happen when we fail to restore the vmemmap for > optimized hvo pages. > So we could be subtracting the same page twice. > > In order to work this around, let us first compute the number of > free persistent pages, and use that along with surplus pages > to compute min_count." > > And then put the PoC. Thanks very much, I will update the changelog and send a new version later. >> Fixes: 9a30523066cd ("hugetlb: add per node hstate attributes") >> Signed-off-by: Jinjiang Tu > Acked-by: Oscar Salvador > >