From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5DF5EC61CE8 for ; Mon, 9 Jun 2025 11:13:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DD20D6B0092; Mon, 9 Jun 2025 07:13:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DAA0D6B0096; Mon, 9 Jun 2025 07:13:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CC08D6B0098; Mon, 9 Jun 2025 07:13:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id AC83C6B0092 for ; Mon, 9 Jun 2025 07:13:38 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 60BBE101077 for ; Mon, 9 Jun 2025 11:13:38 +0000 (UTC) X-FDA: 83535601716.18.F4DD615 Received: from mail-wm1-f50.google.com (mail-wm1-f50.google.com [209.85.128.50]) by imf23.hostedemail.com (Postfix) with ESMTP id 26CEB140002 for ; Mon, 9 Jun 2025 11:13:35 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=b8Q9K9Sd; spf=pass (imf23.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.128.50 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749467616; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hOQhq6CZUPAVA4WdzdkgO6qvBO3SSzHLUEvm+OJdhdM=; b=F4L7k8gJZaXE9tm9oJVbuFL80OPA6WmzQ6ets4A7RCr30XBaz9hpObLVvcAPDQd5u2HE6U fy1tgdV4PTgNYHH3kDQ7rKxpw1qdh9i98VvYjvWEZU6/YdaL+5weXd3v7KdFYUcXekJ0kO spFta8y43HyZFLtjjHPqYQpA0e10I4c= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=b8Q9K9Sd; spf=pass (imf23.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.128.50 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749467616; a=rsa-sha256; cv=none; b=5Hgl4oeZvrUTOEBRM1ZQ32+GMw9CwkcoGiOEj2obl9/RTTSX7h+cEjxHVeaDQAZkw+PE9N rW7MxiZGiWeD1JZ1LVCPhidqi3DZ8x1LvBohi5iPBqBL4aDr2wZ3kATvmfI9iSCN6Ix5jC 2sxrn5LQBWB5s1abejFXMX/dQ4oD4w8= Received: by mail-wm1-f50.google.com with SMTP id 5b1f17b1804b1-451d7b50815so36106725e9.2 for ; Mon, 09 Jun 2025 04:13:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1749467615; x=1750072415; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=hOQhq6CZUPAVA4WdzdkgO6qvBO3SSzHLUEvm+OJdhdM=; b=b8Q9K9SdNKXWUhKKIEA3UpR1MOfB/P9kehxP2XjsKjW2QE9pNdT8a+An/Uf3kvHW6w usNwTNBgd0s2ASe6EbCQdMiqD94n3ilkSGl6zzC7HCErYcTPWhqQwKdSONn69u5vGfIJ sPNthxmezwvL2E1g3jPci0zn6QDkXkYe21n5od+XnOSfNT/KKpXwSiBpTDlGeZXhRihy CTdX8A41JYbCy+tkpRtFwQ8fZU6kJBV3lqD1Ub0rafstjiaP/riXSpsjSUZCoCxLA0Xb wm2MxsDqnJei4eh7pLz8HaVUJELlwlUikQ5TY2aj2E7RcZ73Kd048qw7nALInhRWoRIJ WyNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749467615; x=1750072415; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=hOQhq6CZUPAVA4WdzdkgO6qvBO3SSzHLUEvm+OJdhdM=; b=ER/q9/Im+Hc9KTlRE0sSG3XPQ5Cewx1e1r3TfrEJNphpylRUV0Xg1Ajfe1wxfJe3Cr ozY3eXN2tBaID8NCprPsIBVU3smd2VeoxuqIum6ePjCjIZy8q3dB8ag6qiCigz2HXM9g /PvN0UcehDIBoKZN9BN7WOgH/yJR+Evmmx/7owraKZlM8SM6BcewzD7mYsvR1j0IFtO6 MrCHqSFN1PLru34f7bZ9VNza1ULasIA0tctDCp3fK3b9Yhma7WZ7hZxiaLc1/xqARWzV iR1QJbnxtM2k9fbCD59Cgcoti6NHfeChZLPVIz77G3CHrzL7lU0kH00jRaAyQY2NcAzb 08Dg== X-Forwarded-Encrypted: i=1; AJvYcCUSIDq6cb23Ev7cZUXsHqEw2g3SVDVRl/jSl9wqI3exj5MoN/HraAZU0g4wqsGfpTSk26SpCC3SAQ==@kvack.org X-Gm-Message-State: AOJu0YzpdJ+vI+HLBP5b0VYK4VH+EGpLOHBrVchzILcKXOJMI0Wnxr4X dRqABdK1gjVqB1oj/UKvjPTQ2httBPhyxxDnlzn5PHkZujjCeUbWPWHHS5AIbg== X-Gm-Gg: ASbGnctuEjXT789L4FELE9LWJV5oGSZZEM1aLQynreIRHVxKojQydtX0QcIBf8BI27R zrNhdmRGNgEZDgSvh7SwcrA1iNH4pJu6ds3Ue9heptaDJo0ptNThjUeclCqoxXrUwmnSAah+NJL VzftylqE/YuxnvRwgzWGX8TSR/XcyoWe8d42s4AMf5rUpK10WxqC1APWGxRtC4RCyHLOnZLI/uX co9D1s1zkXjf6r5EtLk71qj0SrsLpV5zPmWwm0LRkIk4JxERZcw66DnzznIbSzY3HtWO8Un0RF3 TW9xK0QwWzr3s572DC6ky5ynEk1uQGWGnLmmxy42ud9QK82PYwZQx4KqM+8MNX1t5arT/LG1T6E 3Byu/Py4U/CmdZ2J3k35KPslicXEjpfYeYjYqKpeuAS/a7+Hxrr0qukYenT83qRU= X-Google-Smtp-Source: AGHT+IErQ40tdpyZuHenlWRHkHCZteBYphWC8Nt4IP+canzeifWBphgcwFkM0703Nk25Ypp6SVN4dQ== X-Received: by 2002:a05:600c:4e88:b0:442:d9fb:d9f1 with SMTP id 5b1f17b1804b1-45201367b68mr122748775e9.4.1749467614202; Mon, 09 Jun 2025 04:13:34 -0700 (PDT) Received: from ?IPV6:2a02:6b6f:e750:f900:146f:2c4f:d96e:4241? ([2a02:6b6f:e750:f900:146f:2c4f:d96e:4241]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3a53229d96fsm9257379f8f.7.2025.06.09.04.13.33 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 09 Jun 2025 04:13:33 -0700 (PDT) Message-ID: Date: Mon, 9 Jun 2025 12:13:33 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC] mm: khugepaged: use largest enabled hugepage order for min_free_kbytes To: Zi Yan Cc: Andrew Morton , david@redhat.com, linux-mm@kvack.org, hannes@cmpxchg.org, shakeel.butt@linux.dev, riel@surriel.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, hughd@google.com, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Juan Yescas , Breno Leitao References: <20250606143700.3256414-1-usamaarif642@gmail.com> <35A3819F-C8EE-48DB-8EB4-093C04DEF504@nvidia.com> Content-Language: en-US From: Usama Arif In-Reply-To: <35A3819F-C8EE-48DB-8EB4-093C04DEF504@nvidia.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 26CEB140002 X-Stat-Signature: 6qg4rthofwmgs9fap3jscapy63objfcu X-Rspam-User: X-Rspamd-Server: rspam07 X-HE-Tag: 1749467615-969627 X-HE-Meta: U2FsdGVkX1+lZPkHUwmDdDsEpFoC+00CJEqRTWrP1EQgNoXhsMX7mys2UZBBaLsS3jsvYTFXTsG4GffvOvNpkK9Q+WJwPS1dFgKnfUGnysDtgnyznX+pYff726CiRB/hpD9DN63uX3Tfyc1S+x/K70wvorHTm6O9CCXuJFdyokRegFAbWazghccbfnKSfPDb36fUqdn7w6vtFtOo45oK72D94k9uvMM09EN26aaD7PStO6brQS6480X6ujQz+/EeAgv3sNEdb32FI5FmwsLNedDGIps/p6pzGw0G28qgqDCc4+MsqdoQkNsWnLZarLrCjq8+aeNopqUIg7adh1U7ilgRCadXM+TahmwsuSbmbJWJ5J5aw03/SCIT2tzCETq7+6AO/pZVIoY0W6LWyMZ+/+8awbDjA0FNn89vd8SFZd19IJPNJ3nCGHtL/ndQv4NPteVc1DgFCU6eYtgiY976tYA4KxgwJYI0XtOD8FURsJN0Zm635biQKtEXPctwrOCSu/N8vwS1oaPyKH0Yoc6hc5bSe32i+4YpP7ZKZE4XzvZsKTF1aM8Q0h39dO9fge3yRjJ8Kqt2M2882GWw0P2ufYXr0jr27gberSLM0V78aPIkgAhxUQkOQG3DZml3M63PRvEgMd2PXJ8pBGDI3FJiHVv09bTEosVGYB1UvUz44rQZjkH14kbc20sfBPNMug8JAR2aE2lujt526T090HQuU4UQVnkZNUqKpTH3+FlrVBl74h4cdTu7tRxtYxhAgZufLcoYtIwg8LXqPmF2sm1EzK1FkaYKrEwI6N1Q2DDDLfoUKGTTedg/JbwlUGgmLHaQTJZhQ8d3ToRZAxTsqYlob+u56cMhQ6UIdV9KYNROZLyy2Brm0FDoLJYRxDdoWsSqC7b9bFp8ASPVWGWEsBjb33WvzyDa0VjwgauRhojEpo/RaQq055cm5Bfs+YOXzOODpuE9bETvFTThMEHNbmE yd/S/JAP RnbYN9gyaxfS0S9b+btRwqbKkspWVGtc/gnESn3klXv5O+j4Wbt12K+tSaBhUIA8kmcHthXxJEKFllBbCmxEFXfiIiM0mobgEJs2QAck31pvGoqvhnR6I09gPm0sePAaRq2zFJJOU5kSx0sIb2sstWpIm1GoJBxj9iwBHXSWkzwQ28mgy/BLuEI3hdFBRrNv0wBqVEHYq62cUcFkRFpvky2uVEHNuys1l+NmpQ//FiVKRFylPhkNGqPrl/6+ib71ffAWrKrz2ry9SoiuR3lG28ElvcExqoX31ELlLzJ9ldVL+DeKpcuKaxRwvYpIl825RysHBr2kCLOuCwV0SSVMY515jAo1mhI9geVttdenkMWBiLaz1oV/uLGVwxgIQ8qSGVFJJubtjmYuCdE8JeeTC3NhITr4MWpyAQrA8drm4Dp2GPffAZIw4GZCkNe1Yob9TIM9iOFxr5esFDNooWJgTAnFWUovnQeXc/EJrBISRS3ZgPiNpuG4zgOiNQZL2pnQY0eC+2ZtU3xbWZCuJkBPyDGJIfTbaqlUgnqAXpPaME00+dE8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 06/06/2025 17:10, Zi Yan wrote: > On 6 Jun 2025, at 11:38, Usama Arif wrote: > >> On 06/06/2025 16:18, Zi Yan wrote: >>> On 6 Jun 2025, at 10:37, Usama Arif wrote: >>> >>>> On arm64 machines with 64K PAGE_SIZE, the min_free_kbytes and hence the >>>> watermarks are evaluated to extremely high values, for e.g. a server with >>>> 480G of memory, only 2M mTHP hugepage size set to madvise, with the rest >>>> of the sizes set to never, the min, low and high watermarks evaluate to >>>> 11.2G, 14G and 16.8G respectively. >>>> In contrast for 4K PAGE_SIZE of the same machine, with only 2M THP hugepage >>>> size set to madvise, the min, low and high watermarks evaluate to 86M, 566M >>>> and 1G respectively. >>>> This is because set_recommended_min_free_kbytes is designed for PMD >>>> hugepages (pageblock_order = min(HPAGE_PMD_ORDER, PAGE_BLOCK_ORDER)). >>>> Such high watermark values can cause performance and latency issues in >>>> memory bound applications on arm servers that use 64K PAGE_SIZE, eventhough >>>> most of them would never actually use a 512M PMD THP. >>>> >>>> Instead of using HPAGE_PMD_ORDER for pageblock_order use the highest large >>>> folio order enabled in set_recommended_min_free_kbytes. >>>> With this patch, when only 2M THP hugepage size is set to madvise for the >>>> same machine with 64K page size, with the rest of the sizes set to never, >>>> the min, low and high watermarks evaluate to 2.08G, 2.6G and 3.1G >>>> respectively. When 512M THP hugepage size is set to madvise for the same >>>> machine with 64K page size, the min, low and high watermarks evaluate to >>>> 11.2G, 14G and 16.8G respectively, the same as without this patch. >>> >>> Getting pageblock_order involved here might be confusing. I think you just >>> want to adjust min, low and high watermarks to reasonable values. >>> Is it OK to rename min_thp_pageblock_nr_pages to min_nr_free_pages_per_zone >>> and move MIGRATE_PCPTYPES * MIGRATE_PCPTYPES inside? Otherwise, the changes >>> look reasonable to me. >> >> Hi Zi, >> >> Thanks for the review! >> >> I forgot to change it in another place, sorry about that! So can't move >> MIGRATE_PCPTYPES * MIGRATE_PCPTYPES into the combined function. >> Have added the additional place where min_thp_pageblock_nr_pages() is called >> as a fixlet here: >> https://lore.kernel.org/all/a179fd65-dc3f-4769-9916-3033497188ba@gmail.com/ >> >> I think atleast in this context the orginal name pageblock_nr_pages isn't >> correct as its min(HPAGE_PMD_ORDER, PAGE_BLOCK_ORDER). >> The new name min_thp_pageblock_nr_pages is also not really good, so happy >> to change it to something appropriate. > > Got it. pageblock is the defragmentation granularity. If user only wants > 2MB mTHP, maybe pageblock order should be adjusted. Otherwise, > kernel will defragment at 512MB granularity, which might not be efficient. > Maybe make pageblock_order a boot time parameter? > > In addition, we are mixing two things together: > 1. min, low, and high watermarks: they affect when memory reclaim and compaction > will be triggered; > 2. pageblock order: it is the granularity of defragmentation for creating > mTHP/THP. > > In your use case, you want to lower watermarks, right? Considering what you > said below, I wonder if we want a way of enforcing vm.min_free_kbytes, > like a new sysctl knob, vm.force_min_free_kbytes (yeah the suggestion > is lame, sorry). > > I think for 2, we might want to decouple pageblock order from defragmentation > granularity. > This is a good point. I only did it for the watermarks in the RFC, but there is no reason that the defrag granularity is done in 512M chunks and is probably very inefficient to do so? Instead of replacing the pageblock_nr_pages for just set_recommended_min_free_kbytes, maybe we just need to change the definition of pageblock_order in [1] to take into account the highest large folio order enabled instead of HPAGE_PMD_ORDER? [1] https://elixir.bootlin.com/linux/v6.15.1/source/include/linux/pageblock-flags.h#L50 I really want to avoid coming up with a solution that requires changing a Kconfig or needs kernel commandline to change. It would mean a reboot whenever a different workload runs on a server that works optimally with a different THP size, and that would make workload orchestration a nightmare. > >>> >>> Another concern on tying watermarks to highest THP order is that if >>> user enables PMD THP on such systems with 2MB mTHP enabled initially, >>> it could trigger unexpected memory reclaim and compaction, right? >>> That might surprise user, since they just want to adjust availability >>> of THP sizes, but the whole system suddenly begins to be busy. >>> Have you experimented with it? >>> >> >> Yes I would imagine it would trigger reclaim and compaction if the system memory >> is too low, but that should hopefully be expected? If the user is enabling 512M >> THP, they should expect changes by kernel to allow them to give hugepage of >> that size. >> Also hopefully, no one is enabling PMD THPs when the system is so low on >> memory that it triggers reclaim! There would be an OOM after just a few >> of those are faulted in. > > > > Best Regards, > Yan, Zi