From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B18C6C5B555 for ; Fri, 6 Jun 2025 14:37:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1FE056B007B; Fri, 6 Jun 2025 10:37:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1AF106B0088; Fri, 6 Jun 2025 10:37:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 09C956B0089; Fri, 6 Jun 2025 10:37:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id DA7586B007B for ; Fri, 6 Jun 2025 10:37:09 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 7C6C4162458 for ; Fri, 6 Jun 2025 14:37:09 +0000 (UTC) X-FDA: 83525228178.30.36F0ED0 Received: from mail-qv1-f42.google.com (mail-qv1-f42.google.com [209.85.219.42]) by imf07.hostedemail.com (Postfix) with ESMTP id D8E6C4000A for ; Fri, 6 Jun 2025 14:37:07 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=JYWeHrAY; spf=pass (imf07.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.219.42 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749220627; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=AaEwI0xQjMrdshuvrU/B5Cob4sYuRWx9gxQZpiMO6so=; b=QWdQpqiFZgtph3TMRi1R0g5ZdCv3nAlPDyyuJ9UpyrcadxQkNYdIrHrXHFoiIkoVyK2d52 Iu6Mrd+qWnSXcsVpbJnfl37zbuNt8kn8XjHy+jx8Cv3lZKx+LFRKVs+FPcx0mKGml5Sdg0 MLsQ98jr1/EOUEzDHqE5sgvfjtYPRiY= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=JYWeHrAY; spf=pass (imf07.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.219.42 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749220627; a=rsa-sha256; cv=none; b=y73YOKbgXSBH2/Gabgx9v+giHOe98wWbbGq53g//+fI8/epQpRQKoISgx3Ay6hG0ORD7Lw OyNke1NoNwA64MW+FM3A5loSZANTNypGgjdMBE/Xc5TXNsAersrgtXCTViT5Qs0g/McPn9 zOPc3CDjV6IHTYK31vjoRzlz5Yvwxfg= Received: by mail-qv1-f42.google.com with SMTP id 6a1803df08f44-6f0ad744811so15575386d6.1 for ; Fri, 06 Jun 2025 07:37:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1749220627; x=1749825427; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=AaEwI0xQjMrdshuvrU/B5Cob4sYuRWx9gxQZpiMO6so=; b=JYWeHrAYExOyASOuw2cjI3E+RZTPobLJh3lDf+qBQ6cw9Cj/da8VsAeDzvOPRXSWiq /VugVFyqtGVlSmayitKgVL+0Sl5wog2fkDZ7BtkB6y4vzlejl5sAFid/hWloRNZSXjvw qqHiKEmGdf/P75SMVc9E1rxMt9zWxgEqo7WWMFP0sdiUk4eEID9PChOovdoyjqk8HNAY G656qTBtMCShpqSBt09NKAAZbwL5wZXmm4NFhopCr8qyldzRKSaC12ubmpPlDgcJO9BC mX0B0QGw8hQTm9LZFskIrxDtrzj/zGGsfGiweE7XWbFapGfs64RjoI2rL/RN+8LZTiOL 9FWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749220627; x=1749825427; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=AaEwI0xQjMrdshuvrU/B5Cob4sYuRWx9gxQZpiMO6so=; b=Hk1vfaQ+6uZ5/kyWdtd8r9gFSL+2psRsDOlMb/7DWVr5UnHA55YCUl2u7DPHZORj+x GYsWkA/c8E3tnfh3b5k1RBQx+55gCic6BTcqf7VUKKz4h0WsZ5HnushiQiFtInvsC0jr elYPt3C8lg2h8TR9zdIYtRTM9gOkrAfGIEkjpIcqi8fbFnUGiQsqxDO+NsLdh9oyPlGz oGPZiHGPkIRuHlAofUQb0VFXldn3k7lFEyhhudJUx3dUKk/3pZI7VmmWMWd5xCJY9m4e QQ2js+YdzyeV9n5PJHNOI6ivkQ+FTh+DI6GUWYMF/+WfxMcZIz+95lyjaIDStmSDcgqA xayg== X-Forwarded-Encrypted: i=1; AJvYcCXOw8c/p3D3zxCM4L2LipJn+F9K2Dp9qA2lGd7GEsGCHHCdX434Nf8eudmrtFf1DS/pqnWaVUR4Ag==@kvack.org X-Gm-Message-State: AOJu0YytgUbC2U19Vm6roBMhlUEtc0JIutQJDDlHVbWtD3sj52VaU7Pl TY+AUU0lcmHROg5xTby6uaaG0pW1iu5HEQ7z+unJUFNu9RPRGzC8eM8w X-Gm-Gg: ASbGncsaRMQ+LajtAWI/0S13ZolCZciyZHcmK9YlN8FroXp0xcpJOk51TiV9oiBrE7K cDXXxyTtsQcDa8Hdvb7Q97VgYziriwMHlWUHjzpYF3I0LutSdAXo8X0vEVAL0K4ETdxsqnNMioX sOi3quQLbKoLdaZmumbnv/+WjieGThGGzVT+JN9fjLF8jkForYxArOnAA3SS0lGfVlaHJZPUQQk OK0NztlSYeaR7LbsDcp63ily4mJmxYj6nxv7XgFbQYRhS2i73+NhJ787YMCbPOsX1sI2Ye8Rolf u3jxRvTnKCIEib++ahk6NzT5bI6j6Xhls55EOw5p8wxDuv5f39jyE4jTRnI= X-Google-Smtp-Source: AGHT+IFR13sAgZh2AfNd6UskGQ4hRF6QlKjuipPT65vrhMI6mbut6Ono5sI0auy2R1DSgPK/Ap9C0w== X-Received: by 2002:a05:6214:1d2c:b0:6fa:cbe8:b873 with SMTP id 6a1803df08f44-6fb08f54d0fmr55768976d6.14.1749220626725; Fri, 06 Jun 2025 07:37:06 -0700 (PDT) Received: from localhost ([2a03:2880:20ff:1::]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6fb09b366easm11829636d6.114.2025.06.06.07.37.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Jun 2025 07:37:06 -0700 (PDT) From: Usama Arif To: Andrew Morton , david@redhat.com, linux-mm@kvack.org Cc: hannes@cmpxchg.org, shakeel.butt@linux.dev, riel@surriel.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, hughd@google.com, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Usama Arif Subject: [RFC] mm: khugepaged: use largest enabled hugepage order for min_free_kbytes Date: Fri, 6 Jun 2025 15:37:00 +0100 Message-ID: <20250606143700.3256414-1-usamaarif642@gmail.com> X-Mailer: git-send-email 2.47.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: D8E6C4000A X-Stat-Signature: bzroyrbo9rqwfotshk4gx1tnhozc16cw X-Rspam-User: X-HE-Tag: 1749220627-26124 X-HE-Meta: U2FsdGVkX1/ZuoSdDVRiHh8oak9fNrh8w+F41nc8lQ8NnZtDaFbhkX+ygAn8h97fzaG5sN+V31KWjq5J8LCMPNtAsaZaOwDAPHjh8DCwFIf4Rthk7exw1Ol89SyP/ExshCPbHAk9ndXU/LNxx39W2j5i0K+2mSk0+FzJY46uqB1FBV7BE0jjQuytQ51kEg/tdgCEXOrt0JCMvfl0/h+qGwK36uJF7+k5+HqY+dgEGet7RgaDSLnDKpdVP73xG8+izxnY+LSxW+FozFQ3mWWBUocnZq2HGmkJP0eTtIqCDT09xih/aRMUAyDFe73gGOh/Tihc7jK4jWtgB12as/duj/8VFQUB9ODJurNgfNTMdZ/PwnBdKKns5o3B/6GxBDr5GMa1EZHf+h4DlDKP09a16rzjFJALGnU1jnJfRk7nQ9Uzu5LP/HNn4YLomPsblFEKZhOsn3Nxt3Rf5lpUlN32BPq0WDNs/FJbIQ//5kl6Tt3Gx+JkpLg930iFZLgqF2I/lWqukfX4l8gOIrgt9kNT0KB0+NamK+7HFk/33bWVGh0Ard9KrXC6ojX/o1DhpHDoK06PHY7u5yKjxA5rDUm0rvFN/YmYssF4vkMsmesOaIaBSsl7h3bJ1kjLz/Q+at7T8AFSoJBiEgx9YrdX0ZAG85lUf0NqYuSI6B7FOUGEBMZbZEAovQR+ljoz8FOL6bGTcNjf8zgx9TZ/1WOc5k0REFsICI9pzYt/vD4hgs1vUmdLFcAJchv/x52uIddOEuW+Xy5zobwcCDiVNUEJYtL/CrJWVhISzRVb0NNt3wEildlh9/kmay+fNZCP+cesBtgdfx2Y/3BElxx+TUZG9pWpWEH72oaJgWYMtXey+mdDPCtWQVWzeQeX0bjxN1pLk7PHpM8PWXjvrh0/FgZI3iNv0QSnwr1N4I4KTYcK8UBYevr/VEWOvxXYAs3Xf8DuGMbJSuJii5HHusrBOCd9sTA u1giOBfa s0IgtWJZ2/rTec5DYuSwNmv8HibbXb26TRzSiijhhnuDGiRTUq6achAbpd/Vc3IVOVTj11KtNcF851LdemL3AIIasMD9YS1vov2cA2vZffvMz0NSw3N59EZdFbQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On arm64 machines with 64K PAGE_SIZE, the min_free_kbytes and hence the watermarks are evaluated to extremely high values, for e.g. a server with 480G of memory, only 2M mTHP hugepage size set to madvise, with the rest of the sizes set to never, the min, low and high watermarks evaluate to 11.2G, 14G and 16.8G respectively. In contrast for 4K PAGE_SIZE of the same machine, with only 2M THP hugepage size set to madvise, the min, low and high watermarks evaluate to 86M, 566M and 1G respectively. This is because set_recommended_min_free_kbytes is designed for PMD hugepages (pageblock_order = min(HPAGE_PMD_ORDER, PAGE_BLOCK_ORDER)). Such high watermark values can cause performance and latency issues in memory bound applications on arm servers that use 64K PAGE_SIZE, eventhough most of them would never actually use a 512M PMD THP. Instead of using HPAGE_PMD_ORDER for pageblock_order use the highest large folio order enabled in set_recommended_min_free_kbytes. With this patch, when only 2M THP hugepage size is set to madvise for the same machine with 64K page size, with the rest of the sizes set to never, the min, low and high watermarks evaluate to 2.08G, 2.6G and 3.1G respectively. When 512M THP hugepage size is set to madvise for the same machine with 64K page size, the min, low and high watermarks evaluate to 11.2G, 14G and 16.8G respectively, the same as without this patch. An alternative solution would be to change PAGE_BLOCK_ORDER by changing ARCH_FORCE_MAX_ORDER to a lower value for ARM64_64K_PAGES. However, this is not dynamic with hugepage size, will need different kernel builds for different hugepage sizes and most users won't know that this needs to be done as it can be difficult to detmermine that the performance and latency issues are coming from the high watermark values. All watermark numbers are for zones of nodes that had the highest number of pages, i.e. the value for min size for 4K is obtained using: cat /proc/zoneinfo | grep -i min | awk '{print $2}' | sort -n | tail -n 1 | awk '{print $1 * 4096 / 1024 / 1024}'; and for 64K using: cat /proc/zoneinfo | grep -i min | awk '{print $2}' | sort -n | tail -n 1 | awk '{print $1 * 65536 / 1024 / 1024}'; An arbirtary min of 128 pages is used for when no hugepage sizes are set enabled. Signed-off-by: Usama Arif --- include/linux/huge_mm.h | 25 +++++++++++++++++++++++++ mm/khugepaged.c | 32 ++++++++++++++++++++++++++++---- mm/shmem.c | 29 +++++------------------------ 3 files changed, 58 insertions(+), 28 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 2f190c90192d..fb4e51ef0acb 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -170,6 +170,25 @@ static inline void count_mthp_stat(int order, enum mthp_stat_item item) } #endif +/* + * Definitions for "huge tmpfs": tmpfs mounted with the huge= option + * + * SHMEM_HUGE_NEVER: + * disables huge pages for the mount; + * SHMEM_HUGE_ALWAYS: + * enables huge pages for the mount; + * SHMEM_HUGE_WITHIN_SIZE: + * only allocate huge pages if the page will be fully within i_size, + * also respect madvise() hints; + * SHMEM_HUGE_ADVISE: + * only allocate huge pages if requested with madvise(); + */ + + #define SHMEM_HUGE_NEVER 0 + #define SHMEM_HUGE_ALWAYS 1 + #define SHMEM_HUGE_WITHIN_SIZE 2 + #define SHMEM_HUGE_ADVISE 3 + #ifdef CONFIG_TRANSPARENT_HUGEPAGE extern unsigned long transparent_hugepage_flags; @@ -177,6 +196,12 @@ extern unsigned long huge_anon_orders_always; extern unsigned long huge_anon_orders_madvise; extern unsigned long huge_anon_orders_inherit; +extern int shmem_huge __read_mostly; +extern unsigned long huge_shmem_orders_always; +extern unsigned long huge_shmem_orders_madvise; +extern unsigned long huge_shmem_orders_inherit; +extern unsigned long huge_shmem_orders_within_size; + static inline bool hugepage_global_enabled(void) { return transparent_hugepage_flags & diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 15203ea7d007..e64cba74eb2a 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -2607,6 +2607,26 @@ static int khugepaged(void *none) return 0; } +static int thp_highest_allowable_order(void) +{ + unsigned long orders = READ_ONCE(huge_anon_orders_always) + | READ_ONCE(huge_anon_orders_madvise) + | READ_ONCE(huge_shmem_orders_always) + | READ_ONCE(huge_shmem_orders_madvise) + | READ_ONCE(huge_shmem_orders_within_size); + if (hugepage_global_enabled()) + orders |= READ_ONCE(huge_anon_orders_inherit); + if (shmem_huge != SHMEM_HUGE_NEVER) + orders |= READ_ONCE(huge_shmem_orders_inherit); + + return orders == 0 ? 0 : fls(orders) - 1; +} + +static unsigned long min_thp_pageblock_nr_pages(void) +{ + return (1UL << min(thp_highest_allowable_order(), PAGE_BLOCK_ORDER)); +} + static void set_recommended_min_free_kbytes(void) { struct zone *zone; @@ -2638,12 +2658,16 @@ static void set_recommended_min_free_kbytes(void) * second to avoid subsequent fallbacks of other types There are 3 * MIGRATE_TYPES we care about. */ - recommended_min += pageblock_nr_pages * nr_zones * + recommended_min += min_thp_pageblock_nr_pages() * nr_zones * MIGRATE_PCPTYPES * MIGRATE_PCPTYPES; - /* don't ever allow to reserve more than 5% of the lowmem */ - recommended_min = min(recommended_min, - (unsigned long) nr_free_buffer_pages() / 20); + /* + * Don't ever allow to reserve more than 5% of the lowmem. + * Use a min of 128 pages when all THP orders are set to never. + */ + recommended_min = clamp(recommended_min, 128, + (unsigned long) nr_free_buffer_pages() / 20); + recommended_min <<= (PAGE_SHIFT-10); if (recommended_min > min_free_kbytes) { diff --git a/mm/shmem.c b/mm/shmem.c index 0c5fb4ffa03a..8e92678d1175 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -136,10 +136,10 @@ struct shmem_options { }; #ifdef CONFIG_TRANSPARENT_HUGEPAGE -static unsigned long huge_shmem_orders_always __read_mostly; -static unsigned long huge_shmem_orders_madvise __read_mostly; -static unsigned long huge_shmem_orders_inherit __read_mostly; -static unsigned long huge_shmem_orders_within_size __read_mostly; +unsigned long huge_shmem_orders_always __read_mostly; +unsigned long huge_shmem_orders_madvise __read_mostly; +unsigned long huge_shmem_orders_inherit __read_mostly; +unsigned long huge_shmem_orders_within_size __read_mostly; static bool shmem_orders_configured __initdata; #endif @@ -516,25 +516,6 @@ static bool shmem_confirm_swap(struct address_space *mapping, return xa_load(&mapping->i_pages, index) == swp_to_radix_entry(swap); } -/* - * Definitions for "huge tmpfs": tmpfs mounted with the huge= option - * - * SHMEM_HUGE_NEVER: - * disables huge pages for the mount; - * SHMEM_HUGE_ALWAYS: - * enables huge pages for the mount; - * SHMEM_HUGE_WITHIN_SIZE: - * only allocate huge pages if the page will be fully within i_size, - * also respect madvise() hints; - * SHMEM_HUGE_ADVISE: - * only allocate huge pages if requested with madvise(); - */ - -#define SHMEM_HUGE_NEVER 0 -#define SHMEM_HUGE_ALWAYS 1 -#define SHMEM_HUGE_WITHIN_SIZE 2 -#define SHMEM_HUGE_ADVISE 3 - /* * Special values. * Only can be set via /sys/kernel/mm/transparent_hugepage/shmem_enabled: @@ -551,7 +532,7 @@ static bool shmem_confirm_swap(struct address_space *mapping, #ifdef CONFIG_TRANSPARENT_HUGEPAGE /* ifdef here to avoid bloating shmem.o when not necessary */ -static int shmem_huge __read_mostly = SHMEM_HUGE_NEVER; +int shmem_huge __read_mostly = SHMEM_HUGE_NEVER; static int tmpfs_huge __read_mostly = SHMEM_HUGE_NEVER; /** -- 2.47.1