From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8802DC61CE8 for ; Mon, 9 Jun 2025 12:12:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 209836B007B; Mon, 9 Jun 2025 08:12:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 192426B0088; Mon, 9 Jun 2025 08:12:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 082056B0089; Mon, 9 Jun 2025 08:12:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id D87786B007B for ; Mon, 9 Jun 2025 08:12:30 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 57FEBA01B6 for ; Mon, 9 Jun 2025 12:12:30 +0000 (UTC) X-FDA: 83535750060.01.27498B5 Received: from mail-wr1-f49.google.com (mail-wr1-f49.google.com [209.85.221.49]) by imf02.hostedemail.com (Postfix) with ESMTP id 4F2DD80004 for ; Mon, 9 Jun 2025 12:12:28 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=DTReC37K; spf=pass (imf02.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.221.49 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749471148; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Vpun/tULtl0H5e1HVie0xVlPi4zB2Tu7EfAGao3SNJE=; b=0q536GCvTda9XgqO6SLj8mczQ4mmEdoCv34/rNtyaMwfjWOCrSDwPC3riVrWmY5SC60CgY 5I+jwceBP1dY9TVV3f0QUJ+k8HVpNRomL2km4LEdt1auySHJu2tvOurqLjdgzFGCfY0poO xcww0XY0BYZ0BT8r3yt7SczEra+sfkM= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=DTReC37K; spf=pass (imf02.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.221.49 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749471148; a=rsa-sha256; cv=none; b=jZO7SJOpeGGoRfLvKZO6B0faRvKVEeIE0VGGhYSS0VQWXzHxI4FUtiLpMBj/SsWE7jfcPQ Zxl5psAoL18BfCfxYuScDUCcYtmM9jOyD9uQuOjuTKiOZhnk8K2w9DY76gqxyFBXMFDDNy cZH9179REuGIDOuz6M5quUmd0cDq/Q0= Received: by mail-wr1-f49.google.com with SMTP id ffacd0b85a97d-3a525eee2e3so3125951f8f.2 for ; Mon, 09 Jun 2025 05:12:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1749471147; x=1750075947; darn=kvack.org; h=content-transfer-encoding:in-reply-to:content-language:references :cc:to:from:subject:user-agent:mime-version:date:message-id:from:to :cc:subject:date:message-id:reply-to; bh=Vpun/tULtl0H5e1HVie0xVlPi4zB2Tu7EfAGao3SNJE=; b=DTReC37KICFXqVACHilg4jbtEKaduz8oH/KlTaXtJ5yxRGTHteMxjpDmTXwJ7cm0Kt Mbmx/84lTPg2J2GoIw0ApuY2GbC3Jq9Fz43kXmwgQcfpF9TR48vfVl0fsRzPFV9lqC0f t7hoDHetWhNMsyWZBV4BdOqJKmMpufEi9tCJi1QyadFMdR6+Me03F5rHUFdlaK4dPI5m ERWxlLg0BNqGUuRsLvjbx/7NLkwcdyEbrwh5sEtkdqWviHtL8+VNv6pgH7m1wEPaRy7c LB9XWMkxvHKrE08FYSbHuRy+VeQL4ns6tC7omeimQ3ZvCD1dlQwHe1NrNZFT0EnOD1u7 xa3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749471147; x=1750075947; h=content-transfer-encoding:in-reply-to:content-language:references :cc:to:from:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Vpun/tULtl0H5e1HVie0xVlPi4zB2Tu7EfAGao3SNJE=; b=B9yURbkXla9TTJjvd3YITDoNhkbnfncE7wK20nIwZpeeD3sbGH0aa3KZKs5zCIHJiR DKnvELzlIhmdZ9aCI/aYXqbd3JOA4Ghkgz4TnZ1poshRi+ZOjdLu7rCHpQdtWDOvyJ2r tu20otz9rFJUsWJBKUWoigDeHyqtx5QU9VhinL4AKgcmoa3XAos4AhVVKpf+yN5q7vPW cFMhO054YcLqIIvbfnLVhr7pe201Q/Gqr/VnXzPGPg/NIUxpw1BCC7B2aar2Ix9RA8dw PXOSNqFnW5osVwnvW7IKHhbBMfrVg5uiMiLO6tMEb3tBvNwicQyY1XtdGhC7qnMhZRu1 zzYQ== X-Forwarded-Encrypted: i=1; AJvYcCWI6mKH0WojwxSZF5vCqVC8FbYLnFbOyerOdyLx3eplKliVBz7DOW1Wc2bY4M6NghSYyUNpTqC2/A==@kvack.org X-Gm-Message-State: AOJu0Yw5dteqktINXjRGKhTlR/MYL10ke2lTNG61H/kGdti4KPOvemrB weKPeexihno9Ljd40tyDLr571BDDKKedWo+Zj2yq6uIPYRBdoxUzJBhK X-Gm-Gg: ASbGnctuebQXKSLKKYdwUedL2juqVinzjWplkjBWrWj0dBg4ui/b1c/QiV3+shsJiyd EW8VGTrZYRvkORSixgOnslyTlQOuEbS6iDb6B6TR6VvO1yCgYvhyx7UkgBOwEfw6YALzjLOALgp WWQDx71mI1YpqnciFLzKg2ZfUVSckcez+A9mWMbAprwyBFxnZJXEg7X3VLBSXjTFDD0RL5dBY9F KEesGGojd7cGPTmPEfTSrdjIRnZF7TLYNEdXR64UtAIZ4MEc2EIQdPyD2uActoAbp+Y/V4lRNuV bgGXvebD8nKDH1zpGrWE5rxNCpmQVJ09/CRNbgcrF8V5r3CslII4Fxb83Q9n2C6TlGWhCSYtQUo lYcapLYFAaQDVJO/ZerkA2S36eweYq+7gHEIxfHO3vZ/3/IRdyx2mtHYz2eK8VUU= X-Google-Smtp-Source: AGHT+IHj3m1slrOYKA98Ye1v9q7Jap88kFAbrM6JW9r3QccET9QUKTfsYMIkLaJZ5F8pwxecx//m7w== X-Received: by 2002:a05:6000:400e:b0:3a1:f564:cd9d with SMTP id ffacd0b85a97d-3a5319b4642mr10525747f8f.36.1749471146782; Mon, 09 Jun 2025 05:12:26 -0700 (PDT) Received: from ?IPV6:2a02:6b6f:e750:f900:146f:2c4f:d96e:4241? ([2a02:6b6f:e750:f900:146f:2c4f:d96e:4241]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3a53229e009sm9712606f8f.16.2025.06.09.05.12.26 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 09 Jun 2025 05:12:26 -0700 (PDT) Message-ID: Date: Mon, 9 Jun 2025 13:12:25 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC] mm: khugepaged: use largest enabled hugepage order for min_free_kbytes From: Usama Arif To: Lorenzo Stoakes , ziy@nvidia.com Cc: Andrew Morton , david@redhat.com, linux-mm@kvack.org, hannes@cmpxchg.org, shakeel.butt@linux.dev, riel@surriel.com, baolin.wang@linux.alibaba.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, hughd@google.com, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com References: <20250606143700.3256414-1-usamaarif642@gmail.com> <063273aa-a852-492a-93da-ba5229f544ca@lucifer.local> <8200fd8b-edae-44ab-be47-7dfccab25a24@gmail.com> Content-Language: en-US In-Reply-To: <8200fd8b-edae-44ab-be47-7dfccab25a24@gmail.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 4F2DD80004 X-Stat-Signature: p9yn81pnszx7np7931qb79swxngjp5qg X-Rspam-User: X-HE-Tag: 1749471148-79645 X-HE-Meta: U2FsdGVkX18S8L5ExkOgD7IMD94Fo0s1VHBrXS+WCwuRQsYUkDL1/rKMej3CZ2zWYFkEKa9iRU00kzPtqQ/0+iQl91pInCga4QAdasYBWRN1qRS0PIN3FRHyhdaz3NyOm8+fEuJOTLVS0fbWwULzcJE20TE0TRX5lcTK7PHTqNT80FDwp/uYGgGxSITjnFKSFwzSfUpwdI1vTEL5MURLtMSSd3TBDUMD72ywGJfXb3ia+RH+4qWiusdPnI0nrQv7/zzoD0v0Ywzkp4s8jwk+mSQLN8hNYoRRTqoAMAzf0tGvCiwRl6GwE6gVtAKygMIdzNLRI2B8rUW0cwjZ9nA9NVsvYihSHMz8VnXL5FcbuEQE29Qo8uN96r3uAACYDEYp9gBQazguvIiZbMSW7P+9nQxZdSBJWNkMTeDwp2A7QML7um/atYFFlPwPGH/ZUoIDbyIo4fLSn3Ory6o3lYrZNuC02TVNiWLSwMNyzgjRznY1hKrDYLz8FcwZ431iiFV/InFPhDtzlLpOMNXUbaGxnDxTl2kZIc9JQFpji6FCQGLhN355LFOGhjwNZb6f8+QSbkRttPUVZQedr8OJNB0gUlq5gB5cSFYCQNvKXAU7I68CE0K6tV6AJxAdC/opMaxQHg8/RIkMgnwSmafY3/3Ly+LwWMto9UrVyfSH9qxInmTinaFXTVA5zjnKQUiHiXvkc2wN4mxYUGnT9+s3KrzUOBrRYkcXUTL+KZuSlnkzawYl0wYTiFMzhnODwyb7JR6frCk08WhBjvVDWJQZQwsvXrVwPdKrSG7N6YDq/VpKPHEkj9S1cOy6a62GS94ZyahOnzVxEpTYK259CbnlnCHw+NDFk1lur6pXSDHnRWryYhuUXLKs718PEywPSx853Scrpa50j1SnCI5YFrH87werv5Ip1ISDK4HUwWPkRYId7ljLollT6AzWvutXh48if4pIWksFWZ6K/UxxWVrTKMI PgKorGyh gN094VtDfV+GLQ2arIG40xAajU6koAnjfm1S/cGHpxpntrjMFP1XYOEsPltphrlKZ1OzVOklYYzdX3hGKQsnSsycua4/P3ZbgDaoCc8b7J3+5N6V07f9qlPwxn3jhswB9QAc47zYlDxHHoM1PtdXCqyVA4TnK0AxGeAzIiLEBz10sT86GqoSPet5Jp1zmmpdCXx4qAdVq06QB5yFu9TVqOVEUR/pTfUovdSRswwk7vLHmzjdUp/erJmJKJdmO14U64XP1GdB/PKIUPVJuB8I32egyhJ5KDmKrMy8CmLDIw1L1/tIpfAgq9iUUxh//CBDChXcuumGf/S5E29zrqS76GBeSiEXsLdga07QRfr2GjA8FE1x8zFAPq8AC0EXKAMyygOpVf3iL9Lv4W3L+V9h0eIOUadlZOxKYLw6zX70lmHYe4sUe1OZH8FiezNpTts9CJWo44TvRtK/1k7zwTnThD2z/ZphCxE32Sv0en6q36qBbnPjb0oAJ4xc/8g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: > I dont like it either :) > Pressed "Ctrl+enter" instead of "enter" by mistake which sent the email prematurely :) Adding replies to the rest of the comments in this email. As I mentioned in reply to David now in [1], pageblock_nr_pages is not really 1 << PAGE_BLOCK_ORDER but is 1 << min(HPAGE_PMD_ORDER, PAGE_BLOCK_ORDER) when THP is enabled. It needs a better name, but I think the right approach is just to change pageblock_order as recommended in [2] [1] https://lore.kernel.org/all/4adf1f8b-781d-4ab0-b82e-49795ad712cb@gmail.com/ [2] https://lore.kernel.org/all/c600a6c0-aa59-4896-9e0d-3649a32d1771@gmail.com/ > >>> +{ >>> + return (1UL << min(thp_highest_allowable_order(), PAGE_BLOCK_ORDER)); >>> +} >>> + >>> static void set_recommended_min_free_kbytes(void) >>> { >>> struct zone *zone; >>> @@ -2638,12 +2658,16 @@ static void set_recommended_min_free_kbytes(void) >> >> You provide a 'patchlet' in >> https://lore.kernel.org/all/a179fd65-dc3f-4769-9916-3033497188ba@gmail.com/ >> >> That also does: >> >> /* Ensure 2 pageblocks are free to assist fragmentation avoidance */ >> - recommended_min = pageblock_nr_pages * nr_zones * 2; >> + recommended_min = min_thp_pageblock_nr_pages() * nr_zones * 2; >> >> So comment here - this comment is now incorrect, this isn't 2 page blocks, >> it's 2 of 'sub-pageblock size as if page blocks were dynamically altered by >> always/madvise THP size'. >> >> Again, this whole thing strikes me as we're doing things at the wrong level >> of abstraction. >> >> And you're definitely now not helping avoid pageblock-sized >> fragmentation. You're accepting that you need less so... why not reduce >> pageblock size? :) >> Yes agreed. >> /* >> * Make sure that on average at least two pageblocks are almost free >> * of another type, one for a migratetype to fall back to and a >> >> ^ remainder of comment >> >>> * second to avoid subsequent fallbacks of other types There are 3 >>> * MIGRATE_TYPES we care about. >>> */ >>> - recommended_min += pageblock_nr_pages * nr_zones * >>> + recommended_min += min_thp_pageblock_nr_pages() * nr_zones * >>> MIGRATE_PCPTYPES * MIGRATE_PCPTYPES; >> >> This just seems wrong now and contradicts the comment - you're setting >> minimum pages based on migrate PCP types that operate at pageblock order >> but without reference to the actual number of page block pages? >> >> So the comment is just wrong now? 'make sure there are at least two >> pageblocks', well this isn't what you're doing is it? So why there are we >> making reference to PCP counts etc.? >> >> This seems like we're essentially just tuning these numbers someswhat >> arbitrarily to reduce them? >> >>> >>> - /* don't ever allow to reserve more than 5% of the lowmem */ >>> - recommended_min = min(recommended_min, >>> - (unsigned long) nr_free_buffer_pages() / 20); >>> + /* >>> + * Don't ever allow to reserve more than 5% of the lowmem. >>> + * Use a min of 128 pages when all THP orders are set to never. >> >> Why? Did you just choose this number out of the blue? Mentioned this in the previous comment. >> >> Previously, on x86-64 with thp -> never on everything a pageblock order-9 >> wouldn't this be a much higher value? >> >> I mean just putting '128' here is not acceptable. It needs to be justified >> (even if empirically with data to back it) and defined as a named thing. >> >> >>> + */ >>> + recommended_min = clamp(recommended_min, 128, >>> + (unsigned long) nr_free_buffer_pages() / 20); >>> + >>> recommended_min <<= (PAGE_SHIFT-10); >>> >>> if (recommended_min > min_free_kbytes) { >>> diff --git a/mm/shmem.c b/mm/shmem.c >>> index 0c5fb4ffa03a..8e92678d1175 100644 >>> --- a/mm/shmem.c >>> +++ b/mm/shmem.c >>> @@ -136,10 +136,10 @@ struct shmem_options { >>> }; >>> >>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE >>> -static unsigned long huge_shmem_orders_always __read_mostly; >>> -static unsigned long huge_shmem_orders_madvise __read_mostly; >>> -static unsigned long huge_shmem_orders_inherit __read_mostly; >>> -static unsigned long huge_shmem_orders_within_size __read_mostly; >>> +unsigned long huge_shmem_orders_always __read_mostly; >>> +unsigned long huge_shmem_orders_madvise __read_mostly; >>> +unsigned long huge_shmem_orders_inherit __read_mostly; >>> +unsigned long huge_shmem_orders_within_size __read_mostly; >> >> Again, we really shouldn't need to do this. Agreed, for the RFC, I just did it similar to the anon ones when I got the build error trying to use these, but yeah a much better approach would be to just have a function in shmem that would return the largest shmem thp allowable order. >> >>> static bool shmem_orders_configured __initdata; >>> #endif >>> >>> @@ -516,25 +516,6 @@ static bool shmem_confirm_swap(struct address_space *mapping, >>> return xa_load(&mapping->i_pages, index) == swp_to_radix_entry(swap); >>> } >>> >>> -/* >>> - * Definitions for "huge tmpfs": tmpfs mounted with the huge= option >>> - * >>> - * SHMEM_HUGE_NEVER: >>> - * disables huge pages for the mount; >>> - * SHMEM_HUGE_ALWAYS: >>> - * enables huge pages for the mount; >>> - * SHMEM_HUGE_WITHIN_SIZE: >>> - * only allocate huge pages if the page will be fully within i_size, >>> - * also respect madvise() hints; >>> - * SHMEM_HUGE_ADVISE: >>> - * only allocate huge pages if requested with madvise(); >>> - */ >>> - >>> -#define SHMEM_HUGE_NEVER 0 >>> -#define SHMEM_HUGE_ALWAYS 1 >>> -#define SHMEM_HUGE_WITHIN_SIZE 2 >>> -#define SHMEM_HUGE_ADVISE 3 >>> - >> >> Again we really shouldn't need to do this, just provide some function from >> shmem that gives you what you need. >> >>> /* >>> * Special values. >>> * Only can be set via /sys/kernel/mm/transparent_hugepage/shmem_enabled: >>> @@ -551,7 +532,7 @@ static bool shmem_confirm_swap(struct address_space *mapping, >>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE >>> /* ifdef here to avoid bloating shmem.o when not necessary */ >>> >>> -static int shmem_huge __read_mostly = SHMEM_HUGE_NEVER; >>> +int shmem_huge __read_mostly = SHMEM_HUGE_NEVER; >> >> Same comment. >> >>> static int tmpfs_huge __read_mostly = SHMEM_HUGE_NEVER; >>> >>> /** >>> -- >>> 2.47.1 >>> >