From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A5D1FEB64DC for ; Mon, 17 Jul 2023 17:08:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 20A556B0072; Mon, 17 Jul 2023 13:08:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1BA906B0074; Mon, 17 Jul 2023 13:08:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 083288D0001; Mon, 17 Jul 2023 13:08:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id EAF876B0072 for ; Mon, 17 Jul 2023 13:08:37 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 971B2A04CF for ; Mon, 17 Jul 2023 17:08:37 +0000 (UTC) X-FDA: 81021737874.05.D722684 Received: from mail-qt1-f169.google.com (mail-qt1-f169.google.com [209.85.160.169]) by imf23.hostedemail.com (Postfix) with ESMTP id A57C6140028 for ; Mon, 17 Jul 2023 17:08:35 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=RKUz0mp7; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf23.hostedemail.com: domain of yuzhao@google.com designates 209.85.160.169 as permitted sender) smtp.mailfrom=yuzhao@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689613715; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=LTmM0HfzwvB/Vdx9NPFotbq1yDAXwvaKOMLW0DYioWk=; b=dmzdnEBdNQcutKr5EZxHVBvz3itqDAF7BZuRW0Qr02zUkC30yOR4w/bgZfRcHvOUkwj3fL vulsAB0detYfq0+0OPu67CmAmPbVi2zwe6dIezqk1FSzOJSihg7k0rYr08pGf+cGAxIvxV yN+C/3ev7NeGiaK3C0L37nZulGXxbLc= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=RKUz0mp7; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf23.hostedemail.com: domain of yuzhao@google.com designates 209.85.160.169 as permitted sender) smtp.mailfrom=yuzhao@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689613715; a=rsa-sha256; cv=none; b=gcGMCuSUV+xVeTgJ5Ca/GNZOtP7Vjj9qJxtNT/t2sixVZ4HHFBl8RWr+J+FMzWHOLKF21u mFVzhKueGBjIjhOU1SSJQvoLwg6qo+lSV8U0wYUT9gBzeBTLKAwK1GVS8F0m8Fn/XQtcmA zNxphedAhtggjmtuNyljYnnXBN0U+9U= Received: by mail-qt1-f169.google.com with SMTP id d75a77b69052e-40371070eb7so10261cf.1 for ; Mon, 17 Jul 2023 10:08:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689613714; x=1692205714; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=LTmM0HfzwvB/Vdx9NPFotbq1yDAXwvaKOMLW0DYioWk=; b=RKUz0mp7peFUWJaRfVmMeTPUwJiwWg2RRd6xfybhZi0aoV3uCuL5V9ayqhg+aCktIa /evz4tT9RgsyK9/uiAQAZ8nKzDwzBsBEYRRU0dIJMxbYnw3j6HUec9UF/zL8OzeKcwHa yFVONWk6iL3/VO2e1PHzJNAYAwhcwP7c1mxxZH9dg8xKmBxbF6jlCZYXGH9p/oZbTM/8 1cwtMYiDqKzG6BYu+iPfDeezrwKSjobrJfs60KkC0DP35iNLDYJDjLrwmZJjDjHHXiCM xgYNzPfPxyxxYQLNjABUXyqQgRngivn2l8q3K7EsJwrtvJ+fYPPczJgrEsF2M9Ww8v5c pm9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689613714; x=1692205714; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LTmM0HfzwvB/Vdx9NPFotbq1yDAXwvaKOMLW0DYioWk=; b=BeE2vuTXxxP7mJwq2hySIQ7VzLUcN54gdC3qCE1Q/c55yUMRpaDG25i579o3i0iSvj maSmxBCX3GymM72lws86VoinwCaQVp9mz0nNxPG75hNlrYfNX1ux3u/gXQRBc4XxjiZB iJuvIHNwVvP7mkaS4r2oeRaCztGuM7zJH1B3KFXsMEGss22AAsVSJoDaGcX6c8ylWfnI o+thx2K4tB325C6tuhWf5yXD/cCcU1BEQ2DU05niMw89AbLiZLRsaesDaioYv63JkUcB DnTDxRL5i+kztJl2ESIETOj+qPyqwkTkj4A6gbiUnJZG3R1NokOGPisD81TCtHUI+t9m qENg== X-Gm-Message-State: ABy/qLa/6zxxF5tJsWXh3UwDlsJYG8e5JSZxqh1+jKMBFOaiim3aKrj6 WLsGz71M5jVAI81aKLMxTLxzk1WLnQ0e3CUu2uCT8Q== X-Google-Smtp-Source: APBJJlEn61y9q4oc3cU7MwZEoRh5a/dJxfP/5I+e8iyg+Ggc7cLgubAm/MLcxlfkG6JbwoDESNcLo+mFPBYlARCt4ik= X-Received: by 2002:a05:622a:1009:b0:403:b3ab:393e with SMTP id d9-20020a05622a100900b00403b3ab393emr1352364qte.18.1689613714249; Mon, 17 Jul 2023 10:08:34 -0700 (PDT) MIME-Version: 1.0 References: <20230714160407.4142030-1-ryan.roberts@arm.com> <20230714161733.4144503-3-ryan.roberts@arm.com> <82c934af-a777-3437-8d87-ff453ad94bfd@redhat.com> In-Reply-To: <82c934af-a777-3437-8d87-ff453ad94bfd@redhat.com> From: Yu Zhao Date: Mon, 17 Jul 2023 11:07:57 -0600 Message-ID: Subject: Re: [PATCH v3 3/4] mm: FLEXIBLE_THP for improved performance To: David Hildenbrand Cc: Ryan Roberts , Andrew Morton , Matthew Wilcox , "Kirill A. Shutemov" , Yin Fengwei , Catalin Marinas , Will Deacon , Anshuman Khandual , Yang Shi , "Huang, Ying" , Zi Yan , Luis Chamberlain , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: A57C6140028 X-Stat-Signature: mrstwfysp78dce53tjeef55e6j9smos9 X-HE-Tag: 1689613715-360348 X-HE-Meta: U2FsdGVkX1+CpONKeohV+7AqzQ3yCMUT5ITfOuNbvVG4wz3F/4u+a1TdFfe4rSGrV0RiKple2qQz/8p8fxlbKFkabNx78fZzkQmWIaXNh7qHOrNrdLlFxNwDeklnkCtwBtLITm2XQwOiRFjvxOurpB+PE+iLUGPHhThOCuLXU4M/oG1OHp90fr1KpgVZV10nbo2h8xClcfCTxyNKeI+2fUD4WziykxPrOuC8yOoU116oo+8/Zh+fncM/JVF6FXzJEsQjfuqms3yCbofJPhYRaeTmAlv1k9D2/QN2s0f8Lgh983NTs9hjxOQdh2qShJELLre/oRnbBTX4hs+Fz3XlNDbsohx1C7PqixvCceSntdtcP5x3g2RqSKumt63pxELBXDyLMedvR8+YaqE6IIDMgYl+0jZ+gQeojJTkStQQ8lwQi9baHMLjEjY6KWgkRsrqmMLIE1KmQOzjzi5Xcf7L5priCCJcEv2JgRlp5xe49T8ST/OXL0veY7aHteIsb5YT1EiML06zK5JTZ0/B02D8Zdl4G33LFwKQfNpDfiOVDNHiYYs15xuAfpbQe9IUhSHuDcQKUoYZf7HurqT0E9SURIZSnH+rsx7S7fBKJJzQobTKFuslz7yHt72wu7eEtvJElQmnT5bma6OjnXxekXMxhYGexPdrC0IHNAOEyRgt5TQsFk7b4LXTjujfJcfsrjgcN2GRqt2old/504VNEsZoiqIIQTwaZRHXPuBxffTKxVP3x9WTRdqkoC4NFwW7FAkHnX7VSPRVNvwxDpSTsCgH8Yu/SWpGmv31XE1EjKReqk/WbfYgzDKoePsJz5FarExtkp9IU2vcQvSb7xwJDAJ70a423VD0hN3ZBKS5sEPXuwmIk3j81WD0qmQxk4HZSmET4Qqpcm+ZmK1g6S2JG3SJ7sKiR/FzphkCdyTULKyL6MhTe4Fhg5uWKiEu9YHr9a9X4aXOfXNQEC2LMsRzqrH mQqYDFzU qm7af8WDELhBgN5YbxCgdgvtHjtEU0ddJnA0UxoaPc0jA/4iNFq1TCi9Nr/u8Il/5Xp6rmJm5nSzBJxYz8H8K5Nw5zcfypC3gk/lvy2s3HcAYNvbzD/xSsnpL8fsfRPa+5llZpK8w9rmaXETb7WAa+x7i5smUVNZwRwZIM4zOhPQj2Whq6mixT89x2r7bcrqwssyPoNnvinMaSGfrURZYcqwQJifpU4wXBCyXzPFeTplx8xD3YgCWEzMDeHEGh/zA+0ff7jkHKUPb+t7dUzbQ9MZJh6aQOLExVgCh+GgkOPUFYs4dnbjVRNxza/WVTqdsLLsMqwpLPzcoly1oCwGbxryor9Z2iWbPHuXUH7PtdV69qWA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jul 17, 2023 at 7:06=E2=80=AFAM David Hildenbrand wrote: > > On 14.07.23 19:17, Yu Zhao wrote: > > On Fri, Jul 14, 2023 at 10:17=E2=80=AFAM Ryan Roberts wrote: > >> > >> Introduce FLEXIBLE_THP feature, which allows anonymous memory to be > >> allocated in large folios of a determined order. All pages of the larg= e > >> folio are pte-mapped during the same page fault, significantly reducin= g > >> the number of page faults. The number of per-page operations (e.g. ref > >> counting, rmap management lru list management) are also significantly > >> reduced since those ops now become per-folio. > >> > >> The new behaviour is hidden behind the new FLEXIBLE_THP Kconfig, which > >> defaults to disabled for now; The long term aim is for this to defaut = to > >> enabled, but there are some risks around internal fragmentation that > >> need to be better understood first. > >> > >> When enabled, the folio order is determined as such: For a vma, proces= s > >> or system that has explicitly disabled THP, we continue to allocate > >> order-0. THP is most likely disabled to avoid any possible internal > >> fragmentation so we honour that request. > >> > >> Otherwise, the return value of arch_wants_pte_order() is used. For vma= s > >> that have not explicitly opted-in to use transparent hugepages (e.g. > >> where thp=3Dmadvise and the vma does not have MADV_HUGEPAGE), then > >> arch_wants_pte_order() is limited by the new cmdline parameter, > >> `flexthp_unhinted_max`. This allows for a performance boost without > >> requiring any explicit opt-in from the workload while allowing the > >> sysadmin to tune between performance and internal fragmentation. > >> > >> arch_wants_pte_order() can be overridden by the architecture if desire= d. > >> Some architectures (e.g. arm64) can coalsece TLB entries if a contiguo= us > >> set of ptes map physically contigious, naturally aligned memory, so th= is > >> mechanism allows the architecture to optimize as required. > >> > >> If the preferred order can't be used (e.g. because the folio would > >> breach the bounds of the vma, or because ptes in the region are alread= y > >> mapped) then we fall back to a suitable lower order; first > >> PAGE_ALLOC_COSTLY_ORDER, then order-0. > >> > >> Signed-off-by: Ryan Roberts > >> --- > >> .../admin-guide/kernel-parameters.txt | 10 + > >> mm/Kconfig | 10 + > >> mm/memory.c | 187 ++++++++++++++++= -- > >> 3 files changed, 190 insertions(+), 17 deletions(-) > >> > >> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documen= tation/admin-guide/kernel-parameters.txt > >> index a1457995fd41..405d624e2191 100644 > >> --- a/Documentation/admin-guide/kernel-parameters.txt > >> +++ b/Documentation/admin-guide/kernel-parameters.txt > >> @@ -1497,6 +1497,16 @@ > >> See Documentation/admin-guide/sysctl/net.rst = for > >> fb_tunnels_only_for_init_ns > >> > >> + flexthp_unhinted_max=3D > >> + [KNL] Requires CONFIG_FLEXIBLE_THP enabled. Th= e maximum > >> + folio size that will be allocated for an anony= mous vma > >> + that has neither explicitly opted in nor out o= f using > >> + transparent hugepages. The size must be a powe= r-of-2 in > >> + the range [PAGE_SIZE, PMD_SIZE). A larger size= improves > >> + performance by reducing page faults, while a s= maller > >> + size reduces internal fragmentation. Default: = max(64K, > >> + PAGE_SIZE). Format: size[KMG]. > >> + > > > > Let's split this parameter into a separate patch. > > > > Just a general comment after stumbling over patch #2, let's not start > splitting patches into things that don't make any sense on their own; > that just makes review a lot harder. Sorry to hear this -- but there are also non-subjective reasons we split patches this way. Initially we had minimum to no common ground, so we had to divide and conquer by smallest steps. if you look at previous discussions: there was a disagreement on patch 2 in v2 -- that's the patch you asked to be squashed into the main patch 3. Fortunately we've resolved that. If that disagreement had persisted, we would leave patch 2 out rather than let it bog down patch 3, which would work indifferently for all arches except arm and could be merged separately. > For this case here, I'd suggest first adding the general infrastructure > and then adding tunables we want to have on top. > > I agree that toggling that at runtime (for example via sysfs as raised > by me previously) would be nicer.