From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 980E2E77173 for ; Fri, 6 Dec 2024 05:09:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 09B726B018F; Fri, 6 Dec 2024 00:09:59 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 04C976B0190; Fri, 6 Dec 2024 00:09:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E2E116B0191; Fri, 6 Dec 2024 00:09:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id C4DB26B018F for ; Fri, 6 Dec 2024 00:09:58 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id D5A4E416FA for ; Fri, 6 Dec 2024 05:09:56 +0000 (UTC) X-FDA: 82863356730.12.0DD6B12 Received: from mail-ua1-f47.google.com (mail-ua1-f47.google.com [209.85.222.47]) by imf09.hostedemail.com (Postfix) with ESMTP id C6C3D140002 for ; Fri, 6 Dec 2024 05:09:43 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=BuaepsxR; spf=pass (imf09.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.47 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733461784; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=F8xIv+n+nDGR2NlbcHUc54eH2vCVwy0Vf5TmzPNyKhM=; b=J+OahdDPi2TryTfRVJYGy8rWBnhZJkiiRG6TFX7k80zaXaH1ejw/yXzGN1LcpQvZ+5+cB9 9PLwHciY0iJxFvNm9eKdi9tgVp1Svw2OvCIS9Qxp3FRUEM7l2jpq5+7GNvyJS2l8WuwJnP xd/YFHDW7sWCiKXYUX1lc0CHYiDVH+c= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733461784; a=rsa-sha256; cv=none; b=325As1JBH/PoBfoAjliu5CVoBGTc8Qenyt5epNFoKGBuHP0dwcW9VFvp2DQ/8LPZxKVhle A0QcgvticeC5GnRApgSxpDs9Bszug3kM31qx9D2r0gGxLEEL5IpXGbbnQTFgcMQTGL1fIQ tYtS1fF3CieQy1KK7qh77MASkb+7UW8= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=BuaepsxR; spf=pass (imf09.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.47 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-ua1-f47.google.com with SMTP id a1e0cc1a2514c-85ba9c6eecdso355755241.3 for ; Thu, 05 Dec 2024 21:09:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1733461794; x=1734066594; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=F8xIv+n+nDGR2NlbcHUc54eH2vCVwy0Vf5TmzPNyKhM=; b=BuaepsxRvkvt52gpawxC5TQ8hBva+SR6IQaA93QgT8ZdWeiDq3e7dK3QKxxjwoO4e/ ZDB0KAKuSJA9mupLLo7AyQNpYImMNqbtJEbSfK52LGvqCHChqI/QUTnP4Tvjm5WnQ69I HlJyt0Kex/rsKZpIL9Xx5IMaHe1CdLG1FN0py93dK6L8TA9tX2PhU1bT/qDPUIz2hrm9 RzqkT0Abw/y6FHYKdUTkemVCq29YmDn1g9BsztaYk5shYIMqK5Bs71fQE/69kntY87uj sK4dxXlbbZf6gZ36x9TXWg3RN/xNR27HbpCgSnd2u+y88s2PVZOWvj5DDdwm9iOiyCbE 2GjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733461794; x=1734066594; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=F8xIv+n+nDGR2NlbcHUc54eH2vCVwy0Vf5TmzPNyKhM=; b=jsDIa5qTlFMZ6EOjPMJIod3OnUySQkCSk09WvAMV7hu8RhC9+kW9/yU98AiycspUjN +xdBLvZivNCk9K1UzpfmscLVIQginxhpG3DBSY9lCECVqq5W/gJifQPyO51a38ulmviH yJJPYsnUOxHYepd+KXrMJdlbEezMxCXLoZKr/EvHNlpllDVW7sE5vMjotJP8A9zx0y21 e/8X0SDqg8C7D+YTMNVVc1ms3IK5/FXfrcm8kKl3Ip2gu9KciYP+hZNNAVRdzV0KFm40 9eLWHzpF6Xp/CKLnBvAD8HuHXLWjc25pPV+lH/IMxspk192vpvTKJP0t7nq2jGwwcmpr C69g== X-Forwarded-Encrypted: i=1; AJvYcCUxVtPd1AuUJGqrWBLxDV1bJh55pW2XLnWR5x7qQGFfJrQ7Y7QA+CUd6O2KANUMpVYFxz0w7T9XUg==@kvack.org X-Gm-Message-State: AOJu0Yyni7lWmWRix/1jkL6lX7OXI0dC1c7jIUcFxa25jFHhO2MoMxe+ SUmvYybA3lkySKxoQUBKDnS7anKPx6qbmvFTwCVov0HKZOR/j2YeLrTy6p2S8MSrbOFpRQunm/L iT/juXy1jnaA9ASm/SnDwVwAZ1+o= X-Gm-Gg: ASbGncvvPq7PkYGdkb5l9uZdM6/3pQ0nExn0mwA9PAWzZVHirkktM6MuDUG+MmMG7TG OmlgFcnU85OAyUAka+wAFXbSTPkZItwZFSCAZq0A+d/5KekKH5GFLJRUeq58zC9R+hQ== X-Google-Smtp-Source: AGHT+IEdXmm3BCnqc5bm40aVNvGcUI1BO0166flqOho2p5LFZEnRt8SQR4Ca8ljT6jPmy5mghkIGggGXvt0ueKBmS+Q= X-Received: by 2002:a05:6102:3a13:b0:4af:a967:65c5 with SMTP id ada2fe7eead31-4afcaa28a8cmr2507868137.10.1733461794020; Thu, 05 Dec 2024 21:09:54 -0800 (PST) MIME-Version: 1.0 References: <20240717071257.4141363-1-ryan.roberts@arm.com> <480f34d0-a943-40da-9c69-2353fe311cf7@arm.com> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Fri, 6 Dec 2024 18:09:43 +1300 Message-ID: Subject: Re: [RFC PATCH v1 0/4] Control folio sizes used for page cache memory To: Ryan Roberts Cc: Andrew Morton , Hugh Dickins , Jonathan Corbet , "Matthew Wilcox (Oracle)" , David Hildenbrand , Lance Yang , Baolin Wang , Gavin Shan , Pankaj Raghav , Daniel Gomez , linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: C6C3D140002 X-Stat-Signature: zqiitkmh9h9qp5hwxwa6ptfxpbsh95pt X-Rspam-User: X-HE-Tag: 1733461783-659127 X-HE-Meta: U2FsdGVkX1/qiRDet+YsIVvGB5f/I/YpEwyXEjjJc2iJ0IVdIDCvopwfoj7Fb9embwaYcYL53MztwUomALNaCS+DnM2Xfzq3mJMWRUQ8e9VLX3KqsFsKySA1PnASOdeHUxL2VxyehLsqUUloG1EXwDnzryO85iPtS5HRVSCsUe90DPcwofaZkfmRuvO0Wa9iiXXF2ycq/tQyzEAaGapvvtzUjvNKJpRJ5QO4c45fTtOGTswJPyEPcNYe2H4F2seBC3xKY9KEWwk7+WqewLN6ILWrGEuVRucPgS5NZMheHWZ7uIx0BMAYFcSiMwtvkHKksZsKbV3p/ire98o4fGCKVginxXAJ5NK3/fVttAvPOEFR55qaQ8JZiu5vuRViCKRba2dw3fU7M+NkdtSQU+S01tPZKnTtOFs/y0Ll+6zXXJIYF0Kpj+EB6wARtHK85UOmFjYWkjuEcj1LG6tqCTpUt3mgeD2HZGqrn/1cTUIb426W9B57UvLM71SB5k5DztL3EovfX90akEMAmH9zN8xyuees72lWrHzfnCbN2G2mjeEX0blZjzH2XE2laWE9KD4jf4erfniHMFU84cyGwJd/clGn7elDPXaIBD7/dGLOdBdoTpN5xUxgt6aZfLw4unL9zWxtulpi41FYkBsp5x6LbTdwCv6zCD4Kho4Qn1lwNy4R48GRl8fbhGcr5ql6sMqI2X9xaeHxB2EqumlMrautgUJq9limzanv1NpdgIWyWKbRpbMDA3BlUCEVijddihRBWpj7Pc12g97x7q5cUdNVenUvgvoiIy2R9ZtbMYb/PEuNuwvRZgEO2IR6MX9jBAy+dE4Wicog2jz+igAwgvjUfrCyTza9/WQ1A9EfBV/n4OSGAoOGZaaMFhvgPOtFfXDl2zT9tM5sxMoR5KoKGyqDepleIOIHzEUvRJwjmibaHElkN1EnAkvwX3OVe+hJuU+ECzkWb19aOUO1g9zvZ1n Hm6431Fh WZRMfRscmEnCeNQOt4g6tjDHioEbweP9lKh6v2ga1Yy5B4Lx+tNc7oz7C4+q8pzNXxLoow5100uxaCsdvvkSFBwhCHKgGoTYf4o9CE9KbhsoIzSuh7UfhXfO1dUSyy2IM8quvuK+UeR624BQrX1Jc8QuG8loMtErfClFnhjdjzZYO7dZm2dDkcMH1RX6WyJnBAjxFiyl+uhVzOEjqZvuF4X2CJAiGBJyCoX/7dP+DF0hK5Y4u404zEzCTHW9nTGqhlhcq21Jmtw2Sm0yYAbDnQ+FlwzWScjCFwMUwb0mkzEIMpxyYDdR+uPkqaHnaZuoqC21FBItExOH3njhh9UOUG+koFrq/lHNIA1HxhR1QVWp2Xhd2swqV99NqV6jEScTn6tvhDdAn/De680OYJCr6Dvf8zTuJ9artv0njIA7keoxWbj0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: It's unusual that many emails sent days ago are resurfacing on LKML. Please ignore them. By the way, does anyone know what happened? On Fri, Dec 6, 2024 at 5:12=E2=80=AFAM Barry Song wrote= : > > On Thu, Aug 8, 2024 at 10:27=E2=80=AFPM Ryan Roberts wrote: > > > > On 17/07/2024 08:12, Ryan Roberts wrote: > > > Hi All, > > > > > > This series is an RFC that adds sysfs and kernel cmdline controls to = configure > > > the set of allowed large folio sizes that can be used when allocating > > > file-memory for the page cache. As part of the control mechanism, it = provides > > > for a special-case "preferred folio size for executable mappings" mar= ker. > > > > > > I'm trying to solve 2 separate problems with this series: > > > > > > 1. Reduce pressure in iTLB and improve performance on arm64: This is = a modified > > > approach for the change at [1]. Instead of hardcoding the preferred e= xecutable > > > folio size into the arch, user space can now select it. This decouple= s the arch > > > code and also makes the mechanism more generic; it can be bypassed (t= he default) > > > or any folio size can be set. For my use case, 64K is preferred, but = I've also > > > heard from Willy of a use case where putting all text into 2M PMD-siz= ed folios > > > is preferred. This approach avoids the need for synchonous MADV_COLLA= PSE (and > > > therefore faulting in all text ahead of time) to achieve that. > > > > Just a polite bump on this; I'd really like to get something like this = merged to > > help reduce iTLB pressure. We had a discussion at the THP Cabal meeting= a few > > weeks back without solid conclusion. I haven't heard any concrete objec= tions > > yet, but also only a luke-warm reception. How can I move this forwards? > > Hi Ryan, > > These requirements seem to apply to anon, swap, pagecache, and shmem to > some extent. While the swapin_enabled knob was rejected, the shmem_enable= d > option is already in place. > > I wonder if it's possible to use the existing 'enabled' setting across > all cases, as > from an architectural perspective with cont-pte, pagecache may not differ= from > anon. The demand for reducing page faults, LRU overhead, etc., also seems > quite similar. > > I imagine that once Android's file systems support mTHP, we=E2=80=99ll un= iformly enable > 64KB for anon, swap, shmem, and page cache. It should then be sufficient = to > enable all of them using a single knob: > '/sys/kernel/mm/transparent_hugepage/hugepages-xxkB/enabled'. > > Is there anything that makes pagecache and shmem significantly different > from anon? In my Android case, they all seem the same. However, I assume > there might be other use cases where differentiating them is necessary? > > > > > Thanks, > > Ryan > > > > > > > > > > 2. Reduce memory fragmentation in systems under high memory pressure = (e.g. > > > Android): The theory goes that if all folios are 64K, then failure to= allocate a > > > 64K folio should become unlikely. But if the page cache is allocating= lots of > > > different orders, with most allocations having an order below 64K (as= is the > > > case today) then ability to allocate 64K folios diminishes. By provid= ing control > > > over the allowed set of folio sizes, we can tune to avoid crucial 64K= folio > > > allocation failure. Additionally I've heard (second hand) of the need= to disable > > > large folios in the page cache entirely due to latency concerns in so= me > > > settings. These controls allow all of this without kernel changes. > > > > > > The value of (1) is clear and the performance improvements are docume= nted in > > > patch 2. I don't yet have any data demonstrating the theory for (2) s= ince I > > > can't reproduce the setup that Barry had at [2]. But my view is that = by adding > > > these controls we will enable the community to explore further, in th= e same way > > > that the anon mTHP controls helped harden the understanding for anony= mous > > > memory. > > > > > > --- > > > This series depends on the "mTHP allocation stats for file-backed mem= ory" series > > > at [3], which itself applies on top of yesterday's mm-unstable (650b6= 752c8a3). All > > > mm selftests have been run; no regressions were observed. > > > > > > [1] https://lore.kernel.org/linux-mm/20240215154059.2863126-1-ryan.ro= berts@arm.com/ > > > [2] https://www.youtube.com/watch?v=3Dht7eGWqwmNs&list=3DPLbzoR-pLrL6= oj1rVTXLnV7cOuetvjKn9q&index=3D4 > > > [3] https://lore.kernel.org/linux-mm/20240716135907.4047689-1-ryan.ro= berts@arm.com/ > > > > > > Thanks, > > > Ryan > > > > > > Ryan Roberts (4): > > > mm: mTHP user controls to configure pagecache large folio sizes > > > mm: Introduce "always+exec" for mTHP file_enabled control > > > mm: Override mTHP "enabled" defaults at kernel cmdline > > > mm: Override mTHP "file_enabled" defaults at kernel cmdline > > > > > > .../admin-guide/kernel-parameters.txt | 16 ++ > > > Documentation/admin-guide/mm/transhuge.rst | 66 +++++++- > > > include/linux/huge_mm.h | 61 ++++--- > > > mm/filemap.c | 26 ++- > > > mm/huge_memory.c | 158 ++++++++++++++++= +- > > > mm/readahead.c | 43 ++++- > > > 6 files changed, 329 insertions(+), 41 deletions(-) > > > > > > -- > > > 2.43.0 > > > > > > > Thanks > Barry >