From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1BD42D0E6D3 for ; Mon, 21 Oct 2024 08:54:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7FF406B009F; Mon, 21 Oct 2024 04:54:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7AF736B00A0; Mon, 21 Oct 2024 04:54:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 64F6F6B00A1; Mon, 21 Oct 2024 04:54:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 465B06B009F for ; Mon, 21 Oct 2024 04:54:37 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id E60251A17B9 for ; Mon, 21 Oct 2024 08:54:09 +0000 (UTC) X-FDA: 82696997880.03.F3736C5 Received: from fout-a1-smtp.messagingengine.com (fout-a1-smtp.messagingengine.com [103.168.172.144]) by imf26.hostedemail.com (Postfix) with ESMTP id EA264140008 for ; Mon, 21 Oct 2024 08:54:23 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=shutemov.name header.s=fm1 header.b=DHj68RMi; dkim=pass header.d=messagingengine.com header.s=fm3 header.b="o fKdN0y"; dmarc=none; spf=pass (imf26.hostedemail.com: domain of kirill@shutemov.name designates 103.168.172.144 as permitted sender) smtp.mailfrom=kirill@shutemov.name ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729500764; a=rsa-sha256; cv=none; b=7k39D5KvF+HNGjGV3B2z57HXCOIYvsibZXY4R5+ZsZeBNvV/ITOGgMjnKt0IQtCm4RUfsQ GFW8CSs8O1yAIUnScynP6t4DFUJBRLHvrdt70wPDH/jEEswfYFrwkdKz/GUVK2qybo13Th qY5QXbeSv0SJsqFE5Ot2KBoVA2tMa+8= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=shutemov.name header.s=fm1 header.b=DHj68RMi; dkim=pass header.d=messagingengine.com header.s=fm3 header.b="o fKdN0y"; dmarc=none; spf=pass (imf26.hostedemail.com: domain of kirill@shutemov.name designates 103.168.172.144 as permitted sender) smtp.mailfrom=kirill@shutemov.name ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729500764; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2nFOUP3L/JFeGH6ny3i7Lq6eOxO/RiE0p8DPNMqCbBU=; b=A27z+TMIOEBZXW2ejbgsyFQDuEstAAX5aQvxTTvpqokaSs9K18dDAMABhloQ1iKEIUcVPs pkT1uGJ1OjaPSyFE4fTh3ku6l4+c30S+7GRZ6eTBNg0zzhSoK2p3PSW5mAdf/hq9fJxTP7 R2lg8M6I8RVFDdjUIERE56t+7lZKJNQ= Received: from phl-compute-05.internal (phl-compute-05.phl.internal [10.202.2.45]) by mailfout.phl.internal (Postfix) with ESMTP id 0FA6A1380554; Mon, 21 Oct 2024 04:54:34 -0400 (EDT) Received: from phl-mailfrontend-01 ([10.202.2.162]) by phl-compute-05.internal (MEProxy); Mon, 21 Oct 2024 04:54:34 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-transfer-encoding:content-type:content-type :date:date:from:from:in-reply-to:in-reply-to:message-id :mime-version:references:reply-to:subject:subject:to:to; s=fm1; t=1729500874; x=1729587274; bh=2nFOUP3L/JFeGH6ny3i7Lq6eOxO/RiE0 p8DPNMqCbBU=; b=DHj68RMig054JMkDk1FB4rjGTzLx/xc4CiJtmC8m4HRkV9Vy jD7yfD6r4LnumAIvRk1Z4IKEZk3EIGHqSzRn1oeS027me0BtT0/kaJX0uIaeCeg9 gzU3SZewfDhUnesHIR3Ew26WMGxmullyMrmpzrO7p5/TxwEvpUbCX+/WtvdTXz7I kJI8FQR0Xofykq+Ls8oEP9qOAYCq6jB0UaKy74wTO5izYigAo3RYHLcfNP5FYurP 3ASFapXLgc9ihBtfFe20ZQ1ixVJKinKZGueYMmFdHvL39ECCj9UJOaFzRHXm9DA9 1QqGo6BPQOmBfBkhZECsBJxqRjTZeRcPUadcTA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1729500874; x= 1729587274; bh=2nFOUP3L/JFeGH6ny3i7Lq6eOxO/RiE0p8DPNMqCbBU=; b=o fKdN0ySqaDL+Ot+gs682rq1QvoaJ4OwPAR45qjVuUzNymqgfXS6FTRoGiEzy2G+F 6muDvnpdU3Y9PLp05B7XwEFfZNom0+NEukc3AVx1x4g7QifgpoGFgeEp898mqWti EnevqVeaKxz0JvLuKABPAStu/oO9EQ3YlPzIq7UpyEQ57OVjD7DsRfR1A0fs4yon syURcGOYoFF+hcGFSfUnQChxuhnyBlFroSim5TnqHFKwPRsHtzGifcJHmO1BOgsc qEQ5kziRd58cDvlKk1MWzx2paEaR+a2IPGh/+aPU9cFS+CgIa5itVWEbOHXgVVxj c6rE2Mzk5tWWt15kWFWRA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeftddrvdehledgtdelucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggvpdfu rfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnh htshculddquddttddmnecujfgurhepfffhvfevuffkfhggtggugfgjsehtkefstddttdej necuhfhrohhmpedfmfhirhhilhhlucetrdcuufhhuhhtvghmohhvfdcuoehkihhrihhllh esshhhuhhtvghmohhvrdhnrghmvgeqnecuggftrfgrthhtvghrnhepudeffeetffegteef jeetvdekgeelveeiheeiffeltddtgeeuffevvdehveevheffnecuvehluhhsthgvrhfuih iivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepkhhirhhilhhlsehshhhuthgvmhho vhdrnhgrmhgvpdhnsggprhgtphhtthhopedufedpmhhouggvpehsmhhtphhouhhtpdhrtg hpthhtohepsggrohhlihhnrdifrghngheslhhinhhugidrrghlihgsrggsrgdrtghomhdp rhgtphhtthhopeifihhllhihsehinhhfrhgruggvrggurdhorhhgpdhrtghpthhtoheprg hkphhmsehlihhnuhigqdhfohhunhgurghtihhonhdrohhrghdprhgtphhtthhopehhuhhg hhgusehgohhoghhlvgdrtghomhdprhgtphhtthhopegurghvihgusehrvgguhhgrthdrtg homhdprhgtphhtthhopeifrghnghhkvghfvghnghdrfigrnhhgsehhuhgrfigvihdrtgho mhdprhgtphhtthhopedvudgtnhgsrghosehgmhgrihhlrdgtohhmpdhrtghpthhtoheprh ihrghnrdhrohgsvghrthhssegrrhhmrdgtohhmpdhrtghpthhtohepihhofihorhhkvghr tdesghhmrghilhdrtghomh X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 21 Oct 2024 04:54:28 -0400 (EDT) Date: Mon, 21 Oct 2024 11:54:23 +0300 From: "Kirill A. Shutemov" To: Baolin Wang Cc: Matthew Wilcox , akpm@linux-foundation.org, hughd@google.com, david@redhat.com, wangkefeng.wang@huawei.com, 21cnbao@gmail.com, ryan.roberts@arm.com, ioworker0@gmail.com, da.gomez@samsung.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A . Shutemov" Subject: Re: [RFC PATCH v3 0/4] Support large folios for tmpfs Message-ID: References: <6dohx7zna7x6hxzo4cwnwarep3a7rohx4qxubds3uujfb7gp3c@2xaubczl2n6d> <8e48cf24-83e1-486e-b89c-41edb7eeff3e@linux.alibaba.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <8e48cf24-83e1-486e-b89c-41edb7eeff3e@linux.alibaba.com> X-Rspamd-Queue-Id: EA264140008 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: t91fdt3nqyeocfh4azrcwp793gk8oktu X-HE-Tag: 1729500863-216191 X-HE-Meta: U2FsdGVkX19CojfRuljuyXfSpHw0XD4Yt6TEisIzYLhLZrzyymo2cFewZ7PpGDs2bF7WhILDDmzl2tCo5tOaXMuflIru6fTNwNfjHwGtn2CO8c98VxTu/siojDWrCi6Pd08DkvgtMnfpIabXjB4NI02GE0cBBgHRiK8/wocA/UXOTDEVW0KFwr+cwKlNEZqSusWfFqvIsd4G4eiB5rhmAoBxqrIDQU3ydCo6T6gH7bzzjsdyG3WWsh+9JG7fLex5dZ2Yfen4S4FpwePpf36Kg/d63LyDMtv/G4GSvtaNWzm4yZmfoLDlFnwEEa0uVfLrB/sA0vSA8h2KsA2cRBCvy9WSAZMerH9uE2I2P1cylWsW1t3ePwGfeBGqpMfFM5DZ/vyi2e7IZrlt3nxVE3HpUTf/hIRfpUNSZVEpPHk59doAQvNHgJWHzCSCyoxrgBHlRtjVsQ4BVyCRHH9Fq5HyPrXfX9JLn7GuENZczYAx1MZL8+oRK8Y+iSeRwEmN/SfGj/TCJWMtWptU+V2vJ7KxnhIDAikjoacUGXx92SaRuIpVw78xMFYDkbuuX1tEMFQhEmROWOyUsjPCbIPPSifH3fhbSYGqHlGiG4pn+ZSkeo0Kgw8m/6z+gysKZqQn8FZ88t9HQjHenLocBLRryLDag6lNLjgV9S1g+P1Va+puGCtFOeJuHJxFbXoiR5BODegEiTmjv+JyCkpY6FNZnLyUuMcvcgqJRNwpIs3ANBnbNQPIMYTH05WWJPwfphmrvZJmxgI5m/lWcJESe/ft4eLY5rpeG0P4DlhVAvq0hmOYa9213vcaCJqaaDwn2aiKBxdGZrhKrer08u6PG4RzftPjt1YaZxmV7dTtDndzLha1IduhyGjnu8iRR1/C/0/Tu1Y62l0X0+hwo8pnPSpmCckUxW7j8OFTS/kRUerVykYqwxX2t96sBoYXVY2rbnT84OGP1PwS8sc5ESAJBPJ2d6V +HR9ljpB gLU0gUVsseZclf59NMzbOM02TxM49kBERSGP7MsLgl1+gBoepoOyVuwQLLR9XRnHl3OeyaPy5WaXb/iQHrWBXPPtuoayXSzPSFvVQHoSEbK7k/O+nQqCw66E093BTkriMKk7GEESEiuC/YYB2/xoVvbC2DcaLj94WwZlvsY+tdP5QONgdEsDA6hOrfrYeA7mRC6b1xq0I8Mxoyk9829LzzMVdf9x4d+/zykaohl/WcahJB/zLwOTE9kHzfOMHUhW5m2YGmsPfD0+S2Q5Nn3jp7qjTc2J4HFOfqO63Pv1hSogNdlqeqntwSszf+O7hLb3HANKVoYX0tFebpTvVK9bcnmPny4LJFU4D7oIMTa9DLpgBvLu566XjTV3/XsoP5gX8A5QrLydaAOM+szU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Oct 21, 2024 at 02:24:18PM +0800, Baolin Wang wrote: > > > On 2024/10/17 19:26, Kirill A. Shutemov wrote: > > On Thu, Oct 17, 2024 at 05:34:15PM +0800, Baolin Wang wrote: > > > + Kirill > > > > > > On 2024/10/16 22:06, Matthew Wilcox wrote: > > > > On Thu, Oct 10, 2024 at 05:58:10PM +0800, Baolin Wang wrote: > > > > > Considering that tmpfs already has the 'huge=' option to control the THP > > > > > allocation, it is necessary to maintain compatibility with the 'huge=' > > > > > option, as well as considering the 'deny' and 'force' option controlled > > > > > by '/sys/kernel/mm/transparent_hugepage/shmem_enabled'. > > > > > > > > No, it's not. No other filesystem honours these settings. tmpfs would > > > > not have had these settings if it were written today. It should simply > > > > ignore them, the way that NFS ignores the "intr" mount option now that > > > > we have a better solution to the original problem. > > > > > > > > To reiterate my position: > > > > > > > > - When using tmpfs as a filesystem, it should behave like other > > > > filesystems. > > > > - When using tmpfs to implement MAP_ANONYMOUS | MAP_SHARED, it should > > > > behave like anonymous memory. > > > > > > I do agree with your point to some extent, but the ‘huge=’ option has > > > existed for nearly 8 years, and the huge orders based on write size may not > > > achieve the performance of PMD-sized THP in some scenarios, such as when the > > > write length is consistently 4K. So, I am still concerned that ignoring the > > > 'huge' option could lead to compatibility issues. > > > > Yeah, I don't think we are there yet to ignore the mount option. > > OK. > > > Maybe we need to get a new generic interface to request the semantics > > tmpfs has with huge= on per-inode level on any fs. Like a set of FADV_* > > handles to make kernel allocate PMD-size folio on any allocation or on > > allocations within i_size. I think this behaviour is useful beyond tmpfs. > > > > Then huge= implementation for tmpfs can be re-defined to set these > > per-inode FADV_ flags by default. This way we can keep tmpfs compatible > > with current deployments and less special comparing to rest of > > filesystems on kernel side. > > I did a quick search, and I didn't find any other fs that require PMD-sized > huge pages, so I am not sure if FADV_* is useful for filesystems other than > tmpfs. Please correct me if I missed something. What do you mean by "require"? THPs are always opportunistic. IIUC, we don't have a way to hint kernel to use huge pages for a file on read from backing storage. Readahead is not always the right way. > > If huge= is not set, tmpfs would behave the same way as the rest of > > filesystems. > > So if 'huge=' is not set, tmpfs write()/fallocate() can still allocate large > folios based on the write size? If yes, that means it will change the > default huge behavior for tmpfs. Because previously having 'huge=' is not > set means the huge option is 'SHMEM_HUGE_NEVER', which is similar to what I > mentioned: > "Another possible choice is to make the huge pages allocation based on write > size as the *default* behavior for tmpfs, ..." I am more worried about breaking existing users of huge pages. So changing behaviour of users who don't specify huge is okay to me. -- Kiryl Shutsemau / Kirill A. Shutemov