From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF4F1D3C53F for ; Mon, 21 Oct 2024 06:24:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 548896B007B; Mon, 21 Oct 2024 02:24:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4F8036B0082; Mon, 21 Oct 2024 02:24:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3E6A76B0083; Mon, 21 Oct 2024 02:24:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 2160F6B007B for ; Mon, 21 Oct 2024 02:24:30 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 497AF40213 for ; Mon, 21 Oct 2024 06:24:21 +0000 (UTC) X-FDA: 82696619964.16.AAC275C Received: from out30-98.freemail.mail.aliyun.com (out30-98.freemail.mail.aliyun.com [115.124.30.98]) by imf23.hostedemail.com (Postfix) with ESMTP id BC18514000A for ; Mon, 21 Oct 2024 06:24:17 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=FK+B3arm; spf=pass (imf23.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.98 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729491792; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9WBl8yacujzHX99xcGir0o4nf+sIFejUzFXCXxbuoNg=; b=Obtf648VayGoYdhJpLW6V4ERghqyfop+3gBe7fC/6i+4izXvNaCgI0pzgiKvre6iR2Kji9 K29GQOn0kpC8NejKcINbTbiXAKj+cTYKX+ncbDpZmCYzMa4vhEuphXnYC6FpqZfiNBUx3R IqhHTj1eZ7Ll/YQhKsu8dg0zolnZmX4= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=FK+B3arm; spf=pass (imf23.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.98 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729491792; a=rsa-sha256; cv=none; b=c7CwLNI48ELLaLImaQS4G01KAsRcsy3wSTrziAR6I2jF0mzUw5aEMZslz5sp4GzDsHR9WL LVIcJHK6v9qVx59SZCdxn/Z8kpyLfZOFfVZZzNkc8QszSY2FREHkhT/MhztS+gnS9igVhb A3YonjUGRc+c0OYKRAn1BZ+kL3wfdNQ= DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1729491862; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=9WBl8yacujzHX99xcGir0o4nf+sIFejUzFXCXxbuoNg=; b=FK+B3armmKcS1Q9KVEMhZv/btoODwJhhhyKbeVD0h/LVgv1D2U9+8BD6S6vYJYuYSvuW9OtRMBXNHZYIYXbKpXT0UUrjfnTRgZNVCLfTR1UhHnT824bHsSS/TObOnJd2Kz+QAkm+9B1DAy1gMZuB48Ky46MLBqgqH+nf1Hj2Eaw= Received: from 30.74.144.131(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WHWxOcW_1729491859 cluster:ay36) by smtp.aliyun-inc.com; Mon, 21 Oct 2024 14:24:20 +0800 Message-ID: <8e48cf24-83e1-486e-b89c-41edb7eeff3e@linux.alibaba.com> Date: Mon, 21 Oct 2024 14:24:18 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH v3 0/4] Support large folios for tmpfs To: "Kirill A. Shutemov" Cc: Matthew Wilcox , akpm@linux-foundation.org, hughd@google.com, david@redhat.com, wangkefeng.wang@huawei.com, 21cnbao@gmail.com, ryan.roberts@arm.com, ioworker0@gmail.com, da.gomez@samsung.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A . Shutemov" References: <6dohx7zna7x6hxzo4cwnwarep3a7rohx4qxubds3uujfb7gp3c@2xaubczl2n6d> From: Baolin Wang In-Reply-To: <6dohx7zna7x6hxzo4cwnwarep3a7rohx4qxubds3uujfb7gp3c@2xaubczl2n6d> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: BC18514000A X-Stat-Signature: gard99u7sayjkq4fk1iqr8menj4iwbdi X-HE-Tag: 1729491857-804146 X-HE-Meta: U2FsdGVkX1++qDfHcCv+Q7gfnZWcPYzoVnueliDcBK0bCTUgqzQlDR7bytte09PFJzh9g8R/9EyAPu/vNfEZ8eeKp4FdyuvLCySgG6nzF0BiTdoNFtm1kvXoFy628u53CTzml26MWxw+DH1Cks4QNmq2wl4g+Rdx74iQeDIWVcHKX0QDwMPkDUFRJrFC3i5QtAUUNyyyz8WnJLVFyn03o9qlTku1T+3/yxIEa1fOJgKvcrVYxVjC0ngIh9Ox+Ogk+brzHLn3kJNMwVt/AQTzRC0DGeoJn4v34FQKsSiDG/PSey7y8BprZ1OW2d4E91QboT4UX4dvPdFKy7INm+4kDFErkjurWioT0ZcwxfI+Vus/XM/XbkBkZQcvXq89h4h6EhuTr8lv0rvhtM90mxTSPBOHOXwaoLWdJUz6PTMcP6jbMZWNzR5DXPQz+wUqRkhilt/hBRFqEJxpfaIqCvcgWkJwwZTdV542I46T/AFmRGV8nFwgCCu/oFJf3Xl04dLEj+IEt+UzNcFuR5tHZC5BgaIqKkEb/3fdoOriEtRBny3SCR9Lq1ZklIJbSjJ9Lj6vgtJuhpKDMNXwna5z8LfN7zbnF59qFFpfpzFW0lZZqeKeaMhv/b8GS973ImWmSHLB3LhVCZJRC9HcQVFlfGjaRve6fWWmbG6f+x3NF8TziP38Jhs6/75Ep+VczQMvTIbSsehN9Yi5laEtAylmjWd70vYgAO7WPBfQBJbtq92g3PuQ4RnYNF6UuFg7InALhon36Bkm8I/nTU0UB6D0sbBzcS8qy9lsMyRuEFplYo3sXivlgRE149+P7Zr+J49EZ63qEFyaYjyIaGykJvuRMkp9q3I+2GP62TeuZG838zG/r23VUBJLKrsY3SVtlyYnsZPxCfIOWNyJLRgLL1xnoCaikfwQBsWgSpKHpiJFn5h7/QTmblY5TObo3CZQs6/wsa3e3T+72/NVuI2/bI78NTb 0CZaiMUJ YaBcrGxjhNeH9qckIZnMz7zLLAGmFHC7xhuZesAvUzsKxnsISZxrTESecaoHXnZWbY3og6oDxxn2vsyKPPQFsq2KScKjyjEKcNq/KnYsOt6eDzq2D+rYnaIGtp1lfkaUh0cfVWyLVDPzj9eJp5whkaOMRbNj6R+5Nq7ocLbng1iJKavvcQG/WjJReqCTw6MrWr8EIpVYqZJiMdZQ7XxQXdpIIh8AYzeBmaXXEEbyD2W0t8c8qBGfS4+JLAmgc7vy8jJKZUgxzKDc6HeXL4dhO/LjvDQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/10/17 19:26, Kirill A. Shutemov wrote: > On Thu, Oct 17, 2024 at 05:34:15PM +0800, Baolin Wang wrote: >> + Kirill >> >> On 2024/10/16 22:06, Matthew Wilcox wrote: >>> On Thu, Oct 10, 2024 at 05:58:10PM +0800, Baolin Wang wrote: >>>> Considering that tmpfs already has the 'huge=' option to control the THP >>>> allocation, it is necessary to maintain compatibility with the 'huge=' >>>> option, as well as considering the 'deny' and 'force' option controlled >>>> by '/sys/kernel/mm/transparent_hugepage/shmem_enabled'. >>> >>> No, it's not. No other filesystem honours these settings. tmpfs would >>> not have had these settings if it were written today. It should simply >>> ignore them, the way that NFS ignores the "intr" mount option now that >>> we have a better solution to the original problem. >>> >>> To reiterate my position: >>> >>> - When using tmpfs as a filesystem, it should behave like other >>> filesystems. >>> - When using tmpfs to implement MAP_ANONYMOUS | MAP_SHARED, it should >>> behave like anonymous memory. >> >> I do agree with your point to some extent, but the ‘huge=’ option has >> existed for nearly 8 years, and the huge orders based on write size may not >> achieve the performance of PMD-sized THP in some scenarios, such as when the >> write length is consistently 4K. So, I am still concerned that ignoring the >> 'huge' option could lead to compatibility issues. > > Yeah, I don't think we are there yet to ignore the mount option. OK. > Maybe we need to get a new generic interface to request the semantics > tmpfs has with huge= on per-inode level on any fs. Like a set of FADV_* > handles to make kernel allocate PMD-size folio on any allocation or on > allocations within i_size. I think this behaviour is useful beyond tmpfs. > > Then huge= implementation for tmpfs can be re-defined to set these > per-inode FADV_ flags by default. This way we can keep tmpfs compatible > with current deployments and less special comparing to rest of > filesystems on kernel side. I did a quick search, and I didn't find any other fs that require PMD-sized huge pages, so I am not sure if FADV_* is useful for filesystems other than tmpfs. Please correct me if I missed something. > If huge= is not set, tmpfs would behave the same way as the rest of > filesystems. So if 'huge=' is not set, tmpfs write()/fallocate() can still allocate large folios based on the write size? If yes, that means it will change the default huge behavior for tmpfs. Because previously having 'huge=' is not set means the huge option is 'SHMEM_HUGE_NEVER', which is similar to what I mentioned: "Another possible choice is to make the huge pages allocation based on write size as the *default* behavior for tmpfs, ..."