From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D43F6D2E02E for ; Wed, 23 Oct 2024 09:27:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 561626B007B; Wed, 23 Oct 2024 05:27:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4EA546B0082; Wed, 23 Oct 2024 05:27:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 33BFA6B0096; Wed, 23 Oct 2024 05:27:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 0C9F96B007B for ; Wed, 23 Oct 2024 05:27:20 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 729CEABB83 for ; Wed, 23 Oct 2024 09:26:45 +0000 (UTC) X-FDA: 82704338430.14.AD95989 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf12.hostedemail.com (Postfix) with ESMTP id 7DC9040010 for ; Wed, 23 Oct 2024 09:27:09 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Io9QIvHk; spf=pass (imf12.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729675435; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hp7NN43drKxOcUqs3YKHkX22H+SxnLe9eJ1yDYlWCYg=; b=Ui+rqXip58pdAtxdxFU/U84CKKdnHdXL/XHenD/Ut/EJLxbFXSq98Itf0qiSCaQnU9LAsn jdwFPenZcpb+PpFT0ZuNPZn/DwitxCYfOazZOhRC6C2NCKJ/bpp85znT1uyxmWHYoVTSVl oVew4amWpEV3s5KVhBoEIQhexexMUek= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Io9QIvHk; spf=pass (imf12.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729675435; a=rsa-sha256; cv=none; b=3cGt7jpzYrGcuYTLTJIDxbo91KOKFtBTEkQcMeXmFih2cbq7BSj68ejAZjHN7a/zoLG6s1 qvNUuIbv2Weex94NTStaz0lgpjHdg2i9iHscNFp9mXdyFdvcBlSpJyJRpaK7AerVp2ydNv Z6T+ETqdJCf9CXqgQxKpBLTyufCcZXM= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1729675636; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=hp7NN43drKxOcUqs3YKHkX22H+SxnLe9eJ1yDYlWCYg=; b=Io9QIvHkCNdtUF7Cu36SrNJw9oamA6hCUEqNKZ8MVHQ++sKUHBmXMPpKKxrXKl/cldTWAd NyNKEzzq4otLi6LlMXYTEtSb2NnvsghL14ukKYOpB80AgqHOaoyifJEULah/bOfAQHK5kV IT4+qJ9emtFZK1VgwUX6ENMB4c0OnDk= Received: from mail-lf1-f72.google.com (mail-lf1-f72.google.com [209.85.167.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-592-JSVDd2sgOqmWZxkRAvKo0A-1; Wed, 23 Oct 2024 05:27:14 -0400 X-MC-Unique: JSVDd2sgOqmWZxkRAvKo0A-1 Received: by mail-lf1-f72.google.com with SMTP id 2adb3069b0e04-539ec1a590fso5230753e87.0 for ; Wed, 23 Oct 2024 02:27:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729675633; x=1730280433; h=content-transfer-encoding:in-reply-to:organization:autocrypt :content-language:from:references:cc:to:subject:user-agent :mime-version:date:message-id:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=hp7NN43drKxOcUqs3YKHkX22H+SxnLe9eJ1yDYlWCYg=; b=psG0hv3OF6H/gy4MPtN4XDRAPsRFPYDnSE1qUsOxO+hsS9r/XRPnKlUgITVqC10h6Y RltYND1+Rk4YNWfZYpQlUcYeT8nmkL7ap8mY4w1MZAe3HFnZCUImpGW0JSDtKaVWAwnQ tH9nlZuKCwJ6epLR50QKCdGjjwSMyNwi7if+MAar2xkpx8RtyLticKZtIcio7+KaJR6q 88CikZxt++FE2Hsu7QRmDZOVh+wGohQOnb1/0yH/zFOv17RB7ux18SywhWp6HNwpctWT px8PWWwiYoWCzZ8DR2vk4jPHlGxC/VHSUHwr+a8GUNhDDPBZUmqdkkbwI0Bb3YI2+Tn6 JY1w== X-Forwarded-Encrypted: i=1; AJvYcCXr1t3Ox79mnGRqzPnHglgMAsaVCS47ZA1Csl0hwwrRHQ9rgy/Hbl+tzNLlOz5DrjJM5bQTWCF/RA==@kvack.org X-Gm-Message-State: AOJu0YwwHszuhiTonfo0zBNjRU3iegxHVxxnTXvRhaT4JH2tlkx1zFwB KuLH+TcZzkuBRODtE+hq7mhZnELpXs0FQfZ4EdLijdZFk7XhxlydztscArOuNz024B+GAtd4X31 XagBaHX3Z7S2HOaEOOXVflgWRFm9x/IJzJXiXDO2PCfF58q2T X-Received: by 2002:a05:6512:6c8:b0:539:f696:777c with SMTP id 2adb3069b0e04-53b1a354760mr895888e87.29.1729675632556; Wed, 23 Oct 2024 02:27:12 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHmoTrugrJphaUs5POzDA5rMtIJRFKg66Kun8XTJWOLCMIfb6u0howTpzCf5owuoIKYBz66fQ== X-Received: by 2002:a05:6512:6c8:b0:539:f696:777c with SMTP id 2adb3069b0e04-53b1a354760mr895869e87.29.1729675632040; Wed, 23 Oct 2024 02:27:12 -0700 (PDT) Received: from ?IPV6:2003:cb:c70c:cd00:c139:924e:3595:3b5? (p200300cbc70ccd00c139924e359503b5.dip0.t-ipconnect.de. [2003:cb:c70c:cd00:c139:924e:3595:3b5]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43186bd69ebsm11057965e9.1.2024.10.23.02.27.10 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 23 Oct 2024 02:27:11 -0700 (PDT) Message-ID: <1b0f9f94-06a6-48ac-a68e-848bce1008e9@redhat.com> Date: Wed, 23 Oct 2024 11:27:10 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH v3 0/4] Support large folios for tmpfs To: Baolin Wang , Daniel Gomez , "Kirill A. Shutemov" Cc: Matthew Wilcox , akpm@linux-foundation.org, hughd@google.com, wangkefeng.wang@huawei.com, 21cnbao@gmail.com, ryan.roberts@arm.com, ioworker0@gmail.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A . Shutemov" References: <6dohx7zna7x6hxzo4cwnwarep3a7rohx4qxubds3uujfb7gp3c@2xaubczl2n6d> <8e48cf24-83e1-486e-b89c-41edb7eeff3e@linux.alibaba.com> <486a72c6-5877-4a95-a587-2a32faa8785d@redhat.com> <7eb412d1-f90e-4363-8c7b-072f1124f8a6@linux.alibaba.com> From: David Hildenbrand Autocrypt: addr=david@redhat.com; keydata= xsFNBFXLn5EBEAC+zYvAFJxCBY9Tr1xZgcESmxVNI/0ffzE/ZQOiHJl6mGkmA1R7/uUpiCjJ dBrn+lhhOYjjNefFQou6478faXE6o2AhmebqT4KiQoUQFV4R7y1KMEKoSyy8hQaK1umALTdL QZLQMzNE74ap+GDK0wnacPQFpcG1AE9RMq3aeErY5tujekBS32jfC/7AnH7I0v1v1TbbK3Gp XNeiN4QroO+5qaSr0ID2sz5jtBLRb15RMre27E1ImpaIv2Jw8NJgW0k/D1RyKCwaTsgRdwuK Kx/Y91XuSBdz0uOyU/S8kM1+ag0wvsGlpBVxRR/xw/E8M7TEwuCZQArqqTCmkG6HGcXFT0V9 PXFNNgV5jXMQRwU0O/ztJIQqsE5LsUomE//bLwzj9IVsaQpKDqW6TAPjcdBDPLHvriq7kGjt WhVhdl0qEYB8lkBEU7V2Yb+SYhmhpDrti9Fq1EsmhiHSkxJcGREoMK/63r9WLZYI3+4W2rAc UucZa4OT27U5ZISjNg3Ev0rxU5UH2/pT4wJCfxwocmqaRr6UYmrtZmND89X0KigoFD/XSeVv jwBRNjPAubK9/k5NoRrYqztM9W6sJqrH8+UWZ1Idd/DdmogJh0gNC0+N42Za9yBRURfIdKSb B3JfpUqcWwE7vUaYrHG1nw54pLUoPG6sAA7Mehl3nd4pZUALHwARAQABzSREYXZpZCBIaWxk ZW5icmFuZCA8ZGF2aWRAcmVkaGF0LmNvbT7CwZgEEwEIAEICGwMGCwkIBwMCBhUIAgkKCwQW AgMBAh4BAheAAhkBFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAl8Ox4kFCRKpKXgACgkQTd4Q 9wD/g1oHcA//a6Tj7SBNjFNM1iNhWUo1lxAja0lpSodSnB2g4FCZ4R61SBR4l/psBL73xktp rDHrx4aSpwkRP6Epu6mLvhlfjmkRG4OynJ5HG1gfv7RJJfnUdUM1z5kdS8JBrOhMJS2c/gPf wv1TGRq2XdMPnfY2o0CxRqpcLkx4vBODvJGl2mQyJF/gPepdDfcT8/PY9BJ7FL6Hrq1gnAo4 3Iv9qV0JiT2wmZciNyYQhmA1V6dyTRiQ4YAc31zOo2IM+xisPzeSHgw3ONY/XhYvfZ9r7W1l pNQdc2G+o4Di9NPFHQQhDw3YTRR1opJaTlRDzxYxzU6ZnUUBghxt9cwUWTpfCktkMZiPSDGd KgQBjnweV2jw9UOTxjb4LXqDjmSNkjDdQUOU69jGMUXgihvo4zhYcMX8F5gWdRtMR7DzW/YE BgVcyxNkMIXoY1aYj6npHYiNQesQlqjU6azjbH70/SXKM5tNRplgW8TNprMDuntdvV9wNkFs 9TyM02V5aWxFfI42+aivc4KEw69SE9KXwC7FSf5wXzuTot97N9Phj/Z3+jx443jo2NR34XgF 89cct7wJMjOF7bBefo0fPPZQuIma0Zym71cP61OP/i11ahNye6HGKfxGCOcs5wW9kRQEk8P9 M/k2wt3mt/fCQnuP/mWutNPt95w9wSsUyATLmtNrwccz63XOwU0EVcufkQEQAOfX3n0g0fZz Bgm/S2zF/kxQKCEKP8ID+Vz8sy2GpDvveBq4H2Y34XWsT1zLJdvqPI4af4ZSMxuerWjXbVWb T6d4odQIG0fKx4F8NccDqbgHeZRNajXeeJ3R7gAzvWvQNLz4piHrO/B4tf8svmRBL0ZB5P5A 2uhdwLU3NZuK22zpNn4is87BPWF8HhY0L5fafgDMOqnf4guJVJPYNPhUFzXUbPqOKOkL8ojk CXxkOFHAbjstSK5Ca3fKquY3rdX3DNo+EL7FvAiw1mUtS+5GeYE+RMnDCsVFm/C7kY8c2d0G NWkB9pJM5+mnIoFNxy7YBcldYATVeOHoY4LyaUWNnAvFYWp08dHWfZo9WCiJMuTfgtH9tc75 7QanMVdPt6fDK8UUXIBLQ2TWr/sQKE9xtFuEmoQGlE1l6bGaDnnMLcYu+Asp3kDT0w4zYGsx 5r6XQVRH4+5N6eHZiaeYtFOujp5n+pjBaQK7wUUjDilPQ5QMzIuCL4YjVoylWiBNknvQWBXS lQCWmavOT9sttGQXdPCC5ynI+1ymZC1ORZKANLnRAb0NH/UCzcsstw2TAkFnMEbo9Zu9w7Kv AxBQXWeXhJI9XQssfrf4Gusdqx8nPEpfOqCtbbwJMATbHyqLt7/oz/5deGuwxgb65pWIzufa N7eop7uh+6bezi+rugUI+w6DABEBAAHCwXwEGAEIACYCGwwWIQQb2cqtc1xMOkYN/MpN3hD3 AP+DWgUCXw7HsgUJEqkpoQAKCRBN3hD3AP+DWrrpD/4qS3dyVRxDcDHIlmguXjC1Q5tZTwNB boaBTPHSy/Nksu0eY7x6HfQJ3xajVH32Ms6t1trDQmPx2iP5+7iDsb7OKAb5eOS8h+BEBDeq 3ecsQDv0fFJOA9ag5O3LLNk+3x3q7e0uo06XMaY7UHS341ozXUUI7wC7iKfoUTv03iO9El5f XpNMx/YrIMduZ2+nd9Di7o5+KIwlb2mAB9sTNHdMrXesX8eBL6T9b+MZJk+mZuPxKNVfEQMQ a5SxUEADIPQTPNvBewdeI80yeOCrN+Zzwy/Mrx9EPeu59Y5vSJOx/z6OUImD/GhX7Xvkt3kq Er5KTrJz3++B6SH9pum9PuoE/k+nntJkNMmQpR4MCBaV/J9gIOPGodDKnjdng+mXliF3Ptu6 3oxc2RCyGzTlxyMwuc2U5Q7KtUNTdDe8T0uE+9b8BLMVQDDfJjqY0VVqSUwImzTDLX9S4g/8 kC4HRcclk8hpyhY2jKGluZO0awwTIMgVEzmTyBphDg/Gx7dZU1Xf8HFuE+UZ5UDHDTnwgv7E th6RC9+WrhDNspZ9fJjKWRbveQgUFCpe1sa77LAw+XFrKmBHXp9ZVIe90RMe2tRL06BGiRZr jPrnvUsUUsjRoRNJjKKA/REq+sAnhkNPPZ/NNMjaZ5b8Tovi8C0tmxiCHaQYqj7G2rgnT0kt WNyWQQ== Organization: Red Hat In-Reply-To: <7eb412d1-f90e-4363-8c7b-072f1124f8a6@linux.alibaba.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 7DC9040010 X-Stat-Signature: bmxwbm43ixjehbz7bcfuqwu68finp4g1 X-Rspam-User: X-HE-Tag: 1729675629-487794 X-HE-Meta: U2FsdGVkX1/wPL03+KIynXNnmtqE8H+PgHbJK/ihlvTFyvzaDzqJ81bBGWrEmnDbBFrHBRNST8E1pcZkHLZuNVWv43F0Cu/YLuJro0uSlR9Y2qrLpeidemFuEJpEYoYELQb6VmSrcaH45fp88Ivmt/3K3liLF6ZK5moMnaMfMi428ifA8NIAbX+A0bfnqfQa9UZ7aAvObdbykTPKfSLDlrJKifNPrSrnr+7MwYuBt+NrAe0r5lqAZDRONF/qzXHb9dqRqthhpaOtO4khAYumZ+cJ0eDDvUxfvXR1zpLHDFblcjAolp0x24bOPLXJM3jwykNz1rffsELaGIVRk2gcbneq47Qy4hS3j4+tN4x+APpMMMQ9n6aNhJ3rwjWMpVoiXX1fB9FVx8CTkV2AfhgEKgghxkHWUkAx4GixKoGJ85gvqNCiQ8XGIT1uz5HyXAW4TpMLU62nl981idC6DAEKpPYFV7BDkAXFz5mwhWVpTrn05MKrgZ4eOy6FCU7lzE9ti3C5I18QeYAf4XTY9ZPBP4FAtvfPJt+FNkz4XFPCFpYJv0QYZusm+ROHKD7ogJidSZ2txywc3R1jmC3NC9/An3+cQvLhxIuKHBmWYP84dGI4gSx3fA0hXMZVSsbMp0R8kop1Rl6rw+Hdz5lIY7xLCmC5h8RBuBZVaCEWTarej/YwxF125JRF5QcBu5sj9kk66+S6wxngE6bnOJ2L7wkAD+uizqH+bqkC/xt5G1olIpIXUMDUaSwCjT1ZIrbIDfyqVdrbrIEyjQGoXC3pOh6ZeYPdIq9TgvGsHJSsLxORpy4ub+tg69ZB/bsBdCmj0oy/iqiDjjRil+etBJqhu1UkGqHNl5LOpC/e4W27Ff4+hH3DNUna92ZdtYeK8E8SrjaCduqajnTPTIfx7HDVSEVWP0VSYvrjwjmsPLkKPpbFKzponDTd+e2lfgbp2hp3bp+DGgplLAlMLiAd9n2WRCH QYk61I8E p1E0DO/zsxvhhtWSTljBlCNfMjlY/K/geOa3i8Ds03K9Yxm0meLUQZvhtdgBKW+u7dB8lmQnaON8VW9OXu/we5fACJSFJQfXkjXDzs7nFNX0mWnmtw2LPcu6PXaKPBA1n4GgQj7F2SWQ1YopqHEANSoNQwNkqNe6xD0LxixE2a3bwy3yvAoMcT24NHoIvT4LoIydqVvXS4eyuYBi7fm5y4uH+WUU0AF7XX1XWDaYlvaW4nvXz4PXBydpVC0eYo0vGlFbFndxXnztTIw2yiz7belMMkZitMYKVMEE8NgQe0DOHUUFI299crTF34ajZLIEJmH6RUpTgl0upVaA04qgGi0R0ms3Xi/KzLD1koTvVkx/Ar8bhobOyeLuPRmdCxekbvUfCy3uHI8YdURa/c6LrUDCZWMANDp47vfG7Jitn8imNbM4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 23.10.24 10:04, Baolin Wang wrote: > > > On 2024/10/22 23:31, David Hildenbrand wrote: >> On 22.10.24 05:41, Baolin Wang wrote: >>> >>> >>> On 2024/10/21 21:34, Daniel Gomez wrote: >>>> On Mon Oct 21, 2024 at 10:54 AM CEST, Kirill A. Shutemov wrote: >>>>> On Mon, Oct 21, 2024 at 02:24:18PM +0800, Baolin Wang wrote: >>>>>> >>>>>> >>>>>> On 2024/10/17 19:26, Kirill A. Shutemov wrote: >>>>>>> On Thu, Oct 17, 2024 at 05:34:15PM +0800, Baolin Wang wrote: >>>>>>>> + Kirill >>>>>>>> >>>>>>>> On 2024/10/16 22:06, Matthew Wilcox wrote: >>>>>>>>> On Thu, Oct 10, 2024 at 05:58:10PM +0800, Baolin Wang wrote: >>>>>>>>>> Considering that tmpfs already has the 'huge=' option to >>>>>>>>>> control the THP >>>>>>>>>> allocation, it is necessary to maintain compatibility with the >>>>>>>>>> 'huge=' >>>>>>>>>> option, as well as considering the 'deny' and 'force' option >>>>>>>>>> controlled >>>>>>>>>> by '/sys/kernel/mm/transparent_hugepage/shmem_enabled'. >>>>>>>>> >>>>>>>>> No, it's not.  No other filesystem honours these settings. >>>>>>>>> tmpfs would >>>>>>>>> not have had these settings if it were written today.  It should >>>>>>>>> simply >>>>>>>>> ignore them, the way that NFS ignores the "intr" mount option >>>>>>>>> now that >>>>>>>>> we have a better solution to the original problem. >>>>>>>>> >>>>>>>>> To reiterate my position: >>>>>>>>> >>>>>>>>>      - When using tmpfs as a filesystem, it should behave like >>>>>>>>> other >>>>>>>>>        filesystems. >>>>>>>>>      - When using tmpfs to implement MAP_ANONYMOUS | MAP_SHARED, >>>>>>>>> it should >>>>>>>>>        behave like anonymous memory. >>>>>>>> >>>>>>>> I do agree with your point to some extent, but the ‘huge=’ option >>>>>>>> has >>>>>>>> existed for nearly 8 years, and the huge orders based on write >>>>>>>> size may not >>>>>>>> achieve the performance of PMD-sized THP in some scenarios, such >>>>>>>> as when the >>>>>>>> write length is consistently 4K. So, I am still concerned that >>>>>>>> ignoring the >>>>>>>> 'huge' option could lead to compatibility issues. >>>>>>> >>>>>>> Yeah, I don't think we are there yet to ignore the mount option. >>>>>> >>>>>> OK. >>>>>> >>>>>>> Maybe we need to get a new generic interface to request the semantics >>>>>>> tmpfs has with huge= on per-inode level on any fs. Like a set of >>>>>>> FADV_* >>>>>>> handles to make kernel allocate PMD-size folio on any allocation >>>>>>> or on >>>>>>> allocations within i_size. I think this behaviour is useful beyond >>>>>>> tmpfs. >>>>>>> >>>>>>> Then huge= implementation for tmpfs can be re-defined to set these >>>>>>> per-inode FADV_ flags by default. This way we can keep tmpfs >>>>>>> compatible >>>>>>> with current deployments and less special comparing to rest of >>>>>>> filesystems on kernel side. >>>>>> >>>>>> I did a quick search, and I didn't find any other fs that require >>>>>> PMD-sized >>>>>> huge pages, so I am not sure if FADV_* is useful for filesystems >>>>>> other than >>>>>> tmpfs. Please correct me if I missed something. >>>>> >>>>> What do you mean by "require"? THPs are always opportunistic. >>>>> >>>>> IIUC, we don't have a way to hint kernel to use huge pages for a >>>>> file on >>>>> read from backing storage. Readahead is not always the right way. >>>>> >>>>>>> If huge= is not set, tmpfs would behave the same way as the rest of >>>>>>> filesystems. >>>>>> >>>>>> So if 'huge=' is not set, tmpfs write()/fallocate() can still >>>>>> allocate large >>>>>> folios based on the write size? If yes, that means it will change the >>>>>> default huge behavior for tmpfs. Because previously having 'huge=' >>>>>> is not >>>>>> set means the huge option is 'SHMEM_HUGE_NEVER', which is similar >>>>>> to what I >>>>>> mentioned: >>>>>> "Another possible choice is to make the huge pages allocation based >>>>>> on write >>>>>> size as the *default* behavior for tmpfs, ..." >>>>> >>>>> I am more worried about breaking existing users of huge pages. So >>>>> changing >>>>> behaviour of users who don't specify huge is okay to me. >>>> >>>> I think moving tmpfs to allocate large folios opportunistically by >>>> default (as it was proposed initially) doesn't necessary conflict with >>>> the default behaviour (huge=never). We just need to clarify that in >>>> the documentation. >>>> >>>> However, and IIRC, one of the requests from Hugh was to have a way to >>>> disable large folios which is something other FS do not have control >>>> of as of today. Ryan sent a proposal to actually control that globally >>>> but I think it didn't move forward. So, what are we missing to go back >>>> to implement large folios in tmpfs in the default case, as any other fs >>>> leveraging large folios? >>> >>> IMHO, as I discussed with Kirill, we still need maintain compatibility >>> with the 'huge=' mount option. This means that if 'huge=never' is set >>> for tmpfs, huge page allocation will still be prohibited (which can >>> address Hugh's request?). However, if 'huge=' is not set, we can >>> allocate large folios based on the write size. >> >> I consider allocating large folios in shmem/tmpfs on the write path less >> controversial than allocating them on the page fault path -- especially >> as long as we stay within the size to-be-written. >> >> I think in RHEL THP on shmem/tmpfs are disabled as default (e.g., >> shmem_enabled=never). Maybe because of some rather undesired >> side-effects (maybe some are historical?): I recall issues with VMs with >> THP+ memory ballooning, as we cannot reclaim pages of folios if >> splitting fails). I assume most of these problematic use cases don't use >> tmpfs as an ordinary file system (write()/read()), but mmap() the whole >> thing. >> >> Sadly, I don't find any information about shmem/tmpfs + THP in the RHEL >> documentation; most documentation is only concerned about anon THP. >> Which makes me conclude that they are not suggested as of now. >> >> I see more issues with allocating them on the page fault path and not >> having a way to disable it -- compared to allocating them on the write() >> path. > > I may not understand your issues. IIUC, you can disable allocating huge > pages on the page fault path by using the 'huge=never' mount option or > setting shmem_enabled=deny. No? That's what I am saying: if there is some way to disable it that will keep working, great. -- Cheers, David / dhildenb