From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5AFE1C2BD09 for ; Tue, 9 Jul 2024 08:28:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9625C6B0096; Tue, 9 Jul 2024 04:28:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 912256B0098; Tue, 9 Jul 2024 04:28:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 800E66B0099; Tue, 9 Jul 2024 04:28:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 619C26B0096 for ; Tue, 9 Jul 2024 04:28:56 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 0EB751417B5 for ; Tue, 9 Jul 2024 08:28:56 +0000 (UTC) X-FDA: 82319538672.05.65342ED Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf28.hostedemail.com (Postfix) with ESMTP id D93F7C000C for ; Tue, 9 Jul 2024 08:28:53 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=none; spf=pass (imf28.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720513718; a=rsa-sha256; cv=none; b=uymQCMV1flyoKwOFQeO7fcnkmr7ITOBlLdIbg8n6GQCr65oZBxayQJLa/kmJrFt3AhYCLw cCR0pTzzag6IJif76046R1alvRkN8N3v0STkbTpIbs4ECtEgM116G8J4xNn5cJiDBJogTy B3PcCMuDAzTmA97EqKYdffb7XVV4EBA= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=none; spf=pass (imf28.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720513718; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gXHkSyqxJDMS60LujapMqr8xpwOXxtSeBSOTj1l2Nm0=; b=yCOg0RbjnTFPJ3vIhkHPOCDykBlsQb+vYkeoofWUaFGAW/v0gHntqXEGn3r7+5cf5E/4cU bmGxs82Nfcm6jUEhb1Q/oqcsIbvS82MDjD/SkD05b4dAd0wNKnMMesntkOeyGKue2VCWFW PsQ+DGKJVv5dF8ek5eQ8OreKIYNc5rM= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 151381042; Tue, 9 Jul 2024 01:29:18 -0700 (PDT) Received: from [10.57.76.194] (unknown [10.57.76.194]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 2F89C3F762; Tue, 9 Jul 2024 01:28:50 -0700 (PDT) Message-ID: Date: Tue, 9 Jul 2024 09:28:48 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v5 0/6] add mTHP support for anonymous shmem Content-Language: en-GB To: Daniel Gomez , David Hildenbrand Cc: Baolin Wang , Matthew Wilcox , "akpm@linux-foundation.org" , "hughd@google.com" , "wangkefeng.wang@huawei.com" , "ying.huang@intel.com" , "21cnbao@gmail.com" <21cnbao@gmail.com>, "shy828301@gmail.com" , "ziy@nvidia.com" , "ioworker0@gmail.com" , Pankaj Raghav , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" References: <27beaa0e-697e-4e30-9ac6-5de22228aec1@redhat.com> <6d4c0191-18a9-4c8f-8814-d4775557383e@redhat.com> <32f04739-0cd0-4a9e-9419-c5a13c333c28@redhat.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Stat-Signature: keyzht78xwh49mjf9rd14tn11i144fyu X-Rspamd-Queue-Id: D93F7C000C X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1720513733-269057 X-HE-Meta: U2FsdGVkX182TmdEjn129xDJpytFxSTB27wgKq1XdOwLqdXz3sds1qxWdEi69nzEeo1YhLA2xVvmDvEo1o2+hWY11HQqZOczEMp/wrW4ZYDeOulEf1ISe0Tgy5FeZEUeKYjMq9TKGzK10oe6KptHGdS7bSqj6bFjxTGJa1zvcDiFo80nzsSsUAlER+v+1IhNg4JMcEdN/Y88Uoy9WIaUACtuHb8EreI84Vvr30ajJB/ibtz3RXEKoHMzdqw0j54hsqGy+vLZqWWfiAq9Y2OlTHSQwivZmkpA9zdzoGlJ/TdGysmrKyFztqMpJcpIXFHODrfqiyLop5ofQWC9TFBWSqF7kYM63MU8mrLBNzNKeCX2bF7T5MP/iVYFyORlbrIgQ/imqoDiO2GocT2m2cG+mDm8+bsByEx8BcLjDoRp6ZyRl1pIkB+2+0qxEUY9LLYAn9bcycTlXiLBrU/s/pAMZk+bYRDlSvxwvmcjCBlNEjY3re0062YeGJjnCGgBkZwzpHoQMBJ0PPpaP0ZKu7nWJvhvrSwUcauFUo2PTdA7IhNCLwy9XGcQB9aZIuB5tyxuXLIN9F+CTkOaJFBVKVBZan4vrkN8khsXN7Atw2qHIrZ5OB5MkvWxWGBgGj0f2YEDEPbyDq+8qCcuKuo0NBDn2KGcHsM3V+zSVIZK8yApaKi7CPNGMVP2Szr7LpFhQtuQ+ieQDkNbzN0oybZ1YLen4NHG8Xb1ayN0SqsohBQMMPFFn8I2NGgTqu2IpOI7DfERNG8hVWHVsDzAF7APNVKchzhXOoocm03Rik967Gdv+sTAJEE1kuqEraljagN+D9PtfdrInOcY3OTV6E5WPM0R37rUSD26SxkuY2YEgejrf+WIj83Hbj+4KHvQfM4sSgCy/CfaylGallTEP0tdbwCatQjBZiasCQSkAAOZ2n4ZFCthDRps2WgQ82JctIOuUHhdfLzZ8yHLm7MEDvvFjmX 6kByHQRV lLS8gk1B7WbPSS/k7uB4hVIBz/QPEOlNEHSWC/pYZZWo7f3h8L1chWMrE7ATG8YycRQpm12BEwmu7pmAfR1CDopj2r2ZGBdSgVNPD5uFlsn4MUraHkGy3R/bjNfBWi+C6opryhAOhMN7n6ohMv6n23rlUYQ9BaXA7yMAJMcRJlo67S5bBS/YBfS6u2t/6CPY7JEPBHwoPPPAkUyQRL1u0XVQGnGSF94u7OgyKrE35AxnPi3CGvX4ODN6HKv4c3p20yqC5out0DKZzD1S0ew92VFw20b80UQrwm4J7C/iy8A0VOYGS3JG/DnHt5ig6SCiFQB/+ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 07/07/2024 17:39, Daniel Gomez wrote: > On Fri, Jul 05, 2024 at 10:59:02AM GMT, David Hildenbrand wrote: >> On 05.07.24 10:45, Ryan Roberts wrote: >>> On 05/07/2024 06:47, Baolin Wang wrote: >>>> >>>> >>>> On 2024/7/5 03:49, Matthew Wilcox wrote: >>>>> On Thu, Jul 04, 2024 at 09:19:10PM +0200, David Hildenbrand wrote: >>>>>> On 04.07.24 21:03, David Hildenbrand wrote: >>>>>>>> shmem has two uses: >>>>>>>> >>>>>>>>     - MAP_ANONYMOUS | MAP_SHARED (this patch set) >>>>>>>>     - tmpfs >>>>>>>> >>>>>>>> For the second use case we don't want controls *at all*, we want the >>>>>>>> same heiristics used for all other filesystems to apply to tmpfs. >>>>>>> >>>>>>> As discussed in the MM meeting, Hugh had a different opinion on that. >>>>>> >>>>>> FWIW, I just recalled that I wrote a quick summary: >>>>>> >>>>>> https://lkml.kernel.org/r/f1783ff0-65bd-4b2b-8952-52b6822a0835@redhat.com >>>>>> >>>>>> I believe the meetings are recorded as well, but never looked at recordings. >>>>> >>>>> That's not what I understood Hugh to mean.  To me, it seemed that Hugh >>>>> was expressing an opinion on using shmem as shmem, not as using it as >>>>> tmpfs. >>>>> >>>>> If I misunderstood Hugh, well, I still disagree.  We should not have >>>>> separate controls for this.  tmpfs is just not that special. >>> >>> I wasn't at the meeting that's being referred to, but I thought we previously >>> agreed that tmpfs *is* special because in some configurations its not backed by >>> swap so is locked in ram? >> >> There are multiple things to that, like: >> >> * Machines only having limited/no swap configured >> * tmpfs can be configured to never go to swap >> * memfd/tmpfs files getting used purely for mmap(): there is no real >> difference to MAP_ANON|MAP_SHARE besides the processes we share that >> memory with. >> >> Especially when it comes to memory waste concerns and access behavior in >> some cases, tmpfs behaved much more like anonymous memory. But there are for >> sure other use cases where tmpfs is not that special. > > Having controls to select the allowable folio order allocations for > tmpfs does not address any of these issues. The suggested filesystem > approach [1] involves allocating orders in larger chunks, but always > the same size you would allocate when using order-0 folios. Well you can't know that you will never allocate more. If you allocate a 2M block, you probably have some good readahead data that tells you you are likely to keep reading sequentially, but you don't know for sure that the application won't stop after just 4K. > So, > it's a conservative approach. Using mTHP knobs in tmpfs would cause: > * Over allocation when using mTHP and/ord THP under the 'always' flag. > * Allocate in bigger chunks in a non optimal way, when > not all mTHP and THP orders are enabled. > * Operate in a similar manner as in [1] when all mTHP and THP orders > are enabled and 'within_size' flag is used (assuming we use patch 11 > from [1]). Large folios may still be considered scarce resources even if the amount of memory allocated is still the same. And if shmem isn't backed by swap then once you have allocated a large folio for shmem, it is stuck in shmem, even if it would be better used somewhere else. And it's possible (likely even, in my opinion) that allocating lots of different folio sizes will exacerbate memory fragmentation, leading to more order-0 fallbacks, which would hurt the overall system performance in the long run, vs restricting to a couple of folio sizes. I'm starting some work to actually measure how limiting the folio sizes allocated for page cache memory can help reduce large folio allocation failure overall. My hypothesis is that the data will show us that in an environment like Android, where memory pressure is high, limiting everything to order-0 and order-4 will significantly improve the allocation success rate of order-4. Let's see. > > [1] Last 3 patches of these series: > https://lore.kernel.org/all/20240515055719.32577-1-da.gomez@samsung.com/ > > My understanding of why mTHP was preferred is to raise awareness in > user space and allow tmpfs mounts used at boot time to operate in > 'safe' mode (no large folios). Does it make more sense to have a large > folios enable flag to control order allocation as in [1], instead of > every single order possible? My intuition is towards every order possible, as per above. Let's see what the data tells us. > >> >> My opinion is that we need to let people configure orders (if you feel like >> it, configure all), but *select* the order to allocate based on readahead >> information -- in contrast to anonymous memory where we start at the highest >> order and don't have readahead information available. >> >> Maybe we need different "order allcoation" logic for read/write vs. fault, >> not sure. > > I would suggest [1] the file size of the write for the write > and fallocate paths. But when does make sense to use readahead > information? Maybe when swap is involved? > >> >> But I don't maintain that code, so I can only give stupid suggestions and >> repeat what I understood from the meeting with Hugh and Kirill :) >> >> -- >> Cheers, >> >> David / dhildenb