From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15130E7719C for ; Fri, 10 Jan 2025 18:54:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9805D8D000C; Fri, 10 Jan 2025 13:54:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 930896B00E5; Fri, 10 Jan 2025 13:54:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7A7468D000C; Fri, 10 Jan 2025 13:54:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 5BA1A6B00E4 for ; Fri, 10 Jan 2025 13:54:17 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id F17E71A0002 for ; Fri, 10 Jan 2025 18:54:16 +0000 (UTC) X-FDA: 82992442512.18.74E73A3 Received: from smtp-fw-80006.amazon.com (smtp-fw-80006.amazon.com [99.78.197.217]) by imf18.hostedemail.com (Postfix) with ESMTP id BE58B1C0005 for ; Fri, 10 Jan 2025 18:54:14 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=sEg7vOVS; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf18.hostedemail.com: domain of "prvs=09809d163=kalyazin@amazon.co.uk" designates 99.78.197.217 as permitted sender) smtp.mailfrom="prvs=09809d163=kalyazin@amazon.co.uk" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736535255; a=rsa-sha256; cv=none; b=FMSzEbjdg4VmMiph/nJiqGZDCcOKsgLoPAOhiVSIOsrirhfZgzq+pmpRZlezRNrmn7eM6O +tTUhMNuZBbQH6U3SnnktU0oPH5qBMKETrqSVf404zYRs8qZ7leWo9bSTrEP4J2YlCfrk3 JGar9A/vPuNR6Jg0EuETFkBzQP/5oyc= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=sEg7vOVS; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf18.hostedemail.com: domain of "prvs=09809d163=kalyazin@amazon.co.uk" designates 99.78.197.217 as permitted sender) smtp.mailfrom="prvs=09809d163=kalyazin@amazon.co.uk" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736535255; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=dVDd8Owu5mmhfXq9kotQZKYGFY24PHmq4HgS4SvakZc=; b=isU3KDIuUMpcYk14En1/rxodlwY9tT34W5PERZ9DGk0YzqMHH12VJ53oiC7FfMcIeoTT/Q fb1HjI6DQMYdDUL4sBLSTOjSL0gC9KD1btMZnkGKUcXls9+V/kC3hyYCKnqawLPGpIzoNB LgrSRuljZjm7/pXHJ+Lo2TwC+PyVLrM= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1736535255; x=1768071255; h=message-id:date:mime-version:reply-to:subject:to:cc: references:from:in-reply-to:content-transfer-encoding; bh=dVDd8Owu5mmhfXq9kotQZKYGFY24PHmq4HgS4SvakZc=; b=sEg7vOVSqDhG9NI7aNlTrN1jsqn129x3ZtPBkzuscNReviNBcfcHW1IS o8y1EZFP/Rl8ExDAHkhzrFexjMUp/1+qYvduec4/wukmPauhW+VU4Jn46 BsR3ks8rdMYwwB61YLP/RXeOOt3Y7q96dcKNa811ljKqIc+fUgTSU5+mT k=; X-IronPort-AV: E=Sophos;i="6.12,305,1728950400"; d="scan'208";a="13453976" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.25.36.214]) by smtp-border-fw-80006.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Jan 2025 18:54:12 +0000 Received: from EX19MTAEUB001.ant.amazon.com [10.0.43.254:2623] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.45.180:2525] with esmtp (Farcaster) id b928f9af-217a-405f-8ef6-6ae3379b2b81; Fri, 10 Jan 2025 18:54:11 +0000 (UTC) X-Farcaster-Flow-ID: b928f9af-217a-405f-8ef6-6ae3379b2b81 Received: from EX19D022EUC002.ant.amazon.com (10.252.51.137) by EX19MTAEUB001.ant.amazon.com (10.252.51.26) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.39; Fri, 10 Jan 2025 18:54:10 +0000 Received: from [192.168.12.16] (10.106.82.30) by EX19D022EUC002.ant.amazon.com (10.252.51.137) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.39; Fri, 10 Jan 2025 18:54:09 +0000 Message-ID: Date: Fri, 10 Jan 2025 18:54:03 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Reply-To: Subject: Re: [RFC PATCH 0/2] mm: filemap: add filemap_grab_folios To: David Hildenbrand , , , , , , CC: , , , , , , , , , References: <20250110154659.95464-1-kalyazin@amazon.com> <5608af05-0b7a-4e11-b381-8b57b701e316@redhat.com> Content-Language: en-US From: Nikita Kalyazin Autocrypt: addr=kalyazin@amazon.com; keydata= xjMEY+ZIvRYJKwYBBAHaRw8BAQdA9FwYskD/5BFmiiTgktstviS9svHeszG2JfIkUqjxf+/N JU5pa2l0YSBLYWx5YXppbiA8a2FseWF6aW5AYW1hem9uLmNvbT7CjwQTFggANxYhBGhhGDEy BjLQwD9FsK+SyiCpmmTzBQJj5ki9BQkDwmcAAhsDBAsJCAcFFQgJCgsFFgIDAQAACgkQr5LK IKmaZPOR1wD/UTcn4GbLC39QIwJuWXW0DeLoikxFBYkbhYyZ5CbtrtAA/2/rnR/zKZmyXqJ6 ULlSE8eWA3ywAIOH8jIETF2fCaUCzjgEY+ZIvRIKKwYBBAGXVQEFAQEHQCqd7/nb2tb36vZt ubg1iBLCSDctMlKHsQTp7wCnEc4RAwEIB8J+BBgWCAAmFiEEaGEYMTIGMtDAP0Wwr5LKIKma ZPMFAmPmSL0FCQPCZwACGwwACgkQr5LKIKmaZPNCxAEAxwnrmyqSC63nf6hoCFCfJYQapghC abLV0+PWemntlwEA/RYx8qCWD6zOEn4eYhQAucEwtg6h1PBbeGK94khVMooF In-Reply-To: <5608af05-0b7a-4e11-b381-8b57b701e316@redhat.com> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.106.82.30] X-ClientProxiedBy: EX19D002EUC003.ant.amazon.com (10.252.51.218) To EX19D022EUC002.ant.amazon.com (10.252.51.137) X-Stat-Signature: kk9oo15mn9cy9ekw3okdiwatjmni3yjq X-Rspam-User: X-Rspamd-Queue-Id: BE58B1C0005 X-Rspamd-Server: rspam08 X-HE-Tag: 1736535254-211516 X-HE-Meta: U2FsdGVkX19Hu9iCZU2fmrQkdb3l0jMklAXW/OqAeY8qPY6uFLduKIPpXumb32VdO/RwJ0SWZYUkClQH8jnsTVwjNcgTcZ5s4cW2cbz5ZOmK+xvNcCAFJFZd+2U0U3lGLKs1P/jkBxw7Gb0YwShiHQTENUVhdH0worB4Lg90zZlIIn2bzAuOHoCtd4wc90HFnALiICvm+rJbs2E/7IQAYrd9thMMmvC3YHDeLOpy+FtF079ObZYSvmvvacWxO3QuMsEnsmKMOsnXYsJDIOhxodpMs8ogKykeXoGtUYo9p7/iOWOYPnfOJlamy/cbxY9g5UW9WuGdbgVchHe9OXa/9yQylMgkMHA9QAVjNtNBKy1Oh6XZWlhR46mvihaOOOHlI7Lzwlwytg2Fr6cuGwlmoJq89u4huVSNaul4PgC7ygZD09nflxUXA0ustqY8HdZsMawCzx3PipPS9tNRNrMIJOYPTtfzPjvrq0Hc6Eetelk49wPPWbIVd+zrpv8tgm/rvKPyfO/7MkopiaQE4gEjCbfFOf4ASla+8LikGhNA/fWEVaW5sWYM1t7jmAPIQX3aLNEekECSxcNWt6+nqBMcXk34R12YSBMXQZQW+I8vi7LnMSfQDP+6y9nKsj8kRzafkz8d0QJbglfJBcNH8pOz1MyCtFnFL1NTsy1ZkfTW1pcgErEwZU4m3tS9UqYjDbmtFdxa7RamWpHOIydKK1g/sDOXk3e4nPKGqD8JN57dBQZEjf1fVpx8zIzJ4nalkMAaiR/3qEnnkQrhqpQav8fqlOLXPOPLOzSLuMI1PYAou85foV6cJsIwSKVG8p/ggJbyrV3jyiIk3+tYaTloEvJcKvJxx68Ty4pKPdpyk4NTdbHxejGsU/2vP6EK1B4tXZ8XDITOvaytc5x1+AZE/tymnnRIw3v/Py2Mc9xobhMfgzPYtaLCvapPbOicY9W7vxkXxHtYV8kQthLi3sC1Xk9 M4hhAG5Z gOccKv29xcHS9x0mwYnE9Nku95K4lMAhmjFxbUJnc8ipnCK27yZiRc4G4dlti4pyzQUzvT/Zw6xTUsHf334pqAYMpk6McwsugNqbbyfSK9hwgSqMTYk0wJkNaq50SeufuVZx/ypfOh2tj7W9s7YAzUtr9XjoJvwoGyv1La8/0rcqlJ4O+pfQDxUqX2PLO1z3NSwR+mnUcIpgtHF/b5qfwSNSEQ4zqEZ6zjyOwBGURuCs4GXirLVGHdafidLSCcsDwFVdil3G8gAMRi08u4hUYUZ/hBLQGRHn+he4FY/XcnJ0UQGrP3Dkg0hoGPlViQlNckWVmXjCZVESiLchFelDISF/hdW/CpOSTtRGsAnAYD9EEe6F4dbKtl07z1/bJsf1wgzE4dkOvR0E/yQUHxm9J5/naphzZA7NwLtiJ/ijtNM+KNr6qYFS/Fwv+DVeguHuDQJMroWc8TuDhEixfHPdHNLop23E80mpOMjqhLEKUaCUtUOsu8mcFyBksoZ5qQDCHy6ZvhCb0wXJG5kDUDf5kU1c/sk0HQ3vR/E+n6yBDmPsMZpg8C9xe/GxT0AuGQ/TH9kcfFrz80rnfG++l+DUJgJiXCMIB6vJDR50p6Ic+sLeUWIQBmbV5q3hRmHtoXXyDB/DwzW2aMIWJFqXFO66jHYcKgtxXuTiaAGV6nMD0MNQIF51H3s4wntrj+QIiIJnE1h9yzdeT6bhx4DRMfEPvQ9xgum7rV8fUCcpUmjgwVwghfMBw+b7pb9cSQffWrBnUXAFpWeHf+u6238aSgNhqFc7pIvCdZUxNq5hz X-Bogosity: Ham, tests=bogofilter, spamicity=0.000161, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 10/01/2025 17:01, David Hildenbrand wrote: > On 10.01.25 16:46, Nikita Kalyazin wrote: >> Based on David's suggestion for speeding up guest_memfd memory >> population [1] made at the guest_memfd upstream call on 5 Dec 2024 [2], >> this adds `filemap_grab_folios` that grabs multiple folios at a time. >> > > Hi, Hi :) > >> Motivation >> >> When profiling guest_memfd population and comparing the results with >> population of anonymous memory via UFFDIO_COPY, I observed that the >> former was up to 20% slower, mainly due to adding newly allocated pages >> to the pagecache.  As far as I can see, the two main contributors to it >> are pagecache locking and tree traversals needed for every folio.  The >> RFC attempts to partially mitigate those by adding multiple folios at a >> time to the pagecache. >> >> Testing >> >> With the change applied, I was able to observe a 10.3% (708 to 635 ms) >> speedup in a selftest that populated 3GiB guest_memfd and a 9.5% (990 to >> 904 ms) speedup when restoring a 3GiB guest_memfd VM snapshot using a >> custom Firecracker version, both on Intel Ice Lake. > > Does that mean that it's still 10% slower (based on the 20% above), or > were the 20% from a different micro-benchmark? Yes, it is still slower: - isolated/selftest: 2.3% - Firecracker setup: 8.9% Not sure why the values are so different though. I'll try to find an explanation. >> >> Limitations >> >> While `filemap_grab_folios` handles THP/large folios internally and >> deals with reclaim artifacts in the pagecache (shadows), for simplicity >> reasons, the RFC does not support those as it demonstrates the >> optimisation applied to guest_memfd, which only uses small folios and >> does not support reclaim at the moment. > > It might be worth pointing out that, while support for larger folios is > in the works, there will be scenarios where small folios are unavoidable > in the future (mixture of shared and private memory). > > How hard would it be to just naturally support large folios as well? I don't think it's going to be impossible. It's just one more dimension that needs to be handled. `__filemap_add_folio` logic is already rather complex, and processing multiple folios while also splitting when necessary correctly looks substantially convoluted to me. So my idea was to discuss/validate the multi-folio approach first before rolling the sleeves up. > We do have memfd_pin_folios() that can deal with that and provides a > slightly similar interface (struct folio **folios). > > For reference, the interface is: > > long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end, >                      struct folio **folios, unsigned int max_folios, >                      pgoff_t *offset) > > Maybe what you propose could even be used to further improve > memfd_pin_folios() internally? However, it must do this FOLL_PIN thingy, > so it must process each and every folio it processed. Thanks for the pointer. Yeah, I see what you mean. I guess, it can potentially allocate/add folios in a batch and then pin them? Although swap/readahead logic may make it more difficult to implement. > -- > Cheers, > > David / dhildenb