From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BC565D711BE for ; Wed, 20 Nov 2024 15:58:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5AC716B0099; Wed, 20 Nov 2024 10:58:39 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 55C496B009A; Wed, 20 Nov 2024 10:58:39 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 44BCB6B009B; Wed, 20 Nov 2024 10:58:39 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 25DA96B0099 for ; Wed, 20 Nov 2024 10:58:39 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id AA3C0120DB1 for ; Wed, 20 Nov 2024 15:58:38 +0000 (UTC) X-FDA: 82806928470.03.EB2B284 Received: from smtp-fw-80007.amazon.com (smtp-fw-80007.amazon.com [99.78.197.218]) by imf25.hostedemail.com (Postfix) with ESMTP id 192D7A000D for ; Wed, 20 Nov 2024 15:57:58 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=MwDiuzhD; spf=pass (imf25.hostedemail.com: domain of "prvs=04767268d=kalyazin@amazon.co.uk" designates 99.78.197.218 as permitted sender) smtp.mailfrom="prvs=04767268d=kalyazin@amazon.co.uk"; dmarc=pass (policy=quarantine) header.from=amazon.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1732118071; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=pSZsUFc2pO+H1LRn5VA2nkGQrkmW6bpxo1bRSlx8hok=; b=70CyuozT0yehG4LFvn26OPL8TNLrs9TGPYcaLTZn4GRyfm0tJGPzCgKhvCfWWH/1W7Kjdb +7W/ga9i7TBXOQ1x9r46k02lnqLOyhm7LF2CPIgtPj3Ks/mqBWAAxs2U9bxQvhUcORgP2g VgTW79zTkyAeNOJ4H1KMqQdbTJ+Lo80= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=MwDiuzhD; spf=pass (imf25.hostedemail.com: domain of "prvs=04767268d=kalyazin@amazon.co.uk" designates 99.78.197.218 as permitted sender) smtp.mailfrom="prvs=04767268d=kalyazin@amazon.co.uk"; dmarc=pass (policy=quarantine) header.from=amazon.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1732118071; a=rsa-sha256; cv=none; b=iZzMDYlf0elp4Ah11REOX0w58i/q7G7xQY/7VkMfqUCd9JoyLdXZCFx3CDjUb2o65dWtTY It0JpqXDct2bGD4aMN7yu+njihQHozcAmciq2R8OurSzBqyhrKw3dgwHHfChFN9dQHa0nV Lef/keCt1+4zP41cZf2UWsGtMR1Nkoo= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1732118317; x=1763654317; h=message-id:date:mime-version:reply-to:subject:to:cc: references:from:in-reply-to:content-transfer-encoding; bh=pSZsUFc2pO+H1LRn5VA2nkGQrkmW6bpxo1bRSlx8hok=; b=MwDiuzhDyBydirjOQNjN6WdSuGRIC9gbh86qcjyDXtXgOrX6fRSDxrgP nS10ZP2PIdQSM7EhbqL370AyTx2AX1Qge33xpkEen5mLvbiZnDNRpJyip zE1sIcnJ7i4Zle7cBCPCLZ1wW0jMTDTRRIDRq513K786yWNVI2iw98v2N c=; X-IronPort-AV: E=Sophos;i="6.12,170,1728950400"; d="scan'208";a="354185361" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.25.36.210]) by smtp-border-fw-80007.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Nov 2024 15:58:34 +0000 Received: from EX19MTAEUB002.ant.amazon.com [10.0.10.100:53225] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.10.207:2525] with esmtp (Farcaster) id 5ec6092c-67cf-4bc6-a64f-e617283997e7; Wed, 20 Nov 2024 15:58:32 +0000 (UTC) X-Farcaster-Flow-ID: 5ec6092c-67cf-4bc6-a64f-e617283997e7 Received: from EX19D022EUC002.ant.amazon.com (10.252.51.137) by EX19MTAEUB002.ant.amazon.com (10.252.51.59) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Wed, 20 Nov 2024 15:58:31 +0000 Received: from [192.168.4.239] (10.106.82.23) by EX19D022EUC002.ant.amazon.com (10.252.51.137) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Wed, 20 Nov 2024 15:58:30 +0000 Message-ID: <55b6b3ec-eaa8-494b-9bc7-741fe0c3bc63@amazon.com> Date: Wed, 20 Nov 2024 15:58:29 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Reply-To: Subject: Re: [RFC PATCH 0/4] KVM: ioctl for populating guest_memfd To: David Hildenbrand , , , , , CC: , , , , , , , , , "Sean Christopherson" , References: <20241024095429.54052-1-kalyazin@amazon.com> <08aeaf6e-dc89-413a-86a6-b9772c9b2faf@amazon.com> <01b0a528-bec0-41d7-80f6-8afe213bd56b@redhat.com> Content-Language: en-US From: Nikita Kalyazin Autocrypt: addr=kalyazin@amazon.com; keydata= xjMEY+ZIvRYJKwYBBAHaRw8BAQdA9FwYskD/5BFmiiTgktstviS9svHeszG2JfIkUqjxf+/N JU5pa2l0YSBLYWx5YXppbiA8a2FseWF6aW5AYW1hem9uLmNvbT7CjwQTFggANxYhBGhhGDEy BjLQwD9FsK+SyiCpmmTzBQJj5ki9BQkDwmcAAhsDBAsJCAcFFQgJCgsFFgIDAQAACgkQr5LK IKmaZPOR1wD/UTcn4GbLC39QIwJuWXW0DeLoikxFBYkbhYyZ5CbtrtAA/2/rnR/zKZmyXqJ6 ULlSE8eWA3ywAIOH8jIETF2fCaUCzjgEY+ZIvRIKKwYBBAGXVQEFAQEHQCqd7/nb2tb36vZt ubg1iBLCSDctMlKHsQTp7wCnEc4RAwEIB8J+BBgWCAAmFiEEaGEYMTIGMtDAP0Wwr5LKIKma ZPMFAmPmSL0FCQPCZwACGwwACgkQr5LKIKmaZPNCxAEAxwnrmyqSC63nf6hoCFCfJYQapghC abLV0+PWemntlwEA/RYx8qCWD6zOEn4eYhQAucEwtg6h1PBbeGK94khVMooF In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.106.82.23] X-ClientProxiedBy: EX19D014EUA003.ant.amazon.com (10.252.50.119) To EX19D022EUC002.ant.amazon.com (10.252.51.137) X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 192D7A000D X-Stat-Signature: 4ntbdjzausg8yy7uy85p7bghwqsonhtx X-Rspam-User: X-HE-Tag: 1732118278-860907 X-HE-Meta: U2FsdGVkX19R7N1w/YduWSm3Js+2PdX3A3gBc3sjTO6FcBrRNHHZviDUnKptAqTi3m9yhBetc6JkHJark0GxWQ6ADulXrUXtDtlVn11wEmOipgmill9T9fi7xQ/tE0VKbbfPljc4KUZc3BWwvqpDLyLbEXmbyu2+P8g1gzhpSfjEXGG9qsGDDT8qt41HmuV1bpAMSmOb1400SlJy3QGPfUU4be/wo1OXnEJW77qHy5qFyKmayzbWzmwkcuJyOfhrX9MaDY8dLjejEnvjK7tFJdWEyHQOUM8JKgVdR35FsR57NBPmrU4pmebJPIfzwa2eQ+gqu4K7DmtjgpwUiRMQ3jYJvMpEtpShQvXRVKdOmJo+E9E0ntCRpsRFTJYBx3BP7Id8th6VMA0/kC+M1Ltil5hj+wkUKU/xTXnAoPc/AS9ShKN6Mt2cBlAC8QSzmzQeDi7Z6Z0jw78fEf+IL+Dw+mLS6DNeu7VqvAJJO/12rG87fsXvHIa9OdmQesMXKCF0Y2udCFfJJkYbTFTnQdS/Li9SWOHFsT0H0PwaREsZvzSszNZTC94yyKXKIcMFJrDlHjKHO8rB4KcNJCzcYmU+GiQWPVzAnIPYLUwdOqDm49sQb63YOm5v0T3BjBvuf1FSaXYdXuvkOynYmcUxXB/sWfveZfKKcRgaTbyfe9wrc/5ui8NDi0aepmSYKnHi9Q9v0IdV1TTs10kqx6nSGsk9oJTbHpMIco46Hgx05Tho8LVVs5Hot+w0PmY9EKHNxpH2UsV5RinyDuJwk+vOjJdF2OEo3FqaM/VAwu0sSU7WJEBOo4c+ggIPOZA0hUV8EkBT3ACVLMh3NhEq14dqj0YLk5SwaVgO3+ipkpRnJ5b9EG/oBXVui8KegF9k8c+UzYGnSFiLB8f27j1AvzK3vbAb1RJ3CBJ3u7xXkt0GhKj7ZyvZBQ84GsSdLqGZGlsxVXfeosiAYyr0t8wMoCrh+oV 7eQkBuoQ OIB/NwmSEAOnHDEhAkWZKzjF/s0wa1LeKuF+mT5R1HFar3UOMtujHLEN4woi09PSFurdzDqSGjv/OmU6VQLu4gT7m0Ip+4wfSTI1+s0f3v24U21QZjpcewvHregyLE+Yr7KSOAytgYTrP1OzXbF2+1OvVVKC2fg8pIHwOcTngFPyMjR+O2n1G4XBpb2yyc3LXQJc9wOkUQgYbO5taZC+EuA52gJPMrYIXXAWUOMfddG3839V6pNo7sxyz/TVtXZq5BYjowAH7plRGI5Z76B2vMLh/+ujmWFUcZl1WRftFk58rguIXSpFgqzKBER11nZcknpypNOdo32hHP6/D+ryxB4xTUcUe0hsgqNDqrpADnf1dvpmzvhtUMTjsVk0F7N9qItWo0DsrRZkXu5wKpu1svZF0OvxvyrDSc7WvNahyZ4eryJeyFmFK5JV02LhygLZyP08jYilyWkoxwimnwkDqy1RvxDpaMcU0vjtXq5nYRpc29zz7d+tZ9xqdtJv/m2wnLSvV9pDIIVWwLE7osJAnefesV2Mjk21LtwnEr3wCKhr7n7NuqLH6gn1ZGGAwr0H+5NEHSH9N0V0PbdGZqkgT0w6lIOqDD2uRU4vWofPPMSw/VdJSeAXfUOkjsOAUWNwKVJhXmlLk2QM2TrOrGNvfQwckfxDyirxwrtgqfA++dZ0mA57yiJVRRh6vxVMNeysRR8fkesSEauZDABJ6xWl70Y02XoJl9OVY3UjVdjd2z27poV+zcu/l8S95qz/qBU65DTB6ppubnYJAA34= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 20/11/2024 15:13, David Hildenbrand wrote: > Hi! Hi! :) >> Results: >> - MAP_PRIVATE: 968 ms >> - MAP_SHARED: 1646 ms > > At least here it is expected to some degree: as soon as the page cache > is involved map/unmap gets slower, because we are effectively > maintaining two datastructures (page tables + page cache) instead of > only a single one (page cache) > > Can you make sure that THP/large folios don't interfere in your > experiments (e.g., madvise(MADV_NOHUGEPAGE))? I was using transparent_hugepage=never command line argument in my testing. $ cat /sys/kernel/mm/transparent_hugepage/enabled always madvise [never] Is that sufficient to exclude the THP/large folio factor? >> While this logic is intuitive, its performance effect is more >> significant that I would expect. > > Yes. How much of the performance difference would remain if you hack out > the atomic op just to play with it? I suspect there will still be some > difference. I have tried that, but could not see any noticeable difference in the overall results. It looks like a big portion of the bottleneck has moved from shmem_get_folio_gfp/folio_mark_uptodate to finish_fault/__pte_offset_map_lock somehow. I have no good explanation for why: Orig: - 69.62% do_fault + 44.61% __do_fault + 20.26% filemap_map_pages + 3.48% finish_fault Hacked: - 67.39% do_fault + 32.45% __do_fault + 21.87% filemap_map_pages + 11.97% finish_fault Orig: - 3.48% finish_fault - 1.28% set_pte_range 0.96% folio_add_file_rmap_ptes - 0.91% __pte_offset_map_lock 0.54% _raw_spin_lock Hacked: - 11.97% finish_fault - 8.59% __pte_offset_map_lock - 6.27% _raw_spin_lock preempt_count_add 1.00% __pte_offset_map - 1.28% set_pte_range - folio_add_file_rmap_ptes __mod_node_page_state > Note that we might improve allocation times with guest_memfd when > allocating larger folios. I suppose it may not always be an option depending on requirements to consistency of the allocation latency. Eg if a large folio isn't available at the time, the performance would degrade to the base case (please correct me if I'm missing something). > Heh, now I spot that your comment was as reply to a series. Yeah, sorry if it wasn't obvious. > If your ioctl is supposed to to more than "allocating memory" like > MAP_POPULATE/MADV_POPULATE+* ... then POPULATE is a suboptimal choice. > Because for allocating memory, we would want to use fallocate() instead. > I assume you want to "allocate+copy"? Yes, the ultimate use case is "allocate+copy". > I'll note that, as we're moving into the direction of moving > guest_memfd.c into mm/guestmem.c, we'll likely want to avoid "KVM_*" > ioctls, and think about something generic. Good point, thanks. Are we at the stage where some concrete API has been proposed yet? I might have missed that. > Any clue how your new ioctl will interact with the WIP to have shared > memory as part of guest_memfd? For example, could it be reasonable to > "populate" the shared memory first (via VMA) and then convert that > "allocated+filled" memory to private? No, I can't immediately see why it shouldn't work. My main concern would probably still be about the latency of the population stage as I can't see why it would improve compared to what we have now, because my feeling is this is linked with the sharedness property of guest_memfd. > Cheers, > > David / dhildenb