From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2D22FC35FF1 for ; Fri, 14 Mar 2025 17:12:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5B363280002; Fri, 14 Mar 2025 13:12:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5369B280001; Fri, 14 Mar 2025 13:12:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3DA72280002; Fri, 14 Mar 2025 13:12:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 16D5F280001 for ; Fri, 14 Mar 2025 13:12:50 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 155E6161449 for ; Fri, 14 Mar 2025 17:12:51 +0000 (UTC) X-FDA: 83220801342.01.7813FE3 Received: from smtp-fw-52003.amazon.com (smtp-fw-52003.amazon.com [52.119.213.152]) by imf29.hostedemail.com (Postfix) with ESMTP id 02AD2120006 for ; Fri, 14 Mar 2025 17:12:48 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=m99CFGec; spf=pass (imf29.hostedemail.com: domain of "prvs=16181954a=kalyazin@amazon.co.uk" designates 52.119.213.152 as permitted sender) smtp.mailfrom="prvs=16181954a=kalyazin@amazon.co.uk"; dmarc=pass (policy=quarantine) header.from=amazon.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741972369; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=oyAepvDv+0bsEwZzYOM+K7FhxhX7gKg3uHeMpyD+04E=; b=VsHw+iQ0w2kkYms/3EDZLWWDktW4x7ah3hlDdlZrUGtyymF9yRcC8yl7R00F4e5cvB1v5o m+Q2Q7CAsKgI9WrR38mo9NFYA+eBp8PfTnfeXGqXawX/qXl2lgSH0UU5NqgWkE/HsWGc1k 4u+n/wLCUld0PCntxb/TR/fkJY0prfs= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741972369; a=rsa-sha256; cv=none; b=aX2G6H9Pyva5Xel0Xj5EiVX3NEvZ8cF+15RMFSRD8THkkE2IIvXE3BtkxA8worx+jvvUb8 99znaO+brbUwua47m6IPq1ep6fE8Bx2exgXODfTGsO2eWfGauUAqf29ftgEv8iKh+xoBGK /ewG6mV+xwlR7Xw/37g44ZerjesR/IE= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=m99CFGec; spf=pass (imf29.hostedemail.com: domain of "prvs=16181954a=kalyazin@amazon.co.uk" designates 52.119.213.152 as permitted sender) smtp.mailfrom="prvs=16181954a=kalyazin@amazon.co.uk"; dmarc=pass (policy=quarantine) header.from=amazon.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1741972369; x=1773508369; h=message-id:date:mime-version:reply-to:subject:to:cc: references:from:in-reply-to:content-transfer-encoding; bh=oyAepvDv+0bsEwZzYOM+K7FhxhX7gKg3uHeMpyD+04E=; b=m99CFGec5UA1CsU4NgmcR6G8jAAmlA9lBOGEKjfZaEp86qAjDS/4Vqq1 RZcpFaGeJVKX0ahqfLVukdPVeW0BBE1e8Pu1bBkFUiH7xlJfWY6B+c7T2 4i6jtp4S8NASSN0lYRA5MxSGs/muWab2c9BevrfYFtSOynhv44hku6t/A c=; X-IronPort-AV: E=Sophos;i="6.14,246,1736812800"; d="scan'208";a="74461133" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-52003.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Mar 2025 17:12:45 +0000 Received: from EX19MTAEUB002.ant.amazon.com [10.0.17.79:48372] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.37.123:2525] with esmtp (Farcaster) id a718f9c8-6f48-4c18-94f5-eb2eba6844c3; Fri, 14 Mar 2025 17:12:45 +0000 (UTC) X-Farcaster-Flow-ID: a718f9c8-6f48-4c18-94f5-eb2eba6844c3 Received: from EX19D022EUC002.ant.amazon.com (10.252.51.137) by EX19MTAEUB002.ant.amazon.com (10.252.51.59) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Fri, 14 Mar 2025 17:12:38 +0000 Received: from [192.168.30.216] (10.106.83.27) by EX19D022EUC002.ant.amazon.com (10.252.51.137) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Fri, 14 Mar 2025 17:12:36 +0000 Message-ID: <24528be7-8f7a-4928-8bca-5869cf14eace@amazon.com> Date: Fri, 14 Mar 2025 17:12:35 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Reply-To: Subject: Re: [RFC PATCH 0/5] KVM: guest_memfd: support for uffd missing To: Peter Xu CC: James Houghton , , , , , , , , , , , , , , , , , References: <9e7536cc-211d-40ca-b458-66d3d8b94b4d@amazon.com> <7c304c72-1f9c-4a5a-910b-02d0f1514b01@amazon.com> <69dc324f-99fb-44ec-8501-086fe7af9d0d@amazon.com> <507e6ad7-2e28-4199-948a-4001e0d6f421@amazon.com> Content-Language: en-US From: Nikita Kalyazin Autocrypt: addr=kalyazin@amazon.com; keydata= xjMEY+ZIvRYJKwYBBAHaRw8BAQdA9FwYskD/5BFmiiTgktstviS9svHeszG2JfIkUqjxf+/N JU5pa2l0YSBLYWx5YXppbiA8a2FseWF6aW5AYW1hem9uLmNvbT7CjwQTFggANxYhBGhhGDEy BjLQwD9FsK+SyiCpmmTzBQJnrNfABQkFps9DAhsDBAsJCAcFFQgJCgsFFgIDAQAACgkQr5LK IKmaZPOpfgD/exazh4C2Z8fNEz54YLJ6tuFEgQrVQPX6nQ/PfQi2+dwBAMGTpZcj9Z9NvSe1 CmmKYnYjhzGxzjBs8itSUvWIcMsFzjgEY+ZIvRIKKwYBBAGXVQEFAQEHQCqd7/nb2tb36vZt ubg1iBLCSDctMlKHsQTp7wCnEc4RAwEIB8J+BBgWCAAmFiEEaGEYMTIGMtDAP0Wwr5LKIKma ZPMFAmes18AFCQWmz0MCGwwACgkQr5LKIKmaZPNTlQEA+q+rGFn7273rOAg+rxPty0M8lJbT i2kGo8RmPPLu650A/1kWgz1AnenQUYzTAFnZrKSsXAw5WoHaDLBz9kiO5pAK In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.106.83.27] X-ClientProxiedBy: EX19D006EUA003.ant.amazon.com (10.252.50.176) To EX19D022EUC002.ant.amazon.com (10.252.51.137) X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 02AD2120006 X-Stat-Signature: n3im6io1xeau4kno7batugpd7s5t78cm X-HE-Tag: 1741972368-281642 X-HE-Meta: U2FsdGVkX19CYnqX48zh0ZUPYA99e3387VXIJlc6wzUEwKHcgWMiAYzvmBxuhsrSo5Zpgu6wOyu/GPfOAZc+4PvG70ancKE/7uW4SSSq9XbHI3hKrptiBDptlisGGIwL4fYJDWwGQXZLZLAyHL7apAyVOjIBUOuJfRc323UR27sAz81p9SSff6rEGg4NA85MqBu2CUwJwktNnIhv+ZIgp29A47fKCeqnXxVN2KPFzmB/vsQLhoQez3L030QjkF3eJ78hDuc/cYDXUP0aE2EmgSGAStiDixnz6FytyCzdHmJr5opIZPs8qo4uRwqJA0Yf5rSh0gqus3z5sWjN567a+lfoktuUTZDLoSfLZ0PLsQQQ2XYQDFiEjW+Ssf6t+DxfmqD2onEXF2NAe5p6ZKwCWI355WI/aHTwxdsO2cdL7ElQxjhTzpNvxTDnwoh7ZsCn2xqGVK6CewjUogaylvGjSIJK72f4NpVkGvIesbWbYsEwaZBHq3+OfkNtyWTdS09iuOBJTPwcjbUfe9wZTm0nAPvicVSeJ8AcRoHjiOVXBwelpTrw2UVObQv5/h22wNcUl8pRCwWkZJDVGRvX/fA9uRLQCjet5ngMJ5e7J8go3SPwd0HkjB1T+TzNKW97iP5uUhTI3kpEcGH7HXxLcwuNCbQqgqPmylpMnyWe0uacw7rcuVR1x8xiDhp6sQoC0eoFMtn4NnSrMZxruUMhXEQWCkYfTLPtR+pgmb2BX3URtXHZpuFbHq+rm148xKwxxwH39cAsRguHcxmFq5NNp4WNk7Jcn5UdhTWdfOdBqgUlgyFThoxmIbBBac5VUmy5f4OVwAOFSwtmRsmmGlZOwJ+kqrv46z7iqEDSB8qmEwivPtRY2t7x/kN3D28czKV1DaZM1YCWinytjNADE4nM1khbbHOKunih9opVMNGSEa8FsR5gr95byZMGHKpq9GJLiXO1n8lbAikISWa3bhpAIme 2E/NbEH1 ++UhLtoMQCnpegfqIPlIc7tFJCEM6VLLfX9RnUaLmlO3q1QoIugEhfqoOT2xTiOvqxa31RlgEo7/XC2LyBxn4TTA66jjOyb9Kr96Nce6hj0Q2ep9AxUaxrwdr7KnwhbIKuhlbUfumorEW7HeaBHA4U6ZKwwPuYF5z6gEShIKuD7s/YZ6ULiYmjbZzzCPL4CE9DLY0U4x6yw28YAkAcz68dEddofT/MTdEm3G9ZI7rhO2+uyIF0KBai2rhHS4j2b9vogmgjEUhqXkFblIs2qgAIcvzxJQqBsxL8LWpzadeAzYW6uJfDLTZhIF6fIgbXXVVlNabswbarEBAoL3e9D2m8yH6uYkIBJPm/f8kxzp9/gUiEwikHTR2rAxERl/CTycRMqE6l7JP6dEU5jo5vmMOtUs0uMK3kB20S0CPYfxVxTJS91LqYOiTpJ+P3PNswj4WpU1/FGw+Yr8WRPuZIFOspdu0fagb0SJ3Poye9sPhp8otIpJA8BsX0E/i+pr4UfGyyYeogJoBjvTmnRA6sLIM241E7VA5uEKrgm/0hwCrSDQ9R55xVZ6+QNTIDmIMzzYzf5GMaUBkmlxCq6WGYO8qSdDE/0v7KYIXyL6SSm7xpPBGKj4mUuovEQgGMj38/ngXcqoq25gW7BL4d2sHSoE3B2ZOM2DbmWVKtbmOmgXgH1yDKQQXCYHs29zbh7BprTgP32qbGmOWEoyyMh4ptbXiocCtaybnTa55+aDfc46CCurmVvk6yQXcaEyy4+bshfltCe55KRT87/lsO/HpruYKnUXwNRxWgzXb6Qh4YEIXKanOXweZfgzJU55nto0CAP+mQ8ZQTrMnNLAQfgPn0Z97MCMj+FOSJwp7C1A9AudKhOSt7dlvNjkhFnYiy+YhwxrFhSTbjjS2LFu0HSN8QWHfgfN5mcB9+8tagZv1LLIU8fO/MDkvdXh751nkfQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000012, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 13/03/2025 22:38, Peter Xu wrote: > On Thu, Mar 13, 2025 at 10:13:23PM +0000, Nikita Kalyazin wrote: >> Yes, that's right, mmap() + memcpy() is functionally sufficient. write() is >> an optimisation. Most of the pages in guest_memfd are only ever accessed by >> the vCPU (not userspace) via TDP (stage-2 pagetables) so they don't need >> userspace pagetables set up. By using write() we can avoid VMA faults, >> installing corresponding PTEs and double page initialisation we discussed >> earlier. The optimised path only contains pagecache population via write(). >> Even TDP faults can be avoided if using KVM prefaulting API [1]. >> >> [1] https://docs.kernel.org/virt/kvm/api.html#kvm-pre-fault-memory > > Could you elaborate why VMA faults matters in perf? Based on my experiments, I can populate 3GiB of guest_memfd with write() in 980 ms, while memcpy takes 2140 ms. When I was profiling it, I saw ~63% of memcpy time spent in the exception handler, which made me think VMA faults mattered. > If we're talking about postcopy-like migrations on top of KVM guest-memfd, > IIUC the VMAs can be pre-faulted too just like the TDP pgtables, e.g. with > MADV_POPULATE_WRITE. Yes, I was thinking about MADV_POPULATE_WRITE as well, but AFAIK it isn't available in guest_memfd, at least with direct map removed due to [1] being updated in [2]: diff --git a/mm/gup.c b/mm/gup.c index 3883b307780e..7ddaf93c5b6a 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1283,7 +1283,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) if ((gup_flags & FOLL_LONGTERM) && vma_is_fsdax(vma)) return -EOPNOTSUPP; - if (vma_is_secretmem(vma)) + if (vma_is_secretmem(vma) || vma_is_no_direct_map(vma)) return -EFAULT; if (write) { [1] https://elixir.bootlin.com/linux/v6.13.6/source/mm/gup.c#L1286 [2] https://lore.kernel.org/kvm/20250221160728.1584559-1-roypat@amazon.co.uk/T/#m05b5c6366be27c98a86baece52b2f408c455e962 > Normally, AFAIU userapp optimizes IOs the other way round.. to change > write()s into mmap()s, which at least avoids one round of copy. > > For postcopy using minor traps (and since guest-memfd is always shared and > non-private..), it's also possible to feed the mmap()ed VAs to NIC as > buffers (e.g. in recvmsg(), for example, as part of iovec[]), and as long > as the mmap()ed ranges are not registered by KVM memslots, there's no > concern on non-atomic copy. Yes, I see what you mean. It may be faster depending on the setup, if it's possible to remove one copy. Anyway, it looks like the solution we discussed allows to choose between memcpy-only and memcpy/write-combined userspace implementations. I'm going to work on the next version of the series that would include MINOR trap and avoiding KVM dependency in mm via calling vm_ops->fault() in UFFDIO_CONTINUE. > Thanks, > > -- > Peter Xu >