From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AD835D63930 for ; Wed, 20 Nov 2024 12:09:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 43DB16B009B; Wed, 20 Nov 2024 07:09:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3EC1F6B009C; Wed, 20 Nov 2024 07:09:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2B3426B009D; Wed, 20 Nov 2024 07:09:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 0EA8C6B009B for ; Wed, 20 Nov 2024 07:09:17 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 81810C06CE for ; Wed, 20 Nov 2024 12:09:16 +0000 (UTC) X-FDA: 82806352188.11.7D93728 Received: from smtp-fw-52002.amazon.com (smtp-fw-52002.amazon.com [52.119.213.150]) by imf26.hostedemail.com (Postfix) with ESMTP id 295A4140010 for ; Wed, 20 Nov 2024 12:08:35 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b="o/ganW5D"; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf26.hostedemail.com: domain of "prvs=04767268d=kalyazin@amazon.co.uk" designates 52.119.213.150 as permitted sender) smtp.mailfrom="prvs=04767268d=kalyazin@amazon.co.uk" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1732104369; a=rsa-sha256; cv=none; b=FZuJYGkQkL+PGCGdD6Ne4JxyhAUuOfz8ARmZrdVrEq50JCDCVCm1ejZ7xD2MYhAAgAscr7 KHUfyTyeT3ZxKH0EzpDpKptA3xkwrDiLjgbakOqOfOD4jIghK3EExgo7shSJDkTtujqVld WkyU+3g+a8RDOC0snoDpEkhHUUN3nqM= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b="o/ganW5D"; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf26.hostedemail.com: domain of "prvs=04767268d=kalyazin@amazon.co.uk" designates 52.119.213.150 as permitted sender) smtp.mailfrom="prvs=04767268d=kalyazin@amazon.co.uk" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1732104369; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Dd57kkU9EzxVIZThbOiKXUHsIvhkPWTCNKEy2o4B7xM=; b=b0bL4KhOxekyZBGzHVgnee7Ww53e/Km/r8I7Y7FUGWOWg2cvF5nR43exwXPiUzIs1vjLID pAQZ3EUHUlubXphijgln4tEzzxauFabsVRDpJyboMpvnfNfflq0L56RDO/WAsKiHIFKkyj /glTHlc8OLs3xAcjmXcglHwPLZS/kyM= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1732104554; x=1763640554; h=message-id:date:mime-version:reply-to:subject:to:cc: references:from:in-reply-to:content-transfer-encoding; bh=Dd57kkU9EzxVIZThbOiKXUHsIvhkPWTCNKEy2o4B7xM=; b=o/ganW5Dc4q8z8DGGGy5Nxc5bj/ybDl8+V8w8y22PeqAiPqykRMil8g8 T3lypfA/FyiTEm/ySLwLXZR6j9xLB4q3tke6tUb2ePXQ98chZ9OGFDHpw Utg2cdrqOWMfc5stP9PU0fp5OO4aBAueqoT0G1/TFaH8gF+/9cPswSbBn E=; X-IronPort-AV: E=Sophos;i="6.12,169,1728950400"; d="scan'208";a="675113447" Received: from iad6-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.124.125.6]) by smtp-border-fw-52002.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Nov 2024 12:09:10 +0000 Received: from EX19MTAEUB002.ant.amazon.com [10.0.10.100:19497] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.32.206:2525] with esmtp (Farcaster) id afaffc34-8f44-4106-9e35-671ee5b1f179; Wed, 20 Nov 2024 12:09:08 +0000 (UTC) X-Farcaster-Flow-ID: afaffc34-8f44-4106-9e35-671ee5b1f179 Received: from EX19D022EUC002.ant.amazon.com (10.252.51.137) by EX19MTAEUB002.ant.amazon.com (10.252.51.79) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Wed, 20 Nov 2024 12:09:07 +0000 Received: from [192.168.4.32] (10.106.83.27) by EX19D022EUC002.ant.amazon.com (10.252.51.137) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Wed, 20 Nov 2024 12:09:06 +0000 Message-ID: <08aeaf6e-dc89-413a-86a6-b9772c9b2faf@amazon.com> Date: Wed, 20 Nov 2024 12:09:05 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Reply-To: Subject: Re: [RFC PATCH 0/4] KVM: ioctl for populating guest_memfd To: , , , , CC: , , , , , , , , , "David Hildenbrand" , Sean Christopherson , References: <20241024095429.54052-1-kalyazin@amazon.com> Content-Language: en-US From: Nikita Kalyazin Autocrypt: addr=kalyazin@amazon.com; keydata= xjMEY+ZIvRYJKwYBBAHaRw8BAQdA9FwYskD/5BFmiiTgktstviS9svHeszG2JfIkUqjxf+/N JU5pa2l0YSBLYWx5YXppbiA8a2FseWF6aW5AYW1hem9uLmNvbT7CjwQTFggANxYhBGhhGDEy BjLQwD9FsK+SyiCpmmTzBQJj5ki9BQkDwmcAAhsDBAsJCAcFFQgJCgsFFgIDAQAACgkQr5LK IKmaZPOR1wD/UTcn4GbLC39QIwJuWXW0DeLoikxFBYkbhYyZ5CbtrtAA/2/rnR/zKZmyXqJ6 ULlSE8eWA3ywAIOH8jIETF2fCaUCzjgEY+ZIvRIKKwYBBAGXVQEFAQEHQCqd7/nb2tb36vZt ubg1iBLCSDctMlKHsQTp7wCnEc4RAwEIB8J+BBgWCAAmFiEEaGEYMTIGMtDAP0Wwr5LKIKma ZPMFAmPmSL0FCQPCZwACGwwACgkQr5LKIKmaZPNCxAEAxwnrmyqSC63nf6hoCFCfJYQapghC abLV0+PWemntlwEA/RYx8qCWD6zOEn4eYhQAucEwtg6h1PBbeGK94khVMooF In-Reply-To: <20241024095429.54052-1-kalyazin@amazon.com> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.106.83.27] X-ClientProxiedBy: EX19D008EUC002.ant.amazon.com (10.252.51.146) To EX19D022EUC002.ant.amazon.com (10.252.51.137) X-Stat-Signature: 7wzk7u61aey4iwsrnf54uonfyhx9x3b6 X-Rspamd-Queue-Id: 295A4140010 X-Rspamd-Server: rspam08 X-Rspam-User: X-HE-Tag: 1732104515-42727 X-HE-Meta: U2FsdGVkX1/xDl8GqFE4LXAraAtrDUMVMVb9dOngfrxulhPnDphbXeWzEfs/GuHOO2ThQRp86QA9vv9Z1Sr/warir7sU95PhPnJXk0/7sxD/9yaw806lPSO6FoHOLkdgsASH8685XNka8A9Qf+ov4Le6EEsAlYLhSIydgssLbQBRzizwZWLGtBHoIsdhS+qjdTV6S00dx/KrGbF0A4318lkJ0k/sGZdEtcyErOJnCHzeV4c7/0zZIc3/oIdmz9JBRvKVfqu7iMQZO8Kd0D+obCi7DLZdcres2lrDQvCxBmlJiQ6tNU6xZAeolLE8p5E70Xf0K+psJMdlqgXxXk0DPmUEK+zZxA6NEtoRudDsrkxqIo8bGA8UYi6GfplpHaLLQY5e43QSyoiThSoYnrP1LUkpQe5VL+FuNJr2flvvy1zQoKpp8BzsmwUAIBxM4R/bUJZoPyYf+nOoCIaNfWiCUSuR2OMw0kbJ5fItLWgkzzbEmxCjCcu/qAi1Swu7UIO3535Ckf/+xEVnr+HeSfqGvepD7+6l99mwkP+ypa5v8UMdYK8IcfbYXZgKK7Wd5r+2UICQWy1CY6WWjfTDSsfnxgUhgwrCuKNvI/psslJDHaVnts6/wKJrgKZ1MfAGV2rjMD1yKIN+qRNYpsSq/lWv1pUge2LtbKDoK+/lIc7uTB4YCW7c0oFtaP+pyCs3Sw5r47tR2aJYD66URPKSNlyLG1GWFQ4ge6aPj1aoyxWpXP0nbpPIJB3L9Gh+tes94qdQcAYGMjnGuEk9v6rfMob+FPK63CLeN/kyRZBiwmuW50VKp0P29X1fn2bi6KvlthcNhAo0jVhXTPiwKCgc58p+aW5kf+q5F8vLxb9aFZr/xlfC+NszZRS+jcje6f018TyQc6Ub1l9Eg1Pq0w+8sq7wiGkYoO2iwlnOm5Mz04s0MOVNf+ato78LaPFKxTsSDVQaJoJak50yUMhpjhprsYe /2u+nKlg Ew0F80OMJq2/RxDDXqsL1jaU94KeXH8snOymB14cMPfD8d9/0GWktqjNzeYF1QX/1O8jzRNUyQNCH62nvdOdJxM2b/guCjXa893snnG0YUXD8B/5mMa25dSSbLRYywJibhiJPIw22MmpMYWyUuXbeuTdPDPGrX9SN1Jbbt/s7fgzvQmhRrx/lWHgpJNGZiei4cfjMWAPB77AeO17cfEeKxxN6wREJVmNtPxT+/BDhNNUgI6eFmOOzKTMVJlaBi6IL3lg9uF35gpHlbjFVyuRq52HyTtW5i3HRVTIIUXcvtGbYrfyrcaad6ZO0ziXXsEwmNaLa0ceC26AE2YJihADepJV/itayiD0y8wOTQ+ur3yMc1jp6Xm0HU798/YzhOkLO2HpxhCFMGlZOxwOaNV6DmgQjHDCr0/5Lwl2cF+dCZ4fvQxcu7PX/AhGI+6hB/AlLFNUQEQT0YtFPJoa72SPj3l5dAw+Irh5dr/hbgb8Dk8NlEopGltbCOheMTW7x/4cTl9/w+c6kQi8D+9O9Aft/3NwRfVkjiIyFjP64eahrxjP/r83ye12A5ytBHEhhWARlpfacRUB0j1YNxV/1skbH/TkBXjRDsIQYXsjnRJd1mE825R5LXD3vYAj2CBMFSgezeIBNV0D3wpSG0eqPbXX6rybM7qcPRrpRlQ/AXSrDudzyYLzpy6AZkKeddJKWIAEab3+JtAJo8Lun7mK3Ygd2Kx+NP5eSpdgvmvsoF/4T9httvcqEl+aY51bZqrFyDjlhnKM2eQZhtCMS1OU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000091, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 24/10/2024 10:54, Nikita Kalyazin wrote: > [2] proposes an alternative to > UserfaultFD for intercepting stage-2 faults, while this series > conceptually compliments it with the ability to populate guest memory > backed by guest_memfd for `KVM_X86_SW_PROTECTED_VM` VMs. +David +Sean +mm While measuring memory population performance of guest_memfd using this series, I noticed that guest_memfd population takes longer than my baseline, which is filling anonymous private memory via UFFDIO_COPY. I am using x86_64 for my measurements and 3 GiB memory region: - anon/private UFFDIO_COPY: 940 ms - guest_memfd: 1371 ms (+46%) It turns out that the effect is observable not only for guest_memfd, but also for any type of shared memory, eg memfd or anonymous memory mapped as shared. Below are measurements of a plain mmap(MAP_POPULATE) operation: mmap(NULL, 3ll * (1 << 30), PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE, -1, 0); vs mmap(NULL, 3ll * (1 << 30), PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS | MAP_POPULATE, -1, 0); Results: - MAP_PRIVATE: 968 ms - MAP_SHARED: 1646 ms I am seeing this effect on a range of kernels. The oldest I used was 5.10, the newest is the current kvm-next (for-linus-2590-gd96c77bd4eeb). When profiling with perf, I observe the following hottest operations (kvm-next). Attaching full distributions at the end of the email. MAP_PRIVATE: - 19.72% clear_page_erms, rep stos %al,%es:(%rdi) MAP_SHARED: - 43.94% shmem_get_folio_gfp, lock orb $0x8,(%rdi), which is atomic setting of the PG_uptodate bit - 10.98% clear_page_erms, rep stos %al,%es:(%rdi) Note that MAP_PRIVATE/do_anonymous_page calls __folio_mark_uptodate that sets the PG_uptodate bit regularly. , while MAP_SHARED/shmem_get_folio_gfp calls folio_mark_uptodate that sets the PG_uptodate bit atomically. While this logic is intuitive, its performance effect is more significant that I would expect. The questions are: - Is this a well-known behaviour? - Is there a way to mitigate that, ie make shared memory (including guest_memfd) population faster/comparable to private memory? Nikita Appendix: full call tree obtained via perf MAP_RPIVATE: - 87.97% __mmap entry_SYSCALL_64_after_hwframe do_syscall_64 vm_mmap_pgoff __mm_populate populate_vma_page_range - __get_user_pages - 77.94% handle_mm_fault - 76.90% __handle_mm_fault - 72.70% do_anonymous_page - 31.92% vma_alloc_folio_noprof - 30.74% alloc_pages_mpol_noprof - 29.60% __alloc_pages_noprof - 28.40% get_page_from_freelist 19.72% clear_page_erms - 3.00% __rmqueue_pcplist __mod_zone_page_state 1.18% _raw_spin_trylock - 20.03% __pte_offset_map_lock - 15.96% _raw_spin_lock 1.50% preempt_count_add - 2.27% __pte_offset_map __rcu_read_lock - 7.22% __folio_batch_add_and_move - 4.68% folio_batch_move_lru - 3.77% lru_add + 0.95% __mod_zone_page_state 0.86% __mod_node_page_state 0.84% folios_put_refs 0.55% check_preemption_disabled - 2.85% folio_add_new_anon_rmap - __folio_mod_stat __mod_node_page_state - 1.15% pte_offset_map_nolock __pte_offset_map - 7.59% follow_page_pte - 4.56% __pte_offset_map_lock - 2.27% _raw_spin_lock preempt_count_add 1.13% __pte_offset_map 0.75% folio_mark_accessed MAP_SHARED: - 77.89% __mmap entry_SYSCALL_64_after_hwframe do_syscall_64 vm_mmap_pgoff __mm_populate populate_vma_page_range - __get_user_pages - 72.11% handle_mm_fault - 71.67% __handle_mm_fault - 69.62% do_fault - 44.61% __do_fault - shmem_fault - 43.94% shmem_get_folio_gfp - 17.20% shmem_alloc_and_add_folio.constprop.0 - 5.10% shmem_alloc_folio - 4.58% folio_alloc_mpol_noprof - alloc_pages_mpol_noprof - 4.00% __alloc_pages_noprof - 3.31% get_page_from_freelist 1.24% __rmqueue_pcplist - 5.07% shmem_add_to_page_cache - 1.44% __mod_node_page_state 0.61% check_preemption_disabled 0.78% xas_store 0.74% xas_find_conflict 0.66% _raw_spin_lock_irq - 3.96% __folio_batch_add_and_move - 2.41% folio_batch_move_lru 1.88% lru_add - 1.56% shmem_inode_acct_blocks - 1.24% __dquot_alloc_space - 0.77% inode_add_bytes _raw_spin_lock - 0.77% shmem_recalc_inode _raw_spin_lock 10.98% clear_page_erms - 1.17% filemap_get_entry 0.78% xas_load - 20.26% filemap_map_pages - 12.23% next_uptodate_folio - 1.27% xas_find xas_load - 1.16% __pte_offset_map_lock 0.59% _raw_spin_lock - 3.48% finish_fault - 1.28% set_pte_range 0.96% folio_add_file_rmap_ptes - 0.91% __pte_offset_map_lock 0.54% _raw_spin_lock 0.57% pte_offset_map_nolock - 4.11% follow_page_pte - 2.36% __pte_offset_map_lock - 1.32% _raw_spin_lock preempt_count_add 0.54% __pte_offset_map