From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9BB32C36018 for ; Wed, 2 Apr 2025 22:25:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4E2AC280003; Wed, 2 Apr 2025 18:25:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 46A53280001; Wed, 2 Apr 2025 18:25:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 30B97280003; Wed, 2 Apr 2025 18:25:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 11AA6280001 for ; Wed, 2 Apr 2025 18:25:31 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 70D22120AE8 for ; Wed, 2 Apr 2025 22:25:31 +0000 (UTC) X-FDA: 83290536462.20.E84CC8A Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) by imf12.hostedemail.com (Postfix) with ESMTP id BC44E40006 for ; Wed, 2 Apr 2025 22:25:29 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=XW1sHTux; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf12.hostedemail.com: domain of 3WLntZwsKCMcnpxr4yrB60tt11tyr.p1zyv07A-zzx8npx.14t@flex--ackerleytng.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3WLntZwsKCMcnpxr4yrB60tt11tyr.p1zyv07A-zzx8npx.14t@flex--ackerleytng.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743632729; a=rsa-sha256; cv=none; b=woRvctC09W0VmSHCLoKlLsA4U205Daws3OnoHpkSD2avCfyQD1hX4QRNRDYQbPDa019yD/ PMSNi6xnNEa47iF9mkLLnILB0jVLiwP3f/OxB6N1MJIcvf3nrGmRX6Rv7Os+q0bGN69Q2N iR/tQ4h5PE6HhUw0nMVnvoPdjZyxk3g= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=XW1sHTux; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf12.hostedemail.com: domain of 3WLntZwsKCMcnpxr4yrB60tt11tyr.p1zyv07A-zzx8npx.14t@flex--ackerleytng.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3WLntZwsKCMcnpxr4yrB60tt11tyr.p1zyv07A-zzx8npx.14t@flex--ackerleytng.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743632729; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:dkim-signature; bh=A6WcvdR6WqAMHzvGxAbwSacD4Wb/rLMAXYVnrZz6qbg=; b=K7HKx5WFi8IMLb/8cIvhrUL4yK7bOfhiYcFff8YMmX90MuhbZA1ahCj3paozG601N6OOe0 T3A/63RsAZ3Rv7ry9Vmeb3/hpbtDKxURsixYKEUJeSyhdstorJGipSUAT8OOs5+n9sN9Lx qm1vSTKgOu149Q5DKtyv6jRPZjX3K1Q= Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-736bf7eb149so203098b3a.0 for ; Wed, 02 Apr 2025 15:25:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1743632728; x=1744237528; darn=kvack.org; h=cc:to:from:subject:message-id:mime-version:in-reply-to:date:from:to :cc:subject:date:message-id:reply-to; bh=A6WcvdR6WqAMHzvGxAbwSacD4Wb/rLMAXYVnrZz6qbg=; b=XW1sHTuxWL28MrP2oAIzuE5+1zkFyVvqIYJ067R2OsU9GalEkFBGefmNfZS4kUT5hb S4LGx5s81Nw+KVAA80r6Ji4XiFRHVGudgtcs1y6v5UfGKxk+6itzE32FcELYK64AzJHn kqFo4Y0eQKb9njlf+annmHdtyIQfFY/kKHvm3TE3PtdrHEQFhzguKcJf+PsX+bsDV8QA JENFGQX7U54O07Vl8UF5SmPxO7U/LPA9ywFSLAUTu1IzIyAWkS0HK5RnJ+jM/B2sEEhk xTrmOnrjp07QsRSe8uxUcBHTB24jdkWw+98CXgglbrpQGbBB4tBgYzHsRqXR0PqYmPZF cXaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1743632728; x=1744237528; h=cc:to:from:subject:message-id:mime-version:in-reply-to:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=A6WcvdR6WqAMHzvGxAbwSacD4Wb/rLMAXYVnrZz6qbg=; b=eOoS2QfhH7xgTRkr8j6BN8yELYKI+B+v1qMn6cwHeGrQ7iA2VVeOl6Tr3hLF9cdMOA SvlINSn/yYVPnsGnTH+G71qmACGl38a7fYNAOCTxgzT2FIe9bOTgJ1W+OYbU2DKsa2Fc 9dqvNdWZOkEB5I2OQz9gSIMcn9DJdKfrfnp/ImW3hBPhx+efwXQ+A4RwL6k88l3eJqGk +2SRg552tnHEEj5XnZOtUYJvBqClYJlheES1iozTgY660OPD8cBXCR1Cm0vytf+aYaTp dQZml4IKz92deA/03DL6OqCezUD2GH08PTcvzegPpy7IhpI2VjtZEidNwzt0mvFgHTRk mMbA== X-Forwarded-Encrypted: i=1; AJvYcCX3D62yGl61C6eBdbC1/pbUThnOiG3xSVKUMFK/n6yKrjudKZyy3Akn9oRwAcqbdlBTwyfgppkzGQ==@kvack.org X-Gm-Message-State: AOJu0YwdCmLEwqUXcP0idWdOWvQ63DR5LxXtrNomznmMBXpZBaO7yi/I a/hyDLzt8PehKg8kosQBLayYCD4KSatNr8iNS8VTGFlRLhJG2kPSxk9yghvg/wbXip3cfOSAa/L Zd3a5cz8Qp6JDbniRow5s0w== X-Google-Smtp-Source: AGHT+IGMP1M5dr+i5RR5wshPuL6my2/PghcE1DIyVKMYQME2SB4PdP1R+1Hd6RdA1YQ7XI2HJR0mJC1nwnB0LvRPjA== X-Received: from pfbci14.prod.google.com ([2002:a05:6a00:28ce:b0:736:a055:1ce3]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:80d:b0:736:d6da:8f9e with SMTP id d2e1a72fcca58-739c7511323mr5458330b3a.0.1743632728406; Wed, 02 Apr 2025 15:25:28 -0700 (PDT) Date: Wed, 02 Apr 2025 15:25:26 -0700 In-Reply-To: <20250328153133.3504118-4-tabba@google.com> (message from Fuad Tabba on Fri, 28 Mar 2025 15:31:29 +0000) Mime-Version: 1.0 Message-ID: Subject: Re: [PATCH v7 3/7] KVM: guest_memfd: Track folio sharing within a struct kvm_gmem_private From: Ackerley Tng To: Fuad Tabba Cc: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, peterx@redhat.com, pankaj.gupta@amd.com, tabba@google.com Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: BC44E40006 X-Stat-Signature: qn49zagmc69w6a884bnt9ryasa8rz7xh X-Rspam-User: X-HE-Tag: 1743632729-838171 X-HE-Meta: U2FsdGVkX1+Vo3+/hiIK2MO08intj3SBqxQDWpA2yLIXC5WvnBfnb5GzV280TWbEEuvwxMttld/7S6aDtpCMAPdEfdjLLpGWI83LJJVmaWL6jEN6u+8w/1FJvqlDzFfLSm2UnrWDNlycl5FW2vV3ax9JYSSWXN+PDlaGV84TCDpJ9j1biwbHeXbrHGdz2Lv3DXI90PeQwA3h+FJ0ARrsi7zUXr0gAS9sF+6ux21MPDlOG+hGaVoT4ewlNtVvcg/2ycmGbzNf5kUCky1kvCk76J+7/iFXinKlgQGY2FuHc7RhY7a+LBnhA3jDoPkG0eu8AmZF0NBl6sFjK9Ph6Nx5ofqLznCBjGv70Ep8BkPCRuJet4938+BRRsevIpvEay8QknYGKc2P2l7SnWtHX668G2Vc632wYr9FF7cyRqrT/Y49exgmWEsIPX7zUACKSiVsaOcESH2egVmD0kz9M9QskgcTWSgQdPJI3J4mLFW8Bzxh+Tl1IhGPDoVdthtsmYYtBpza2i4irVUEol4aQMzs8dxAKh2aA2ISb85F+KsShrAPmj7RCj7NacdNVp5p9ESBbUDaZEzK2dzPzu+0ClzRgkSpyv33ZKb8lTo5VDagbSVLjl3U0hYxpupPVA22eCQokV/nYQDLSMDj5sybZXCVPYZl5o1IgG+tRDo4ZwrpDCr1+dmpHGJ2ldq8tpi+FxOXebkBwgRJsetQdXxtcxCZWfx5lbYDw9391sVmtkuYYOQblNm4hLZq1MARjEHO20ZEdee4WeVbZtOFxXlr861dEg0vb1pAYSZw5VLu3Gol+IMt/14W/L5yQNRUlElzkzixa8vfswe8Xi7dFpiD7t7dHd2+rtUkfzKpnIMypK1D2G2L0ufX7p0XTqw58D1hX5QRsy/Bsq1bKGZBLx3cnHp6jW4QXY5wRHYAvlJN3FpstPtbdGkPkKrcXsYg+2j9bVn3glmgspGJHmRi4tN28ys BikhCJOK JM+baJK/RtW5lHef1SmMzJmluz0N3l87wC+BqBw0Ub4WDsN7CnobLX6wNedizNiGmcBU2nblf4ZXL5YrHYdeBXEiqnmpk9LbQfDGvQKqnhP4pQcMTus/hMiO6sT0j59xW6YVjAH4SxhU099ysvdsRVM3lE3kGkU+3RzMhIUM181ZMp5na8xJpu0cBBaMCWvQvC2ryVV/S1DNv4SCPVCzfHO8cXlXdwgUJj0G0BVhymFPTS42tbfcSm/TKwwX25racc7j3C1Pv3ovXwmh9FN4+QhonYpPP49RAb5whSgcGN8iegzlgm/pu5N29suaF1+cyOLlGZf3jYWy4YWhushJG0dT3kM3mpex8aogqbHFwm5SihnGEe70fVdTpYqHixHMpyULGEWYidlxrZVzJSDtspJ9u66xusOe2shbMqbAc8AOkrsydPP+NJOy+ce4cTfCA63Pt0J5jZT3lhxor1Q47LkOpLhITamp+UxeHpbUjC8ae/IVvxuIepS4BpY9m4AvKCh/UgmPOqohfOJQR/Oc0qlqva1cgCExaIdH7fC0on/wR3orODbWBKMrMln+Ny98xSZrAeBcoyBnvks9KJa+ST7cWODE1DjUexz3qAUoDNXR00Bq+ieZ3HcXkXu0UfnG+Y/iQ5FTUjD7s2k9VGYWzBTbPXZkexGyPa0XhYXUovFQa+Vu9dlqAUPeS1+bXy95Wq6Wn1w0Soj3Z/P1ZFPaqZRlVdg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Fuad Tabba writes: > From: Ackerley Tng > > Track guest_memfd folio sharing state within the inode, since it is a > property of the guest_memfd's memory contents. > > The guest_memfd PRIVATE memory attribute is not used for two reasons. It > reflects the userspace expectation for the memory state, and therefore > can be toggled by userspace. Also, although each guest_memfd file has a > 1:1 binding with a KVM instance, the plan is to allow multiple files per > inode, e.g. to allow intra-host migration to a new KVM instance, without > destroying guest_memfd. > > Signed-off-by: Ackerley Tng > Co-developed-by: Vishal Annapurve > Signed-off-by: Vishal Annapurve > Co-developed-by: Fuad Tabba > Signed-off-by: Fuad Tabba > --- > virt/kvm/guest_memfd.c | 58 ++++++++++++++++++++++++++++++++++++++---- > 1 file changed, 53 insertions(+), 5 deletions(-) > > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c > index ac6b8853699d..cde16ed3b230 100644 > --- a/virt/kvm/guest_memfd.c > +++ b/virt/kvm/guest_memfd.c > @@ -17,6 +17,18 @@ struct kvm_gmem { > struct list_head entry; > }; > > +struct kvm_gmem_inode_private { > +#ifdef CONFIG_KVM_GMEM_SHARED_MEM > + struct xarray shared_offsets; > + rwlock_t offsets_lock; This lock doesn't work, either that or this lock can't be held while faulting, because holding this lock means we can't sleep, and we need to sleep to allocate. One of these config options must have helped throw a BUG CONFIG_DEBUG_ATOMIC_SLEEP=y CONFIG_DEBUG_IRQFLAGS=y CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_DEBUG_LOCKDEP=y CONFIG_DEBUG_MUTEXES=y CONFIG_DEBUG_RT_MUTEXES=y CONFIG_DEBUG_RWSEMS=y CONFIG_DEBUG_SPINLOCK=y CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y CONFIG_LOCKDEP=y CONFIG_LOCK_STAT=y CONFIG_PREEMPT_COUNT=y CONFIG_PREEMPTIRQ_TRACEPOINTS=y CONFIG_PROVE_LOCKING=y CONFIG_PROVE_RAW_LOCK_NESTING=y CONFIG_PROVE_RCU=y CONFIG_RCU_CPU_STALL_CPUTIME=y CONFIG_TRACE_IRQFLAGS_NMI=y CONFIG_TRACE_IRQFLAGS=y CONFIG_UNINLINE_SPIN_UNLOCK=y with ./guest_memfd_test [ 161.255012] BUG: sleeping function called from invalid context at ./include/linux/sched/mm.h:321 [ 161.257350] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 254, name: guest_memfd_tes [ 161.259662] preempt_count: 1, expected: 0 [ 161.260884] RCU nest depth: 0, expected: 0 [ 161.262119] 3 locks held by guest_memfd_tes/254: [ 161.263470] #0: ffff8883064c3c80 (&mm->mmap_lock){++++}-{4:4}, at: lock_mm_and_find_vma+0x29/0x140 [ 161.265932] #1: ffff88830dedbc10 (mapping.invalidate_lock#4){++++}-{4:4}, at: kvm_gmem_fault+0x3d/0x1f0 [ 161.268507] #2: ffff88830d510d30 (&private->offsets_lock){.+.+}-{3:3}, at: kvm_gmem_fault+0x45/0x1f0 [ 161.270992] CPU: 2 UID: 0 PID: 254 Comm: guest_memfd_tes Tainted: G W 6.14.0-rc7-00016-g174a15c15f96 #1 [ 161.270995] Tainted: [W]=WARN [ 161.270996] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 161.270997] Call Trace: [ 161.270998] [ 161.271000] dump_stack_lvl+0xa7/0x100 [ 161.271005] __might_resched+0x261/0x280 [ 161.271009] prepare_alloc_pages+0xe5/0x1e0 [ 161.271013] __alloc_frozen_pages_noprof+0xbc/0x2a0 [ 161.271019] alloc_pages_mpol+0x111/0x1d0 [ 161.271025] alloc_pages_noprof+0x7e/0x120 [ 161.271028] folio_alloc_noprof+0x14/0x30 [ 161.271030] __filemap_get_folio+0x189/0x380 [ 161.271036] kvm_gmem_fault+0x5e/0x1f0 [ 161.271041] __do_fault+0x42/0xc0 [ 161.271045] handle_mm_fault+0xf37/0x1c90 [ 161.271047] ? handle_mm_fault+0x3c/0x1c90 [ 161.271053] ? mt_find+0x208/0x2a0 [ 161.271088] do_user_addr_fault+0x3c0/0x740 [ 161.271095] exc_page_fault+0x69/0x110 [ 161.271099] asm_exc_page_fault+0x26/0x30 [ 161.271102] RIP: 0033:0x419fb0 [ 161.271104] Code: 48 8d 3c 17 48 89 c1 48 85 d2 74 2e 48 89 fa 48 29 c2 83 e2 01 74 13 48 8d 48 01 40 88 71 ff 48 39 cf 74 17 66 0f 1f 44 00 00 <44> 88 01 48 83 c1 02 44 88 41 ff 48 39 cf[ 161.271105] RSP: 002b:00007fffd695b568 EFLAGS: 00010246 [ 161.271107] RAX: 00007f4f395a0000 RBX: 00007fffd695b590 RCX: 00007f4f395a0000 [ 161.271108] RDX: 0000000000000000 RSI: 00000000ffffffaa RDI: 00007f4f395a4000 [ 161.271109] RBP: 0000000000000005 R08: 00000000ffffffaa R09: 0000000000000000 [ 161.271109] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000001000 [ 161.271110] R13: 000000002e03ab40 R14: 00007f4f395a0000 R15: 0000000000004000 [ 161.271119] > +#endif > +}; > + > +static struct kvm_gmem_inode_private *kvm_gmem_private(struct inode *inode) > +{ > + return inode->i_mapping->i_private_data; > +} > + > #ifdef CONFIG_KVM_GMEM_SHARED_MEM > void kvm_gmem_handle_folio_put(struct folio *folio) > { > @@ -324,8 +336,28 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn) > return gfn - slot->base_gfn + slot->gmem.pgoff; > } > > +static void kvm_gmem_evict_inode(struct inode *inode) > +{ > + struct kvm_gmem_inode_private *private = kvm_gmem_private(inode); > + > +#ifdef CONFIG_KVM_GMEM_SHARED_MEM > + /* > + * .evict_inode can be called before private data is set up if there are > + * issues during inode creation. > + */ > + if (private) > + xa_destroy(&private->shared_offsets); > +#endif > + > + truncate_inode_pages_final(inode->i_mapping); > + > + kfree(private); > + clear_inode(inode); > +} > + > static const struct super_operations kvm_gmem_super_operations = { > - .statfs = simple_statfs, > + .statfs = simple_statfs, > + .evict_inode = kvm_gmem_evict_inode, > }; > > static int kvm_gmem_init_fs_context(struct fs_context *fc) > @@ -553,6 +585,7 @@ static struct inode *kvm_gmem_inode_make_secure_inode(const char *name, > loff_t size, u64 flags) > { > const struct qstr qname = QSTR_INIT(name, strlen(name)); > + struct kvm_gmem_inode_private *private; > struct inode *inode; > int err; > > @@ -561,10 +594,20 @@ static struct inode *kvm_gmem_inode_make_secure_inode(const char *name, > return inode; > > err = security_inode_init_security_anon(inode, &qname, NULL); > - if (err) { > - iput(inode); > - return ERR_PTR(err); > - } > + if (err) > + goto out; > + > + err = -ENOMEM; > + private = kzalloc(sizeof(*private), GFP_KERNEL); > + if (!private) > + goto out; > + > +#ifdef CONFIG_KVM_GMEM_SHARED_MEM > + xa_init(&private->shared_offsets); > + rwlock_init(&private->offsets_lock); > +#endif > + > + inode->i_mapping->i_private_data = private; > > inode->i_private = (void *)(unsigned long)flags; > inode->i_op = &kvm_gmem_iops; > @@ -577,6 +620,11 @@ static struct inode *kvm_gmem_inode_make_secure_inode(const char *name, > WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping)); > > return inode; > + > +out: > + iput(inode); > + > + return ERR_PTR(err); > } > > static struct file *kvm_gmem_inode_create_getfile(void *priv, loff_t size,