From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F41E7C5AD49 for ; Tue, 3 Jun 2025 01:05:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 76F086B037D; Mon, 2 Jun 2025 21:05:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 71FC46B037E; Mon, 2 Jun 2025 21:05:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5C1846B037F; Mon, 2 Jun 2025 21:05:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 384386B037D for ; Mon, 2 Jun 2025 21:05:49 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id A4C09BF363 for ; Tue, 3 Jun 2025 01:05:48 +0000 (UTC) X-FDA: 83512297176.26.C127A96 Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) by imf09.hostedemail.com (Postfix) with ESMTP id B0C30140006 for ; Tue, 3 Jun 2025 01:05:46 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=07Kk15BJ; spf=pass (imf09.hostedemail.com: domain of vannapurve@google.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=vannapurve@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748912746; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3zA8i3SvMx3WliW0WUY0TLhornaCo9nNs4r4P4DLJp8=; b=0mNkazz7CJzb8dxx9BvZpovNWnoFAVDQ5MMf+mPOxuNXGblHPJa9OWswZ8Thx4PJE4+o+b 2KoclkVDAfGN42c8+sxVoH7lCRIB52J6PKu7Ug4Cw+OPYpk8/3G1vNJhXbFRFBPkSsDgdC 7mtx/98e8ZoJdm/GOcCowdplmQ13e54= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=07Kk15BJ; spf=pass (imf09.hostedemail.com: domain of vannapurve@google.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=vannapurve@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748912746; a=rsa-sha256; cv=none; b=BnYN1tjA4fNagQUEi6lrBIMBGTYibEX4MlHDY/fInRCmd3JUvbbaMn05NbQ/4anwnasKyI Cc1uKZryWTBoY4iD8KKsCjKsChh+dKobgSr2T2Wv5lUV99K4esGxEaQDLv4XNzeedSyKTg oD3LoIbun1idXP/RgffhTZRH8uvv7dg= Received: by mail-pl1-f170.google.com with SMTP id d9443c01a7336-2348ac8e0b4so63645ad.1 for ; Mon, 02 Jun 2025 18:05:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1748912745; x=1749517545; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=3zA8i3SvMx3WliW0WUY0TLhornaCo9nNs4r4P4DLJp8=; b=07Kk15BJub/gRL+fCuLy+AKQAtWHAwkc9+VrkwaLW4Ey1RXgYjxTSZuKCg9j4BL56R qi2ev26+b8pQfVbZLbvJgf91NodFltfhmZdV8/4BNreYhcvn3WgF3eVd3ZWR+1hhIyM6 N+PIriUffZmReFsMIcg/iYokVHUoD6+fxhhygJt1NvgXWMjcxLsdw+2nmtovYWZhmwi0 3uZTP73WgCnC8Q9hauvFS2ESYSF6ENPJHDXMknhl/ZB+nUl3rVAiVuEzS2nPo/v7hbls u5pTMEQvqlrBQu/b+SvjC42M60PPocF2o3FP5RTN3S0umqk7mDL8e6/8jnev+X5j5sva scgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748912745; x=1749517545; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3zA8i3SvMx3WliW0WUY0TLhornaCo9nNs4r4P4DLJp8=; b=Pqdn3my93+TwBUY8xTbpzYHShffRkqnDOErJVXXepOckAqhsjpSTiDygT2MUrGm5rR EaCyJDucvYEVTUaMx+LGXsiCg8ygFhxN4m680bvhQnLUPHNZebl2dIrDc4n03AjegIiY +x4uGsKOvIUfezZkfavoV/g+FvpFZafkXJ/tM+Cn7ttD+IDaBjFbTzu0FsJmI3ey9G8U Y+M8b6fBXi4g25RsTldHqo+Ew6qONQM1EbCSHzxTv6WpvEUI1k5rO7lACP6SPPKKbgS0 tQ5yll3jky5xAQt5w7Mdrd75RKWjPUKin7XcPMmc6jJorCaTJCOOY9AlPYCP+QTOoq2V RnWQ== X-Forwarded-Encrypted: i=1; AJvYcCVX9fhCFzWZBuCN5t8q5F7ioEVtPSshuiNELyoyX9TXl1BBnmBdLjxEFcULH2jD5PmQaMPHak8kdA==@kvack.org X-Gm-Message-State: AOJu0Yz9q4Oec5jPSyW62/2/S2aT1I6+dX5b7YDSVkezr9DKm+EGBbnX 6YwDAlBvv1qurin92sQ+TWbB7Pg4xQWQiuW/119jodsjbnndyKYlBbWu0R8LdobwGi0PZAl/3cw TGWliGVbXcnREJLXU6VG6F2pFSXSx9Wy+GVc/IXB3 X-Gm-Gg: ASbGncsR9sxGEhprEknTVqmIanRqRYAIjTMFK8TpgjQhGXDYh7R76iNHiiV9XFwMhV0 a6TmoU6Talh+jKN7ey6lKh+pPZneoP5a5ts48+bDtqeK7v41BwzX2AENJTpndKgWug8xf/O59Nh EfOS4+zFmA8k8+5JXay0Yse1iyzpXGCK5S/2byJiRCz67qE/8YFEwiPrc8C/3ul7lc+a/u2fgYZ Q== X-Google-Smtp-Source: AGHT+IF8Jp/U2+xc+07146dXlpTbASvoteEnNJEFuewV+ucbySXUxWAeNEdlDnnAVbVMzXVfjIYziZx78rbAmXvzJ98= X-Received: by 2002:a17:902:ec90:b0:234:c2e7:a0e7 with SMTP id d9443c01a7336-235c83a1796mr1458355ad.4.1748912745102; Mon, 02 Jun 2025 18:05:45 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Vishal Annapurve Date: Mon, 2 Jun 2025 18:05:32 -0700 X-Gm-Features: AX0GCFtnJZatjpGKUUpgyUTxkmXtEoTmNaLkinCtwBCaHY9YrfWtQsfvQRQkcEw Message-ID: Subject: Re: [PATCH 3/5] KVM: gmem: Hold filemap invalidate lock while allocating/preparing folios To: Yan Zhao Cc: Ackerley Tng , michael.roth@amd.com, kvm@vger.kernel.org, linux-coco@lists.linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, jroedel@suse.de, thomas.lendacky@amd.com, pbonzini@redhat.com, seanjc@google.com, vbabka@suse.cz, amit.shah@amd.com, pratikrajesh.sampat@amd.com, ashish.kalra@amd.com, liam.merwick@oracle.com, david@redhat.com, quic_eberman@quicinc.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: z969nb84oszwsbfj17fmsosfw5d5mxkp X-Rspamd-Queue-Id: B0C30140006 X-Rspamd-Server: rspam11 X-HE-Tag: 1748912746-870293 X-HE-Meta: U2FsdGVkX19IBtqy5kiJD/KO+H4t5Gd5q4qwWDtUYntccAYBO7FgwwlbLmusY/BAT62mL2/eoZ0LerYowwg1oPg7DLJf7IRQ9oJFr4LOywGSwXAlaGTegklLpR6g02f4JOxYX0ZDxELpypL/8/Or6LWoYLlTXzZYttddanNFTdDsqUgg2XNPC9gWHpkKR/SkfZdohHurCrFDmJlZ4z8Stsr9PuFc1z2QSylJ3Mrri7TbALmMm2xW1+0qYvm3FrxMHMu0urpzVY7B4IDEnS8jok//0UiqDiB3T4HAL0LQjuPdQp7gJDMc+nqw3Ty+B5L063EUsEbIeOeMkjHlT2lfSV1k8CsPF7SyCEVNqThQGGE5jkQKVIqd766/fbkIHtMPbXbrqjooBTmqw2WaBm2tApnlJr5MNsNGK8iJAehA1aZE6YhYEFvT+e0xRyMwJr4/LWfuIU2N315AB9uyAswOKQvFkKBTuHlj+6f3rhq2/4jiHyFWegB3OqLcaTK5jzeJ9jOvNZVdm1swjBBWejdgF+XIeMJZwMS66zXJXLK4cnFOmNHjAdD5BEIKNgp98zKAjI3Ao5AJ022Lhs6h5qMkkX8oV9NlD4zl6QKcVDP2NfXm6uS6URTBd+B0gbMt83PWBgYe3Ubx+nqpMHCvoF1nyiC3cLLWwIm2LFnv59cMd0vlzQLwlVny6z+hD3D+JxcOp66oJUBWXgE5qgJcaTsVEEYGaZGbpfWN0W8dTkVrBcUmRvKCEFj/I1wpdRlupVKOsYsjDOzUhimvdFVR+3OXH1lWo19NQc0MVTam5MfrlhccaiuZUAjexlddzAp8K+YVuKtgLWF95UbTVVWrnZAqbv5tS03sU2yxSW3w8RhDWnV5LfuLm71adhKW4BkScuZVjOBdf8KC9Xc93TTS/Wf4B+7F9bo6iwuZjLFAFdoxoJSm13nlJpK4/MwWEDXW7YoMK6qgyM7kBR7+BSuZbqJ Dlh2QtBk UpG+etOOnNhp/TLsrHaKLT5CjUNCeja59gEuDp/0h3wUpdMCkPJyAFCKV7xGv/GlQXMqTjceqVzDUDeLm+wc1IetL2mClgsCLj73G2rtcePt3B1RY2s0luBM25vqP2pWszF561ArLkVW/SHexNmRU8PqQEYrrwEEG87jYridsNNiMyQ+4K/a3QARFQuyleqcKvaiPa1wT12NZPmliQ1/8g7hUnqBTLB2JEe/5EGzcS5J2PNavtKhkdQoIChXc0mvcntcqPEjIdKanqnM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, May 20, 2025 at 11:49=E2=80=AFPM Yan Zhao wr= ote: > > On Mon, May 19, 2025 at 10:04:45AM -0700, Ackerley Tng wrote: > > Ackerley Tng writes: > > > > > Yan Zhao writes: > > > > > >> On Fri, Mar 14, 2025 at 05:20:21PM +0800, Yan Zhao wrote: > > >>> This patch would cause host deadlock when booting up a TDX VM even = if huge page > > >>> is turned off. I currently reverted this patch. No further debug ye= t. > > >> This is because kvm_gmem_populate() takes filemap invalidation lock,= and for > > >> TDX, kvm_gmem_populate() further invokes kvm_gmem_get_pfn(), causing= deadlock. > > >> > > >> kvm_gmem_populate > > >> filemap_invalidate_lock > > >> post_populate > > >> tdx_gmem_post_populate > > >> kvm_tdp_map_page > > >> kvm_mmu_do_page_fault > > >> kvm_tdp_page_fault > > >> kvm_tdp_mmu_page_fault > > >> kvm_mmu_faultin_pfn > > >> __kvm_mmu_faultin_pfn > > >> kvm_mmu_faultin_pfn_private > > >> kvm_gmem_get_pfn > > >> filemap_invalidate_lock_shared > > >> > > >> Though, kvm_gmem_populate() is able to take shared filemap invalidat= ion lock, > > >> (then no deadlock), lockdep would still warn "Possible unsafe lockin= g scenario: > > >> ...DEADLOCK" due to the recursive shared lock, since commit e9181886= 11f0 > > >> ("locking: More accurate annotations for read_lock()"). > > >> > > > > > > Thank you for investigating. This should be fixed in the next revisio= n. > > > > > > > This was not fixed in v2 [1], I misunderstood this locking issue. > > > > IIUC kvm_gmem_populate() gets a pfn via __kvm_gmem_get_pfn(), then call= s > > part of the KVM fault handler to map the pfn into secure EPTs, then > > calls the TDX module for the copy+encrypt. > > > > Regarding this lock, seems like KVM'S MMU lock is already held while TD= X > > does the copy+encrypt. Why must the filemap_invalidate_lock() also be > > held throughout the process? > If kvm_gmem_populate() does not hold filemap invalidate lock around all > requested pages, what value should it return after kvm_gmem_punch_hole() = zaps a > mapping it just successfully installed? > > TDX currently only holds the read kvm->mmu_lock in tdx_gmem_post_populate= () when > CONFIG_KVM_PROVE_MMU is enabled, due to both slots_lock and the filemap > invalidate lock being taken in kvm_gmem_populate(). Does TDX need kvm_gmem_populate path just to ensure SEPT ranges are not zapped during tdh_mem_page_add and tdh_mr_extend operations? Would holding KVM MMU read lock during these operations sufficient to avoid having to do this back and forth between TDX and gmem layers? > > Looks sev_gmem_post_populate() does not take kvm->mmu_lock either. > > I think kvm_gmem_populate() needs to hold the filemap invalidate lock at = least > around each __kvm_gmem_get_pfn(), post_populate() and kvm_gmem_mark_prepa= red(). > > > If we don't have to hold the filemap_invalidate_lock() throughout, > > > > 1. Would it be possible to call kvm_gmem_get_pfn() to get the pfn > > instead of calling __kvm_gmem_get_pfn() and managing the lock in a > > loop? > > > > 2. Would it be possible to trigger the kvm fault path from > > kvm_gmem_populate() so that we don't rebuild the get_pfn+mapping > > logic and reuse the entire faulting code? That way the > > filemap_invalidate_lock() will only be held while getting a pfn. > The kvm fault path is invoked in TDX's post_populate() callback. > I don't find a good way to move it to kvm_gmem_populate(). > > > [1] https://lore.kernel.org/all/cover.1747264138.git.ackerleytng@google= .com/T/ > > > > >>> > @@ -819,12 +827,16 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struc= t kvm_memory_slot *slot, > > >>> > pgoff_t index =3D kvm_gmem_get_index(slot, gfn); > > >>> > struct file *file =3D kvm_gmem_get_file(slot); > > >>> > int max_order_local; > > >>> > + struct address_space *mapping; > > >>> > struct folio *folio; > > >>> > int r =3D 0; > > >>> > > > >>> > if (!file) > > >>> > return -EFAULT; > > >>> > > > >>> > + mapping =3D file->f_inode->i_mapping; > > >>> > + filemap_invalidate_lock_shared(mapping); > > >>> > + > > >>> > /* > > >>> > * The caller might pass a NULL 'max_order', but internal= ly this > > >>> > * function needs to be aware of any order limitations se= t by > > >>> > @@ -838,6 +850,7 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct = kvm_memory_slot *slot, > > >>> > folio =3D __kvm_gmem_get_pfn(file, slot, index, pfn, &max= _order_local); > > >>> > if (IS_ERR(folio)) { > > >>> > r =3D PTR_ERR(folio); > > >>> > + filemap_invalidate_unlock_shared(mapping); > > >>> > goto out; > > >>> > } > > >>> > > > >>> > @@ -845,6 +858,7 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct = kvm_memory_slot *slot, > > >>> > r =3D kvm_gmem_prepare_folio(kvm, file, slot, gfn= , folio, max_order_local); > > >>> > > > >>> > folio_unlock(folio); > > >>> > + filemap_invalidate_unlock_shared(mapping); > > >>> > > > >>> > if (!r) > > >>> > *page =3D folio_file_page(folio, index); > > >>> > -- > > >>> > 2.25.1 > > >>> > > > >>> > > >