From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BEE46C54798 for ; Thu, 29 Feb 2024 19:02:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3BB6F6B009B; Thu, 29 Feb 2024 14:02:32 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3926D6B009D; Thu, 29 Feb 2024 14:02:32 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 281E76B009E; Thu, 29 Feb 2024 14:02:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 189576B009B for ; Thu, 29 Feb 2024 14:02:32 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id F091441384 for ; Thu, 29 Feb 2024 19:02:31 +0000 (UTC) X-FDA: 81845762502.01.D10BCC7 Received: from mail-ot1-f41.google.com (mail-ot1-f41.google.com [209.85.210.41]) by imf06.hostedemail.com (Postfix) with ESMTP id 29121180026 for ; Thu, 29 Feb 2024 19:02:29 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=pcO88W6a; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf06.hostedemail.com: domain of tabba@google.com designates 209.85.210.41 as permitted sender) smtp.mailfrom=tabba@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709233350; a=rsa-sha256; cv=none; b=0tDdCE7ZsArft7DDbrTRU21QAmTM+6SHrrlKEEeqWTFTlEtF/3IFBdID2Ql6Z1SKdpsMg2 Un5bNLDE5VHE7NtCqzNpKgAfs7eDUUwSHDjA96QPDfgxPdD1OzLbQxxm4NfoYy3jR1RFKx MHJvbTh0oAfYElRyh3RSFHsInqvW12E= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=pcO88W6a; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf06.hostedemail.com: domain of tabba@google.com designates 209.85.210.41 as permitted sender) smtp.mailfrom=tabba@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709233350; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=XvgXNeIlR2n0DZO5YewWXJGaSGLGyyFIo92YkUCmdP8=; b=ZDMARtd8789Ud/xaXeSM3D7rH8XrniUhnWSrDaGnaUeZpxmorQdvrpu0m4UiHFbLahvlRD bhviZ+9JVaAuSaXB3+9y4NNN9ZtBfoLp36KPNIVSWbb3iu0CzK5Rg83i4CRp/ZDcJVcT0b dGI5q4cscOiHx+LeL1DRaoBrTPPBkTo= Received: by mail-ot1-f41.google.com with SMTP id 46e09a7af769-6e480d8fb5eso573475a34.1 for ; Thu, 29 Feb 2024 11:02:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1709233349; x=1709838149; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=XvgXNeIlR2n0DZO5YewWXJGaSGLGyyFIo92YkUCmdP8=; b=pcO88W6aUHaRJQMwefYtFylHbzh6rinfRl6CHcig/i/ZC7lGIDZu7qOeMLjJhhZwkE gvyGYnhkisPyfjDt2w52Zz3cgcQ7If3aI4KxwYeK43xO+UlPzY/TMonEx8oA31HIybwF rWpyku3uq1Ngegro3YseN3neRUAVUIyJLbQMvbOejxA1k3cukxVotMFWFHth6yq7ZvSX lNBkSP2FVzPisOgVdvOVk4XCIuIJx5p0fzpn6t/tyUllSMwHGArHC3BrRs1jLAOXuEwn ZuZ5EI4BtnyPtZ4NAK0LNkhEiTJyNgug1xK8Ms/kqb6Cj4bnXkZA0lRo9lBrbNcgf0g8 ft7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709233349; x=1709838149; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=XvgXNeIlR2n0DZO5YewWXJGaSGLGyyFIo92YkUCmdP8=; b=P6M9oKP4gYP3z647Lx72rqdH4EX7jzOeSqi9fcZ7w1bdBff/5+PWVaJleLXO7R57fu WF2n1bIWEHpk+u7XsjSO5S0hUvfXCCCxESuCmgrYPZ9pepVOPnExWEsZ1SUxt4Ki4Hgk NZ98gdgykjceMSngq2KG7yicYkH1+RTJye9/Jn26oWwvxBO3v52kNrF6s/brp7+0Vo3X cb3A2O7oG3y2Rgdg3Zi/g4WWCBIau320Vncxlaq0JrfB1DIqrrQ+UV8WNHO98T+NxWPh 9RU0dZ5flXdSIflIHxyJahfDq+JiKkj2+xR83IFlcUGDyKeCm3VVklA/LTa0Tidwq3Sr Quvw== X-Forwarded-Encrypted: i=1; AJvYcCUr6JaWSmMjJ8yfYXahrmB7qHbdu9Vu9AnEiq86PvvGemXjgdYHr0t+gPbBXL3p9rEgE5M16GN6fUo5MjPQVg2Cf+E= X-Gm-Message-State: AOJu0YzruMYK0QvD7yavPOqKNWrnNuj6nHdIMLFJ2iqYM4B+0BPnVtqp tc24E5DoOefIuQktsBELXijMmpDTerqmjdSKiVBFlM21+qJCEnz4Es8CqL04sbLpjz169FW6Pdf oXpj3YU+m0cyAfdloUlOtvrvAjoGzMHEpux6d X-Google-Smtp-Source: AGHT+IEI/pY59xJ10obQoL/tRxHmG9mjs+ibck4DhVV7ENtA+Z0m9kxA/TZwJEny/i4GFd0l0C4j2gpkUzoeJyDFa2s= X-Received: by 2002:a05:6870:4582:b0:218:d445:78f8 with SMTP id y2-20020a056870458200b00218d44578f8mr2983698oao.9.1709233348773; Thu, 29 Feb 2024 11:02:28 -0800 (PST) MIME-Version: 1.0 References: <20240222161047.402609-1-tabba@google.com> <20240222141602976-0800.eberman@hu-eberman-lv.qualcomm.com> <40a8fb34-868f-4e19-9f98-7516948fc740@redhat.com> <20240226105258596-0800.eberman@hu-eberman-lv.qualcomm.com> <925f8f5d-c356-4c20-a6a5-dd7efde5ee86@redhat.com> <755911e5-8d4a-4e24-89c7-a087a26ec5f6@redhat.com> <99a94a42-2781-4d48-8b8c-004e95db6bb5@redhat.com> In-Reply-To: From: Fuad Tabba Date: Thu, 29 Feb 2024 19:01:51 +0000 Message-ID: Subject: Re: folio_mmapped To: David Hildenbrand Cc: Quentin Perret , Matthew Wilcox , kvm@vger.kernel.org, kvmarm@lists.linux.dev, pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, brauner@kernel.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, keirf@google.com, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 29121180026 X-Stat-Signature: uhe68uzhu35n4bh3w6s3cugbd7scq8k8 X-HE-Tag: 1709233349-393229 X-HE-Meta: U2FsdGVkX1/Fb95DYpll8rCV8JZmUg1mRBFmQThjhr/Wxv4PcxkeAS0al3LhRZWIRXB3CD0PZnshxsbbkTx/IskuXebkhITyZ00cMUoBI+Xx/D4Ho6ZxblLxhkMFp/SJi4tD1KGR1qNBECg6UuMlDlRXzLq5GFLp9wCgEUQxKGm6MllG23sqDQEmJqlXNOpDCGZwiP/5tCv0xGHAN/wQ1a8hLYFWpkUF/+4BUXqAVbqP8pr2vbq+IORMZLrxMFOm1dyJdxDF5mtkFuFoVzIYW9EJJLWg3eW9CxBd6KzcokWSEFQALtznhPVdT5hSSTJodH3B7B6R5ax9xAqiL2cS898OmWmTfCG5HD7gSqxqZ4mTWc8wFLpwW7OGFtuut5tlCP18JwYjOoKF9lvH5sbLrXvlg7IRHAt9F9t0xANRSSM9tIBTbLS6BleVf5B3KgRwdH05lybJOloe2PVI6w8l8ZHjf2ezCAE5Xk1mqTm+17OC9mKvdWFSVE0lIVlYFAV3PP8cPC+AZcgqux9Wz3vWL50v0Yv9R54jcXJ7DqAcoPtoF6y8nUySlPGvlFy7wTsMpWU4BDV4PWK8XqUnTMg/wcVQkf4s7GGTSoLD8D8Y5Jps1cR6IrUHAzPkPZKYCdJVeSHRGCVNPZWkIQbeIvEFTkRoF0fN9zVBOABnfaMxAYw+D4CprxsywXZjPnVJjL2sRaUEAz1PFMkFrlgGcpIGWw1n/XSumFA613KO3u3b5Myb9Q/UodN51JbTH9Ne2bX63z+wL2UOu/75p7SW8GqgxACuC8pBUsqPSC6D8tGR7DX34P8tlLhzqVnctLlP58Z3L+We2XE6Yq+jHLzdflfROGqB6vLZ+KQMSUnRUDnyc/A2+M47m73CRiUu5nL0xqzEPa7bhgeZTBEx/rMXCTsG0p1qBui6xE3dZHQHSDSmGcykLNrVq9csu3eNwadV9SIyTOBIwt6tZ84SKNVEYOl XZqgZdTh 0hdZx3m8i2nwaenzXJm/xVPauFOVvqPFDZtPht2S8KZSSGWvuqjJrSsJif2zEWz0TEZLe+x8lvd1X2+fq2EVbWzq8x5u7eJeMFY8yliaer4gn8mLjeWe36H53dQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000712, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi David, ... >>>> "mmap() the whole thing once and only access what you are supposed to > > (> > > access" sounds reasonable to me. If you don't play by the rules, you get a > >>>> signal. > >>> > >>> "... you get a signal, or maybe you don't". But yes I understand your > >>> point, and as per the above there are real benefits to this approach so > >>> why not. > >>> > >>> What do we expect userspace to do when a page goes from shared back to > >>> being guest-private, because e.g. the guest decides to unshare? Use > >>> munmap() on that page? Or perhaps an madvise() call of some sort? Note > >>> that this will be needed when starting a guest as well, as userspace > >>> needs to copy the guest payload in the guestmem file prior to starting > >>> the protected VM. > >> > >> Let's assume we have the whole guest_memfd mapped exactly once in our > >> process, a single VMA. > >> > >> When setting up the VM, we'll write the payload and then fire up the VM. > >> > >> That will (I assume) trigger some shared -> private conversion. > >> > >> When we want to convert shared -> private in the kernel, we would first > >> check if the page is currently mapped. If it is, we could try unmapping that > >> page using an rmap walk. > > > > I had not considered that. That would most certainly be slow, but a well > > behaved userspace process shouldn't hit it so, that's probably not a > > problem... > > If there really only is a single VMA that covers the page (or even mmaps > the guest_memfd), it should not be too bad. For example, any > fallocate(PUNCHHOLE) has to do the same, to unmap the page before > discarding it from the pagecache. I don't think that we can assume that only a single VMA covers a page. > But of course, no rmap walk is always better. We've been thinking some more about how to handle the case where the host userspace has a mapping of a page that later becomes private. One idea is to refuse to run the guest (i.e., exit vcpu_run() to back to the host with a meaningful exit reason) until the host unmaps that page, and check for the refcount to the page as you mentioned earlier. This is essentially what the RFC I sent does (minus the bugs :) ) . The other idea is to use the rmap walk as you suggested to zap that page. If the host tries to access that page again, it would get a SIGBUS on the fault. This has the advantage that, as you'd mentioned, the host doesn't need to constantly mmap() and munmap() pages. It could potentially be optimised further as suggested if we have a cooperating VMM that would issue a MADV_DONTNEED or something like that, but that's just an optimisation and we would still need to have the option of the rmap walk. However, I was wondering how practical this idea would be if more than a single VMA covers a page? Also, there's the question of what to do if the page is gupped? In this case I think the only thing we can do is refuse to run the guest until the gup (and all references) are released, which also brings us back to the way things (kind of) are... Thanks, /fuad