From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4B965CD1288 for ; Thu, 4 Apr 2024 00:22:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A0B9A6B009F; Wed, 3 Apr 2024 20:22:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 994F56B00A3; Wed, 3 Apr 2024 20:22:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 80EB86B00A4; Wed, 3 Apr 2024 20:22:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 603826B009F for ; Wed, 3 Apr 2024 20:22:57 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 271A5801AB for ; Thu, 4 Apr 2024 00:22:57 +0000 (UTC) X-FDA: 81969949194.06.B5539B7 Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf22.hostedemail.com (Postfix) with ESMTP id 6707AC0005 for ; Thu, 4 Apr 2024 00:22:55 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=ubfKLLkw; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf22.hostedemail.com: domain of 3GfENZgYKCGoaMIVRKOWWOTM.KWUTQVcf-UUSdIKS.WZO@flex--seanjc.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3GfENZgYKCGoaMIVRKOWWOTM.KWUTQVcf-UUSdIKS.WZO@flex--seanjc.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1712190175; a=rsa-sha256; cv=none; b=2y1xDevw98Wk+hayNE7bvXZ0oR8sVnlyn2EvNxgszc773ObxRj4nM5TX8ZryUYy0vu4Ja5 5RCnhvWdigmk4G3XSZ5Hvu2ItTYfj9SzS3je2+EkMdse6pSx9/86Mx8/FuESKXvp+Q7AfI saybDrI+lGbvI+1KrU/EVvUNCoWKeMw= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=ubfKLLkw; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf22.hostedemail.com: domain of 3GfENZgYKCGoaMIVRKOWWOTM.KWUTQVcf-UUSdIKS.WZO@flex--seanjc.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3GfENZgYKCGoaMIVRKOWWOTM.KWUTQVcf-UUSdIKS.WZO@flex--seanjc.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1712190175; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=mCeSr4wJkFCp2IdLwGAwGp7GdqdkS33tB/AmL1dElBE=; b=0Y4w57Boh4U1gx29abS6N8RDERNcxtvJ/BSXkWFFE6pyDQp6+2HwIGuhZ/fdDx1un20dUL CBgac48I/TzddCHWF4/IybGIqOMD3KjoKRi6hvTjy5iIQ65aWiBWAWCs1/sHAj4INP3Ig1 uZ3ZwSvJYiBlwSvK8U1PgbeUOJdGtdU= Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-610b96c8ca2so7731127b3.2 for ; Wed, 03 Apr 2024 17:22:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1712190174; x=1712794974; darn=kvack.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=mCeSr4wJkFCp2IdLwGAwGp7GdqdkS33tB/AmL1dElBE=; b=ubfKLLkwF9V+snohD+/SL0wbKzCuWJmmzCMBD3HRBbKJVeUxuEj+WO4WwymJMJA9ty DqZkloJilEi9PvjJHVESmqjHoqVJ6W27detvNf3V91AXrCrY+8KpjSuahzc+T5R3iqaO VNrzeyquZCr/4TK0XxGgGfLekI8MRSBOXGfI2LL0HjFatf97+b3Vw8L1f+ul9YVFLKsW Lbelev3nWPcw7xI2PZ6B0zBsnHGuefiqEZM336ZGIt1pAnSxNp1HTsS2e21Qtg/44RNk GatAXtJDqGhkbFSIPOOmAhfjVn0PLaqAg7w5sU06MlEI9ljWuo3Y7J2xbfZNv2VrkHvn +qcg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712190174; x=1712794974; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=mCeSr4wJkFCp2IdLwGAwGp7GdqdkS33tB/AmL1dElBE=; b=a2FabvzS3UNmuQFTiotVux/XR27aEf6mZ8SkQXnAWwIyE9+ULrfOas7l2ksQq9rKu8 YkORPGEMKOpM9kU9YegcvYphL+QV4uGsFkjvKLG+u9ks8uE4q5tzx7o2eP+lVS6mRuJl PBC7vXiTTuHLwP41SWIpjLUHxJSGTKElAXlHN7zLgrcK7dsXth+gPK0OMG8bNQV+zbzL J6Mpt/4Gj1kjnXAHzovioyt2Yy3jvl8mDuveDxzw+QOf4KD54cOWx4fBRxHhwcZd0C/4 eQOFfAkDxED3wrrNxMFVvqZvXbNpkn0fRA2yF4EN9G5cXu15ciTT80pMcdDdq5W5F0DY 0hmA== X-Forwarded-Encrypted: i=1; AJvYcCW+1eGnQ7F6AvE760bX6sxY4s74qlJM+YWXQwrwRaH2KeD//MpnBnFxmGc1QkwDrCVaCS5wVC+vA384bcjyp/rqFQ0= X-Gm-Message-State: AOJu0YwC8Rd756HJ0uq41JRMhllF2jt0Djx11aWU1bPqWxZD6SOHJNCA kpdZ6ahuLv1TebvZO9qcklzYY0VqlK1rdaf5u913yGCDru4bbgcjvFbCcK2MGGJekZ5LM/wM8sW CTA== X-Google-Smtp-Source: AGHT+IFAMIRH3R/vig8IV5/mJg6X5qS3pE27tIxvM6f0sLO7l47QQKGi+iyrrJK0mXp82lG2rh9Ii9Npzng= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a81:6dd1:0:b0:614:f416:9415 with SMTP id i200-20020a816dd1000000b00614f4169415mr249120ywc.7.1712189721201; Wed, 03 Apr 2024 17:15:21 -0700 (PDT) Date: Wed, 3 Apr 2024 17:15:19 -0700 In-Reply-To: <20240327193454.GB11880@willie-the-truck> Mime-Version: 1.0 References: <7470390a-5a97-475d-aaad-0f6dfb3d26ea@redhat.com> <40f82a61-39b0-4dda-ac32-a7b5da2a31e8@redhat.com> <20240319143119.GA2736@willie-the-truck> <2d6fc3c0-a55b-4316-90b8-deabb065d007@redhat.com> <20240327193454.GB11880@willie-the-truck> Message-ID: Subject: Re: folio_mmapped From: Sean Christopherson To: Will Deacon Cc: David Hildenbrand , Vishal Annapurve , Quentin Perret , Matthew Wilcox , Fuad Tabba , kvm@vger.kernel.org, kvmarm@lists.linux.dev, pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, viro@zeniv.linux.org.uk, brauner@kernel.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, ackerleytng@google.com, mail@maciej.szmigiero.name, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, keirf@google.com, linux-mm@kvack.org Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 6707AC0005 X-Stat-Signature: xbsxbcexoscm491mqr95qe8mofcqo1xi X-HE-Tag: 1712190175-241269 X-HE-Meta: U2FsdGVkX19LtYGdDQ02Ep6gTkrLyPnpsQ0KUn6vjucmBP064MbFohiauyLjulDf8VHROnqLLbiSkbhgqaikdiwtlExYrYLBOqWS6NabTrNASFXoDoPIE1uaEDfVO5Vd28LbEGVlsBtlMJq/clSKRvVyNFw0RVGCVjCSO7a2j3HHx2n1mjWYS/IRlN2zNmLpxVnCn14SFt9Fe5dVF8HV+zNJ0Nq9XArSbWIVV0cJuF9aOQjvJjs2r2ImDE9nRsJAi7mzsjj6yGvBAnKVGRKQ2sP1EkVHH5YGlVwRdXjcgwOIsnhEPeXANOPuB7s9KWkoVudWQe/0An+AezzwjjA3ylGRY5hKdeyXehiLE4maZ7uemBV99CSxxV4sPoxy/hzXp8l1wdWxU+fLmL3l3QGGfW8Rormbt3XMhZcPZjiWzneOSueJn+PZNHCck+Nd67QkSfWsOUnFeha9DTmlgQUrvBUzn/bMAFWXXoLMpkA0K6oj5i567gEOBPjcMV7bhGyExzqJNCFEIgMEcPa2aBszgKhYuq8xjzI3krSbkL/4FduvUIKs006UfL7EXz9nCGYBCcMHFSH0Riq2yqDox3DFKneaiRhaYxHDSRExVuGNQsLZPIUZkmEpceIdspx1G+LfolxAIHlzHU+rT4vfSpjbttqav3I6QmSH3BAWLTj/OCRHh/zgdT73npEh9/7WNBOgZGvUM3c8bbnz7tWRJLdXMN7lDis8beswc3pwzFWDpOxJvNl/zCeIvmw9u04B3eJVKoS9/Fa5qKjWVJ7dE3Rcu3TeNG1D6DfhHDxa6qTBz8P9rv4RLfGNhtkdv/OvtfdDQyApwvrw3WuOl3NQY1Fq2NL7JRH5sZwPXpPbQiaczGtVDvdpuzJ3AemBxvcpxi8+Dfa5rwXyPjXIQny9E21JUqibXojma+Hi5wai63GQGbXZ0exDFtoeS0J9ZN48D58xm1QWNvUuMSV4kB1l0Ai 84geO7Qy 1JOl19+HGLtdI7nZ/nPu/mBKFUcwES4pB6zpepbZ9rxgG2wNZ220OUyF+DhBX4z1TtuFYafjjprUqONJTYbQTQkw78qJKJIfUHbqhW7iRHcqlSR5dYvfULyoRXwpeYHcVmeQxZmdpZQwAPdlfVs7HzG7GiuxJdw70j51TEt7DcrOjuPRQhvO4V46/qx56a2puTh64rcS8377KNl6Z+xezwo3S8Sd+bBHA16EvnSUQiHXuz5DEAUabZHygfGrxIO2jGnZS0l+nsIs+mHfNmTYhIcj1bJRitetiHUtPezTEk/6BSO7SlMGUITGQeSY5o2xq9YjFFpuzYYsrbHV3afCCqEACHWH6qZ7u+fqLFX1h1h6cpf5Ylin2GCFhaOcG6UMB1N5BnrF8GBTzAVqThvkL1hbJIrRvTbl4Ok5pqATXz8vrJ+aQfyjHGRC+4rSVHloA/5p/FQxaLeI8LfIDAkxtSdLFj62EqSh+4KaFwV+41IKHhTJeYFN1VVNjjEl5b9k0RYCTI/STrzshYJ8WDcT+nqgBpnaKeqab0P0/W/4Uubvh2+JhOtoviVUcPCW/eWbHZNqwME2raExquLA34NxVNb90Hijh7qwtKzgR4fVbFrPdfmCK1XK9s11c12rqAj/UrLRSy3ASxP6VVTIUo2JudaQFDct3gvMwJeVTCZSw19jI3QscdSWZQBcbG3kbIqUm57DMRp0XXMGitPc+746CT8qiKLmJNnOcA7de X-Bogosity: Ham, tests=bogofilter, spamicity=0.005153, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Mar 27, 2024, Will Deacon wrote: > Hi again, David, >=20 > On Fri, Mar 22, 2024 at 06:52:14PM +0100, David Hildenbrand wrote: > > On 19.03.24 15:31, Will Deacon wrote: > > sorry for the late reply! >=20 > Bah, you and me both! Hold my beer ;-) > > > On Tue, Mar 19, 2024 at 11:26:05AM +0100, David Hildenbrand wrote: > > > > On 19.03.24 01:10, Sean Christopherson wrote: > > > > > On Mon, Mar 18, 2024, Vishal Annapurve wrote: > > > > > > On Mon, Mar 18, 2024 at 3:02=E2=80=AFPM David Hildenbrand wrote: > > > From the pKVM side, we're working on guest_memfd primarily to avoid > > > diverging from what other CoCo solutions end up using, but if it gets > > > de-featured (e.g. no huge pages, no GUP, no mmap) compared to what we= do > > > today with anonymous memory, then it's a really hard sell to switch o= ver > > > from what we have in production. We're also hoping that, over time, > > > guest_memfd will become more closely integrated with the mm subsystem= to > > > enable things like hypervisor-assisted page migration, which we would > > > love to have. > >=20 > > Reading Sean's reply, he has a different view on that. And I think that= 's > > the main issue: there are too many different use cases and too many > > different requirements that could turn guest_memfd into something that = maybe > > it really shouldn't be. >=20 > No argument there, and we're certainly not tied to any specific > mechanism on the pKVM side. Maybe Sean can chime in, but we've > definitely spoken about migration being a goal in the past, so I guess > something changed since then on the guest_memfd side. What's "hypervisor-assisted page migration"? More specifically, what's the mechanism that drives it? I am not opposed to page migration itself, what I am opposed to is adding d= eep integration with core MM to do some of the fancy/complex things that lead t= o page migration. Another thing I want to avoid is taking a hard dependency on "struct page",= so that we can have line of sight to eliminating "struct page" overhead for gu= est_memfd, but that's definitely a more distant future concern. > > This makes sense: shared memory is neither nasty nor special. You can > > migrate it, swap it out, map it into page tables, GUP it, ... without a= ny > > issues. >=20 > Slight aside and not wanting to derail the discussion, but we have a few > different types of sharing which we'll have to consider: >=20 > * Memory shared from the host to the guest. This remains owned by the > host and the normal mm stuff can be made to work with it. This seems like it should be !guest_memfd, i.e. can't be converted to guest private (without first unmapping it from the host, but at that point it's completely different memory, for all intents and purposes). > * Memory shared from the guest to the host. This remains owned by the > guest, so there's a pin on the pages and the normal mm stuff can't > work without co-operation from the guest (see next point). Do you happen to have a list of exactly what you mean by "normal mm stuff"?= I am not at all opposed to supporting .mmap(), because long term I also want = to use guest_memfd for non-CoCo VMs. But I want to be very conservative with = respect to what is allowed for guest_memfd. E.g. host userspace can map guest_mem= fd, and do operations that are directly related to its mapping, but that's abou= t it. > * Memory relinquished from the guest to the host. This actually unmaps > the pages from the host and transfers ownership back to the host, > after which the pin is dropped and the normal mm stuff can work. We > use this to implement ballooning. >=20 > I suppose the main thing is that the architecture backend can deal with > these states, so the core code shouldn't really care as long as it's > aware that shared memory may be pinned. >=20 > > So if I would describe some key characteristics of guest_memfd as of to= day, > > it would probably be: > >=20 > > 1) Memory is unmovable and unswappable. Right from the beginning, it is > > allocated as unmovable (e.g., not placed on ZONE_MOVABLE, CMA, ...). > > 2) Memory is inaccessible. It cannot be read from user space, the > > kernel, it cannot be GUP'ed ... only some mechanisms might end up > > touching that memory (e.g., hibernation, /proc/kcore) might end up > > touching it "by accident", and we usually can handle these cases. > > 3) Memory can be discarded in page granularity. There should be no case= s > > where you cannot discard memory to over-allocate memory for private > > pages that have been replaced by shared pages otherwise. > > 4) Page tables are not required (well, it's an memfd), and the fd could > > in theory be passed to other processes.o More broadly, no VMAs are required. The lack of stage-1 page tables are ni= ce to have; the lack of VMAs means that guest_memfd isn't playing second fiddle, = e.g. it's not subject to VMA protections, isn't restricted to host mapping size,= etc. > > Having "ordinary shared" memory in there implies that 1) and 2) will ha= ve to > > be adjusted for them, which kind-of turns it "partially" into ordinary = shmem > > again. >=20 > Yes, and we'd also need a way to establish hugepages (where possible) > even for the *private* memory so as to reduce the depth of the guest's > stage-2 walk. Yeah, hugepage support for guest_memfd is very much a WIP. Getting _someth= ing_ is easy, getting the right thing is much harder.