From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 600C5C5478C for ; Mon, 4 Mar 2024 12:53:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E40606B0080; Mon, 4 Mar 2024 07:53:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DEFFC6B0081; Mon, 4 Mar 2024 07:53:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C90196B0082; Mon, 4 Mar 2024 07:53:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id B703B6B0080 for ; Mon, 4 Mar 2024 07:53:44 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 8431A1A091A for ; Mon, 4 Mar 2024 12:53:44 +0000 (UTC) X-FDA: 81859348368.13.A0B5F6B Received: from mail-lf1-f53.google.com (mail-lf1-f53.google.com [209.85.167.53]) by imf22.hostedemail.com (Postfix) with ESMTP id 97496C001A for ; Mon, 4 Mar 2024 12:53:42 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="ggF+OdO/"; spf=pass (imf22.hostedemail.com: domain of qperret@google.com designates 209.85.167.53 as permitted sender) smtp.mailfrom=qperret@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709556822; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=VyJk6hu3JJWbvReoDubwdKriDejEDiVT4W+9pXCxJ+E=; b=F/uMrT+HWORszmUEjD3JS12YOWbmsdixpPZ9Fth44HoCSIRXSTfoUInK/GZE1DUVOU7oAu 6iMILuL2pFOhljprj+m0X2X+BJHyqg7tkCqEAjQnXU5RTf358PtSxuMh7APvOVaZoiuBOu COUsDyVQtiVesq4KcnyPQ8opJL8BOqc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709556822; a=rsa-sha256; cv=none; b=K+P4t3S1LBcd27KRDtimLBcK8BLHYo/EzsjkL974cElpKfHrP/COFVwaEu6mlKW3p27WqA +oqFIIPraqprJoTcEBXZfhokVOevcCnEMvsTEoWGbdaEGEfqUErsZsBYjBWc9P4qvtY1Up bBsW96kt6XjtlHpWp3RlqqWLD38FilQ= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="ggF+OdO/"; spf=pass (imf22.hostedemail.com: domain of qperret@google.com designates 209.85.167.53 as permitted sender) smtp.mailfrom=qperret@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-lf1-f53.google.com with SMTP id 2adb3069b0e04-513173e8191so5311615e87.1 for ; Mon, 04 Mar 2024 04:53:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1709556821; x=1710161621; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=VyJk6hu3JJWbvReoDubwdKriDejEDiVT4W+9pXCxJ+E=; b=ggF+OdO/QbbFTSxZ3ZqyPs71hLebcK3B+okT29T7KRLZZA6g/xp273ZZcNj8gpmLig GC4zSy6bStq3WbcV4tXh30KMzgSoer4BZJOBIwWx1ZDVtSXmyLN0txHqucXiW/UVAL43 M3+DPYawRCorUd09/zMXmQ5VB0BfHNZSzWbdmg3bQ94nGwKppDN4sqqEja3Twkr4cAdz siO0ETSTW3BSRq8X4IShHFmfKZAA33b2gRJ+nq5m2Q+F83Df8InAdPq+EacQABGQJJhR H3ZXLDrnPh/Dly+m04fYENMRo3P5fwfm8oz/8PUD/hmcHnelTK2L/vwAQUP5Xlk1nt7j VWsw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709556821; x=1710161621; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=VyJk6hu3JJWbvReoDubwdKriDejEDiVT4W+9pXCxJ+E=; b=sJK4Nzk6mzsLpqQ4H5RKN/LXhBMF81spPZzOcNzRdHVu2mn3HEfp1mS6ZjoaXNw5xD vO+M1YYFZu4lz/CkrVhBGOvwoZnj04SwBzWLdBr2D+lvg6iaMUb3IDDpZz+33g4cwXNy Tinh/R5Fv2MWAnV8eSi5RFPEQuVEzp63vP1xHMqWaDmbJGTKM88tunJWYJCsF9rCwt4L CSgT3f2ABrXWdGgx+P5fr1HJYI6PzK+klcsE+AbJNbnygGP/xOrvTB+NubN1OlS0ruaR bKe6w1HivD/xeHphkwS+/DUFKVBZh9ECAV1S+Lfa39pMkO7GC6j31XyQeMIr31mLQEid SdwA== X-Forwarded-Encrypted: i=1; AJvYcCWVRFaAdcQFjzVIaK0xxngrGIiZYq86TRWeL8KXq8Ym+Q4hpciL2qBTcbs4oWOecd9CkHxe5doDpmNzxQduhgk4AP0= X-Gm-Message-State: AOJu0YxSjhNd8ZhuQKTVQaIpx6MPzRw5J/8u2UIO1/qnyhb33cWU8QbU 8I6evlptplyFPxk6udOvIG1PZxCtxLuf0wLYHfHs1QUghkfcQVmTsohJopjxGg== X-Google-Smtp-Source: AGHT+IHtjqEhfl4jblKM5MgIY40eq1zkabY+65oqUKrKFxAD8VaMIS1dfXXB8O4itgb1+Pj3fW84dA== X-Received: by 2002:a05:6512:b8e:b0:513:1f3f:3fef with SMTP id b14-20020a0565120b8e00b005131f3f3fefmr8263628lfv.1.1709556820493; Mon, 04 Mar 2024 04:53:40 -0800 (PST) Received: from google.com (64.227.90.34.bc.googleusercontent.com. [34.90.227.64]) by smtp.gmail.com with ESMTPSA id d17-20020a056402517100b005671100145dsm2378285ede.55.2024.03.04.04.53.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Mar 2024 04:53:39 -0800 (PST) Date: Mon, 4 Mar 2024 12:53:36 +0000 From: Quentin Perret To: David Hildenbrand Cc: Fuad Tabba , Matthew Wilcox , kvm@vger.kernel.org, kvmarm@lists.linux.dev, pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, brauner@kernel.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, keirf@google.com, linux-mm@kvack.org Subject: Re: folio_mmapped Message-ID: References: <925f8f5d-c356-4c20-a6a5-dd7efde5ee86@redhat.com> <755911e5-8d4a-4e24-89c7-a087a26ec5f6@redhat.com> <99a94a42-2781-4d48-8b8c-004e95db6bb5@redhat.com> <20240229114526893-0800.eberman@hu-eberman-lv.qualcomm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 97496C001A X-Rspam-User: X-Stat-Signature: d35it83xe196qgoph7fhh14atipfiqei X-Rspamd-Server: rspam03 X-HE-Tag: 1709556822-319973 X-HE-Meta: U2FsdGVkX1+GqwDf9imMyhtEXcL78DRhCrOqU5spzo+kNac93ZFCdgUqYjk/wj5lGII8uD/Tn2Lp/hZYA7JZQ7Zj5gBXhIGjFoGFqAbX+8BBsthAaWCGT01/sOJrtEFfnPOHIe8RVUvXyDtnWzxhcXGkMO+9TgOFAAp8s5zWwCwO/9UtQMRMe87ENsC2gFu04LZLLp/adWRP9nlnWrlQSyX68xDavefFdPnWpkrUtSIEooEKDcuw2dEbVS6Y7ZnkrQAz0ydxyU8bPkFci9QM2FiqhaLDQ2vrlsT6AwGWZsHDdgQYqwBkIYLUlL0OFHSPyvGN9VlexEYNlVoBE4NxliRcKHNjXtrrXstddw8vV1fiqcbYU6hLZDKU2O3efkLHudy0dcY/3Wv91rQRaW36i3Moy9rloS5GIjMwadIiSr4GFTtKmixTYv/b5+2p2nPIJStlssiUigaLh1BDS6ZGFCSngq+dvSUleOqnu9ObKmywN3FfDMqaTd4mV4EQTof+nlitDw51qoLGplOvK0sf04U18kOEDTIuHai0BbRv1wcLKKQG0AWWfcHRYdiahYgDBrWuZqUkvNqGq2QuOIsIlO/VZXX4+OqclUAZiRcCOYp+G9TOEdNI0vW6LAUZqBxnFofGbOYpY3NUm2N6+xsZw4SCtpMAmNk7nAnbXER301Ru4nClOzzNpvCbSPxXdC6PgvhkVEoqr3BFy+e705A8nitDNsWf4jv60WQnuoABG1KrFBl0HW3OJks6HBQsGIZi46+uVhPZk0q2o8pwDZVM+XH0RH4140mbRAL6aL3oYV7ocB72cpVgVKAggAk8S9tyMfhlEM52x34xn5g9GfiCcKIzjkRbMJ/jvkk9wu27MEOi7O/fLIxA/klQ7yVpW1wWf/kMNwSvARglxlY4M+gCcnA2uKj5I4bSWT/jFZzYjU5sLJQxBDnadpxURwbaaYnPeNT6vB2ePBZC94iMyh3 vB+zJuXV oGBZXPHn4e41V0H9CUFOLoI77v8NNn+GFSKrxvRl3Y2u6VyMGhl7TV7CdlWxRzuslhkMqDg4a1PcKpfvX6CIL9PrA6ivryKDbEW4Wsg3zN22zE+gka065x4g6WwGsBNAnS6Ctmr52ZwMkHKdU6LuSANu4ADPAXUdGs2gRzyMrjhODeLO49ez+N/c17mCtumCK14VggFEwFaCYKScD7q+pczFG2TiUXrDXul82DDyGjCWxs1cKUdqn0FWtnT3RfMBWdo3PPKfhIWDRDAAONmg3CVIppoQJIaGlKuwlAGqBzeq6Qv17KwjxVb4oCQ37qsUeQDxzdVAUGVWsEdN/8HAP1UDFIBcMeArltNf0uR8LxIrfuyi8t4ee3EsmSs/Wf8FLPHjoThDjo3m/QbtuzKr3fU6gsyIQ5/TF1/OyY4H4+MJRyMbf7yNK8JNsWYkcdbNHnv3OR0wLxu6yLuxxdHRAeUJXrvOFSoqhWGr2HWJp2BzB5qE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000020, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Friday 01 Mar 2024 at 12:16:54 (+0100), David Hildenbrand wrote: > > > I don't think that we can assume that only a single VMA covers a page. > > > > > > > But of course, no rmap walk is always better. > > > > > > We've been thinking some more about how to handle the case where the > > > host userspace has a mapping of a page that later becomes private. > > > > > > One idea is to refuse to run the guest (i.e., exit vcpu_run() to back > > > to the host with a meaningful exit reason) until the host unmaps that > > > page, and check for the refcount to the page as you mentioned earlier. > > > This is essentially what the RFC I sent does (minus the bugs :) ) . > > > > > > The other idea is to use the rmap walk as you suggested to zap that > > > page. If the host tries to access that page again, it would get a > > > SIGBUS on the fault. This has the advantage that, as you'd mentioned, > > > the host doesn't need to constantly mmap() and munmap() pages. It > > > could potentially be optimised further as suggested if we have a > > > cooperating VMM that would issue a MADV_DONTNEED or something like > > > that, but that's just an optimisation and we would still need to have > > > the option of the rmap walk. However, I was wondering how practical > > > this idea would be if more than a single VMA covers a page? > > > > > > > Agree with all your points here. I changed Gunyah's implementation to do > > the unmap instead of erroring out. I didn't observe a significant > > performance difference. However, doing unmap might be a little faster > > because we can check folio_mapped() before doing the rmap walk. When > > erroring out at mmap() level, we always have to do the walk. > > Right. On the mmap() level you won't really have to walk page tables, as the > the munmap() already zapped the page and removed the "problematic" VMA. > > Likely, you really want to avoid repeatedly calling mmap()+munmap() just to > access shared memory; but that's just my best guess about your user space > app :) Ack, and expecting userspace to munmap the pages whenever we hit a valid mapping in userspace page-tables in the KVM faults path makes for a somewhat unusual interface IMO. Userspace can munmap, mmap again, and if it doesn't touch the pages, it can proceed to run the guest just fine, is that the expectation? If so, it feels like we're 'leaking' internal kernel state somehow. The kernel is normally well within its rights to zap userspace mappings if it wants to e.g. swap. (Obviously mlock is a weird case, but even in that case, IIRC the kernel still has a certain amount of flexibility and can use compaction and friends). Similarly, it should be well within its right to proactively create them. How would this scheme work if, 10 years from now, something like Speculative Page Faults makes it into the kernel in a different form? Not requiring to userspace to unmap makes the userspace interface a lot simpler I think -- once a protected guest starts, you better not touch its memory if it's not been shared back or you'll get slapped on the wrist. Whether or not those pages have been accessed beforehand for example is irrelevant.