From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B45EEC2BD09 for ; Fri, 12 Jul 2024 23:29:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 269006B008A; Fri, 12 Jul 2024 19:29:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 219126B0095; Fri, 12 Jul 2024 19:29:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 107736B0096; Fri, 12 Jul 2024 19:29:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id E60CD6B008A for ; Fri, 12 Jul 2024 19:29:42 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 85B2A80791 for ; Fri, 12 Jul 2024 23:29:42 +0000 (UTC) X-FDA: 82332695004.28.06234C5 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) by imf16.hostedemail.com (Postfix) with ESMTP id 8309118000D for ; Fri, 12 Jul 2024 23:29:40 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=rToIGwhm; spf=pass (imf16.hostedemail.com: domain of 3Y7yRZgsKCF468GANHAUPJCCKKCHA.8KIHEJQT-IIGR68G.KNC@flex--ackerleytng.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=3Y7yRZgsKCF468GANHAUPJCCKKCHA.8KIHEJQT-IIGR68G.KNC@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720826947; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3ZzMQzTcLNpWOotcGFmeiz3+1hsAd4h/kpmJZcUSiZY=; b=KbCWrgTVvSXyuBJKGUQiCh4iYL7fYkZhQpTjEMIJqBmysvPZfC2Et2yGvz2xihI4yQund8 BMKs2LCV0cx4tB2PjuPyYdnJlsFio/Rwg+qZv6lb0wIBXRu7dHe5PwE3un6MFcDoVZrCKL JnJMs9sqkDOpkPYhsR67xIywl6lAAc8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720826947; a=rsa-sha256; cv=none; b=lbIx6KGfnY82Qw0jiL0hHEO9WxfsoJeSEnlN2jmFC1yn4YMHEY9Mh0tnWgouK0EqZ5qr4J twCsXfWhCYuSfubp5XhQ+LtKkOY8cKy3p1nGvLVtm6BDLbPLjzPEMCJCycfguvDtJtXdNo BnhoHzSS3SNzJfCUwEar3+cJG9xnAzc= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=rToIGwhm; spf=pass (imf16.hostedemail.com: domain of 3Y7yRZgsKCF468GANHAUPJCCKKCHA.8KIHEJQT-IIGR68G.KNC@flex--ackerleytng.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=3Y7yRZgsKCF468GANHAUPJCCKKCHA.8KIHEJQT-IIGR68G.KNC@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-1fb90d7a4c3so20172115ad.1 for ; Fri, 12 Jul 2024 16:29:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720826979; x=1721431779; darn=kvack.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=3ZzMQzTcLNpWOotcGFmeiz3+1hsAd4h/kpmJZcUSiZY=; b=rToIGwhmityYpBdpUd/x4F1MpqkxM3YS2pyfDwPqQmy3GZ6c8ZOHbhIdCzb5agpIuT jONEgPgDQ7EyGPHUHaNnPD0tRi1d+eFDtuOM+QC2TsePkFYB2jftgQL0o2wPlST57/d2 cck9mxZyamLPknlS1L7+YEEiSQDlIViC6TAyXIyG+tWMZDu/p6GIE/Y1/VTILe0jTw2N W1qg+xf3+Cw6p9DiNOE75Vgt29wajEBDictwSortLQX/Zj/vLee1CM35UJTMSE+EIxdm AIoNpfFy1c024pS9Izpcv6FRlkghKDJI0gnTc7DZU2T9fXDSB+uUGrNJEVwyGdudNvdR aJAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720826979; x=1721431779; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=3ZzMQzTcLNpWOotcGFmeiz3+1hsAd4h/kpmJZcUSiZY=; b=DHivcCErvfWOGe5mWXwB7s5Sg+u+IIzE6ewSbgk1iUPH30WB6MwjBlKFN1C4A/NiBY Gmq37aHHT7XU8wRZeyMFGL7uGOS+cOjJB2FqGsz2em+fqaXkhdNyo5uL/oZ04U9ctyQF s30Co8VRo37cxHwOnlWmyoDzJwDDmzxEIRDcZ6WoS2LEXzhqtIAC42vslxbt5nWfMZVZ viEwOdzW/ruVZR5/bW6oyV+SJoxMZ+XoAqgDdEY4ULbRXmLJTEjDVlUZOv7wrC1qCDm5 hMI+ZDV0dyVsraPkQQx8Z5ACL87jvZrEwNxpJAeTO4WAo2ymAOM6ibFmycE17ht5eP4w z9/A== X-Forwarded-Encrypted: i=1; AJvYcCWmvCwTAtmfbuRHhaHKMqGi+9FKtLwHT5YEpUpzD6L/7gc4TRcd6x1J4YLcEJn5Lp8x6FQ2vTQ7nm2QtFNV8GqcBEk= X-Gm-Message-State: AOJu0YyRa5NTe3acSHVlO4AhPeCcfpFRGkJ8+dQ8eYtu8mpTq6dfh5/P dl3AnV+jLvyD6O7K5J2ysUO0rHGx7FDaNJmCNMosPkg1el00Qb/mMZF0ZD1xiMEQAm/2HKPKlw4 VBW/D/C+reVA1A1d1N8gLvw== X-Google-Smtp-Source: AGHT+IFQY3bFW+Ec1W/jbxkBChGtxIDhrDnXJSKahhJ1TSz9fMywWTZ933D5uxR+un98Rl9u18eV+KxVPn9D87FWqw== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a17:902:c405:b0:1f7:3763:5ff0 with SMTP id d9443c01a7336-1fbb6cd4279mr6343705ad.1.1720826979113; Fri, 12 Jul 2024 16:29:39 -0700 (PDT) Date: Fri, 12 Jul 2024 23:29:37 +0000 In-Reply-To: <20240618-exclusive-gup-v1-0-30472a19c5d1@quicinc.com> Mime-Version: 1.0 References: <20240618-exclusive-gup-v1-0-30472a19c5d1@quicinc.com> X-Mailer: git-send-email 2.45.2.993.g49e7a77208-goog Message-ID: <20240712232937.2861788-1-ackerleytng@google.com> Subject: Re: [PATCH RFC 0/5] mm/gup: Introduce exclusive GUP pinning From: Ackerley Tng To: quic_eberman@quicinc.com Cc: akpm@linux-foundation.org, david@redhat.com, kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, maz@kernel.org, pbonzini@redhat.com, shuah@kernel.org, tabba@google.com, willy@infradead.org, vannapurve@google.com, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, seanjc@google.com, jhubbard@nvidia.com, qperret@google.com, smostafa@google.com, fvdl@google.com, hughd@google.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 8309118000D X-Stat-Signature: 3hgbtdxebi44pkhtwytps13epwswrik5 X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1720826980-832136 X-HE-Meta: U2FsdGVkX191AganuQB+EeFRdKrgh5VVUztU9Uevd1ej7zmqyOYdaEgGjpO3uF+8L56wH+JoOOr+ZIJf8vljSyJHX4mJ640tqQOsB8zt+7NvZEgCC1F8PKSUhB17g1KA6iAcBC8LOTC6XLomty9H30pG6aDeV9jbfXgDZrilVVhv1oO47M7I+lx993bUlIdYzRJpuaDisatCWSauS+LGVNyiUv/0r4srrCxUjm3SNIoSxSOOV+XCjA6OlfeI+1QOVxK8EHNWANu6JQf4JaG72tONjJv9uu1c2Q9XmMjQ4IbMQdvoEyHDrRCv+TB+KPjnauI55Gr3quejmiEi0iC/VV2Zu2cxjIJT/07XwO0ERYzvHxybi8oWLjpJ4UehCRuIW1jVhUprlU1XS1LZvfqRQs8x1HFfCDGs4ZEuvuW0h46hYD2RRKrkJZHShSPBXCxWyqu7X/iQeMPbrUBkf7II8p3fftZIRIZRLMtAMGfdQy+TrNUiEeqt/4STfeGFlbnYoQB4otqH4koGgaAaZS/KFwelqR3vxOMrN8PTuFx08zauZ9shYR/M9wHE016fmfuWiz0EZOz3ucPNVAUC9gVe4lrtNxld+blAZGAP1pE7WRlHy+ca0y2LjBYeNAaTFRiCjeUTP8moXQZPgwtKkLyynLQJX3vsJoHaZ/GVhG7xNXP9dBACmOlcZiue8JC44NVuex/Amvw5H4pD7RecvGpG5V8itv1HkOud1cb2/CO7YBg3rRFpQFRcqnSjm35290USVNbO6eJGEb5jD4Vnw9riUJ+ZU+QMcamAv9hALm9nNWIn76BFhxOJsrYCIwE6Wj5s3uwVHc9Qu2EBr8YDyc9IhjFNAuK7NTiF/NYFEPjtW4HvvN1heS7LFNF3VM1AXiUya8UFGy+uwkEiIZ9fBEmljJ1/dBhVYI7hvg3NU8D4BQGcDUm60jghTX6TMdFlbraAkkouwKAKqPdETWz0Beq Ufe8CZ3Q +V8VEb9YERJHs/8BX0R/onxGq1w3O+fJGYFN/A6ZfMqsFSGgatUYw8VNCImtwkKg0sb6WuHp0zpBa4B42Pj1Te31UjBuXLTysvZ0bWMGtvJ/NYfCPacpsKrY2u3023XB1a+L10AVVwYU6tUnfrR68lbkx0iVkhzlE1WEoeGHrfw6+6k3rlWxuGOaXhaRG6iCF1b17KlIPtLWeMOaP557J2kM6mwXlCU2skrZTLX04H/peTUR+ErbrrmNLaYtYqo6r9Y9ma6YAYM9EDFFiUFyoy5wc2sgF3lTpfLwPHYmnFgNpaG0YhNpShz3RN45E3M5qGEwmWvJHhtcHeZr9ndrPEHvWwmX/PsLP/5Jmgq+6yuDDOHXghPE60viTDSRSEinh15kk34BlwNQvtPHgIczncq4HOgc/kzn9kww8YuC95GtA47hKp1HcAZSKn4kpfFkdu5kT5ujXGugx2v3OSIAsZ/ylQs1v6rMXC6X1xSeHHbAmy3apJTpzixgvIVLQ8G0Inp0ylOxsDurXUt2gOmjKzrdXssEetdqVIOGjKtRw8eFA9aQ7XLG5dn+IAKtXDtaEfqDNqaJ34GkT4ykSepuDf+KW+HNQpuNDB/5G9UNEuOgA685mV1DghRWam+G9FT+hjzsqDWwwMDsISAc4mCeBQkgAqMF9jx8qlKrR X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Here=E2=80=99s an update from the Linux MM Alignment Session on July 10 202= 4, 9-10am PDT: The current direction is: + Allow mmap() of ranges that cover both shared and private memory, but dis= allow faulting in of private pages + On access to private pages, userspace will get some error, perhaps SIGB= US + On shared to private conversions, unmap the page and decrease refcounts + To support huge pages, guest_memfd will take ownership of the hugepages, = and provide interested parties (userspace, KVM, iommu) with pages to be used. + guest_memfd will track usage of (sub)pages, for both private and shared memory + Pages will be broken into smaller (probably 4K) chunks at creation time= to simplify implementation (as opposed to splitting at runtime when privat= e to shared conversion is requested by the guest) + Core MM infrastructure will still be used to track page table mapping= s in mapcounts and other references (refcounts) per subpage + HugeTLB vmemmap Optimization (HVO) is lost when pages are broken up -= to be optimized later. Suggestions: + Use a tracking data structure other than struct page + Remove the memory for struct pages backing private memory from the vmemmap, and re-populate the vmemmap on conversion from private to shared + Implementation pointers for huge page support + Consensus was that getting core MM to do tracking seems wrong + Maintaining special page refcounts for guest_memfd pages is difficult= to get working and requires weird special casing in many places. This wa= s tried for FS DAX pages and did not work out: [1] + Implementation suggestion: use infrastructure similar to what ZONE_DEVICE uses, to provide the huge page to interested parties + TBD: how to actually get huge pages into guest_memfd + TBD: how to provide/convert the huge pages to ZONE_DEVICE + Perhaps reserve them at boot time like in HugeTLB + Line of sight to compaction/migration: + Compaction here means making memory contiguous + Compaction/migration scope: + In scope for 4K pages + Out of scope for 1G pages and anything managed through ZONE_DEVICE + Out of scope for an initial implementation + Ideas for future implementations + Reuse the non-LRU page migration framework as used by memory ballonin= g + Have userspace drive compaction/migration via ioctls + Having line of sight to optimizing lost HVO means avoiding being lo= cked in to any implementation requiring struct pages + Without struct pages, it is hard to reuse core MM=E2=80=99s compaction/migration infrastructure + Discuss more details at LPC in Sep 2024, such as how to use huge pages, shared/private conversion, huge page splitting This addresses the prerequisites set out by Fuad and Elliott at the beginni= ng of the session, which were: 1. Non-destructive shared/private conversion + Through having guest_memfd manage and track both shared/private memory 2. Huge page support with the option of converting individual subpages + Splitting of pages will be managed by guest_memfd 3. Line of sight to compaction/migration of private memory + Possibly driven by userspace using guest_memfd ioctls 4. Loading binaries into guest (private) memory before VM starts + This was identified as a special case of (1.) above 5. Non-protected guests in pKVM + Not discussed during session, but this is a goal of guest_memfd, for al= l VM types [2] David Hildenbrand summarized this during the meeting at t=3D47m25s [3]. [1]: https://lore.kernel.org/linux-mm/cover.66009f59a7fe77320d413011386c3ae= 5c2ee82eb.1719386613.git-series.apopple@nvidia.com/ [2]: https://lore.kernel.org/lkml/ZnRMn1ObU8TFrms3@google.com/ [3]: https://drive.google.com/file/d/17lruFrde2XWs6B1jaTrAy9gjv08FnJ45/view= ?t=3D47m25s&resourcekey=3D0-LiteoxLd5f4fKoPRMjMTOw