From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F1D30C433EF for ; Sat, 18 Dec 2021 18:42:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 685EF6B0071; Sat, 18 Dec 2021 13:42:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 60D746B0073; Sat, 18 Dec 2021 13:42:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 45FE06B0074; Sat, 18 Dec 2021 13:42:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0240.hostedemail.com [216.40.44.240]) by kanga.kvack.org (Postfix) with ESMTP id 31BD06B0071 for ; Sat, 18 Dec 2021 13:42:49 -0500 (EST) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id E949F180ED79C for ; Sat, 18 Dec 2021 18:42:38 +0000 (UTC) X-FDA: 78931785996.19.84E1ED9 Received: from NAM04-MW2-obe.outbound.protection.outlook.com (mail-mw2nam08on2087.outbound.protection.outlook.com [40.107.101.87]) by imf31.hostedemail.com (Postfix) with ESMTP id 49B3C20031 for ; Sat, 18 Dec 2021 18:42:29 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=J9kdAfrsB/kZKY++6op8DCc1c6Vap8/mUQBYWA3F5RYRjbHba5IQ74eiAVeqHAGhR9p0l5N3F0w6ZRmDr/IhxcGGbhoMLy2FL3LDWZzJebOMuySWS8xB5OYIwlWRIGHZXqlQqJrrAQh8+0/Si0RP2SiwhlcL/YJrzBwks2Cg8i3B37aodV+rV2pwLfZVLBp1s2sDeJ6YXyywYz+p3MoO08humNy9XSrGbgcraLgp9R8Vz+2WyLUIwbyMchKblWw8zmc+SYNJ5b83N1c7//7c0lbrCQOcpILUQAU09MHxVs5uDf/sJTMsG6+QQZEpP2Ds1lzy13b5aJ1h+BobZiF/5Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=YiizjnnpB6GVBmYbe/rJst1xTKs/wm3Uf5zlJ4pNCQg=; b=UOnPl6DPUIU0HXcAv3PSfk1yR6zGP9NJWQ6qboQW6PUn6QidSnNPBY1CR7OxllBksRKY2QK2ysG95+JnssuHcUnZH3T0HTWVjFA5ZclXmJj8/7pOr9+kO+pVsTadckGP9xZ5gj2FR91zS8mgo2rBmDdc/57kSwC/VVH323lOaabJEhqRj0J+kLJINkM2GWbjyebzMuENXrzCVhzuV97/6Af1HAModjNcMH1SjhHMWQ9vFjGYLNvJvWQzwSUI08etkmCNE10XJTZXrDqmjq7cNP4PydAeZWmTXhIrtTsIr6ejwHpH1eTcZC3OlNp8vrmG2EDkBsAxqpe5NMTim8WI6w== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=YiizjnnpB6GVBmYbe/rJst1xTKs/wm3Uf5zlJ4pNCQg=; b=TArVGwjoKt5P4RqCa49CtltW47bQa402WjSpg8LVjjCEdmXdkUjyt0fMd1iG9xyUOm0rXC/Qa+23GFn/9BmeTM8NitSQ3gRE0kmSuI3flJcBJ9eqC7bZ0CN69f6q4ngLRW7pJBR8B8FJakzKIygJPEbDG0zza8gMOr7mlP1BEUgjhZhDVMHyV+dQy964LYdFuaKKdTlly/2UQJp0/P0jM4vlUfmEgQ3biZsbsRdMKAEwn7B5Mv0ZyhFEVGVlDkS68cdtXzpAci4+ZAEuYBlR7x9kx4IPMNLm+/vVhySenSWlExVcbr2DUFLfluJm2IJK20/ZPYunz+W28/MpfSsP8A== Received: from BL0PR12MB5506.namprd12.prod.outlook.com (2603:10b6:208:1cb::22) by BL0PR12MB5507.namprd12.prod.outlook.com (2603:10b6:208:1c4::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4801.17; Sat, 18 Dec 2021 18:42:35 +0000 Received: from BL0PR12MB5506.namprd12.prod.outlook.com ([fe80::d8be:e4e4:ce53:6d11]) by BL0PR12MB5506.namprd12.prod.outlook.com ([fe80::d8be:e4e4:ce53:6d11%8]) with mapi id 15.20.4801.017; Sat, 18 Dec 2021 18:42:35 +0000 Date: Sat, 18 Dec 2021 14:42:33 -0400 From: Jason Gunthorpe To: Linus Torvalds Cc: Nadav Amit , David Hildenbrand , Linux Kernel Mailing List , Andrew Morton , Hugh Dickins , David Rientjes , Shakeel Butt , John Hubbard , Mike Kravetz , Mike Rapoport , Yang Shi , "Kirill A . Shutemov" , Matthew Wilcox , Vlastimil Babka , Jann Horn , Michal Hocko , Rik van Riel , Roman Gushchin , Andrea Arcangeli , Peter Xu , Donald Dutile , Christoph Hellwig , Oleg Nesterov , Jan Kara , Linux-MM , "open list:KERNEL SELFTEST FRAMEWORK" , "open list:DOCUMENTATION" Subject: Re: [PATCH v1 06/11] mm: support GUP-triggered unsharing via FAULT_FLAG_UNSHARE (!hugetlb) Message-ID: <20211218184233.GB1432915@nvidia.com> References: <54c492d7-ddcd-dcd0-7209-efb2847adf7c@redhat.com> <20211217204705.GF6385@nvidia.com> <2E28C79D-F79C-45BE-A16C-43678AD165E9@vmware.com> <20211218030509.GA1432915@nvidia.com> <5C0A673F-8326-4484-B976-DA844298DB29@vmware.com> Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: BL1PR13CA0266.namprd13.prod.outlook.com (2603:10b6:208:2ba::31) To BL0PR12MB5506.namprd12.prod.outlook.com (2603:10b6:208:1cb::22) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 071ce01c-a18c-46dc-a7ba-08d9c256282b X-MS-TrafficTypeDiagnostic: BL0PR12MB5507:EE_ X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:10000; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: Aq6qEb3LGJeAM6ix9c4bt4859uMnJZUcYe494g2rPULobSFj8tUp8MfvAZY28WA5/5/i4gt9FxngradxScGziHXDAV0fzmlwdd60CLhwptTjHjwSdRLgGXnL+a9jBnRjMQ/6o6zLj4r6arNxV/MC/GANF4/U2OHkZoxquRY84ncTxQKkWPE9JUoO1Ra9UbARfVWqdFiMXIqpo4ZR32ykAW3WYPKQaslUv9oz4XkSofHtBIuH4sKkSiF5WzEDzNVHkdSaMiwWur56R6TRBLUXpN2vSzRNZwFHydE+vejUeHpJmwFccARGzKvaboEm4nvu/LVd56hjeSj+xNL6OGE6e1f7Nor1HWxIodp9wNsA2LaOkQhInLPCswwQDJhuBT1Fx2O5mSt1GSy0EvQAEdBxXAGXX13Z1o2hEBhUXgaOkVOjreHAzZ/wX5K1AfNPA1owL1Nl7FrOddJEA7o9XonZJynunfxCAOO+ErBKclt1LlHmqO6ISJKmX9MGL+jbs77sFXfX5j8lBCTEbbijRAHcQRQtOzArYtvaVJLfWQvsHGklSqi2EElG9qHy0+bxklBq9/blBsHJGwtTPsG56nHi/QluzwB3HwvUbVX/czjwtBBEtKK6pscZ6pTqH0vc4vGH2rD3tgoa0vVGIEYbBnWLdg== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BL0PR12MB5506.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(4636009)(366004)(8936002)(86362001)(36756003)(33656002)(38100700002)(8676002)(2906002)(2616005)(6486002)(6506007)(53546011)(6916009)(508600001)(1076003)(4326008)(66476007)(186003)(54906003)(26005)(66556008)(7416002)(316002)(6512007)(66946007)(5660300002);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?6W7Xd7dVHI321VHs3AbHoWuAOqF5l1A4y7fUqTDkhpRpCfWiotbNYsKQeshi?= =?us-ascii?Q?93nMyKgB1O4tiElQCa3QRqJVFHYUSPEKKwGYvfkeZaK1rUP31sfbqZ4uZIBO?= =?us-ascii?Q?y3vEbPBAPZNNqV8RLuI4GtVJcXuQlLI9g2/p6sYMWb9a42A+GsFCSGdbtYfN?= =?us-ascii?Q?4r9hK22UBcWJv9NgBlZjm3imVLon1I5eSZ8MLfuqwagLjaTW7CPwzlVgmFgy?= =?us-ascii?Q?UdT1vOVWqrbNm7nIa8elZi2fd7+nm0MZo8pwigDodKuT516VmWVpV+JPhybR?= =?us-ascii?Q?vF2+EQh6toxgQbEji0q+TTdvRtAAETvEhJtFkuXjUDhE1oEz31BjrWDVH3Ya?= =?us-ascii?Q?8I8bvoZbTG8TccijNUW6HqVHdSP1cGYGm3tnkixV66l1JRUrx8hCxM0n3Il8?= =?us-ascii?Q?fkJ6ZCvguE+rbImdsNf/RsNqNkZlii6arJBrX1LenFr9uYAdgDL3XbFyjytQ?= =?us-ascii?Q?VZa/ytXPL9Q9o09OFJZjM2vwrXQMsmtN5hWAxhrdUFzVTWdUxCz+UjPUjrdN?= =?us-ascii?Q?4A/KDOJowxuko4N16sJSgrYrjlefhWZugz8enkipaQqi+fO4SwN4IVPY4yHm?= =?us-ascii?Q?sPkihdR29U7qFRBTitomV2nlZdTXAkHcunbYvCybdt9O4D7XUXjKd8nmcmVU?= =?us-ascii?Q?6Mcv0hHtVWfoyH6+ZhDpSSADltrbNkRSC3/2waf85h0ytij6RkQ4+GEkD0Pt?= =?us-ascii?Q?Vq0WWLg2kwIQfRseIBKCS9MyRjzMQsVZjTjPCnS6+PPVsHp85mbiDCtf32cY?= =?us-ascii?Q?8XzwuPbonfHriTiJ0r4fqvZvW4KQhfBt4PcSFfeLnAsVoGsUHr3iCnqHex8F?= =?us-ascii?Q?pPWH8qT1Fl/43xfW2yQTX3Q+Br6mXksCoJ9/8uZrIkd3Gb+vGndXyDJBxfIS?= =?us-ascii?Q?auJxKM1L+Wp4RhqmO9tLnBXX0hkmSqRGCO911ZPJ/S2Olpg+14sEq5hPMoi6?= =?us-ascii?Q?T/M42aHTcDow0H9w9omxKm5eF4rCfozJj+7jCp44l9+iM3eV17HStaIV/uyn?= =?us-ascii?Q?hbD+QrW12aO28aTMOmBnJ1B/cmAgK27tdSTozhYibOEcNxEYD6mRFm9e52sR?= =?us-ascii?Q?u/IpgDY3pXHFuqUH7JoU2s/XoYqAPj0Bvu0YC+Q2oPy9wc+bQL9YzxRFZpdW?= =?us-ascii?Q?2yV4VdoJ6cJflZ/DTfez4XisbRr9exvJaCZ69NqroTaq/heF1YNi1zAN/I0I?= =?us-ascii?Q?9lEnQHied6te8mwvX09OFflrRaXNBfRca6x8wyjiemhHA9StH6D6VG6bA9ne?= =?us-ascii?Q?sMordOWEWg3310QN+p/3iabrAWZnyjBZBXHjgavWmZaxfkHVzAmaeibZJ99J?= =?us-ascii?Q?BuKtkauER7UxbYN6uZfpTlmGJPRWLMvnv+V4P6dmJs2WUqR6PvTVwDcX+tG2?= =?us-ascii?Q?TSpsTkg23GoxNAplF6ovw4iJd4b5BH51CdsRhL+t2DdquZcNfzFAkU3gBFzG?= =?us-ascii?Q?EFoTeozK0SBrfDX+U49+PpTdRSlre+4NGkcxqYAYalpDz0+8mzvDmfzsOIhe?= =?us-ascii?Q?gPQvVAObgIkIHGECzlgRWycqJXOPt15E4mNjg1NsB6krmxojZkMera61jPKI?= =?us-ascii?Q?cB/Mcws7rtg7olRzF1Y=3D?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 071ce01c-a18c-46dc-a7ba-08d9c256282b X-MS-Exchange-CrossTenant-AuthSource: BL0PR12MB5506.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 18 Dec 2021 18:42:34.9373 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: x00JQl17ZK7PIM7z7E1JBFX9+9/UTdLL3o1bp1H5el/loDZD1SYdcEEW0Jxd5ksK X-MS-Exchange-Transport-CrossTenantHeadersStamped: BL0PR12MB5507 X-Rspamd-Queue-Id: 49B3C20031 X-Stat-Signature: ub16btxaymybj8act1uxkan5imtbk8y1 Authentication-Results: imf31.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=TArVGwjo; dmarc=pass (policy=quarantine) header.from=nvidia.com; spf=none (imf31.hostedemail.com: domain of jgg@nvidia.com has no SPF policy when checking 40.107.101.87) smtp.mailfrom=jgg@nvidia.com X-Rspamd-Server: rspam02 X-HE-Tag: 1639852949-997292 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Dec 17, 2021 at 07:38:39PM -0800, Linus Torvalds wrote: > On Fri, Dec 17, 2021 at 7:30 PM Nadav Amit wrote: > > > > In such a case, I do think it makes sense to fail uffd-wp (when > > page_count() > 1), and in a prototype I am working on I do something > > like that. > > Ack. If uddf-wp finds a page that is pinned, just skip it as not > write-protectable. > > Because some of the pinners might be writing to it, of course - just > not through the page tables. That doesn't address the qemu use case though. The RDMA pin is the 'coherent r/o pin' we discussed before, which requires that the pages remain un-write-protected and the HW DMA is read only. The VFIO pin will enable dirty page tracking in the system IOMMU so it gets the same effect from qemu's perspective as the CPU WP is doing. In these operations every single page of the guest will be pinned, so skip it just means userfault fd wp doesn't work at all. Qemu needs some solution to be able to dirty track the CPU memory for migration.. > So that sounds like the right thing to do. I _think_ we discussed this > the last time this came up. I have some dim memory of that. Jason, > ring a bell? We talked about clear_refs alot, but it was never really clear the use case, I think. Plus that discussion never finialized to anything. David's latest summary seems accurate, if I paraphrase at a high level, Linus's approach always does enough COWs but might do extra and David's approach tries to do exactly the right number of COWs. It looks like to have the same functionality with Linus's approach we need to have a way for userspace to opt out of COW and work in an entirely deterministic non-COW world. WP&GUP can never work together otherwise which leaves qemu stranded. Or, we follow David's approach and make COW be precise and accept the complexity.. Jason