From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CDC3FC5AD4C for ; Thu, 23 Nov 2023 05:16:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0DA906B0508; Thu, 23 Nov 2023 00:16:41 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 089316B0509; Thu, 23 Nov 2023 00:16:41 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E6D036B050A; Thu, 23 Nov 2023 00:16:40 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id D69186B0508 for ; Thu, 23 Nov 2023 00:16:40 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id A8240401A2 for ; Thu, 23 Nov 2023 05:16:40 +0000 (UTC) X-FDA: 81488058960.03.B3F7E80 Received: from NAM12-DM6-obe.outbound.protection.outlook.com (mail-dm6nam12on2075.outbound.protection.outlook.com [40.107.243.75]) by imf02.hostedemail.com (Postfix) with ESMTP id CC4E58001A for ; Thu, 23 Nov 2023 05:16:37 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=jMqK0EZU; dmarc=pass (policy=reject) header.from=nvidia.com; arc=pass ("microsoft.com:s=arcselector9901:i=1"); spf=pass (imf02.hostedemail.com: domain of apopple@nvidia.com designates 40.107.243.75 as permitted sender) smtp.mailfrom=apopple@nvidia.com ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700716598; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TWEc7WxPOSC/jdspV2ZxxI1QRBaETZvioRaPPCiVe9U=; b=bEv1BU7dFYBkl185UBjRlVkLiD2r7k+oYaLdnh775Wt/NA4PpLDGh5BHukYXC1iyFFQVT5 QjAE/pi5maV7cmqeULAElcBisaeKif44tJzlEmp/cXhZeMqc7rK6iaNU0D3cLLxJmpxohT ooVFCIJ8qKMUGTsnfOBEthY1S9nuNiM= ARC-Authentication-Results: i=2; imf02.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=jMqK0EZU; dmarc=pass (policy=reject) header.from=nvidia.com; arc=pass ("microsoft.com:s=arcselector9901:i=1"); spf=pass (imf02.hostedemail.com: domain of apopple@nvidia.com designates 40.107.243.75 as permitted sender) smtp.mailfrom=apopple@nvidia.com ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1700716598; a=rsa-sha256; cv=pass; b=iffWcz6rK0W/s11ArE7sLH5f2e1cNaTjPCgV8vCVLl5ydg/WCirbD2q78tKQEv+yJWc7I2 vocNXmZiTsk8lLLUYi32KTDHLXsdAqpdmrB6JP4TImbw8pzlyShJS9aZBinYG9z90CCJIu puXiEbC0XA4aYtBODubMSAa79rbu8GA= ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=jRmhXmE4L21EcfVPmYVipUvFH4264oZdatEkBWmB0Fku06LNYL6/KsnQVYiHIQ11pKF224BgYivPCXTOD22oU9f/LAW44kcamZC1cmi+qUTftO99DtflZfO1C6ptt8ESahmtbm0cYoTJqaCaf2ep+f/8qmyoPf1L4kUi8eGSTASkSCbkOWRW4viIfjy5rswatpRhONk+WBTuWj/8HnNbtm0eIUTU5U3hOzK0EGI9aJ/5wlfz8VvPz8nvcbjt5XBx9KQn9OUiQCjdVy8zd5xpDwFgnplpy6+fPaArD47s1YfZkfbd9846tkT80s2sYJaNN18NdC+ZxsgTXudsKWHEmQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=TWEc7WxPOSC/jdspV2ZxxI1QRBaETZvioRaPPCiVe9U=; b=DfQLmZGcHc6LxBXY3dHhBHSPS8IIFsSoGKxKKmAaEXpjcyV9SJQyBnf29I3yGiJFS7EnePE8pjr9kaqPfDGTsqw9SfYfwB3DiAqyxKd+O3N5E738MxpFobUlMT1YvbfLs+QkEmpi+C7SPiDY4ZlZkyNNALuLz2+L78NDBFoyx4JDWoL4iZMqgU/qqYAmdfEkAzbEA8nTtZanfLINDGQHLdD8UIuENVqMYUC7WIwjOoc0f/Rr1xXvR4EaOlEBEmKGHWcNU9DTiUdJHshS5uKAauGFt1qU0ZL/OsdSD87WY9Z4joKx9GQAwOk0TAqqefm/TX3eBZs8RU3qNSMdqRzUwQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=TWEc7WxPOSC/jdspV2ZxxI1QRBaETZvioRaPPCiVe9U=; b=jMqK0EZUiqKCxr47CI92WZqKb6aLwghvNARvhhLG6WqUQZSU6JB08P5vR2xszYhmTkki1PZyF9U4y5PX5k8bqHjN/fqzF85pZPJyuf/5GuCgs8bqNep6oauQ3lIMMgJrBq+Vl5gWjcEm5NwSczCp0cLvPkZABJZ0u5QLQXXJQW28P0tiIKfcI78NrIMbZt0jrSLcTLr8lD8MbAnigJBNsJ4hi0SkiVkv6ZWABYb2wNkKgJerccEG9tpp6vOtxjyqRxSJpdOmlpw0vzRecSkue85sjB0euXobcVeFfivsG/I9taseDvyOZSMvGBTmUBM1i4c8FX2Ye5VJRKdIHGxyHg== Received: from BYAPR12MB3176.namprd12.prod.outlook.com (2603:10b6:a03:134::26) by CH2PR12MB4037.namprd12.prod.outlook.com (2603:10b6:610:7a::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7025.19; Thu, 23 Nov 2023 05:16:34 +0000 Received: from BYAPR12MB3176.namprd12.prod.outlook.com ([fe80::b8a:1b58:1edf:90e6]) by BYAPR12MB3176.namprd12.prod.outlook.com ([fe80::b8a:1b58:1edf:90e6%7]) with mapi id 15.20.7025.020; Thu, 23 Nov 2023 05:16:33 +0000 References: <20231115163018.1303287-1-ryan.roberts@arm.com> <20231115163018.1303287-15-ryan.roberts@arm.com> User-agent: mu4e 1.8.13; emacs 29.1 From: Alistair Popple To: Ryan Roberts Cc: Catalin Marinas , Will Deacon , Ard Biesheuvel , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Andrew Morton , Anshuman Khandual , Matthew Wilcox , Yu Zhao , Mark Rutland , David Hildenbrand , Kefeng Wang , John Hubbard , Zi Yan , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v2 14/14] arm64/mm: Add ptep_get_and_clear_full() to optimize process teardown Date: Thu, 23 Nov 2023 16:13:17 +1100 In-reply-to: <20231115163018.1303287-15-ryan.roberts@arm.com> Message-ID: <87fs0xxd5g.fsf@nvdebian.thelocal> Content-Type: text/plain X-ClientProxiedBy: SYBPR01CA0039.ausprd01.prod.outlook.com (2603:10c6:10:4::27) To DM6PR12MB3179.namprd12.prod.outlook.com (2603:10b6:5:183::18) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BYAPR12MB3176:EE_|CH2PR12MB4037:EE_ X-MS-Office365-Filtering-Correlation-Id: 814c7b67-5f46-48e7-c119-08dbebe35b0a X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: avbJwK7ibgk36epRoaVr+Hzd4ERiHr4rxcYz28IvnMMuOAwm5Q/bdhVgsFxzOWadg1Bkw+nwKzoca0toMz2YHhtD3Fabh+pHHfRWWob4nAL3LHu+TwtTpd/wUt1ILER5YSObTzjFFoYA8lOzh8QDYehGn9AC5hn8IBvnbFa4p2849xn/NjvjAXYPahkWIhVDOcEQDBib4dXQ0x8D4i4aom2GjWyVDzCVn/AoyjjmriBjrr45WY+XTfV7L1LnbrA2sgEBGxgWG+jwnvURFiJLp/GdJu3TD0hTy8GiH1f+h/ANAfzrwhsAv9ElZqaZlFkRE/qJQuTj94+Jsgxe1IgI1exaScVrfUSFfG3J6nLZ+QwHo8h3FYardlb3RIUNBKBFR8cWMt1nSxW1tDmlfOQENraG2DukY2qjKZMPqpzk+V5aU3PKCdyzyyb6ld7kZQtpOM8/M+nqRWTKqGUxbo2HbEyhmCrNHhN+kgwisEVc/OqwYFfbPNJHtgW/UYrFduQnGcVNbYbX5yZGAkfpTJsm3djRcGY89/vrH4bPSBMPOVHHK2+r3S35he4slbbmuF5B X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BYAPR12MB3176.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(376002)(396003)(366004)(346002)(136003)(39860400002)(230922051799003)(64100799003)(451199024)(1800799012)(186009)(66899024)(26005)(38100700002)(86362001)(83380400001)(316002)(6916009)(8936002)(4326008)(8676002)(5660300002)(66946007)(66556008)(66476007)(7416002)(41300700001)(2906002)(54906003)(6506007)(6666004)(9686003)(6512007)(6486002)(478600001);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?LqqX3xjhDTOqdf1jZk/OcJucWsgnxn1MQNfHQYCX2RbQGgt9EEKgkPPkrEsF?= =?us-ascii?Q?45Zab0loIqMRSqMip9vrmbGd/TKxEqYIAAjBfPk9fWXywfXaJFxoWK3jxgLa?= =?us-ascii?Q?N87KIYRDxYkFM3twXMF3Mx+tIkB/NdkWWfNrNzr14qB4qphpxTEy9g+5QYw2?= =?us-ascii?Q?7Mm47SfWIWm9SefvVN+L2CZeqQFkf6JDJACxOgjo/0saEVipM5hMTv83J/OP?= =?us-ascii?Q?V/52TpC2vwPd/hZvOW6pTvM4VTV4YCMBF8AJmXcq8cWHY6da5shjufK/vxmm?= =?us-ascii?Q?UIZawbAdLF8O74A1z7O76u6W0mtOLenZtq1P25b4ZUmVfRCCe16RNzT5Yr2N?= =?us-ascii?Q?KywpO0hTgCbZel/y4HTkTXWNwV1+aK8N94H14TKP+ZBz1SKDzggc2+y+jHXN?= =?us-ascii?Q?IdOnYl2KIrIpV5rbaoTTyJIIpuCpGSrWTIJwXwPnE6AKui117V8EA8W7hojR?= =?us-ascii?Q?ouJOyou8pE2NDU3DE48qjFRtMbrWIjRZUDFFMt/8ImJZbPNh+zL06TOtI1l5?= =?us-ascii?Q?ht0t1tddssw2oZZY0HllMrRCg4FR4eKl13FvBqgGlLkeQFpDSY+GwjRCtUIU?= =?us-ascii?Q?oH0D5hROpEtSwzVK1r/Bc31Vki7Hf7p4iJ5XOj+cB+DS7ydmjiW5Jg/DUdi/?= =?us-ascii?Q?ksJeeAtHQi+Hx3k2ZImEUvbJght35cMSS41e9maLMxgGcKs5qjNmJ2IvXbF8?= =?us-ascii?Q?loBPkFuFws4mIY6kLPdVRsn+wqtIJstHFAqpUwNh31qEUrZvbFvDA/vA7deP?= =?us-ascii?Q?1gVTPiFy70QWU7La+ItJfmcX6vYYQd4r1EfTC+YP5PF3mZ92uS6nso1K7H5L?= =?us-ascii?Q?MVkHzz3bUZ42Yt3Vn5i92Xep9CXZyCuCvqewfC/OFLs/Hw/LR3EmSAxLDvYn?= =?us-ascii?Q?ySuJuCwa7jtPIncmSdQPI0d4X2w5uhMSVtHSmt41TSnOd+XubTCblr9UmC7M?= =?us-ascii?Q?l8UP36u71NAYDnWJMehQBTS+KkK6jNkgAWgYVhoO8Wd3iwJC5eB5D5KZuLaF?= =?us-ascii?Q?3elCiY1xibm+omvKbDT6p/Xh3XEJ3ODAf+EKL+vbYVpwsZXLJxI0d4RvbH4O?= =?us-ascii?Q?qi8e0xlc50JsFqjZwKI1kpQHvLZAQeLJ3LEF/PddPTPxQvJPiZkvgXgtCvDj?= =?us-ascii?Q?53sYoUJefJ5r75opQBffVfpONOvNQQB8z2V9738Q0HN/7a1lx1HcaAdRFmrb?= =?us-ascii?Q?nwHFG5XLmvtUuNynZWof03QuAtBNJjTVsxuF1uPhYjOuhrIZAsDEP+jN8LiO?= =?us-ascii?Q?0c9VwsDsOuWtDS7o6vRbiHgLOyrXcrL63wrTXGtj0DnlMXP2Umh5vprJv2O4?= =?us-ascii?Q?2RwDx48xMX4ph+JHmNPNrxrCGDyHP1hyG8DWXoH99wusfo0gDAEtcURKg3vu?= =?us-ascii?Q?ejKjv1WbBSEQhRQLgDDNzbZFiTpcD6dvduk+wTVE7pJd5/Z4SAF8Quj15FT7?= =?us-ascii?Q?TT7HRPjvkq4qs6GWkYqRlZOXuWFtjxADMiS3pcvvfC5neluu3n5HL8CC0DGR?= =?us-ascii?Q?VTewKF4dU+lA68Vv5qiQ1F7m0zeON2cIx1HrDX+HsYxmsA1+/YLB7Ur9VYXG?= =?us-ascii?Q?4lgJbZDUuG7GATyTzfazrjTTIdMfxagFKhEiTqg3?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 814c7b67-5f46-48e7-c119-08dbebe35b0a X-MS-Exchange-CrossTenant-AuthSource: DM6PR12MB3179.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Nov 2023 05:16:33.0645 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: tT6pSJUGjaw2+w2h9rq3zRboD1UsGnZDgtY8DW0wc13gRQRdbG8E2vbDYrKtmWdv8WcEXjYmJmR92HFEiVwA8w== X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH2PR12MB4037 X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: CC4E58001A X-Stat-Signature: 4ssa4jcn61hw47akrtfxzafwepcouuxk X-HE-Tag: 1700716597-492585 X-HE-Meta: U2FsdGVkX1/Fx9R2s5CLp1J4ri+jo2m+w2789M54IGNNQcOdwFxn6u+V5PlQAmHpdQ0Ymymaq5rjQtvbZdKWkAw1YnkllE3cIsOhIVsnBVha4JpzgK4AJXv1JIvn/2ohXudTbKLyfNbpbKWf3bT44ykscunjLGxL0ocIPnfde6Plx5X5Cf/7lYkn8ldUwHdbBcdEAGz5Raqtx1Szu5Xhiq5ix1NxKkqE0/CZY0Lsfifh67Ytj6rlf7y4dOVwDH3JZBAJvfiiIyJADlQuohXs4ulT5AVdTAUmvys//3TKBz9IpIRgHVEIHJCy1w9NZqXgBK7wAUNH3s42sXzYBE7BwdCM4DuO6loHbh6Jow7bnM/CrDR22QbBzAvamhIY/F4JSFuKDfAVMh/g/diWfvYd82KEKy8QwDU5S8N1Mdzvp/hdMGykq1x+7GcVqqxODCyl711+pDggJrjPePbmrZZij9v+UJ6DICCmyE1pI4/D69QQXu1U78iXIYrkosRepV8CLp3RZhS04nUNaZiAidXS84IlheAOARz/in6vXdHBuSMsFzmaf11rHicbuvm/AaZscBFKkR7AM9mMH5z05L23wC8h90c3EizsB06x/V4uXhk0YLm6FqkC/rXDpUxaiLepbhciS0QUla2GQtuhbqQclfoE9GtKYsQEwfXRR9eVBHZK4fj5YQ3Xj4awLuhnn02VHhauh4ANQ6woIgWrpsM/eyq7LbVaFvcHDlr3tRBKZM8AGnX2Jl8OjSstIfKjntBVhqJuXSMrA6ZehB0TKWF7NIct/XjqwQpQhLmcLeBMDZeLZnsPFmYF54Dge9hASldCPyDF8S2IBdIz3HedelFGatr/6wrkXB/zeOxlv6JI9yuEP2Ev53+c4+N2rZAcZ3lqylp88bIZaQiDyd1Hq9zwBmn/rg5hyPfNxHIZDY/MBnwIICqKVTVOtGl02eIOUfXTdOkfQuHXXJj8Sk2y5v7 YDhUPq99 ZB0xEi4AIfjUBdMK41iaeeGhfWHQdY2OiNuhIzZ5GJhXihQeHiAUJC7iE87pkYbjGplDS2jOAeSRts++fDVdHz4/yguTeJ7pQpSGCj3rkfdSJsV5hIX25o+oHTW+ahQVoSKP+l6aEOb+JiT4NLuvbLZY0a+MHvSKpcBnoI8X+ztPZLN7X9KMW0p9x2O3nuTAHSysQ3DsLbES8ecs2BtmzCco7rDHjXIbQE5joskARIG6N7cQFAQOIAxsoKfl2YpJGnx6CYy3bukusbUEHWckK00/QeBZD0LOXXimvZw92p+JSnqU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Ryan Roberts writes: > ptep_get_and_clear_full() adds a 'full' parameter which is not present > for the fallback ptep_get_and_clear() function. 'full' is set to 1 when > a full address space teardown is in progress. We use this information to > optimize arm64_sys_exit_group() by avoiding unfolding (and therefore > tlbi) contiguous ranges. Instead we just clear the PTE but allow all the > contiguous neighbours to keep their contig bit set, because we know we > are about to clear the rest too. > > Before this optimization, the cost of arm64_sys_exit_group() exploded to > 32x what it was before PTE_CONT support was wired up, when compiling the > kernel. With this optimization in place, we are back down to the > original cost. > > This approach is not perfect though, as for the duration between > returning from the first call to ptep_get_and_clear_full() and making > the final call, the contpte block in an intermediate state, where some > ptes are cleared and others are still set with the PTE_CONT bit. If any > other APIs are called for the ptes in the contpte block during that > time, we have to be very careful. The core code currently interleaves > calls to ptep_get_and_clear_full() with ptep_get() and so ptep_get() > must be careful to ignore the cleared entries when accumulating the > access and dirty bits - the same goes for ptep_get_lockless(). The only > other calls we might resonably expect are to set markers in the > previously cleared ptes. (We shouldn't see valid entries being set until > after the tlbi, at which point we are no longer in the intermediate > state). Since markers are not valid, this is safe; set_ptes() will see > the old, invalid entry and will not attempt to unfold. And the new pte > is also invalid so it won't attempt to fold. We shouldn't see this for > the 'full' case anyway. > > The last remaining issue is returning the access/dirty bits. That info > could be present in any of the ptes in the contpte block. ptep_get() > will gather those bits from across the contpte block. We don't bother > doing that here, because we know that the information is used by the > core-mm to mark the underlying folio as accessed/dirty. And since the > same folio must be underpinning the whole block (that was a requirement > for folding in the first place), that information will make it to the > folio eventually once all the ptes have been cleared. This approach > means we don't have to play games with accumulating and storing the > bits. It does mean that any interleaved calls to ptep_get() may lack > correct access/dirty information if we have already cleared the pte that > happened to store it. The core code does not rely on this though. Does not *currently* rely on this. I can't help but think it is potentially something that could change in the future though which would lead to some subtle bugs. Would there be any may of avoiding this? Half baked thought but could you for example copy the access/dirty information to the last (or perhaps first, most likely invalid) PTE? - Alistair > Signed-off-by: Ryan Roberts > --- > arch/arm64/include/asm/pgtable.h | 18 +++++++++-- > arch/arm64/mm/contpte.c | 54 ++++++++++++++++++++++++++++++++ > 2 files changed, 70 insertions(+), 2 deletions(-) > > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h > index 9bd2f57a9e11..ea58a9f4e700 100644 > --- a/arch/arm64/include/asm/pgtable.h > +++ b/arch/arm64/include/asm/pgtable.h > @@ -1145,6 +1145,8 @@ extern pte_t contpte_ptep_get(pte_t *ptep, pte_t orig_pte); > extern pte_t contpte_ptep_get_lockless(pte_t *orig_ptep); > extern void contpte_set_ptes(struct mm_struct *mm, unsigned long addr, > pte_t *ptep, pte_t pte, unsigned int nr); > +extern pte_t contpte_ptep_get_and_clear_full(struct mm_struct *mm, > + unsigned long addr, pte_t *ptep); > extern int contpte_ptep_test_and_clear_young(struct vm_area_struct *vma, > unsigned long addr, pte_t *ptep); > extern int contpte_ptep_clear_flush_young(struct vm_area_struct *vma, > @@ -1270,12 +1272,24 @@ static inline void pte_clear(struct mm_struct *mm, > __pte_clear(mm, addr, ptep); > } > > +#define __HAVE_ARCH_PTEP_GET_AND_CLEAR_FULL > +static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm, > + unsigned long addr, pte_t *ptep, int full) > +{ > + pte_t orig_pte = __ptep_get(ptep); > + > + if (!pte_valid_cont(orig_pte) || !full) { > + contpte_try_unfold(mm, addr, ptep, orig_pte); > + return __ptep_get_and_clear(mm, addr, ptep); > + } else > + return contpte_ptep_get_and_clear_full(mm, addr, ptep); > +} > + > #define __HAVE_ARCH_PTEP_GET_AND_CLEAR > static inline pte_t ptep_get_and_clear(struct mm_struct *mm, > unsigned long addr, pte_t *ptep) > { > - contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep)); > - return __ptep_get_and_clear(mm, addr, ptep); > + return ptep_get_and_clear_full(mm, addr, ptep, 0); > } > > #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG > diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c > index 426be9cd4dea..5d1aaed82d32 100644 > --- a/arch/arm64/mm/contpte.c > +++ b/arch/arm64/mm/contpte.c > @@ -144,6 +144,14 @@ pte_t contpte_ptep_get(pte_t *ptep, pte_t orig_pte) > for (i = 0; i < CONT_PTES; i++, ptep++) { > pte = __ptep_get(ptep); > > + /* > + * Deal with the partial contpte_ptep_get_and_clear_full() case, > + * where some of the ptes in the range may be cleared but others > + * are still to do. See contpte_ptep_get_and_clear_full(). > + */ > + if (pte_val(pte) == 0) > + continue; > + > if (pte_dirty(pte)) > orig_pte = pte_mkdirty(orig_pte); > > @@ -256,6 +264,52 @@ void contpte_set_ptes(struct mm_struct *mm, unsigned long addr, > } > EXPORT_SYMBOL(contpte_set_ptes); > > +pte_t contpte_ptep_get_and_clear_full(struct mm_struct *mm, > + unsigned long addr, pte_t *ptep) > +{ > + /* > + * When doing a full address space teardown, we can avoid unfolding the > + * contiguous range, and therefore avoid the associated tlbi. Instead, > + * just get and clear the pte. The caller is promising to call us for > + * every pte, so every pte in the range will be cleared by the time the > + * tlbi is issued. > + * > + * This approach is not perfect though, as for the duration between > + * returning from the first call to ptep_get_and_clear_full() and making > + * the final call, the contpte block in an intermediate state, where > + * some ptes are cleared and others are still set with the PTE_CONT bit. > + * If any other APIs are called for the ptes in the contpte block during > + * that time, we have to be very careful. The core code currently > + * interleaves calls to ptep_get_and_clear_full() with ptep_get() and so > + * ptep_get() must be careful to ignore the cleared entries when > + * accumulating the access and dirty bits - the same goes for > + * ptep_get_lockless(). The only other calls we might resonably expect > + * are to set markers in the previously cleared ptes. (We shouldn't see > + * valid entries being set until after the tlbi, at which point we are > + * no longer in the intermediate state). Since markers are not valid, > + * this is safe; set_ptes() will see the old, invalid entry and will not > + * attempt to unfold. And the new pte is also invalid so it won't > + * attempt to fold. We shouldn't see this for the 'full' case anyway. > + * > + * The last remaining issue is returning the access/dirty bits. That > + * info could be present in any of the ptes in the contpte block. > + * ptep_get() will gather those bits from across the contpte block. We > + * don't bother doing that here, because we know that the information is > + * used by the core-mm to mark the underlying folio as accessed/dirty. > + * And since the same folio must be underpinning the whole block (that > + * was a requirement for folding in the first place), that information > + * will make it to the folio eventually once all the ptes have been > + * cleared. This approach means we don't have to play games with > + * accumulating and storing the bits. It does mean that any interleaved > + * calls to ptep_get() may lack correct access/dirty information if we > + * have already cleared the pte that happened to store it. The core code > + * does not rely on this though. > + */ > + > + return __ptep_get_and_clear(mm, addr, ptep); > +} > +EXPORT_SYMBOL(contpte_ptep_get_and_clear_full); > + > int contpte_ptep_test_and_clear_young(struct vm_area_struct *vma, > unsigned long addr, pte_t *ptep) > {