From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1C8D7C77B77 for ; Mon, 17 Apr 2023 01:06:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 394008E0002; Sun, 16 Apr 2023 21:06:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 344048E0001; Sun, 16 Apr 2023 21:06:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1E42B8E0002; Sun, 16 Apr 2023 21:06:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 0C9C28E0001 for ; Sun, 16 Apr 2023 21:06:40 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id C0E8240322 for ; Mon, 17 Apr 2023 01:06:39 +0000 (UTC) X-FDA: 80689092918.01.7CF596C Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2080.outbound.protection.outlook.com [40.107.92.80]) by imf05.hostedemail.com (Postfix) with ESMTP id D6E4410000F for ; Mon, 17 Apr 2023 01:06:36 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=H9gclq3S; spf=pass (imf05.hostedemail.com: domain of apopple@nvidia.com designates 40.107.92.80 as permitted sender) smtp.mailfrom=apopple@nvidia.com; dmarc=pass (policy=reject) header.from=nvidia.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1681693597; a=rsa-sha256; cv=pass; b=YGeKqT2XhIkZq90WaLFrBdVko4dRiwCRyRquX5ifme4V5SUlPu3rV/EdWcZ79alxH6Fhmk Z5Mcir1zLUYI9UwRGYg3G8ebb+lTLX2AnOyqUz2jn+gIqntJwWlVTpu0HEqYdgC2BBWJmS i78Lls0F4m9xebrV8z5k7L1vYsSB/eg= ARC-Authentication-Results: i=2; imf05.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=H9gclq3S; spf=pass (imf05.hostedemail.com: domain of apopple@nvidia.com designates 40.107.92.80 as permitted sender) smtp.mailfrom=apopple@nvidia.com; dmarc=pass (policy=reject) header.from=nvidia.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681693597; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=s084YyqkAbH9fOmp945tsfX2FJHTbkRY8pGFJfdOiTQ=; b=M/6bEwOYlHRiWrKPaSF8+pFWKiYJtHIxS6zNdo95pMR65nGTG0W31awqLxhs/RKTE3AvwI W/vMkvJ0G6/f03QQaOlc/kK/pFHCzDVyqTTkb2TIsQmKHbToEiRHNoR4cpdJKWIIyD5vx9 sbIngs1QrT4GtfkdvMzaNHUqdVcVkTk= ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=hicf0OrkKVp5YcX0Q7NpKye3/POBZt9UfZYoLTazWDxhTc7HvsvhzkYh9u6auUMVBVBnIKI1Reav45Upab153Ex4YEKyBNsp8HFyi7vG+SAok/x9VMIWS8JL5zk0wzqfny828rUA30amLNSartx5p1BqImJkvN8C0qV1NsprDyJ6+2hbVtYevnZmo1iNuAFlDiShwgmjGLC53RCIusRwt8keGBYsHOdAm7BhlXYhTegivd/0L8WqY2QKVlyoIoVkYZH6K9mI8RNOUE/lbAjLmIRcY74HZCpLa+ELhaX1c7ZxIMM5zFb0ugE78wA+rWrDp5XWtJ+cg0gCQEmSLMk+uQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=s084YyqkAbH9fOmp945tsfX2FJHTbkRY8pGFJfdOiTQ=; b=AR+7gMBIfXOQkWwk31Kvo8eG3e2QRv4sSpRvCOYGqx639ZWWpVaf40OXk8pizsQ+XSs7/fGhunyLfkzeOMyWrYwm3sHRAMST477ctnJjaNpIo3IoiyARspCM7v20WN0yIrXnecI/ZPrHnj3nP8iSnxjvI2lMbmqC6G/cy6e16EMXo5n9i2tEsueNlIaC/H7ZHhmjviwB6rfl5wCFsbRHoYn9KldUhK11zasosyeI8SntU69oNqwhfaw76G6pyuRm+SCTmiWVcwSExOPNHGhOwgHFWvt84MRQ0q03r19J6R01MyMY7fdMrnOkk0a340pHnckXq7+SRVKqY54BWVgkNA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=s084YyqkAbH9fOmp945tsfX2FJHTbkRY8pGFJfdOiTQ=; b=H9gclq3S2y5oUWcme7wE+zPBw5xyWnZ/oFtxuLEbvGbmhuzM3MPxg4d296Fu6L+N10E0iVbEVHXqv2ukYPA1zPMib3YzLxgkS8REaBE5JjqYNptxREUtV64OBmgzuAWCUbP2TCS+Ii8PDUh8lZzAerUhMdsyDM2apO0EkNjaRw/ou9fwWiuXlnsDJbu/YIUApk64UbMYXLhj/1ouxF68o/aOnUZ4uWs2R6xGwGxDjNF9lWoe5QK9fC0BkIiGK+IhYTwLg6TmXGoQer2lttHukd3MEzN5vEaM7CbJuzwDy1SJDyQDFD2c58/65yDk09LhbRw0ir3mrDRELQkCZk/0+g== Received: from BYAPR12MB3176.namprd12.prod.outlook.com (2603:10b6:a03:134::26) by DM4PR12MB6111.namprd12.prod.outlook.com (2603:10b6:8:ac::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6298.45; Mon, 17 Apr 2023 01:06:34 +0000 Received: from BYAPR12MB3176.namprd12.prod.outlook.com ([fe80::f9e4:206e:75c3:eaa7]) by BYAPR12MB3176.namprd12.prod.outlook.com ([fe80::f9e4:206e:75c3:eaa7%7]) with mapi id 15.20.6298.030; Mon, 17 Apr 2023 01:06:34 +0000 References: <20230414180043.1839745-1-surenb@google.com> User-agent: mu4e 1.8.10; emacs 28.2 From: Alistair Popple To: Matthew Wilcox Cc: Suren Baghdasaryan , akpm@linux-foundation.org, hannes@cmpxchg.org, mhocko@suse.com, josef@toxicpanda.com, jack@suse.cz, ldufour@linux.ibm.com, laurent.dufour@fr.ibm.com, michel@lespinasse.org, liam.howlett@oracle.com, jglisse@google.com, vbabka@suse.cz, minchan@google.com, dave@stgolabs.net, punit.agrawal@bytedance.com, lstoakes@gmail.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com Subject: Re: [PATCH 1/1] mm: handle swap page faults if the faulting page can be locked Date: Mon, 17 Apr 2023 10:49:31 +1000 In-reply-to: Message-ID: <87sfczuxkc.fsf@nvidia.com> Content-Type: text/plain X-ClientProxiedBy: SYXPR01CA0133.ausprd01.prod.outlook.com (2603:10c6:0:30::18) To BYAPR12MB3176.namprd12.prod.outlook.com (2603:10b6:a03:134::26) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BYAPR12MB3176:EE_|DM4PR12MB6111:EE_ X-MS-Office365-Filtering-Correlation-Id: 70c508e9-5e2c-4b37-8bab-08db3edffc65 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: spYb51gWUKGF8lN33rWABp910okXj9GKu+u2fNNjpeCM2BaYXCOrYB+5dFoLJYmKO5jTfIgMvwqn1g9NUrA7yU9jDwTxPZgSLTg/B4QFIno+JLy4cRxnZwUaNuxdIln5UU1h5cp+o3BpiPKwunG+mOiYLBYkLkpSb5UuNb+/XjKCXCNVe0arp70AifSe2svFZIB3jD+sIw9QdPai4VLoZh4A05wOv5hZzXH9JsuoXP92TRDZKpFoEZAGiUaDHt3i7UHTkzQd9wbv3MVbRkTLEAE4kh5OVfuOOHSxSqe5KX9GG7fwuAYhP/JYSX2D4pFnLcVrGzaTeMHX8qvd3jqTp02PwXHQNvGzRYvQKq90lCFkQQae/QhYwEivmb+WSTBUENFqvztCcIrd61SwqvPOPTrh6NYzQAGFFImsdQRNxSqUln8xByc+3AI5Lt78Hxoq+9hVqrTkOL15l/+GD3firOlN7qih4yOd76Zd5fugVatmZc53atx0D1TIACUdEaMn6nONKa03weusW5ksWsdpXkpkb5W03BmWcnSMnTKR6QIHBrKr/X+e+cyuFgNmS92Q X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BYAPR12MB3176.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230028)(4636009)(39860400002)(376002)(346002)(366004)(396003)(136003)(451199021)(478600001)(6666004)(38100700002)(66899021)(8936002)(8676002)(316002)(41300700001)(6916009)(4326008)(66476007)(66946007)(66556008)(186003)(2906002)(36756003)(6512007)(6506007)(26005)(86362001)(83380400001)(2616005)(5660300002)(6486002)(7416002);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?+i6c/GO5cMWPVydXR1kEMCw2lTQLLVtMXEc4RZxwaxJHMz4oMCUVHZ5sUIbV?= =?us-ascii?Q?d0CmaheMN/IVhvwSu+7BwXRL667Wj7Wma0DJnj6isr2WCxtk3wOSP+JhublA?= =?us-ascii?Q?8tYVtTAXJcwxByiexJdfEyide4B6Rl7NEZkQVT6o4GhhUi8M+Tpds5QGZ6dR?= =?us-ascii?Q?iQiOi5xdIxcInCMC1TsHzuPpKIPMklAVyhamRKomEHjgBVYZ87F7tqtQJuXj?= =?us-ascii?Q?4XwAtMRRZwmYmnb3ieLAw8nwWHGG6T5ZecESa83pRirHxxVL3RvZrA0vRSYS?= =?us-ascii?Q?3jm/Qqiy/s7f9zPBlRbvJsVR5H2u9zQaGF6MdK6j9r9BQabNPSQdgUNp1s7s?= =?us-ascii?Q?e/FWGghC/yDRz4/TCrFDZMK5B3xX9Y9rkvQa5cjL4E5GpdHUTR14znozyUr0?= =?us-ascii?Q?sQ6Nsb8JhL8ZpbqYEEX6qyMW9bKMSsHvhUnw9MqNbJC2aaoC7ImWQFcwfbzX?= =?us-ascii?Q?DSDgH8Qe89RArx7CZS+hg2tnNI7bLU1cblUIZW1pVmQJlSsumK3ett81Ppac?= =?us-ascii?Q?Hhf84bVTGw17w6R2g+ZRIrFfW6dc1lJTowxYGMuxn1PWZsuCzNzjBNe/QBMK?= =?us-ascii?Q?vMBtnP4xjantZJn5BOEPRKdSPTnHqXj9537IpGs3GGIpzUntWx2dU5QYediF?= =?us-ascii?Q?XU+ZlkudhUtp/WBS2XDv6bmsgkIVIP1Gvn4/vyiJ6sDaONjH1cVSIYFJjB/x?= =?us-ascii?Q?zDplMWJKSvJ8xMowDQyUtNnG5f0MqqiVgm9hiDa0UULTj7DI2+kmQkq0ZGQa?= =?us-ascii?Q?TKxgx2SQu+dGJXnrhc2zjVlZBNEc7X89yWe9NLJWwBkcTpYv5TbHka8qoIqe?= =?us-ascii?Q?06QUE2WrSRnhiNqnum6MkGhwheuYcZO+uU6Xok70f0YOpOPp5zyBfAwjDNTI?= =?us-ascii?Q?F7k9zWvMWK6ZGgw+Ff2VWgGUpntj/uKjoQldCTU/w2OQZsLyXqIXX4gtTUq8?= =?us-ascii?Q?yz7kM8k7g4ekmldJ38FtvKS0M458AzCT44Ksy1daE5ghiDcBwRKunM0qlRAJ?= =?us-ascii?Q?/eSis6CwRZ/U2hnQXlriPTXjPpIOWEMciqwt/KuEqGz0I9/Nh0H/E3MW0zsK?= =?us-ascii?Q?Z5yKzD8hJwlShujTVKtqlozQFoQxGbb8c4E/NUkOhjjiNKWjTu2scU0OtcOO?= =?us-ascii?Q?nX92HrWEIrLPbGtCkVhpzaoWw1Dza1x9QrnGD3sU915FRUKb6yyiZurL4KXi?= =?us-ascii?Q?rUq6mo8nbrZusydUb0nWYrn+Uwwh8AXYwh7I4kT/4Xgnp52jsg2hu6jIK5OE?= =?us-ascii?Q?yxYAI9n2Q5IQlRySe7+c0oHFi0WtzN912ORwEyRo8ehiwidJj5tr3LAWphqi?= =?us-ascii?Q?EYHZdxUBqMRUtmxfzkrTMYemK7rS64o5YBkHaYPWtwU9X5cPisXrky4dlyb3?= =?us-ascii?Q?6WkmiLtI+Rue4qCA93L8p20IV3ibKdsO4tSksu6BXEAHdJSDwe3U1XNLbSPB?= =?us-ascii?Q?wqXjSf1uf0qO/irC8150qlGVFCs7rVWfEFi5pAq2lFdPcqp9/smVtHEaq5+l?= =?us-ascii?Q?c0gRqOj4Zo89RXvEHcnu0jb9CMKJsoy8a3kwZhpiZoI95VvH7MJg0GfSQWm4?= =?us-ascii?Q?zl6b9y6lI+Q37ZnLdqqby8Ph+9lQ4bI1rnVI9Jxt?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 70c508e9-5e2c-4b37-8bab-08db3edffc65 X-MS-Exchange-CrossTenant-AuthSource: BYAPR12MB3176.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 17 Apr 2023 01:06:33.8810 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: 1nC1lKFbYPuzHVFFoxOfyvqkeRuJ/6Ky5aWcDTTL7PNmaipxsO20K0vcX02jgWQDsRwsJMWVv1np9bQ2rT9vHQ== X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM4PR12MB6111 X-Rspam-User: X-Rspamd-Queue-Id: D6E4410000F X-Rspamd-Server: rspam01 X-Stat-Signature: nqqaaw3xe6yhajzdosct74pgyqh8t5gs X-HE-Tag: 1681693596-517098 X-HE-Meta: U2FsdGVkX18daPG4xjk5/bWFJE/anvMH8QECeUquXFpr7nTIHOXb+kJXLKO2G/N2iUijrxa0KYChgXWAa6obtO24WpJV4iahdddNT6YKJX1XBNxTpiPWWwg1YZCSVFrJ0TMHIMgbCV7BtM9ofmItlC9ZbdcueyXiGnI3c3niYMtZqYdZqU763jw08vkKalCerjGC4s5rSmyGMNXe48W1zLQRQyCYwDQk2Nq4fpTL17Za642dDHZ7NdDkkccex6r5dQfSd96B4iCQDtcDN6XXzfCo57v/6BK4Q/DsKyD4Gp9ufbBZG+1LRJmm/DFTtLkoHuRsT4bEFjOt4UBh4xvgTJC+SQ5rlYpSf9mrXJFIvL6nOa18eakDKAvO7ziG4FYIrAfdWbMwpbsLR76lNbnOzcM7rWI2Dm5eHOablg/JfidL3GpHH07VGY63YT+CpNCxmY19OnaMZy7BOWZKRxc9tpd3W70gLmOVYNSt9vQnZeHjEiMQJ+X/hexpShRZOuPArCHLyJjgHkh2mHnQfZ6xu2N0kQ1AH94MvXpTOawOMIoaA3AuvIo0mKzbsPx19yNcVuU9nvjGxYfvCu6sU5COgCDf0eEbb/IA/Nl5Xod/N21RR7ImcFqHwFLtzNr/qL1pVDusl//pt6GbhZ4PqQ28Bw+VTvo3N1foU9m550agACakynV2InJM9Q6RhF9KOIojD91FtizSijLYjVDF8noJpe4m3Tz5sCLFgCQve8+Fr/OqlRoWxeJHvpGN5t83ujAcK16FaH20P3IVVNVw8nz3G27O7kKNvYK9DAw5xzmkAOpCN3IJsiYv49X1lqroJsyqS77RuLnr4kUECeaSVnWIOjfJiNafuWgheEip7pVA8VgFnBI97bWgOZMJPifhg6730ePsFCfQnW1n3XM0soIXJ4IEDzctgAjWPYCdZiKTAORWo7q7W9soXaKfTGbd3wHQdCWIPfkicc0XI11FCwz lpAHuyb3 n3h8w5L0A8ObRevSk8j3DFFwXQRWd6vAkOJELz53IsjgdKROLzNkoHwRSu6ZFo0qZh7I/Os2ppm6fUeiePYd5cGtmIWNpHSAO+bfDrc9IZoU1yWC+z7SbZwwSCZ2v+0fqHd/nqnCzLJi4VQaogF5uqryatHfCf7iSOsGnLugeVzXuYHGNIVS/bq8mrkXVFcLyrlLxH6meg1/V0xzbiZNHxofC2+FqrbSA9zo7Yw4gbA0TRPg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Matthew Wilcox writes: > On Fri, Apr 14, 2023 at 11:00:43AM -0700, Suren Baghdasaryan wrote: >> When page fault is handled under VMA lock protection, all swap page >> faults are retried with mmap_lock because folio_lock_or_retry >> implementation has to drop and reacquire mmap_lock if folio could >> not be immediately locked. >> Instead of retrying all swapped page faults, retry only when folio >> locking fails. > > Reviewed-by: Matthew Wilcox (Oracle) > > Let's just review what can now be handled under the VMA lock instead of > the mmap_lock, in case somebody knows better than me that it's not safe. > > - We can call migration_entry_wait(). This will wait for PG_locked to > become clear (in migration_entry_wait_on_locked()). As previously > discussed offline, I think this is safe to do while holding the VMA > locked. Do we even need to be holding the VMA locked while in migration_entry_wait()? My understanding is we're just waiting for PG_locked to be cleared so we can return with a reasonable chance the migration entry is gone. If for example it has been unmapped or protections downgraded we will simply refault. > - We can call remove_device_exclusive_entry(). That calls > folio_lock_or_retry(), which will fail if it can't get the VMA lock. Looks ok to me. > - We can call pgmap->ops->migrate_to_ram(). Perhaps somebody familiar > with Nouveau and amdkfd could comment on how safe this is? Currently this won't work because drives assume mmap_lock is held during pgmap->ops->migrate_to_ram(). Primarily this is because migrate_vma_setup()/migrate_vma_pages() is used to handle the fault and that asserts mmap_lock is taken in walk_page_range() and also migrate_vma_insert_page(). So I don't think we can call that case without mmap_lock. At a glance it seems it should be relatively easy to move to using lock_vma_under_rcu(). Drivers will need updating as well though because migrate_vma_setup() is called outside of fault handling paths so drivers will currently take mmap_lock rather than vma lock when looking up the vma. See for example nouveau_svmm_bind(). > - I believe we can't call handle_pte_marker() because we exclude UFFD > VMAs earlier. > - We can call swap_readpage() if we allocate a new folio. I haven't > traced through all this code to tell if it's OK. > > So ... I believe this is all OK, but we're definitely now willing to > wait for I/O from the swap device while holding the VMA lock when we > weren't before. And maybe we should make a bigger deal of it in the > changelog. > > And maybe we shouldn't just be failing the folio_lock_or_retry(), > maybe we should be waiting for the folio lock with the VMA locked.