From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BB31FC74A5B for ; Thu, 30 Mar 2023 01:45:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0A1176B0072; Wed, 29 Mar 2023 21:45:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0522E6B0074; Wed, 29 Mar 2023 21:45:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E34E16B0075; Wed, 29 Mar 2023 21:45:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id D48C46B0072 for ; Wed, 29 Mar 2023 21:45:05 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 8A3F880ABF for ; Thu, 30 Mar 2023 01:45:05 +0000 (UTC) X-FDA: 80623871370.10.F478BC2 Received: from NAM02-DM3-obe.outbound.protection.outlook.com (mail-dm3nam02on2043.outbound.protection.outlook.com [40.107.95.43]) by imf16.hostedemail.com (Postfix) with ESMTP id 6F17E180011 for ; Thu, 30 Mar 2023 01:45:02 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=SK+7syhA; arc=pass ("microsoft.com:s=arcselector9901:i=1"); spf=pass (imf16.hostedemail.com: domain of jhubbard@nvidia.com designates 40.107.95.43 as permitted sender) smtp.mailfrom=jhubbard@nvidia.com; dmarc=pass (policy=reject) header.from=nvidia.com ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680140702; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5mD9c9Pos7YjQZjtZtuDjFkLrseBBf1D9sgnC0ePsZY=; b=EzIiugo7vw/6i1VeMCKw54s5Zj9kxdY0KjkZKiDjJbVX7DS48+AAicrGRZqEylC8oKGshE SlryLradXMy0FxbXskwH/qgG/x9U05bUgMfH6xW0EAX3QQg7ovJngKK7AM3Mh8IxD3Ol6W HzrxaRBgh3RXONyqkTaWy7pCO0ad00Y= ARC-Authentication-Results: i=2; imf16.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=SK+7syhA; arc=pass ("microsoft.com:s=arcselector9901:i=1"); spf=pass (imf16.hostedemail.com: domain of jhubbard@nvidia.com designates 40.107.95.43 as permitted sender) smtp.mailfrom=jhubbard@nvidia.com; dmarc=pass (policy=reject) header.from=nvidia.com ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1680140702; a=rsa-sha256; cv=pass; b=HChkFQLCeCzfaqqcZ7W9IrX0qBm3bJqiojTdXsHdwIdEGyWQhPp99Qyz1IQnoQuunJ+Qx+ mxlLwvfAIMn2ZfJtEQ3zGTxh7QN0JYfE6v7Rm4519POul1gWkLLKgtdrdshveSacXHJUG4 zfhXy9KlhalzD0faupB8KGWXtrYQbGE= ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=EB7Cij960prXbKIZ2ghpr9053FuyNwH0+iUv0NYHevBORU++zOe+dqcIJ8TCPd1XGGIcgf/0mgLxwh4+7qqZDPPuYyZaKTDsbqJahq6xV5MeJ7xMJhqnqyDdd6bNy2cSs/2e2MwBSozBCgUdP3SXAUvaD/WM/8TxiOQyT5eeey1Y6shmAXPkMOoY7pIa8K4QgHeS9devxJCYYua95P/9K0g77FYMTC4CgLj6god+i3qv9HN+aIw3TbZZcwRtFbWY6kCDuiTGS5y7vQvRcYjXgWkpttYN3AouWz1/t68oANpw0Jo2geWn0spJgJswlg+7U41xAJF/ugAuXAndXqhO8A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=5mD9c9Pos7YjQZjtZtuDjFkLrseBBf1D9sgnC0ePsZY=; b=Dkj7r7cWpEVCQTGNdS7xVMPMwEzIBH1lxpzWrNO2f773haViOvmYyfROuSQZ6bHw3hHaZy566pw+pQ9YOumhAmTzXhnrHZfEP0xXGO6UTJBot6kQLj9d5WA1UfAVIlfnmajkD76nXndfei6/QKkIRlz91zNasGtbH/k4TDJtmV96dR/VsR/oDiYCwvsHQ7XRh/TPQ8jZs14+ZOQNikXyflwfRy4n6Wl1viBlCUPSjfpIcoNxyItn8lGnwEytb8FLgL1hXBSnc+iR+2/3nnqrsJjcRPSn+w+RkvCGPVGMB13zIsDnMluRrNtit2ODTlgfMQyV4cu7Y7UqrF7tLjTUaw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=kvack.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=5mD9c9Pos7YjQZjtZtuDjFkLrseBBf1D9sgnC0ePsZY=; b=SK+7syhAGOLSVHaTKjnZnxPJnxHOPJZFhyDh1Q0cGY5PGvwXS45EbacS9Hg27gz6rm4xXPJmKEEr52VtXYD5f5G6ztUKujdM56SIzL2MzHCtnkJFoFaMs0Bxz8ncbk/AdRaZabU4afHhoxOSYmIGmGksFGmpzRwQLeahGMc7rUBAKVvb7vUCf9lyLoNpZAgDw5nKBz1jdkF2hgm6lcEbqsQwZsAvj0IZQQjkoKGCWql09EjgXohNFK+aAEXv+Oeut71zHSQbxfPyJgOiADyQjt+DxUQod2ZylQ7etuy2o2J5jPr6tXO1DPB6OcrKXwwkn8W7yg5Zlju3HSoEbkYzxg== Received: from DM6PR18CA0011.namprd18.prod.outlook.com (2603:10b6:5:15b::24) by BL1PR12MB5288.namprd12.prod.outlook.com (2603:10b6:208:314::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6222.35; Thu, 30 Mar 2023 01:44:58 +0000 Received: from DS1PEPF0000B078.namprd05.prod.outlook.com (2603:10b6:5:15b:cafe::37) by DM6PR18CA0011.outlook.office365.com (2603:10b6:5:15b::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6254.20 via Frontend Transport; Thu, 30 Mar 2023 01:44:58 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by DS1PEPF0000B078.mail.protection.outlook.com (10.167.17.9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6178.26 via Frontend Transport; Thu, 30 Mar 2023 01:44:58 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.5; Wed, 29 Mar 2023 18:44:48 -0700 Received: from [10.110.48.28] (10.126.231.37) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.37; Wed, 29 Mar 2023 18:44:48 -0700 Message-ID: <83040531-ce19-0dca-6e73-ef08407a6669@nvidia.com> Date: Wed, 29 Mar 2023 18:44:48 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.9.0 Subject: Re: [PATCH v2] mm: Take a page reference when removing device exclusive entries Content-Language: en-US To: Alistair Popple , , Andrew Morton CC: Ralph Campbell , , Matthew Wilcox , References: <20230330012519.804116-1-apopple@nvidia.com> From: John Hubbard In-Reply-To: <20230330012519.804116-1-apopple@nvidia.com> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.126.231.37] X-ClientProxiedBy: rnnvmail203.nvidia.com (10.129.68.9) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS1PEPF0000B078:EE_|BL1PR12MB5288:EE_ X-MS-Office365-Filtering-Correlation-Id: 5bc39ea9-adf1-400e-5853-08db30c05eb5 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: q37KwuiI9d/8m9NJlkSK8x/NPnH4fGma0+5FOtUoEtuG3gLxDS8xZRW7zX1bDXqsORmbayGP01h3c0XSqKZ28MmbdcAp9+2OiCJQVB1Egt0esg2OKe2XRafgL3RPf8ctPj1LRdzXwz50GyoKOS7AuRFqMhpVCFSPL7tukM8LWc2zU985mgJ+XwJvId4XrCoez+1yBzk4QE2LxrDZFmSOdUfgzYsOXUJl52cy2xg2pe/9JmJATKCu8DuULp7TLnd4Ar8/v08qY8BYEn5a/LpSFw9XOG1AoDBSXwOd0ClHvv2wmBUdh56DoxwWnTrOUet2JAE4C6hcRmi+rHxlBe+MOV7Cn215s6dvq6xAynZpTviBO6wEp6wAzBKWbr71ZcDhPGe+jJkjUo2kRH1qQBXxT2sWnvlswmOkZo7fZddlQsbZLXYFSmq60dW2AdDaUprD40V6z5O2WLilQCLFTnmctLz7lu4KzJQWXRIEB71kYVNOdxi7FMqCdRmQB8baIQBZUcDaLYd3o2UUHiBOqryOMWxlfyRNSMVolYKVODtvKZJJJsZlbWbQu0WCAQZD+4HA76tstM8cxzbpCUTKmnwFAjG6vq+djrL029AgJf421xLmIDGDDAWkF2DqfKMijKO7u4aqLKmK+CUHWp5A5cC6zar3FousOsJgrF9R48Z5w4VqZceuAK5WEhYe0VEk6gS69CZC45vQQYFsdYQfg/y6kSzRRACUSnZ2IwB1qXKeMaohqVE5OZuGcQjMJ1kaCayO X-Forefront-Antispam-Report: CIP:216.228.117.161;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge2.nvidia.com;CAT:NONE;SFS:(13230028)(4636009)(136003)(39860400002)(396003)(376002)(346002)(451199021)(36840700001)(46966006)(70586007)(26005)(7636003)(82740400003)(356005)(53546011)(478600001)(47076005)(426003)(36860700001)(186003)(16526019)(336012)(8936002)(5660300002)(2906002)(2616005)(36756003)(70206006)(82310400005)(110136005)(16576012)(54906003)(316002)(31696002)(40480700001)(41300700001)(4326008)(86362001)(8676002)(83380400001)(31686004)(43740500002);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 Mar 2023 01:44:58.2291 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 5bc39ea9-adf1-400e-5853-08db30c05eb5 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.161];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: DS1PEPF0000B078.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BL1PR12MB5288 X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: frnfrobn89sg84fh4qdbocy3y3u11mz5 X-Rspamd-Queue-Id: 6F17E180011 X-HE-Tag: 1680140702-82072 X-HE-Meta: U2FsdGVkX1+g4Sh+76FdmCxQbMFAelU6tr6PSySulsEGdRSyE0SwNlk3oPoLb808r8g6P7njoVlqVbxWNx62FB3c/d69XtEVRSvyTvhbZHNUeSBirBocviDlm9s5v4QVnOa0AUMTizf/kze52YKUh51IB5ejf4mgOaEICsob3WK3wwDB92wpxIQj7rWwkV3xafvcFO4/Xm4hEJSNlKALSZQxoiFS9b4mvJQVQXpn7m2CZd8KEY9qtawW/njmmpd7YnUsPT0AyZ9PhsFvbEp5CR13DOm4Xh97AuZHERefRqvN89ks5Jo9qbhsBzJjLuckBqzipj/rrzUG7qgwAxGdJXGoNr76DavuuYeu0YnWvDxVIWraZO3Cf/rrysnxTqYG9hbsdWm1lr18XoU6tfoGgCs+5ZbC6iL3NJisLHcTx5TSQaQWCziT4y4pdUOfjbDJ+Qrf3IsUIpcPrn7sEqfloULsDjXlg0+jNRC+o7Onqd8PFmwv90TxAzmyNtjXoX9kmob7gqY26CcqDGkRqeJlomwuvr+p9FudhBYyj1g0etZzLbhaHIxkGAt93BpR7dBaod3ec8TSsxGxScEQ3hja8dmhrr+kBYOuSzKXWsWwGfFW752nJyvsu1g0BgjHSD6UqVy27hHrxwu7MhAwrsd2+8LdrqOb7iHJJMSi/sBojtcAZdmTnAsqrBVKespi2KQG6HD8nD4JVF2o6Gv1Rkw88CxFktHkg10sXh1olMti00/q8102hzfiJKEs9Ahnp4zcjYGiDKLt65nG/SEu1jIKCK7RrOz7B+CHN1XdVkpLROcbVV/K0ondj2xPGbS/ESL8f9NgVkiOESlxQQg5W+FC1tn37IXq95G06SG9u18wJWGbPYWwSjFbQPM7XqNazoiD26+dbzls9e1We5DqvfosFcP7ywh8TSo4HXD4Z79jt1KANbH01XBHQn+F5nSW5QXx2qa9G7PULy7fnUW9YUT CuyzRuha yWmZ/wDX7EodJgGICp+qXPNSQHZADLDOfTlZBweZjUCLgy2nTe0AB8N/Kd+e9HAepfJisaAe5VtoMrlgzrObDfVskWyrmBV1s1gImPD528vV+yYwaYMGgP95OFsooaqa92Gh9QAmz6pMa3nPTP5710TTq6W+a3e7p7i5h5aA09IQ6jGTpOtqVE8QV2PRft4MS8WbhlZVDrk+Et6b4ogo/bC/l9MBPqWpgbviO/qPmZ+AAIUnnFuQpkhR14E6me6ahBgBSCWattxayGFz/8V1zAXpP3I6Ff/k9lmnJnvQX57jc8L6GNUC518ZGa8iChMatnU0HDPfzqgH8ihtIQXBbrAfFPXUz9klSxzBtlXK+o8p1MgcDsVEBkf8giOCoWMll3t/m X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 3/29/23 18:25, Alistair Popple wrote: > Device exclusive page table entries are used to prevent CPU access to > a page whilst it is being accessed from a device. Typically this is > used to implement atomic operations when the underlying bus does not > support atomic access. When a CPU thread encounters a device exclusive > entry it locks the page and restores the original entry after calling > mmu notifiers to signal drivers that exclusive access is no longer > available. > > The device exclusive entry holds a reference to the page making it > safe to access the struct page whilst the entry is present. However > the fault handling code does not hold the PTL when taking the page > lock. This means if there are multiple threads faulting concurrently > on the device exclusive entry one will remove the entry whilst others > will wait on the page lock without holding a reference. > > This can lead to threads locking or waiting on a folio with a zero > refcount. Whilst mmap_lock prevents the pages getting freed via > munmap() they may still be freed by a migration. This leads to > warnings such as PAGE_FLAGS_CHECK_AT_FREE due to the page being locked > when the refcount drops to zero. > > Fix this by trying to take a reference on the folio before locking > it. The code already checks the PTE under the PTL and aborts if the > entry is no longer there. It is also possible the folio has been > unmapped, freed and re-allocated allowing a reference to be taken on > an unrelated folio. This case is also detected by the PTE check and > the folio is unlocked without further changes. > > Signed-off-by: Alistair Popple > Reviewed-by: Ralph Campbell > Reviewed-by: John Hubbard > Fixes: b756a3b5e7ea ("mm: device exclusive memory access") > Cc: stable@vger.kernel.org > > --- > > Changes for v2: > > - Rebased to Linus master > - Reworded commit message > - Switched to using folios (thanks Matthew!) > - Added Reviewed-by's v2 looks correct to me. thanks, -- John Hubbard NVIDIA > --- > mm/memory.c | 16 +++++++++++++++- > 1 file changed, 15 insertions(+), 1 deletion(-) > > diff --git a/mm/memory.c b/mm/memory.c > index f456f3b5049c..01a23ad48a04 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3563,8 +3563,21 @@ static vm_fault_t remove_device_exclusive_entry(struct vm_fault *vmf) > struct vm_area_struct *vma = vmf->vma; > struct mmu_notifier_range range; > > - if (!folio_lock_or_retry(folio, vma->vm_mm, vmf->flags)) > + /* > + * We need a reference to lock the folio because we don't hold > + * the PTL so a racing thread can remove the device-exclusive > + * entry and unmap it. If the folio is free the entry must > + * have been removed already. If it happens to have already > + * been re-allocated after being freed all we do is lock and > + * unlock it. > + */ > + if (!folio_try_get(folio)) > + return 0; > + > + if (!folio_lock_or_retry(folio, vma->vm_mm, vmf->flags)) { > + folio_put(folio); > return VM_FAULT_RETRY; > + } > mmu_notifier_range_init_owner(&range, MMU_NOTIFY_EXCLUSIVE, 0, > vma->vm_mm, vmf->address & PAGE_MASK, > (vmf->address & PAGE_MASK) + PAGE_SIZE, NULL); > @@ -3577,6 +3590,7 @@ static vm_fault_t remove_device_exclusive_entry(struct vm_fault *vmf) > > pte_unmap_unlock(vmf->pte, vmf->ptl); > folio_unlock(folio); > + folio_put(folio); > > mmu_notifier_invalidate_range_end(&range); > return 0;