From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 96DB1C4708D for ; Wed, 7 Dec 2022 20:34:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 328FC8E0003; Wed, 7 Dec 2022 15:34:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2D99F8E0001; Wed, 7 Dec 2022 15:34:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 17A948E0003; Wed, 7 Dec 2022 15:34:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id EB2FA8E0001 for ; Wed, 7 Dec 2022 15:34:51 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 9CD3FA0973 for ; Wed, 7 Dec 2022 20:34:51 +0000 (UTC) X-FDA: 80216663982.02.852D98C Received: from NAM11-CO1-obe.outbound.protection.outlook.com (mail-co1nam11on2051.outbound.protection.outlook.com [40.107.220.51]) by imf05.hostedemail.com (Postfix) with ESMTP id E7C1510000F for ; Wed, 7 Dec 2022 20:34:50 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=RLxvTvoB; spf=pass (imf05.hostedemail.com: domain of jhubbard@nvidia.com designates 40.107.220.51 as permitted sender) smtp.mailfrom=jhubbard@nvidia.com; arc=pass ("microsoft.com:s=arcselector9901:i=1"); dmarc=pass (policy=reject) header.from=nvidia.com ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1670445291; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5mFmkA8YTZDg9sjQ/y/TnF8x32rehSzGeEyYjFtmKHU=; b=7qc347coa67wvbCzq6yFrHHpTIAP2DbTa6RlP1g1iPQXi3OQDjjrKUC8/JBj8nflFosY0H jWbonBt7XzYiXXSP/7zBdHoiAS/70N2k8/BWx9yEoz6lJi/Qaf/N+gV77z0AvObg1q6lqt gp2KCQdW4H7ADrBdC+cXlzVtw37pG2E= ARC-Authentication-Results: i=2; imf05.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=RLxvTvoB; spf=pass (imf05.hostedemail.com: domain of jhubbard@nvidia.com designates 40.107.220.51 as permitted sender) smtp.mailfrom=jhubbard@nvidia.com; arc=pass ("microsoft.com:s=arcselector9901:i=1"); dmarc=pass (policy=reject) header.from=nvidia.com ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1670445291; a=rsa-sha256; cv=pass; b=NxDTgOHu946FqAYXHuhV2EAFexa7xRu7cyHuzyQzIsu6Nse5BQlZ2wMbz3v2GPOqiG5bli O795LlGGEYF+PDLs3niRBx52iUL9WHY7LjTWtyC+8atv2VkB6wht/qRhundpP3l4TrBkeK mc2Rr3AfS4umxnno421VXu3Q5X/FBPo= ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=WU+okhXuyJwZqduRWYT9B03ryt2wgA4jPTDn+WniZapaTIPHLY5Ce2hDz6Pk+R4dPIfkCME903gwVqYGk82r6ovIss/K1bbV4QxZIkyKnIcK9JlTyhPn400WflsamXguCSWnyl4PnLBN++zvOeE1xba2PBcX+lj6EROAyX4cVK+lN0IMehjFRIi+5b3rxlioSyTNuXVxaBWi6PFzfn/Z/VUKeOcUvbQtMwZ6iafZ1Ew9YPvfoy8ggNdDdL4eQxBxVGygxwergAMCypijcFNl/IkHAJa3PzW6CItbqcZiK8pGWcggwjQ21Kn871Jj3PvuDB/F6fXwKNtorxnzz84pjw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=5mFmkA8YTZDg9sjQ/y/TnF8x32rehSzGeEyYjFtmKHU=; b=f6JzzMFLoABFpNRmF0vvFVF4sepGunfLbhu2INy8wN7xNE0tlAbQXocdbh0/5Ca8wIK1mc5162bcDlILM0q99t7IHYQlB8w4A7XR5d2tkR3V4hNVY/eOfR8ZCR6eLrtNZPnOC4iINkGatPDNZlnG2ytgECqcrJR5C3hR2oNaXHr4W2BUbC4CMQIFNeXxqo+f9KZjZT5/MEKXppUHqSLWnwjIRqAns1J7vco0gDBMZtIzQYOgynKAmI7+WTkumfN+5qERtiF9x/9mXQjKSZokHbXi1P/CpxbbO8/Y1o6NzDVSUu9+bLnjOoakH4gTOwfuNzWZRA2UHI1lvxPyMjn4iA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.160) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=5mFmkA8YTZDg9sjQ/y/TnF8x32rehSzGeEyYjFtmKHU=; b=RLxvTvoBuKT6aYyjmkUc37i9bObVzSj+iudoPKxiWmgUlvKIzWtdDE8CKPTGR92Gkp+njOL2xdL0bAyxVXUvGX0mo43PAXuELnhSlWUS6XVCdBXpe4q3UuRoYnMi4bUFEdD1GTqksbzYu+70fVcD9n0Ev6DvgqVwnXZIFB/5J0sdLtcjxpaSrRtrLrnAuCFTpetzBF/8Oyg3sqjoBMiA4vp2sli3e+/s1ga4v9RokHIZTfa/OO2waJGAUnPtombtTzRKZ7v1yqVH2gB08wFDvp1SIevN65kaf2Nnwh5pj134xsf3mp+VdfbCm0zUi659pp9Vix/8kQGkoOKDo3QQsQ== Received: from BN9PR03CA0265.namprd03.prod.outlook.com (2603:10b6:408:ff::30) by MN0PR12MB6103.namprd12.prod.outlook.com (2603:10b6:208:3c9::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.10; Wed, 7 Dec 2022 20:34:48 +0000 Received: from BN8NAM11FT115.eop-nam11.prod.protection.outlook.com (2603:10b6:408:ff:cafe::42) by BN9PR03CA0265.outlook.office365.com (2603:10b6:408:ff::30) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.14 via Frontend Transport; Wed, 7 Dec 2022 20:34:48 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.160) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.160 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.160; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.160) by BN8NAM11FT115.mail.protection.outlook.com (10.13.177.151) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5901.16 via Frontend Transport; Wed, 7 Dec 2022 20:34:48 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.66) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Wed, 7 Dec 2022 12:34:35 -0800 Received: from [10.110.48.28] (10.126.231.37) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Wed, 7 Dec 2022 12:34:34 -0800 Message-ID: <7f7682d0-ad8c-30f0-b9ed-9a20041d02ca@nvidia.com> Date: Wed, 7 Dec 2022 12:34:34 -0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.5.1 Subject: Re: [PATCH v2 08/10] mm/hugetlb: Make walk_hugetlb_range() safe to pmd unshare Content-Language: en-US To: Peter Xu , , CC: Muchun Song , Andrea Arcangeli , James Houghton , Jann Horn , Rik van Riel , Miaohe Lin , Andrew Morton , "Mike Kravetz" , David Hildenbrand , Nadav Amit References: <20221207203034.650899-1-peterx@redhat.com> <20221207203034.650899-9-peterx@redhat.com> From: John Hubbard In-Reply-To: <20221207203034.650899-9-peterx@redhat.com> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.126.231.37] X-ClientProxiedBy: rnnvmail203.nvidia.com (10.129.68.9) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN8NAM11FT115:EE_|MN0PR12MB6103:EE_ X-MS-Office365-Filtering-Correlation-Id: f6a3834d-4973-43ad-e347-08dad8927bf1 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 5rx2nz2rakrRRDYMkgbnxR207i5DklxzhRboc8BE/iLgtNbcRUp62Dko8v39N/34oYlMOief+YnV4GAE+Shhaq+Lqjy2DdZEoGY1XG6MFDVAQwcmDHHUiCXoOQt4fDH8U6yZ5saJ+Cs6MiLDulTcuoLnRtMil0canNfPoHm0TqkZWCf+0wfIT97zexo7bxaI1F1Z0Vk45V56Ae1HspYnmdNP+rBbkP1WqdjKtEDH+Ce2Pjzun/84rX2SQcVqZ9w6Ggdqy7Tbu2QCId8N9At5gAXisTv1itDTFL7Sug/FtZ9BVsCDqeU+RqOqOf+nBx22VIsh9tUIw79I6UcGEGg3rtof3T9zYXzn8bgUfHubM4mejklQrr4OztWh7WzVJNroere6x1V3lh34HjM2lahtkVvjj3CYJPD8vIfHcYhp2DDkKJH1fBskrN7CzC+xfXx0VXPmshLNMotoigLsyifVXjs0TfUZEvggraZiXBCNHxsbTVjPZNKmJk02J3gp9XdSN8RtZ11QpxhNM/Q7nuqWAz68XMTaRqb7VMjf5jewQ7F3e5H9d1O58wjMdntFs4zurZkS16IIWV4oaWXgBglfhy2V2lg20GMQ6hhQFUf53ud6/fRv5LM7OvtG7QZIntSucCzjJEQyGMnYsyJIC+2QV5uE1XtqKrLAQgzD0NX7ACBIMRJORv0rtJKC2eTn98d42iZOTIHTcIBWmLN8YQeikkas2NSQS274mvku+cpt6h6vpvZO/zOQLGwzPBcCzdO+ X-Forefront-Antispam-Report: CIP:216.228.117.160;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge1.nvidia.com;CAT:NONE;SFS:(13230022)(4636009)(376002)(346002)(396003)(39860400002)(136003)(451199015)(46966006)(40470700004)(36840700001)(31686004)(83380400001)(47076005)(426003)(336012)(53546011)(16526019)(478600001)(40460700003)(31696002)(40480700001)(36756003)(2906002)(82740400003)(7636003)(36860700001)(356005)(82310400005)(186003)(86362001)(2616005)(54906003)(16576012)(4326008)(8676002)(26005)(110136005)(70586007)(7416002)(5660300002)(70206006)(316002)(41300700001)(8936002)(43740500002)(2101003);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 07 Dec 2022 20:34:48.0590 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: f6a3834d-4973-43ad-e347-08dad8927bf1 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.160];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT115.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN0PR12MB6103 X-Stat-Signature: d69j4o3apcrf6yxni47hnjxt1dhmkgw4 X-Rspam-User: X-Spamd-Result: default: False [-5.40 / 9.00]; BAYES_HAM(-6.00)[100.00%]; SUSPICIOUS_RECIPS(1.50)[]; SUBJECT_HAS_UNDERSCORES(1.00)[]; ARC_ALLOW(-1.00)[microsoft.com:s=arcselector9901:i=1]; DMARC_POLICY_ALLOW(-0.50)[nvidia.com,reject]; R_DKIM_ALLOW(-0.20)[Nvidia.com:s=selector2]; R_SPF_ALLOW(-0.20)[+ip4:40.107.0.0/16]; MIME_GOOD(-0.10)[text/plain]; RCVD_NO_TLS_LAST(0.10)[]; RCPT_COUNT_TWELVE(0.00)[13]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; DKIM_TRACE(0.00)[Nvidia.com:+]; TO_MATCH_ENVRCPT_SOME(0.00)[]; ARC_SIGNED(0.00)[hostedemail.com:s=arc-20220608:i=2]; HAS_XOIP(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; FROM_HAS_DN(0.00)[]; TAGGED_RCPT(0.00)[]; TO_DN_SOME(0.00)[]; RCVD_COUNT_FIVE(0.00)[6] X-Rspamd-Queue-Id: E7C1510000F X-Rspamd-Server: rspam06 X-HE-Tag: 1670445290-420030 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 12/7/22 12:30, Peter Xu wrote: > Since walk_hugetlb_range() walks the pgtable, it needs the vma lock > to make sure the pgtable page will not be freed concurrently. > > Reviewed-by: Mike Kravetz > Signed-off-by: Peter Xu > --- > arch/s390/mm/gmap.c | 2 ++ > fs/proc/task_mmu.c | 2 ++ > include/linux/pagewalk.h | 11 ++++++++++- > mm/hmm.c | 15 ++++++++++++++- > mm/pagewalk.c | 2 ++ > 5 files changed, 30 insertions(+), 2 deletions(-) Reviewed-by: John Hubbard thanks, -- John Hubbard NVIDIA > > diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c > index 8947451ae021..292a54c490d4 100644 > --- a/arch/s390/mm/gmap.c > +++ b/arch/s390/mm/gmap.c > @@ -2643,7 +2643,9 @@ static int __s390_enable_skey_hugetlb(pte_t *pte, unsigned long addr, > end = start + HPAGE_SIZE - 1; > __storage_key_init_range(start, end); > set_bit(PG_arch_1, &page->flags); > + hugetlb_vma_unlock_read(walk->vma); > cond_resched(); > + hugetlb_vma_lock_read(walk->vma); > return 0; > } > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > index e35a0398db63..cf3887fb2905 100644 > --- a/fs/proc/task_mmu.c > +++ b/fs/proc/task_mmu.c > @@ -1613,7 +1613,9 @@ static int pagemap_hugetlb_range(pte_t *ptep, unsigned long hmask, > frame++; > } > > + hugetlb_vma_unlock_read(walk->vma); > cond_resched(); > + hugetlb_vma_lock_read(walk->vma); > > return err; > } > diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h > index 959f52e5867d..27a6df448ee5 100644 > --- a/include/linux/pagewalk.h > +++ b/include/linux/pagewalk.h > @@ -21,7 +21,16 @@ struct mm_walk; > * depth is -1 if not known, 0:PGD, 1:P4D, 2:PUD, 3:PMD. > * Any folded depths (where PTRS_PER_P?D is equal to 1) > * are skipped. > - * @hugetlb_entry: if set, called for each hugetlb entry > + * @hugetlb_entry: if set, called for each hugetlb entry. This hook > + * function is called with the vma lock held, in order to > + * protect against a concurrent freeing of the pte_t* or > + * the ptl. In some cases, the hook function needs to drop > + * and retake the vma lock in order to avoid deadlocks > + * while calling other functions. In such cases the hook > + * function must either refrain from accessing the pte or > + * ptl after dropping the vma lock, or else revalidate > + * those items after re-acquiring the vma lock and before > + * accessing them. > * @test_walk: caller specific callback function to determine whether > * we walk over the current vma or not. Returning 0 means > * "do page table walk over the current vma", returning > diff --git a/mm/hmm.c b/mm/hmm.c > index 3850fb625dda..796de6866089 100644 > --- a/mm/hmm.c > +++ b/mm/hmm.c > @@ -493,8 +493,21 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask, > required_fault = > hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, cpu_flags); > if (required_fault) { > + int ret; > + > spin_unlock(ptl); > - return hmm_vma_fault(addr, end, required_fault, walk); > + hugetlb_vma_unlock_read(vma); > + /* > + * Avoid deadlock: drop the vma lock before calling > + * hmm_vma_fault(), which will itself potentially take and > + * drop the vma lock. This is also correct from a > + * protection point of view, because there is no further > + * use here of either pte or ptl after dropping the vma > + * lock. > + */ > + ret = hmm_vma_fault(addr, end, required_fault, walk); > + hugetlb_vma_lock_read(vma); > + return ret; > } > > pfn = pte_pfn(entry) + ((start & ~hmask) >> PAGE_SHIFT); > diff --git a/mm/pagewalk.c b/mm/pagewalk.c > index 7f1c9b274906..d98564a7be57 100644 > --- a/mm/pagewalk.c > +++ b/mm/pagewalk.c > @@ -302,6 +302,7 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end, > const struct mm_walk_ops *ops = walk->ops; > int err = 0; > > + hugetlb_vma_lock_read(vma); > do { > next = hugetlb_entry_end(h, addr, end); > pte = huge_pte_offset(walk->mm, addr & hmask, sz); > @@ -314,6 +315,7 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end, > if (err) > break; > } while (addr = next, addr != end); > + hugetlb_vma_unlock_read(vma); > > return err; > }