From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50A71C352A1 for ; Tue, 6 Dec 2022 16:45:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C7A2E8E0003; Tue, 6 Dec 2022 11:45:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C01BF8E0001; Tue, 6 Dec 2022 11:45:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A7C0F8E0003; Tue, 6 Dec 2022 11:45:14 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 96E788E0001 for ; Tue, 6 Dec 2022 11:45:14 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 68499A0CAF for ; Tue, 6 Dec 2022 16:45:14 +0000 (UTC) X-FDA: 80212456548.01.C5F5E4F Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf20.hostedemail.com (Postfix) with ESMTP id EE2F51C0017 for ; Tue, 6 Dec 2022 16:45:13 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=dGCtmTo6; spf=pass (imf20.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1670345114; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hu//luPd1qB5r3+Ts+vTjEiNTGzdMg8HNl/VpXn1zlo=; b=HdOkArEEOtvbKeDMnh1U38Kaklg6lDej4QKFn9rPX/fnaRBtecx4BRYNrssWdU8YycIIrK V1wMFiobxGFXKHzZPau81u7Tw604XZ2qsxE2dmzuT3fRlVfAOYTyREEcTrb0Enaru2GtrA F8Xr1BklAzh0M0XHWcma3qHbZYh2nIo= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=dGCtmTo6; spf=pass (imf20.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1670345114; a=rsa-sha256; cv=none; b=5/ZjR1s1ztEg8G6JurVmQf94KAp4j9Moc/yNAGQdO/NaDnpi/Bzdszt9p6HNg69Ak/yD5t 0YN+zEu/ksbxUcWuWjCAogihwSXsYnKeeq0HI4+PEDddynvAAt0oBx3Hdv7c2vCvwQgYQa HVHZTHbgkinvFt5c/LR5PdWmcMf6QqI= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1670345113; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=hu//luPd1qB5r3+Ts+vTjEiNTGzdMg8HNl/VpXn1zlo=; b=dGCtmTo67pXrtNEbBu8rlkWmNubF1n+HneXH6yi7MRop/Pj2UhuTdFWAMs76usNnkKaZuO oAPTB3+v1TcdKJGTZnAwXMzREmFlXmG/TUMVwLiCJ8Cs9U1ylabUdOymM110QBbmXLF466 0751FSlq1roxEE5sySYM+haz7FLTOWg= Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-219-EjLWf7mtPKqcavpGj6imcg-1; Tue, 06 Dec 2022 11:45:11 -0500 X-MC-Unique: EjLWf7mtPKqcavpGj6imcg-1 Received: by mail-qk1-f200.google.com with SMTP id bm39-20020a05620a19a700b006fca217dc54so21581091qkb.16 for ; Tue, 06 Dec 2022 08:45:11 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=KGJNwjSQdeRAlEiXt/3sKWgpYttrjAgrALTa09hqIwM=; b=ifGqrx5YUk59FnmEKTo3YoiooXX5o6dZPwqFo/5Gxx4UWWNF2GOttyNOeG3hWNYP0c lev8EITwm6UuCDC+71uuKf7DdA+5cwki6dlM8HUpzcFmOrTABYg1LPjAbl2AJlP76Q6k ofldVlAXd/Ja4nvbZ0pZUAcvGxdMRTMcn5nwMzcKTwNrnVoq7WeHSi73zNvjP0EqZ5Vg PbRnLT70V4goLEQ5tKtUrTWi5NwDY0+Sd+vL8k8VJxtOiub0IMQbAk02Rl70B/1TzjDs G7gQPAAuihdegnsZ645WCnsE/TlFZS6PjskuzJutAVZ4mMn5LjF5kbGF/4pdz9p4Au6D B23g== X-Gm-Message-State: ANoB5pl/IFQ1nfJjPa6anZLOFc3aii5gHbZ1xI7HZn4nLHmAgJr9Pz7u zb1sFGMHtvU3MrL9lxdUsMccnySKEKgUwwZp9JoJHYTzi5UO+B0Hg18c9MyTBO2BEbCIYdXGuc4 TnSE3YszxqEo= X-Received: by 2002:a05:620a:cd6:b0:6f7:65f6:aa2f with SMTP id b22-20020a05620a0cd600b006f765f6aa2fmr78311442qkj.222.1670345111219; Tue, 06 Dec 2022 08:45:11 -0800 (PST) X-Google-Smtp-Source: AA0mqf5DoG5RWviRB0g1HJxDXe2nwn1oXotvYwJiCC0N8Ev697tK/Ud3FJ0dCWA81BRR1fDOwRiFtw== X-Received: by 2002:a05:620a:cd6:b0:6f7:65f6:aa2f with SMTP id b22-20020a05620a0cd600b006f765f6aa2fmr78311432qkj.222.1670345110935; Tue, 06 Dec 2022 08:45:10 -0800 (PST) Received: from x1n (bras-base-aurron9127w-grc-46-70-31-27-79.dsl.bell.ca. [70.31.27.79]) by smtp.gmail.com with ESMTPSA id i21-20020a05620a405500b006f8665f483fsm15824483qko.85.2022.12.06.08.45.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 06 Dec 2022 08:45:10 -0800 (PST) Date: Tue, 6 Dec 2022 11:45:09 -0500 From: Peter Xu To: John Hubbard Cc: Mike Kravetz , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton , Jann Horn , Andrew Morton , Andrea Arcangeli , Rik van Riel , Nadav Amit , Miaohe Lin , Muchun Song , David Hildenbrand Subject: Re: [PATCH 08/10] mm/hugetlb: Make walk_hugetlb_range() safe to pmd unshare Message-ID: References: <20221129193526.3588187-1-peterx@redhat.com> <20221129193526.3588187-9-peterx@redhat.com> <0813b9ed-3c92-088c-4fb9-45fb648c6e73@nvidia.com> MIME-Version: 1.0 In-Reply-To: <0813b9ed-3c92-088c-4fb9-45fb648c6e73@nvidia.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: multipart/mixed; boundary="cAL/YfRa6VIe+3z7" Content-Disposition: inline X-Spamd-Result: default: False [0.10 / 9.00]; BAYES_HAM(-6.00)[100.00%]; SORBS_IRL_BL(3.00)[209.85.222.200:received]; SUSPICIOUS_RECIPS(1.50)[]; SUBJECT_HAS_UNDERSCORES(1.00)[]; MID_RHS_NOT_FQDN(0.50)[]; MIME_GOOD(-0.10)[multipart/mixed,text/plain]; BAD_REP_POLICIES(0.10)[]; RCVD_NO_TLS_LAST(0.10)[]; RCPT_COUNT_TWELVE(0.00)[13]; MIME_TRACE(0.00)[0:+,1:+,2:+]; DMARC_POLICY_ALLOW(0.00)[redhat.com,none]; DKIM_TRACE(0.00)[redhat.com:+]; FROM_EQ_ENVFROM(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_COUNT_THREE(0.00)[4]; ARC_SIGNED(0.00)[hostedemail.com:s=arc-20220608:i=1]; FROM_HAS_DN(0.00)[]; ARC_NA(0.00)[]; R_DKIM_ALLOW(0.00)[redhat.com:s=mimecast20190719]; HAS_ATTACHMENT(0.00)[]; TO_DN_SOME(0.00)[]; TAGGED_RCPT(0.00)[]; R_SPF_ALLOW(0.00)[+ip4:170.10.129.0/24]; PREVIOUSLY_DELIVERED(0.00)[linux-mm@kvack.org]; RCVD_VIA_SMTP_AUTH(0.00)[] X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: EE2F51C0017 X-Stat-Signature: jp4rcuf3gjxxuaia6ik3rta1ys3mmgn3 X-HE-Tag: 1670345113-35249 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: --cAL/YfRa6VIe+3z7 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline On Mon, Dec 05, 2022 at 03:52:51PM -0800, John Hubbard wrote: > On 12/5/22 15:33, Mike Kravetz wrote: > > On 11/29/22 14:35, Peter Xu wrote: > > > Since walk_hugetlb_range() walks the pgtable, it needs the vma lock > > > to make sure the pgtable page will not be freed concurrently. > > > > > > Signed-off-by: Peter Xu > > > --- > > > mm/pagewalk.c | 2 ++ > > > 1 file changed, 2 insertions(+) > > > > > > diff --git a/mm/pagewalk.c b/mm/pagewalk.c > > > index 7f1c9b274906..d98564a7be57 100644 > > > --- a/mm/pagewalk.c > > > +++ b/mm/pagewalk.c > > > @@ -302,6 +302,7 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end, > > > const struct mm_walk_ops *ops = walk->ops; > > > int err = 0; > > > + hugetlb_vma_lock_read(vma); > > > do { > > > next = hugetlb_entry_end(h, addr, end); > > > pte = huge_pte_offset(walk->mm, addr & hmask, sz); > > > > For each found pte, we will be calling mm_walk_ops->hugetlb_entry() with > > the vma_lock held. I looked into the various hugetlb_entry routines, and > > I am not sure about hmm_vma_walk_hugetlb_entry. It seems like it could > > possibly call hmm_vma_fault -> handle_mm_fault -> hugetlb_fault. If this > > can happen, then we may have an issue as hugetlb_fault will also need to > > acquire the vma_lock in read mode. Thanks for spotting that, Mike. I used to notice that path special but that's when I was still using RCU locks who doesn't have the issue. Then I overlooked this one when switchover. > > > > I do not know the hmm code well enough to know if this may be an actual > > issue? > > Oh, this sounds like a serious concern. If we add a new lock, and hold it > during callbacks that also need to take it, that's not going to work out, > right? > > And yes, hmm_range_fault() and related things do a good job of revealing > this kind of deadlock. :) I've got a fixup attached. John, since this got your attention please also have a look too in case there's further issues. Thanks, -- Peter Xu --cAL/YfRa6VIe+3z7 Content-Type: text/plain; charset=utf-8 Content-Disposition: attachment; filename="0001-fixup-mm-hugetlb-Make-walk_hugetlb_range-safe-to-pmd.patch" >From 9ad1e65a31f51a0dc687cd9d6083b9e920d2da61 Mon Sep 17 00:00:00 2001 From: Peter Xu Date: Tue, 6 Dec 2022 11:38:47 -0500 Subject: [PATCH] fixup! mm/hugetlb: Make walk_hugetlb_range() safe to pmd unshare Content-type: text/plain Signed-off-by: Peter Xu --- arch/s390/mm/gmap.c | 2 ++ fs/proc/task_mmu.c | 2 ++ include/linux/pagewalk.h | 8 +++++++- mm/hmm.c | 8 +++++++- 4 files changed, 18 insertions(+), 2 deletions(-) diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c index 8947451ae021..292a54c490d4 100644 --- a/arch/s390/mm/gmap.c +++ b/arch/s390/mm/gmap.c @@ -2643,7 +2643,9 @@ static int __s390_enable_skey_hugetlb(pte_t *pte, unsigned long addr, end = start + HPAGE_SIZE - 1; __storage_key_init_range(start, end); set_bit(PG_arch_1, &page->flags); + hugetlb_vma_unlock_read(walk->vma); cond_resched(); + hugetlb_vma_lock_read(walk->vma); return 0; } diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 89338950afd3..d7155f3bb678 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -1612,7 +1612,9 @@ static int pagemap_hugetlb_range(pte_t *ptep, unsigned long hmask, frame++; } + hugetlb_vma_unlock_read(walk->vma); cond_resched(); + hugetlb_vma_lock_read(walk->vma); return err; } diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h index 959f52e5867d..1f7c2011f6cb 100644 --- a/include/linux/pagewalk.h +++ b/include/linux/pagewalk.h @@ -21,7 +21,13 @@ struct mm_walk; * depth is -1 if not known, 0:PGD, 1:P4D, 2:PUD, 3:PMD. * Any folded depths (where PTRS_PER_P?D is equal to 1) * are skipped. - * @hugetlb_entry: if set, called for each hugetlb entry + * @hugetlb_entry: if set, called for each hugetlb entry. Note that + * currently the hook function is protected by hugetlb + * vma lock to make sure pte_t* and the spinlock is valid + * to access. If the hook function needs to yield the + * thread or retake the vma lock for some reason, it + * needs to properly release the vma lock manually, + * and retake it before the function returns. * @test_walk: caller specific callback function to determine whether * we walk over the current vma or not. Returning 0 means * "do page table walk over the current vma", returning diff --git a/mm/hmm.c b/mm/hmm.c index 3850fb625dda..dcd624f28bcf 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -493,8 +493,14 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask, required_fault = hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, cpu_flags); if (required_fault) { + int ret; + spin_unlock(ptl); - return hmm_vma_fault(addr, end, required_fault, walk); + hugetlb_vma_unlock_read(vma); + /* hmm_vma_fault() can retake the vma lock */ + ret = hmm_vma_fault(addr, end, required_fault, walk); + hugetlb_vma_lock_read(vma); + return ret; } pfn = pte_pfn(entry) + ((start & ~hmask) >> PAGE_SHIFT); -- 2.37.3 --cAL/YfRa6VIe+3z7--