From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BCE2EC4332F for ; Thu, 8 Dec 2022 21:20:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4C2C68E0003; Thu, 8 Dec 2022 16:20:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 473468E0001; Thu, 8 Dec 2022 16:20:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 33B158E0003; Thu, 8 Dec 2022 16:20:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 25ED58E0001 for ; Thu, 8 Dec 2022 16:20:45 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id F0F311C693C for ; Thu, 8 Dec 2022 21:20:44 +0000 (UTC) X-FDA: 80220408408.16.BA1034A Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf15.hostedemail.com (Postfix) with ESMTP id 14AC8A0005 for ; Thu, 8 Dec 2022 21:20:41 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=erWUfQ9n; spf=pass (imf15.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1670534442; a=rsa-sha256; cv=none; b=10/vvXt5mse/tejLQOIZLdsDRIwD6/XvHCVQo4IgjO3mjEPWa69t6IfnEi0AvJwAuFXP43 gH6YPqQzs8p8r0fgRC0ROlwgoyUI51L/a9EiCyz1Z9z+Ca3BGDtHUXXqY02THJcZrF+LNC WJpNLFEHYlRVoc15rJZgBVYAIL7l9K4= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=erWUfQ9n; spf=pass (imf15.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1670534442; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gdt6sN4IybB7crh5R7stYpZnodJiuuI/ros/sx/Fs0k=; b=mLyxAAcmtTdCn4+2gdGjcj1gdEzPEFY1qsMDCTHF482D4D9ez/RLqHL+DohzL52zfi1Uvt aOvA8o2KIAoIe4xTpIpDmkl4w6Jgk4gaCHEAI1iG6O3nYN/HgNZoqX9+WuRQvuI4x1qgG4 Nm41zVPUui3ex0CFnrZ34ZS/E16+3wE= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1670534441; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=gdt6sN4IybB7crh5R7stYpZnodJiuuI/ros/sx/Fs0k=; b=erWUfQ9n2+cCM5TIc4/M0PXMXNfrgDfcl6RslvHuJSAnKBHJLqwHN7L9XWDoF30t8tI6cD q6uK+b+sDyHiJPBFiEHQ/1Iai5PKlFq9i6t7KKNRE8GAoXIQIN14BMFxjMDy6JW+Egtq6g 3FrkvgJ0/vRU+qDjDBubxWLqILBA/Os= Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-111-F6nzUmKhNRq0vJt3zkfpkw-1; Thu, 08 Dec 2022 16:20:32 -0500 X-MC-Unique: F6nzUmKhNRq0vJt3zkfpkw-1 Received: by mail-qt1-f200.google.com with SMTP id f4-20020a05622a114400b003a57f828277so2563927qty.22 for ; Thu, 08 Dec 2022 13:20:32 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=gdt6sN4IybB7crh5R7stYpZnodJiuuI/ros/sx/Fs0k=; b=L+Aa/9Te6nQ8Bv27n64bV9blyIu6mMNMt0a8t+EqDHwC5XCFcNEGMEwtUFahyr/hXp wDb+mylZL5dBp7EwG8GRyV3bzjlY0GNYhlS8AMuZ+l1ht+tknGE8a5a8NT19cpGH180t 4YhoM1mJoMl/Ci+Sn1m5SjDYZx3ngfGz3gnxVV+aeif1CEnMMv8mr3ZzEsPl1llW9Xm4 20gGbTQtQiGt3P4/TE8TdRlqzl3qykGoaGDVVzdfvy+JNEpqyQKfDR82Q3QyCUC1txWi KhjXEd4D9vowN6lrBR/NoIPM9Jih+DqkM4RpyWA6RX8HVoPXESGIMFyicLAXmU/4Utz7 YefQ== X-Gm-Message-State: ANoB5pk4mIS/pQ+fH5ASHRStHB35fH1mEXWsME8nS/iv298NXG9r0pId unzHlUwwhArG7tXBaYhSFxAjudcjT+tI12tpX3otdGrTfg+hNmyvzW6RZxTajhNW2Wu2dvKlLPD la3Mk2VVHSN0= X-Received: by 2002:ac8:5386:0:b0:3a5:7036:b838 with SMTP id x6-20020ac85386000000b003a57036b838mr5168749qtp.33.1670534431146; Thu, 08 Dec 2022 13:20:31 -0800 (PST) X-Google-Smtp-Source: AA0mqf5u4IKVKdDpiMUh91B1+jG0P07DsxqalynZ1vRQCxfobY15hLx4DV38bB3WC0AkLIIUlb+esQ== X-Received: by 2002:ac8:5386:0:b0:3a5:7036:b838 with SMTP id x6-20020ac85386000000b003a57036b838mr5168729qtp.33.1670534430846; Thu, 08 Dec 2022 13:20:30 -0800 (PST) Received: from x1n (bras-base-aurron9127w-grc-46-70-31-27-79.dsl.bell.ca. [70.31.27.79]) by smtp.gmail.com with ESMTPSA id l15-20020a37f90f000000b006fc447eebe5sm19578767qkj.27.2022.12.08.13.20.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Dec 2022 13:20:29 -0800 (PST) Date: Thu, 8 Dec 2022 16:20:27 -0500 From: Peter Xu To: David Hildenbrand Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Muchun Song , John Hubbard , Andrea Arcangeli , James Houghton , Jann Horn , Rik van Riel , Miaohe Lin , Andrew Morton , Mike Kravetz , Nadav Amit Subject: Re: [PATCH v2 08/10] mm/hugetlb: Make walk_hugetlb_range() safe to pmd unshare Message-ID: References: <20221207203034.650899-1-peterx@redhat.com> <20221207203034.650899-9-peterx@redhat.com> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 14AC8A0005 X-Stat-Signature: kyeshwkcq9bj16bo858b6p6i9gzhb9qg X-HE-Tag: 1670534441-656199 X-HE-Meta: U2FsdGVkX18/SQ8XViY7eMO2iwrzfzzOTVSYnX5U4TPNi2b4ijmwWGxmniTNPTfn8IKSRyjJvfI96x46QwyKdtNSxdjHD+WNJu9A2HrU26BcBxB7hFP2P1mwJqJA2xhdIITa0kQ4uY9AKCb9Nb3Dkyr0aJ35jI2mez8jFVR4nV8IGB5iDOgUGgeo/gjX4gSXRwElKNV48aPD8IQ1xzTPFxwr6AFmjajsaP9Jz5f8X+y6q7lsC8v6mYk6/BsPEPH+iuXugvyzkyT57YWJuf3HtzqYMMdiF9mXx3yQsoY5pPkEloA5VTCbfDiQTV/4WM2GUr90xvB89t65oyMa17MTGpv/8TwQM3TroA+P9btCzr8yy8RVyu7SwBLQHuHX5/+bHj4XFLJOmdBm/IkT0uVQrypY/B9G4yr4LP1iW4LLBaDCYuxMFe/xdyqcEGouQVZi2CqSanLtmh7nXsWjqBtsRQKC+jyxxzK9g6cFQNXeo0NfuCPC5IDYm+y2U1YL0tgTyn1S4Q7Dh6yrPsNnO5NRDLZq/fjGgC1zIG6//vwxwp1HiulSIGV/6lL0sCNLKnzUHZL+Eaw04ti+v0czzMyMCtjyBFjLS0HQFnxp/3n9yFFT3smzL5Fn+TEP5r+QYuVYcmKQ4r+yWYuS/W+r+gOR+iJxboJCdZGh+aolE3EP/hjga59dP5yl5c5NFA0OVXFbL8kMvNnAlu2qvXVZYmVgxMeBHXI1gU64NsM/GvHvMHis5DIhP4i7v4XKgJPfzeJFb3aMJIpHIP5sLLtBPJiYdyHERcrM7uT/Y7BOyf7HF4hAV4rTcNMrXBdScut3LczWrtGttYfDKPxUqN9i7pHdf9GMtB+Cv4Fya67HhPS5ZYgTdzRXUdipo9hMURt3FxOiX1vkCY0xtr5byW7FnA/L34ZHfNspq4EtQwSSyBVb3Q19LC1I4r0lwdcw3fIz9g5hI82Q8Ls9IJ7e+rBAsGo cD81FwTL 2r81TmWgiIDjPan9By0Ko8JMiQZX+9ifW4QFV68lIFGjgqTZU74OI3e3UcoEi79desEKq4mTFQ1bkRmOfw69BXtX1Lm5f/xrdkfFGElzOBIxEosEeah93orQfK3MwyfxSxf0cGVT5l38pColCZWfAPD7vXfvOhd+sMByY33jYsrzyaR9hgQJjYJepv+bBLjEsIDOKFJPKKCgsWnLptNLlGk5bixnt780oByNO X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Dec 08, 2022 at 03:47:26PM -0500, Peter Xu wrote: > On Thu, Dec 08, 2022 at 02:14:42PM +0100, David Hildenbrand wrote: > > On 07.12.22 21:30, Peter Xu wrote: > > > Since walk_hugetlb_range() walks the pgtable, it needs the vma lock > > > to make sure the pgtable page will not be freed concurrently. > > > > > > Reviewed-by: Mike Kravetz > > > Signed-off-by: Peter Xu > > > --- > > > arch/s390/mm/gmap.c | 2 ++ > > > fs/proc/task_mmu.c | 2 ++ > > > include/linux/pagewalk.h | 11 ++++++++++- > > > mm/hmm.c | 15 ++++++++++++++- > > > mm/pagewalk.c | 2 ++ > > > 5 files changed, 30 insertions(+), 2 deletions(-) > > > > > > diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c > > > index 8947451ae021..292a54c490d4 100644 > > > --- a/arch/s390/mm/gmap.c > > > +++ b/arch/s390/mm/gmap.c > > > @@ -2643,7 +2643,9 @@ static int __s390_enable_skey_hugetlb(pte_t *pte, unsigned long addr, > > > end = start + HPAGE_SIZE - 1; > > > __storage_key_init_range(start, end); > > > set_bit(PG_arch_1, &page->flags); > > > + hugetlb_vma_unlock_read(walk->vma); > > > cond_resched(); > > > + hugetlb_vma_lock_read(walk->vma); > > > return 0; > > > } > > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > > > index e35a0398db63..cf3887fb2905 100644 > > > --- a/fs/proc/task_mmu.c > > > +++ b/fs/proc/task_mmu.c > > > @@ -1613,7 +1613,9 @@ static int pagemap_hugetlb_range(pte_t *ptep, unsigned long hmask, > > > frame++; > > > } > > > + hugetlb_vma_unlock_read(walk->vma); > > > cond_resched(); > > > + hugetlb_vma_lock_read(walk->vma); > > > > We already hold the mmap_lock and reschedule. Even without the > > cond_resched() we might happily reschedule on a preemptive kernel. So I'm > > not sure if this optimization is strictly required or even helpful in > > practice here. > > It's just low hanging fruit if we need that complexity anyway. > > That's also why I didn't do that for v1 (where I missed hmm special case, > though..), but I think since we'll need that anyway, we'd better release > the vma lock if we can easily do so. > > mmap_lock is just more special because it needs more work in the caller to > release (e.g. vma invalidations). Otherwise I'm happy dropping that too. > > > > > In the worst case, concurrent unsharing would have to wait. > > For example, s390_enable_skey() is called at most once for a VM, for most > > VMs it gets never even called. > > > > Or am I missing something important? > > Nothing important. I just don't see why we need to strictly follow the > same release rule of mmap_lock here when talking about vma lock. > > In short - if we can drop a lock earlier before sleep, why not? > > I tend to just keep it as-is, but let me know if you have further thoughts > or concerns. One thing I can do better here is: - cond_resched(); + + if (need_resched()) { + hugetlb_vma_unlock_read(walk->vma); + cond_resched(); + hugetlb_vma_lock_read(walk->vma); + } + It's a pity we don't have rwsem version of cond_resched_rwlock_read(), or it'll be even cleaner. -- Peter Xu