From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 47A9EC4332F for ; Thu, 8 Dec 2022 21:01:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C99218E0003; Thu, 8 Dec 2022 16:01:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C4A3A8E0001; Thu, 8 Dec 2022 16:01:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B10AF8E0003; Thu, 8 Dec 2022 16:01:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id A0B568E0001 for ; Thu, 8 Dec 2022 16:01:57 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 71EF3AB6F8 for ; Thu, 8 Dec 2022 21:01:57 +0000 (UTC) X-FDA: 80220361074.24.F595D1F Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf04.hostedemail.com (Postfix) with ESMTP id 5901A40004 for ; Thu, 8 Dec 2022 21:01:55 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=NMvmu5sV; spf=pass (imf04.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1670533315; a=rsa-sha256; cv=none; b=vOEyEWFV+7mih46uE5y7fmZt4DvHX6O06WS1RHTUOJPvTZ5/454OcuerLMhYyBYfHgUlYt 4ahI2c31casyVW1Fl6+djnvZOwVpaCyL0Xj/58lCJTIteMQ5xCGtW7gx0refPxYUBARJ/p ePf86s6NLv+R3STKtS4WRfxRvmVEepc= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=NMvmu5sV; spf=pass (imf04.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1670533315; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=RVzt3hEvVKmirkShWRFT/f18dZYrhrH4zy5lPi2xfE8=; b=7swMT73SeZput4YzUjwZL+hNUuedqtkDXAjdt9hzyoKPXgyT9jOm2OCHdK5X4AkDWlQDiR o+baWABgvp/D7MjQ2EWobktKtcpKLr/nWb5kqUfFFAns8CxePt+jMkK5EubEujReHrz71/ iyaIlegEJ6S5q2P105cm05IaWiNAaMk= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1670533314; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=RVzt3hEvVKmirkShWRFT/f18dZYrhrH4zy5lPi2xfE8=; b=NMvmu5sVH+ILNY8bwHHNRh+LYHy1QRvVjm4BLlvtZ6xiQvnb4b3vTuS5dGJ3ioplQARXO+ b5/k6gg6SuVqqzfbvw/EBGv6HvP+6r18XHtWFKDgyUbyvCBGy2J6X1Ocm57dEmpFVMvClG RHMUAtyt9z6BU2MoTW8Ynk004PGVysU= Received: from mail-vs1-f71.google.com (mail-vs1-f71.google.com [209.85.217.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-454-ZgdvU2RPMC69fEypQlTiow-1; Thu, 08 Dec 2022 16:01:51 -0500 X-MC-Unique: ZgdvU2RPMC69fEypQlTiow-1 Received: by mail-vs1-f71.google.com with SMTP id h7-20020a676c07000000b003b14cb5cc8fso750551vsc.7 for ; Thu, 08 Dec 2022 13:01:51 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=RVzt3hEvVKmirkShWRFT/f18dZYrhrH4zy5lPi2xfE8=; b=7pq2DC3zLSFSH4xsOWkSP/SvmaAWp2pz9S++AkU7Mfq9PN0wrpxpeBn2pIDJ3Q+mML MEvIwJ+FIlrzxvckEY9fTQMdh1AA+GPknux60+WACaM+W0MOpeqZ+xhtzeEzP8RV+3pN r6CMxvuWY/x+ry5zMYkHGnggwWTcuaaQyV+ha6oTLXFgiYCmlvwQ8lnv66VWP+F96EcK M7P/2jR6Nec0ABwrd2GYgrKXCF1fFYrJa5Ju6cRZuak37qPP2ilprGhzGuv1DUDjhp+7 chU1vDH4jU9CwYmwjjUVUEX6LGIJt0kU3URzHneAccnfJ7LhDTldQxqOBbUhnUh3ATqp fbGQ== X-Gm-Message-State: ANoB5plA+chzEs+fi2tTusWgcpNnQOh2RhngVe3hyrzOpxBn2fa+QyJ5 FWpwpv4/X2wIGMuxbxmNDSmHkbLH8tdL/U8/TE3PBIWoLYZLcQsL52PUy6iedau8JlGftZV+HKT e89jMU5UvlBs= X-Received: by 2002:a1f:a705:0:b0:3af:2f12:c9d2 with SMTP id q5-20020a1fa705000000b003af2f12c9d2mr1648436vke.3.1670533310952; Thu, 08 Dec 2022 13:01:50 -0800 (PST) X-Google-Smtp-Source: AA0mqf4d/GUuJc3kVNDuJ8r82nlGsRFmGfKd/+nOwDLcHsBN3Z1SwVWHCQsCqv/JcBXQqgG/gaWRUQ== X-Received: by 2002:a1f:a705:0:b0:3af:2f12:c9d2 with SMTP id q5-20020a1fa705000000b003af2f12c9d2mr1648405vke.3.1670533310607; Thu, 08 Dec 2022 13:01:50 -0800 (PST) Received: from x1n (bras-base-aurron9127w-grc-46-70-31-27-79.dsl.bell.ca. [70.31.27.79]) by smtp.gmail.com with ESMTPSA id w17-20020a05620a425100b006eef13ef4c8sm20538281qko.94.2022.12.08.13.01.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Dec 2022 13:01:50 -0800 (PST) Date: Thu, 8 Dec 2022 16:01:48 -0500 From: Peter Xu To: John Hubbard Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Jann Horn , Andrea Arcangeli , James Houghton , Rik van Riel , Miaohe Lin , Nadav Amit , Mike Kravetz , David Hildenbrand , Andrew Morton , Muchun Song Subject: Re: [PATCH v2 09/10] mm/hugetlb: Introduce hugetlb_walk() Message-ID: References: <20221207203034.650899-1-peterx@redhat.com> <20221207203156.651077-1-peterx@redhat.com> <3b5579e8-8e51-a2e2-4c93-6709f8833723@nvidia.com> MIME-Version: 1.0 In-Reply-To: <3b5579e8-8e51-a2e2-4c93-6709f8833723@nvidia.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspam-User: X-Rspamd-Queue-Id: 5901A40004 X-Rspamd-Server: rspam01 X-Stat-Signature: r99uu36snwmhwimhzg5h361yebrmpfoa X-HE-Tag: 1670533315-686373 X-HE-Meta: U2FsdGVkX180BNMyADU8TYHb+YJqabT2vNLNPjvoKZZ77FhrsZeMWCjHLJ3cpMWUOR13MOLLPGs+7/wtPlGawK+GCP/3pgHsrQS4hFo+ZvaiY1976F/7sgJPVAdeEK0x40/bycUsfWkkCo+hvaCtehv8u7N7qd/Ier7OsLM806lUommcB7ZPRc9Jbc6Gze8TePcD705yTqEwL6Gl2oQWXY6iPKpJFKh2i5C6WkUHR7AYw3xXT58pE2hYWAxbXAJe8HDc7YbUnlwklBlIMeJGbojpm07Vwor8JrLjROGhAlQHmI1tgc/ze2rVlj2cbORFQB60rqQBXzb1ucElYOJlEF0H+aZ04KXyHhUfSuYjBHIlRpy1RnnhBd5XoXz4C2Xuoms9M5D0/gtgpnrJdkanqqKRdCSnhcQPvuIZDXl0fRXwQ47Bh0WZihY82pRbZw0S8wSl+VOFo/qzKmXcdWV6zSHPPZdFldUmjTAVpUXWOOf9aD0gcgfzS5va2J9o4HI+aSMPVgCAK0CAeOMTkDkKMDmMVSp4QnkcCnvf795pxb8PdC0Kv34pAOIn/a0myKbp1SbOov6MJ6jSnjMxDLCpMBxz2o5vl9KJ1VCAlBzWw1ANuMC21elLIkr/zTnuvcvpik/vbdAVrEt3resnb4iuSxuNHZI6qbLYxfvYC8WvgLo8EvHtaD5JHxCIcSpdbJNFIm6exeuLGQ3SoLnd9C7jn51UnIj56kOtYQI7BWnpOg70CJwvhMGZULjSQ0ixbfmFqNz6cQ5kMeNXi4yMktBs6iYZBrUpPbh/oqP79WZq7/8o7ZY03/5R42zolBy5JwHbt+L8DFUqu5roRTf5HxN+b91WvrCwc8VG0afkrsKfhirmHPFf/2A7cnXVOCFd2Mq+S4lE1xk7RGIrSObcayBwD4GNxt4qFqYNLhrnK73Ch3ekMvcXtbrBIgQTp2n+8UjjFjHvZyzb5Et4PUE238U +rMQJvpl 0WUAuM5jxDZ0OMCbl6v2uYQ6Qwf1gORWniLi7HlfZLv4NvEz7EbxSl2YF9sa47ZvPZI5M157xzD2XkutVKPKEgg1XXR0mSDaVbkQ8H0omuwc2y4XZbKSX9ep+TQ2sR3PmKKkOiNBSo/AtJrnGVps8LFoxG0Jb2yUFAEsyuH2E9+guvYz+a05vnUFQ6oqVMW6vkC5VIsjmLshebONx4bDmgp5/wrfbYv8S2VuB X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Dec 07, 2022 at 04:12:31PM -0800, John Hubbard wrote: > On 12/7/22 12:31, Peter Xu wrote: > > huge_pte_offset() is the main walker function for hugetlb pgtables. The > > name is not really representing what it does, though. > > > > Instead of renaming it, introduce a wrapper function called hugetlb_walk() > > which will use huge_pte_offset() inside. Assert on the locks when walking > > the pgtable. > > > > Note, the vma lock assertion will be a no-op for private mappings. > > > > Signed-off-by: Peter Xu > > --- > > fs/hugetlbfs/inode.c | 4 +--- > > fs/userfaultfd.c | 6 ++---- > > include/linux/hugetlb.h | 39 +++++++++++++++++++++++++++++++++++++++ > > mm/hugetlb.c | 32 +++++++++++++------------------- > > mm/page_vma_mapped.c | 2 +- > > mm/pagewalk.c | 4 +--- > > 6 files changed, 57 insertions(+), 30 deletions(-) > > > > diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c > > index fdb16246f46e..48f1a8ad2243 100644 > > --- a/fs/hugetlbfs/inode.c > > +++ b/fs/hugetlbfs/inode.c > > @@ -388,9 +388,7 @@ static bool hugetlb_vma_maps_page(struct vm_area_struct *vma, > > { > > pte_t *ptep, pte; > > - ptep = huge_pte_offset(vma->vm_mm, addr, > > - huge_page_size(hstate_vma(vma))); > > - > > + ptep = hugetlb_walk(vma, addr, huge_page_size(hstate_vma(vma))); > > if (!ptep) > > return false; > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > > index a602f008dde5..f31fe1a9f4c5 100644 > > --- a/fs/userfaultfd.c > > +++ b/fs/userfaultfd.c > > @@ -237,14 +237,12 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx, > > unsigned long flags, > > unsigned long reason) > > { > > - struct mm_struct *mm = ctx->mm; > > pte_t *ptep, pte; > > bool ret = true; > > - mmap_assert_locked(mm); > > - > > - ptep = huge_pte_offset(mm, address, vma_mmu_pagesize(vma)); > > + mmap_assert_locked(ctx->mm); > > + ptep = hugetlb_walk(vma, address, vma_mmu_pagesize(vma)); > > if (!ptep) > > goto out; > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > > index 81efd9b9baa2..1c20cbbf3d22 100644 > > --- a/include/linux/hugetlb.h > > +++ b/include/linux/hugetlb.h > > @@ -2,6 +2,7 @@ > > #ifndef _LINUX_HUGETLB_H > > #define _LINUX_HUGETLB_H > > +#include > > #include > > #include > > #include > > @@ -196,6 +197,11 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, > > * huge_pte_offset(): Walk the hugetlb pgtable until the last level PTE. > > * Returns the pte_t* if found, or NULL if the address is not mapped. > > * > > + * IMPORTANT: we should normally not directly call this function, instead > > + * this is only a common interface to implement arch-specific walker. > > + * Please consider using the hugetlb_walk() helper to make sure of the > > + * correct locking is satisfied. > > Or: > > "Please use hugetlb_walk() instead, because that will attempt to verify > the locking for you." Ok. > > > + * > > * Since this function will walk all the pgtable pages (including not only > > * high-level pgtable page, but also PUD entry that can be unshared > > * concurrently for VM_SHARED), the caller of this function should be > > @@ -1229,4 +1235,37 @@ bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr); > > #define flush_hugetlb_tlb_range(vma, addr, end) flush_tlb_range(vma, addr, end) > > #endif > > +static inline bool > > +__vma_shareable_flags_pmd(struct vm_area_struct *vma) > > +{ > > + return vma->vm_flags & (VM_MAYSHARE | VM_SHARED) && > > + vma->vm_private_data; > > +} > > + > > +/* > > + * Safe version of huge_pte_offset() to check the locks. See comments > > + * above huge_pte_offset(). > > + */ > > It is odd to say that functionA() is a safe version of functionB(), if the > names are completely different. > > At this point, it is very clear that huge_pte_offset() should be renamed. > I'd suggest something like one of these: > > __hugetlb_walk() > hugetlb_walk_raw() We can. Not only because that's an arch api for years (didn't want to touch more arch code unless necessary), but also since we have hugetlb_walk() that'll be the future interface not huge_pte_offset(). Actually it's good when that's the only thing people can find from its name when they want to have a huge pgtable walk. :) So totally makes sense to do so, but I don't strongly feel like doing it in this patchset if you're okay with it. > > > +static inline pte_t * > > +hugetlb_walk(struct vm_area_struct *vma, unsigned long addr, unsigned long sz) > > +{ > > +#if defined(CONFIG_HUGETLB_PAGE) && \ > > + defined(CONFIG_ARCH_WANT_HUGE_PMD_SHARE) && defined(CONFIG_LOCKDEP) > > + struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; > > + > > + /* > > + * If pmd sharing possible, locking needed to safely walk the > > + * hugetlb pgtables. More information can be found at the comment > > + * above huge_pte_offset() in the same file. > > + * > > + * NOTE: lockdep_is_held() is only defined with CONFIG_LOCKDEP. > > + */ > > + if (__vma_shareable_flags_pmd(vma)) > > + WARN_ON_ONCE(!lockdep_is_held(&vma_lock->rw_sema) && > > + !lockdep_is_held( > > + &vma->vm_file->f_mapping->i_mmap_rwsem)); > > +#endif > > + return huge_pte_offset(vma->vm_mm, addr, sz); > > +} > > Let's please not slice up C functions with ifdefs. Instead, stick to the > standard approach of > > #ifdef X > functionC() > { > ...implementation > } > #else > functionC() > { > ...simpler or shorter or stub implementation > } Personally I like the slicing because it clearly tells what's the difference with/without the macros defined. Thanks, -- Peter Xu