From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 93A7AC433FE for ; Wed, 30 Nov 2022 10:24:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2058E6B0074; Wed, 30 Nov 2022 05:24:40 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1B4CA6B0075; Wed, 30 Nov 2022 05:24:40 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 07E5A6B0078; Wed, 30 Nov 2022 05:24:40 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id E8F136B0074 for ; Wed, 30 Nov 2022 05:24:39 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id B6E61C10A8 for ; Wed, 30 Nov 2022 10:24:39 +0000 (UTC) X-FDA: 80189724678.15.A8A54DD Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf19.hostedemail.com (Postfix) with ESMTP id 5F4AA1A000E for ; Wed, 30 Nov 2022 10:24:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1669803878; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=h3qb9pbfLJD1ep66GbpCldh7yIsfAPSKJe9tKT7uIVc=; b=fBXMtE1JQ60lo90L3q1D1A//WGh9iHkIOCdOL2k070f/iNAxwC0tRfPvUcZVqRbVqY3VYK eVAlKJqxnlBeCbgsxyoKLWBbjhx8hser9cektGd82JAeZjRke/pGcYvQ7LiDlSL1IshtFS Rx05bvCPz3+VBqJMotdtpB5qLo2gwgo= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-582-QWbaWz7AMlqw9SyjRBH09g-1; Wed, 30 Nov 2022 05:24:37 -0500 X-MC-Unique: QWbaWz7AMlqw9SyjRBH09g-1 Received: by mail-wm1-f70.google.com with SMTP id h4-20020a1c2104000000b003d01b66fe65so801291wmh.2 for ; Wed, 30 Nov 2022 02:24:37 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=h3qb9pbfLJD1ep66GbpCldh7yIsfAPSKJe9tKT7uIVc=; b=PBtA6wRqHIhoaQm/vhNhUSDaPsTokzzl5SMRvERj0JIrsF7rWdAtPVGy1P/BouuwhJ Gq6VVCpoIhuBW2LS4jgVVsLsyZfkWqR+2mAfOvetkUf3JEJY8H/CxaBRdxh8N3yhxxRB AKFKsZp4OA8aEXjqGG/g4TZoSvlO5qSUJNxYnvrnf6kS5R1W4p23ebcstPONg2R4KyrR K4pp4IiDpxPBVxnTdTR4N7ZajF/SPfIZ6hE2ohqnTKgDpZecGJFuJJtcRT76k1AIl86s jknxBu1EqnEwBTd+R57qW8mrKHjAerN+6MNBhew5TJrzVxN6MJDOPFyqRcS4P5lTuoyz 8KUw== X-Gm-Message-State: ANoB5pk1e6KdRgjGAGAbAt9CL35y78DDiMGs23hL0gQsp89M2d7Wtg+y b/HZJwOQFaHVT7Bl+lLZ7l52EVqT1q6UbMl9NMo9QMYlzwD98vAEL9x/2QuCCbqH0+Vzy4axI/x NbKruc1KMH7Q= X-Received: by 2002:a05:6000:71e:b0:241:df3f:f5d6 with SMTP id bs30-20020a056000071e00b00241df3ff5d6mr25914038wrb.288.1669803876151; Wed, 30 Nov 2022 02:24:36 -0800 (PST) X-Google-Smtp-Source: AA0mqf6+svkIP7IzWDFFwjUhMPm8CMH5OdicwS9GB0w7SRVKR9PXGcNvAox2TZZLqcgOd+daTf+jNQ== X-Received: by 2002:a05:6000:71e:b0:241:df3f:f5d6 with SMTP id bs30-20020a056000071e00b00241df3ff5d6mr25914023wrb.288.1669803875813; Wed, 30 Nov 2022 02:24:35 -0800 (PST) Received: from ?IPV6:2003:cb:c703:7600:a8ea:29ce:7ee3:dd41? (p200300cbc7037600a8ea29ce7ee3dd41.dip0.t-ipconnect.de. [2003:cb:c703:7600:a8ea:29ce:7ee3:dd41]) by smtp.gmail.com with ESMTPSA id p17-20020a5d68d1000000b002420cfcd13dsm1057412wrw.105.2022.11.30.02.24.34 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 30 Nov 2022 02:24:35 -0800 (PST) Message-ID: Date: Wed, 30 Nov 2022 11:24:34 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.1 Subject: Re: [PATCH 03/10] mm/hugetlb: Document huge_pte_offset usage To: Peter Xu , linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: James Houghton , Jann Horn , Andrew Morton , Andrea Arcangeli , Rik van Riel , Nadav Amit , Miaohe Lin , Muchun Song , Mike Kravetz References: <20221129193526.3588187-1-peterx@redhat.com> <20221129193526.3588187-4-peterx@redhat.com> From: David Hildenbrand Organization: Red Hat In-Reply-To: <20221129193526.3588187-4-peterx@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=fBXMtE1J; spf=pass (imf19.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1669803879; a=rsa-sha256; cv=none; b=T31k20zUCaKjmypmwcAEk4bKFsb70aS+GGzJJ49l+ATWuV1fiJW9uISIaboIyKUbLEKTmu QHvzQrQwcJ81laBGRZez6x8GmpwLfOayG/QdZiwLYVFEiEAeA61jLvV5HKZnXiSXH+RmnA 8e/vZBFpIYTr0Mhu1m5RDZpDvxYMgkA= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1669803879; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=h3qb9pbfLJD1ep66GbpCldh7yIsfAPSKJe9tKT7uIVc=; b=XHb9F8cICtKgl8Yn8CJwQCAfZlJLflbCOerv5ZtB0ExLW9gHjjJ96PFWJHWJoxSEINNPYT sK0iqBf4GVYnPHVkIcMX7iNlHGmy7w6gXLIQbuBgDiPnSP4hvaV6dZyzelpaUWLDgbj8Rg CTqRlD+ahlnBzIQHbvndov96rGnNnvU= Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=fBXMtE1J; spf=pass (imf19.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 5F4AA1A000E X-Stat-Signature: tztfwcx5s5pihb8b4y6so8sn6x64eewt X-Rspam-User: X-HE-Tag: 1669803879-343225 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 29.11.22 20:35, Peter Xu wrote: > huge_pte_offset() is potentially a pgtable walker, looking up pte_t* for a > hugetlb address. > > Normally, it's always safe to walk a generic pgtable as long as we're with > the mmap lock held for either read or write, because that guarantees the > pgtable pages will always be valid during the process. > > But it's not true for hugetlbfs, especially shared: hugetlbfs can have its > pgtable freed by pmd unsharing, it means that even with mmap lock held for > current mm, the PMD pgtable page can still go away from under us if pmd > unsharing is possible during the walk. > > So we have two ways to make it safe even for a shared mapping: > > (1) If we're with the hugetlb vma lock held for either read/write, it's > okay because pmd unshare cannot happen at all. > > (2) If we're with the i_mmap_rwsem lock held for either read/write, it's > okay because even if pmd unshare can happen, the pgtable page cannot > be freed from under us. > > Document it. > > Signed-off-by: Peter Xu > --- > include/linux/hugetlb.h | 32 ++++++++++++++++++++++++++++++++ > 1 file changed, 32 insertions(+) > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index 551834cd5299..81efd9b9baa2 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -192,6 +192,38 @@ extern struct list_head huge_boot_pages; > > pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, > unsigned long addr, unsigned long sz); > +/* > + * huge_pte_offset(): Walk the hugetlb pgtable until the last level PTE. > + * Returns the pte_t* if found, or NULL if the address is not mapped. > + * > + * Since this function will walk all the pgtable pages (including not only > + * high-level pgtable page, but also PUD entry that can be unshared > + * concurrently for VM_SHARED), the caller of this function should be > + * responsible of its thread safety. One can follow this rule: > + * > + * (1) For private mappings: pmd unsharing is not possible, so it'll > + * always be safe if we're with the mmap sem for either read or write. > + * This is normally always the case, IOW we don't need to do anything > + * special. Maybe worth mentioning that hugetlb_vma_lock_read() and friends already optimize for private mappings, to not take the VMA lock if not required. Was happy to spot that optimization in there already :) -- Thanks, David / dhildenb