From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 44CADC433FE for ; Wed, 30 Nov 2022 16:11:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D09366B0074; Wed, 30 Nov 2022 11:11:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CB9786B0075; Wed, 30 Nov 2022 11:11:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B32EF6B0078; Wed, 30 Nov 2022 11:11:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 9F7806B0074 for ; Wed, 30 Nov 2022 11:11:44 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 781441A1161 for ; Wed, 30 Nov 2022 16:11:44 +0000 (UTC) X-FDA: 80190599328.30.506F1AD Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf10.hostedemail.com (Postfix) with ESMTP id CFD95C000D for ; Wed, 30 Nov 2022 16:11:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1669824703; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=j7/K8W5xLJ1MTcPVoq3LKtEdS3w02DLD54QmBsm/5yg=; b=B0an/3gQwNy8K/yQXyIWL7unnMnbuAF3Y2XDiXroJ0we5EOP+igrAY7cx0rD3kMEuasE9R qjpqhVvSys45F78DO1T5swl+KhBQ6Njd+W4rUHooe3ZodVLIpiyJTszR8whGaLNfhHlvDl r85y5gjbLeceUop8RkplDnR51aQR6MM= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-18-XB8_pUkbMj-rVxHwYrbn0Q-1; Wed, 30 Nov 2022 11:11:39 -0500 X-MC-Unique: XB8_pUkbMj-rVxHwYrbn0Q-1 Received: by mail-wm1-f69.google.com with SMTP id h9-20020a1c2109000000b003cfd37aec58so9587177wmh.1 for ; Wed, 30 Nov 2022 08:11:39 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=j7/K8W5xLJ1MTcPVoq3LKtEdS3w02DLD54QmBsm/5yg=; b=kh8nbj2Mwvi3JSoaOS+Cew60/Hs7zvN9u2EscMslw50S3gRrRHRKDEMQ/iND5f3SYP 2rkMupADDx9kfP3F2f9+dHvMG+eSWVZjh1DLKPfvn0H2XU31ny0M1JqNX/9VQGhXI2od p/B+6Pfjk8MxNtdboJE7H5GirGGX88Vy5QvRiulUc0VntakgoOsqfvMQzfkF55hRIimA ci/6lKWiucPeoEGG8iH2gewAtUR198rm9xvgDWMadVB9Fb63l6OthlRtHDBgf8OnRblF QMBtGmn2WZHmg61/DRts+IAw4wIxps3EkqfPtRFgvSh3W6ZVu/ZLNB1OWpJLYN4fKu0h DZQg== X-Gm-Message-State: ANoB5pnY3ROptYFNco6SlRVkHHj3WfYfGpo5X2TyBxfaqohj2fAZL8IP KqvH3r5FbRMWcZZdobP7w7mowbmmCbPJhpZdCJdRzOPPKcoJT4i5KD4tVJ/E+n1d9oK7RIE2mNz nqEDIoyK3jvw= X-Received: by 2002:adf:dc0f:0:b0:241:e7b4:e10 with SMTP id t15-20020adfdc0f000000b00241e7b40e10mr23921890wri.423.1669824698740; Wed, 30 Nov 2022 08:11:38 -0800 (PST) X-Google-Smtp-Source: AA0mqf6XAHkXOyJKQ1Ux+KmxILsOrz8YtOr2Gmx1Av8naJIpA55ErjahSokbrqKoTyoo8rZQzT00xA== X-Received: by 2002:adf:dc0f:0:b0:241:e7b4:e10 with SMTP id t15-20020adfdc0f000000b00241e7b40e10mr23921868wri.423.1669824698428; Wed, 30 Nov 2022 08:11:38 -0800 (PST) Received: from ?IPV6:2003:cb:c703:7600:a8ea:29ce:7ee3:dd41? (p200300cbc7037600a8ea29ce7ee3dd41.dip0.t-ipconnect.de. [2003:cb:c703:7600:a8ea:29ce:7ee3:dd41]) by smtp.gmail.com with ESMTPSA id t1-20020a5d49c1000000b00241e4bff85asm1858952wrs.100.2022.11.30.08.11.37 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 30 Nov 2022 08:11:37 -0800 (PST) Message-ID: Date: Wed, 30 Nov 2022 17:11:36 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.5.0 Subject: Re: [PATCH 03/10] mm/hugetlb: Document huge_pte_offset usage To: Peter Xu Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton , Jann Horn , Andrew Morton , Andrea Arcangeli , Rik van Riel , Nadav Amit , Miaohe Lin , Muchun Song , Mike Kravetz References: <20221129193526.3588187-1-peterx@redhat.com> <20221129193526.3588187-4-peterx@redhat.com> From: David Hildenbrand Organization: Red Hat In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1669824703; a=rsa-sha256; cv=none; b=K/mSn7ZvnpFK5r506QaJBfCm8/WlOpaJdR8nmcA+EG/V64hd0bdu8hKBonzfbJDXv+OuW2 i1jekDsQbLLMwISJZHTjY2J2yPuXzcEyspmAUIMETlpJvTSLCijvdQel0Q+TZu/oz94t0O XiwB51KVOScRgk2e8O6um9M68K1BX3c= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="B0an/3gQ"; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf10.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1669824703; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=j7/K8W5xLJ1MTcPVoq3LKtEdS3w02DLD54QmBsm/5yg=; b=N8vmJu1E6HhT7Mi6nd8f5sv8m92kzJ/JEOQXNbbG5vvE7tNk0PMeFH8H8SRmp+QE8bnNEu RI7wAbLF5RY3aZ9bDfOL83i4AxxHndbtEMHFPrnnCUhZNDb2380LKrVEq6xmN1GL94TmZt 3JjYoS8j+v/WabP6BwguaH3Cc5Ws4is= X-Rspamd-Queue-Id: CFD95C000D Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="B0an/3gQ"; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf10.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com X-Rspamd-Server: rspam12 X-Rspam-User: X-Stat-Signature: kss8gz84qm31y1x9nkjwbhzdcjw3m43r X-HE-Tag: 1669824703-744236 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 30.11.22 17:09, Peter Xu wrote: > On Wed, Nov 30, 2022 at 11:24:34AM +0100, David Hildenbrand wrote: >> On 29.11.22 20:35, Peter Xu wrote: >>> huge_pte_offset() is potentially a pgtable walker, looking up pte_t* for a >>> hugetlb address. >>> >>> Normally, it's always safe to walk a generic pgtable as long as we're with >>> the mmap lock held for either read or write, because that guarantees the >>> pgtable pages will always be valid during the process. >>> >>> But it's not true for hugetlbfs, especially shared: hugetlbfs can have its >>> pgtable freed by pmd unsharing, it means that even with mmap lock held for >>> current mm, the PMD pgtable page can still go away from under us if pmd >>> unsharing is possible during the walk. >>> >>> So we have two ways to make it safe even for a shared mapping: >>> >>> (1) If we're with the hugetlb vma lock held for either read/write, it's >>> okay because pmd unshare cannot happen at all. >>> >>> (2) If we're with the i_mmap_rwsem lock held for either read/write, it's >>> okay because even if pmd unshare can happen, the pgtable page cannot >>> be freed from under us. >>> >>> Document it. >>> >>> Signed-off-by: Peter Xu >>> --- >>> include/linux/hugetlb.h | 32 ++++++++++++++++++++++++++++++++ >>> 1 file changed, 32 insertions(+) >>> >>> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h >>> index 551834cd5299..81efd9b9baa2 100644 >>> --- a/include/linux/hugetlb.h >>> +++ b/include/linux/hugetlb.h >>> @@ -192,6 +192,38 @@ extern struct list_head huge_boot_pages; >>> pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, >>> unsigned long addr, unsigned long sz); >>> +/* >>> + * huge_pte_offset(): Walk the hugetlb pgtable until the last level PTE. >>> + * Returns the pte_t* if found, or NULL if the address is not mapped. >>> + * >>> + * Since this function will walk all the pgtable pages (including not only >>> + * high-level pgtable page, but also PUD entry that can be unshared >>> + * concurrently for VM_SHARED), the caller of this function should be >>> + * responsible of its thread safety. One can follow this rule: >>> + * >>> + * (1) For private mappings: pmd unsharing is not possible, so it'll >>> + * always be safe if we're with the mmap sem for either read or write. >>> + * This is normally always the case, IOW we don't need to do anything >>> + * special. >> >> Maybe worth mentioning that hugetlb_vma_lock_read() and friends already >> optimize for private mappings, to not take the VMA lock if not required. > > Yes we can. I assume this is not super urgent so I'll hold a while to see > whether there's anything else that needs amending for the documents. > > Btw, even with hugetlb_vma_lock_read() checking SHARED for a private only > code path it's still better to not take the lock at all, because that still > contains a function jump which will be unnecesary. IMHO it makes coding a lot more consistent and less error-prone when not care about whether to the the lock or not (as an optimization) and just having this handled "automatically". Optimizing a jump out would rather smell like a micro-optimization. -- Thanks, David / dhildenb