From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 55627C32774 for ; Tue, 23 Aug 2022 08:29:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CBE656B0074; Tue, 23 Aug 2022 04:29:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C46BB6B0075; Tue, 23 Aug 2022 04:29:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AC0F16B0078; Tue, 23 Aug 2022 04:29:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 965E46B0074 for ; Tue, 23 Aug 2022 04:29:11 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 672154144B for ; Tue, 23 Aug 2022 08:29:11 +0000 (UTC) X-FDA: 79830182502.20.3C53A7D Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf23.hostedemail.com (Postfix) with ESMTP id 3B5D9140037 for ; Tue, 23 Aug 2022 08:29:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1661243349; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0M7MKXuIOUn+65HKb9s0vuBPYqXY5iZeLpoXphpzO/Q=; b=SeBiVUO3ZydvlMZJweKYstkayGbkRq8JZMmaTh2NUTNv5ZR3FFnqhzwN0Gd2A8lt4B6ppU j9gjkjwKNg5vTd5L38rljfU3O4zWahM1C1fK2hXby125bBNzPgilkjwypbjwyYrkuPer+O tvDPLQINF4yfmrxlz9rtyVGwYEpjQzo= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-674-8LO_MPSSMtCiMUaiy1bvJg-1; Tue, 23 Aug 2022 04:29:08 -0400 X-MC-Unique: 8LO_MPSSMtCiMUaiy1bvJg-1 Received: by mail-wr1-f71.google.com with SMTP id v20-20020adf8b54000000b002216d3e3d5dso2089734wra.12 for ; Tue, 23 Aug 2022 01:29:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc; bh=0M7MKXuIOUn+65HKb9s0vuBPYqXY5iZeLpoXphpzO/Q=; b=BZMLp04NMRSXTvAbUruIGxN4K531XGlWUfjgmffiByQQ0XPluiLycwEh4nv8pW+HCI RjiTHkpzlr9QxXof16ecI1q3NjVFZU/hBnJsjQO+od1LoiGrEiABiJK7pC7Z84rbU6as 9jEn4eCLwoQw+0Li/6ewAEdQCBE9FXwALgK5zQsfIKFLJmdHIQ/FuAHp3qWlCjKgE5CU tl3TvNCHVtmzF4g00Pt3MBa5w1IUOba8uVINGV4POdevNlhzxIVjVbc5WgbLkQi1ZDyn ELqAAUG7XuUjyRR2NnyCX0KVNV3ooCN01mljQ0XGK/uvhlFUiiDRSpCYCbipbPf4ENPE YTGw== X-Gm-Message-State: ACgBeo3hQ8coBNzQz4+YP7szvtufDWY8AphKlTgXtkASuXywerjwapVe CCTxFgWpHb46h03qZ7y6OYG5A2BwKta40yTHlc0XTvf6RsWFOaQ9AJjWMiTmUid85FNABANcAD6 eRt2eqE9t22Y= X-Received: by 2002:a5d:51c8:0:b0:225:3a78:217f with SMTP id n8-20020a5d51c8000000b002253a78217fmr10214983wrv.592.1661243347280; Tue, 23 Aug 2022 01:29:07 -0700 (PDT) X-Google-Smtp-Source: AA6agR7VvdMi/xvXogR4Z17pNGIVYldfQOJelfE5tffAABZFc/ifwnwcFRJkpm4v99ogtcJuAQnJ0w== X-Received: by 2002:a5d:51c8:0:b0:225:3a78:217f with SMTP id n8-20020a5d51c8000000b002253a78217fmr10214965wrv.592.1661243346996; Tue, 23 Aug 2022 01:29:06 -0700 (PDT) Received: from ?IPV6:2003:cb:c70b:1600:c48b:1fab:a330:5182? (p200300cbc70b1600c48b1faba3305182.dip0.t-ipconnect.de. [2003:cb:c70b:1600:c48b:1fab:a330:5182]) by smtp.gmail.com with ESMTPSA id h13-20020a5d4fcd000000b0022533c4fa48sm12238050wrw.55.2022.08.23.01.29.06 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 23 Aug 2022 01:29:06 -0700 (PDT) Message-ID: <4c24b891-04ce-2608-79d2-a75dc236533f@redhat.com> Date: Tue, 23 Aug 2022 10:29:05 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 To: Baolin Wang , akpm@linux-foundation.org, songmuchun@bytedance.com, mike.kravetz@oracle.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <0e5d92da043d147a867f634b17acbcc97a7f0e64.1661240170.git.baolin.wang@linux.alibaba.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v2 1/5] mm/hugetlb: fix races when looking up a CONT-PTE size hugetlb page In-Reply-To: <0e5d92da043d147a867f634b17acbcc97a7f0e64.1661240170.git.baolin.wang@linux.alibaba.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661243351; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0M7MKXuIOUn+65HKb9s0vuBPYqXY5iZeLpoXphpzO/Q=; b=JKyS5iyzn0G4iXCyXbll1y0VLdE7z9KSplTbWyNSCkshn3bSPS3GqKxeOltRTovrTc3Y4x gMQo1S7N65YhQS8z+fioF9nn+ohl/fsfBJHkjpjP7M1qODUMhsZwrAgi8jFdLkVyBXrtQa WfcTPjDjzz3tQZdfttS8AMmZw2A2NjA= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=SeBiVUO3; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf23.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661243351; a=rsa-sha256; cv=none; b=6MHScpNP37i7p12kGKOZPSE0vShCb7qRZzdQiSc3WjoLzbc8LrY0zj4lHnJjYb7Yqs54DJ d6RaAGVyafw1o7+O8CdOPFy8JqWquhaXot2g8xettBEAageg4AIixQBDV5UDmxYKlALjxb eJXrtVKmakwiq8yzNTzuqDLYwUXwkoM= X-Rspamd-Queue-Id: 3B5D9140037 X-Rspam-User: Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=SeBiVUO3; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf23.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com X-Rspamd-Server: rspam03 X-Stat-Signature: 17e898m1ttkqrrnw3hqmrps5p8yir5ro X-HE-Tag: 1661243350-700379 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 23.08.22 09:50, Baolin Wang wrote: > On some architectures (like ARM64), it can support CONT-PTE/PMD size > hugetlb, which means it can support not only PMD/PUD size hugetlb > (2M and 1G), but also CONT-PTE/PMD size(64K and 32M) if a 4K page size > specified. > > So when looking up a CONT-PTE size hugetlb page by follow_page(), it > will use pte_offset_map_lock() to get the pte entry lock for the CONT-PTE > size hugetlb in follow_page_pte(). However this pte entry lock is incorrect > for the CONT-PTE size hugetlb, since we should use huge_pte_lock() to > get the correct lock, which is mm->page_table_lock. > > That means the pte entry of the CONT-PTE size hugetlb under current > pte lock is unstable in follow_page_pte(), we can continue to migrate > or poison the pte entry of the CONT-PTE size hugetlb, which can cause > some potential race issues, and following pte_xxx() validation is also > unstable in follow_page_pte(), even though they are under the 'pte lock'. > > Moreover we should use huge_ptep_get() to get the pte entry value of > the CONT-PTE size hugetlb, which already takes into account the subpages' > dirty or young bits in case we missed the dirty or young state of the > CONT-PTE size hugetlb. > > To fix above issues, introducing a new helper follow_huge_pte() to look > up a CONT-PTE size hugetlb page, which uses huge_pte_lock() to get the > correct pte entry lock to make the pte entry stable, as well as > supporting non-present pte handling. > > Signed-off-by: Baolin Wang > --- > include/linux/hugetlb.h | 8 ++++++++ > mm/gup.c | 11 ++++++++++ > mm/hugetlb.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 72 insertions(+) > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index 3ec981a..d491138 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -207,6 +207,8 @@ struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address, > struct page *follow_huge_pd(struct vm_area_struct *vma, > unsigned long address, hugepd_t hpd, > int flags, int pdshift); > +struct page *follow_huge_pte(struct vm_area_struct *vma, unsigned long address, > + pmd_t *pmd, int flags); > struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address, > pmd_t *pmd, int flags); > struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address, > @@ -312,6 +314,12 @@ static inline struct page *follow_huge_pd(struct vm_area_struct *vma, > return NULL; > } > > +static inline struct page *follow_huge_pte(struct vm_area_struct *vma, > + unsigned long address, pmd_t *pmd, int flags) > +{ > + return NULL; > +} > + > static inline struct page *follow_huge_pmd(struct mm_struct *mm, > unsigned long address, pmd_t *pmd, int flags) > { > diff --git a/mm/gup.c b/mm/gup.c > index 3b656b7..87a94f5 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -534,6 +534,17 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, > if (unlikely(pmd_bad(*pmd))) > return no_page_table(vma, flags); > > + /* > + * Considering PTE level hugetlb, like continuous-PTE hugetlb on > + * ARM64 architecture. > + */ > + if (is_vm_hugetlb_page(vma)) { > + page = follow_huge_pte(vma, address, pmd, flags); > + if (page) > + return page; > + return no_page_table(vma, flags); > + } > + > ptep = pte_offset_map_lock(mm, pmd, address, &ptl); > pte = *ptep; > if (!pte_present(pte)) { > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 6c00ba1..cf742d1 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -6981,6 +6981,59 @@ struct page * __weak > return NULL; > } > > +/* Support looking up a CONT-PTE size hugetlb page. */ > +struct page * __weak > +follow_huge_pte(struct vm_area_struct *vma, unsigned long address, > + pmd_t *pmd, int flags) > +{ > + struct mm_struct *mm = vma->vm_mm; > + struct hstate *hstate = hstate_vma(vma); > + unsigned long size = huge_page_size(hstate); > + struct page *page = NULL; > + spinlock_t *ptl; > + pte_t *ptep, pte; > + > + /* > + * FOLL_PIN is not supported for follow_page(). Ordinary GUP goes via > + * follow_hugetlb_page(). > + */ > + if (WARN_ON_ONCE(flags & FOLL_PIN)) > + return NULL; > + > + ptep = huge_pte_offset(mm, address, size); > + if (!ptep) > + return NULL; > + > +retry: > + ptl = huge_pte_lock(hstate, mm, ptep); > + pte = huge_ptep_get(ptep); > + if (pte_present(pte)) { > + page = pte_page(pte); > + if (WARN_ON_ONCE(!try_grab_page(page, flags))) { > + page = NULL; > + goto out; > + } > + } else { > + if (!(flags & FOLL_MIGRATION)) { > + page = NULL; > + goto out; > + } > + > + if (is_hugetlb_entry_migration(pte)) { > + spin_unlock(ptl); > + __migration_entry_wait_huge(ptep, ptl); > + goto retry; > + } > + /* > + * hwpoisoned entry is treated as no_page_table in > + * follow_page_mask(). > + */ > + } > +out: > + spin_unlock(ptl); > + return page; > +} > + > struct page * __weak > follow_huge_pmd(struct mm_struct *mm, unsigned long address, > pmd_t *pmd, int flags) Can someone explain why: * follow_page() goes via follow_page_mask() for hugetlb * __get_user_pages() goes via follow_hugetlb_page() and never via follow_page_mask() for hugetlb? IOW, why can't we make follow_page_mask() just not handle hugetlb and route everything via follow_hugetlb_page() -- we primarily only have to teach it to not trigger faults. What's the reason that this hugetlb code has to be overly complicated? -- Thanks, David / dhildenb