From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6089DC433DB for ; Thu, 18 Feb 2021 21:56:03 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CADDB64EAE for ; Thu, 18 Feb 2021 21:56:02 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CADDB64EAE Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 50FA98D0001; Thu, 18 Feb 2021 16:56:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 448916B0070; Thu, 18 Feb 2021 16:56:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2E9BE8D0001; Thu, 18 Feb 2021 16:56:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0020.hostedemail.com [216.40.44.20]) by kanga.kvack.org (Postfix) with ESMTP id 09F2B6B006E for ; Thu, 18 Feb 2021 16:56:02 -0500 (EST) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id C1C50181AF5C7 for ; Thu, 18 Feb 2021 21:56:01 +0000 (UTC) X-FDA: 77832746922.07.E9064E9 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf03.hostedemail.com (Postfix) with ESMTP id C2D7EC000C44 for ; Thu, 18 Feb 2021 21:55:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613685360; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=a/LGjoZ6J0eOt0Hag6QsOmNHN0ORlN3M/YMxrfz7rNU=; b=NpkYEYocMj8gduzKMuaY+0Q0VmkNfk872HMKBs5zrEGE37889Tg31csTqnj1nNMBSDasZj altL/7X2ZU6s6KLb+cRcjJvLORqxrN0u4kPkFOFkx3jzrM2Z7dcmNyLC8RDHLveVPS+V66 Vt2XbXeSBvvp5wyvThGcF8p2fteWW/s= Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-7-gzTblgE9N-6fqh4c6QBycQ-1; Thu, 18 Feb 2021 16:55:58 -0500 X-MC-Unique: gzTblgE9N-6fqh4c6QBycQ-1 Received: by mail-qt1-f200.google.com with SMTP id p20so2048131qtn.23 for ; Thu, 18 Feb 2021 13:55:57 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=a/LGjoZ6J0eOt0Hag6QsOmNHN0ORlN3M/YMxrfz7rNU=; b=PU9AUQ9nEz52CMFwaDOXAeD3Cxf43IN7dgJWvwF3OTTiLgJ4EgDXtRX+NXP8pb2rqR 281RarjdcsIGRKeyPxyh9Sslo/8t1Rs4pl7i/8ysgEGWPWt/zZkb2RZ7byWOGZU6J7Yw oM/lgnNHFRhYXCLoSlr6GvKqmVi4Dm6b0NIBdXgTp7GqsKLUYvu581beZc5BH4ilLD4V L3t34qWUijKNJwRT5kHJZSU6bCoWvFKJMfXvgH0MMSG1FLNCc6rzvKQSklDPIDz9MzJp mCZNJy6eiOtTW9yJA/qPZ1hRGenW9tBQZ5HCobo6KeqBIoIozoZPJvDU4oz5kf0v+K2x ICjg== X-Gm-Message-State: AOAM531Mjf/sbFPQvAXq4LYM3l/hF1BG5iPDnr1vaT6PdS0e+Zx7QNzm 1iBk7aopq7ZQhlGDgcRZ0F1rQCLmJLWREgypdga1hj99vJFOKHzdAyf2rb+zZQ2JMyOBPC8kC96 IPK7csCnhxbE= X-Received: by 2002:a37:a7c9:: with SMTP id q192mr6324208qke.299.1613685357503; Thu, 18 Feb 2021 13:55:57 -0800 (PST) X-Google-Smtp-Source: ABdhPJzMD5UVVS1w5xPjiB2AW4UW0+MMcsCBMmGZDBmYRgcCKOQNuldvrR1qOQccnAuyoRN9xCcXbg== X-Received: by 2002:a37:a7c9:: with SMTP id q192mr6324189qke.299.1613685357281; Thu, 18 Feb 2021 13:55:57 -0800 (PST) Received: from xz-x1.redhat.com (bras-vprn-toroon474qw-lp130-20-174-93-89-182.dsl.bell.ca. [174.93.89.182]) by smtp.gmail.com with ESMTPSA id m5sm4956015qkf.55.2021.02.18.13.55.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Feb 2021 13:55:56 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Mike Kravetz , Andrew Morton , Mike Rapoport , Andrea Arcangeli , peterx@redhat.com, Axel Rasmussen , "Kirill A . Shutemov" , Matthew Wilcox Subject: [PATCH v3 4/4] hugetlb/userfaultfd: Unshare all pmds for hugetlbfs when register wp Date: Thu, 18 Feb 2021 16:55:55 -0500 Message-Id: <20210218215555.10710-1-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210218215434.10203-1-peterx@redhat.com> References: <20210218215434.10203-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="US-ASCII" X-Stat-Signature: fwq15j6c1pn8s5a6uin3rcsyyjcy7k7y X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: C2D7EC000C44 Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf03; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=216.205.24.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1613685358-779744 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Huge pmd sharing for hugetlbfs is racy with userfaultfd-wp because userfaultfd-wp is always based on pgtable entries, so they cannot be shar= ed. Walk the hugetlb range and unshare all such mappings if there is, right b= efore UFFDIO_REGISTER will succeed and return to userspace. This will pair with want_pmd_share() in hugetlb code so that huge pmd sha= ring is completely disabled for userfaultfd-wp registered range. Signed-off-by: Peter Xu --- fs/userfaultfd.c | 4 ++++ include/linux/hugetlb.h | 3 +++ mm/hugetlb.c | 51 +++++++++++++++++++++++++++++++++++++++++ 3 files changed, 58 insertions(+) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 894cc28142e7..e259318fcae1 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include #include @@ -1448,6 +1449,9 @@ static int userfaultfd_register(struct userfaultfd_= ctx *ctx, vma->vm_flags =3D new_flags; vma->vm_userfaultfd_ctx.ctx =3D ctx; =20 + if (is_vm_hugetlb_page(vma) && uffd_disable_huge_pmd_share(vma)) + hugetlb_unshare_all_pmds(vma); + skip: prev =3D vma; start =3D vma->vm_end; diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 3b4104021dd3..6437483ad01b 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -188,6 +188,7 @@ unsigned long hugetlb_change_protection(struct vm_are= a_struct *vma, unsigned long address, unsigned long end, pgprot_t newprot); =20 bool is_hugetlb_entry_migration(pte_t pte); +void hugetlb_unshare_all_pmds(struct vm_area_struct *vma); =20 #else /* !CONFIG_HUGETLB_PAGE */ =20 @@ -369,6 +370,8 @@ static inline vm_fault_t hugetlb_fault(struct mm_stru= ct *mm, return 0; } =20 +static inline void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) = { } + #endif /* !CONFIG_HUGETLB_PAGE */ /* * hugepages at page global directory. If arch support diff --git a/mm/hugetlb.c b/mm/hugetlb.c index f53a0b852ed8..fc62932c31cb 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5653,6 +5653,57 @@ void move_hugetlb_state(struct page *oldpage, stru= ct page *newpage, int reason) } } =20 +/* + * This function will unconditionally remove all the shared pmd pgtable = entries + * within the specific vma for a hugetlbfs memory range. + */ +void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) +{ + struct hstate *h =3D hstate_vma(vma); + unsigned long sz =3D huge_page_size(h); + struct mm_struct *mm =3D vma->vm_mm; + struct mmu_notifier_range range; + unsigned long address, start, end; + spinlock_t *ptl; + pte_t *ptep; + + if (!(vma->vm_flags & VM_MAYSHARE)) + return; + + start =3D ALIGN(vma->vm_start, PUD_SIZE); + end =3D ALIGN_DOWN(vma->vm_end, PUD_SIZE); + + if (start >=3D end) + return; + + /* + * No need to call adjust_range_if_pmd_sharing_possible(), because + * we have already done the PUD_SIZE alignment. + */ + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, + start, end); + mmu_notifier_invalidate_range_start(&range); + i_mmap_lock_write(vma->vm_file->f_mapping); + for (address =3D start; address < end; address +=3D PUD_SIZE) { + unsigned long tmp =3D address; + + ptep =3D huge_pte_offset(mm, address, sz); + if (!ptep) + continue; + ptl =3D huge_pte_lock(h, mm, ptep); + /* We don't want 'address' to be changed */ + huge_pmd_unshare(mm, vma, &tmp, ptep); + spin_unlock(ptl); + } + flush_hugetlb_tlb_range(vma, start, end); + i_mmap_unlock_write(vma->vm_file->f_mapping); + /* + * No need to call mmu_notifier_invalidate_range(), see + * Documentation/vm/mmu_notifier.rst. + */ + mmu_notifier_invalidate_range_end(&range); +} + #ifdef CONFIG_CMA static bool cma_reserve_called __initdata; =20 --=20 2.26.2