From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60882C433FE for ; Mon, 28 Nov 2022 19:54:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9D0866B0071; Mon, 28 Nov 2022 14:54:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 980796B0072; Mon, 28 Nov 2022 14:54:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 849916B0073; Mon, 28 Nov 2022 14:54:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 74FBF6B0071 for ; Mon, 28 Nov 2022 14:54:52 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 4833A1A0224 for ; Mon, 28 Nov 2022 19:54:52 +0000 (UTC) X-FDA: 80183904024.14.AF8ECB4 Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) by imf16.hostedemail.com (Postfix) with ESMTP id BE874180010 for ; Mon, 28 Nov 2022 19:54:51 +0000 (UTC) Received: by mail-pl1-f174.google.com with SMTP id d3so6249634plr.10 for ; Mon, 28 Nov 2022 11:54:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=yufTTXr46fRXR6nxP30ZjlEiGclwfZrdGjcG5UsmqVQ=; b=NJcA1FMwOGnUdayh3+ySuYcR7/d2ThDh307D6M38g3ioFjyfXy4nQGJ2jZ/CCuNY2r Cv0ZtXSKh9gutzlxIkD+qbBuyeQ6AFzWynMcpctRKpDAT56XveLPWMIJKLs3ZZ1lpBAr ndmb3j/9FTfWVw5wDDoWGdZFttSX1EGAZuBypB3ao3EU2jcSPhOX90OBDQtnT6zj30Qx sSaHl1uK0V2ltp3bHc+Ve/g70CADvwAVswIw6Fw3XaN1FSOxQRc6mnq7psAM+1MUoZEP LwNWpt4gfZcRtC3jdIZ7B6xBbKSbLY9xye2yfGTKPa8VOqHxVdXKcUnP7FS9pzDhEJzy wJ6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=yufTTXr46fRXR6nxP30ZjlEiGclwfZrdGjcG5UsmqVQ=; b=nx5jmZ+J+pC2p8hl2u0WdOGyMfYwSx6GY0fUJGvhujEcpnoPWah9MgzKfJBG6j8F0+ aIgNLWSqtk1vCRyoxwIbzsEWuEyeOdX6lf9OQotiYPLeQwc65p7wT+4I59Odj3jJ2AU6 2wuJIsPGq/MW0p80E+vB1U3OZjQCVfNXYInlMgK+FuhCKxqPEIjyccPUfQ3l7FDaN7hq c6M/Lu/aMWdQmiamJO5Ouu3I5NvRqAR/j2BrJbzk3vO0YTk1x7KK61tyVh0QWRaJSsDo DkUObG98hHqkCvFpUMgtgwKRfpczCNjHbH8gqVFvq46Ym3ZuKK6/E22Hp3rA1MoS80u3 yRhA== X-Gm-Message-State: ANoB5plt+IXfqG9fZYFnVSLes1fGvsC5YvcW+w60ZWe64cYXnNK4dFQn qRyWTmeUE2HfXfCN0H4gT98Zu4uwgBTTJ5RRAe1mQsZdA34= X-Google-Smtp-Source: AA0mqf4mrQuHhOBtp23UdUQ6mo09vCj2BRnPlD0JgJTQDBG5tIyKUpqU2oGxiYMA8QnYiguma45brnar5WxhKo6XtoU= X-Received: by 2002:a17:90a:d38a:b0:218:a7e6:60df with SMTP id q10-20020a17090ad38a00b00218a7e660dfmr42400418pju.38.1669665290769; Mon, 28 Nov 2022 11:54:50 -0800 (PST) MIME-Version: 1.0 References: <20221128180252.1684965-1-jannh@google.com> <20221128180252.1684965-2-jannh@google.com> In-Reply-To: <20221128180252.1684965-2-jannh@google.com> From: Yang Shi Date: Mon, 28 Nov 2022 11:54:39 -0800 Message-ID: Subject: Re: [PATCH v4 2/3] mm/khugepaged: Fix GUP-fast interaction by sending IPI To: Jann Horn Cc: security@kernel.org, Andrew Morton , David Hildenbrand , Peter Xu , John Hubbard , linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1669665291; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yufTTXr46fRXR6nxP30ZjlEiGclwfZrdGjcG5UsmqVQ=; b=V7t6RvyW8EwwLkWsyZPJuNnHTxCHAUwy4C4I+PDuRnX+jbQCASZxp2RDuKEWkz92SD6aj3 mlYltBsEuNM+Mgsears2FMQGoee23F1JwLoFh2Cg3AIvjgk6TzEqXU9aj90zEEUs9wq8G1 OHyjIinrdKvBkU/boDbrSICrgVF6Y9E= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=NJcA1FMw; spf=pass (imf16.hostedemail.com: domain of shy828301@gmail.com designates 209.85.214.174 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1669665291; a=rsa-sha256; cv=none; b=Fk0XsnnYQFLqEn4QQAevGS6kTYecETxrzDrucsadvxhJexbdb7GiAvE0ZjpWqeAyfHaP41 5KizaWuqZEAYSdohIu7cSCh/gG/XliOO1YblE+JMnuONow6ULusEhjmZL4A4m5UK3edyTT QU2kKXxqZf1fkMto9Y3aWDOkjnTHv4A= X-Stat-Signature: 6fpur369jenk5zs4sn7riwm7jubymwy4 X-Rspamd-Queue-Id: BE874180010 Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=NJcA1FMw; spf=pass (imf16.hostedemail.com: domain of shy828301@gmail.com designates 209.85.214.174 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam06 X-Rspam-User: X-HE-Tag: 1669665291-950777 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Nov 28, 2022 at 10:03 AM Jann Horn wrote: > > Since commit 70cbc3cc78a99 ("mm: gup: fix the fast GUP race against THP > collapse"), the lockless_pages_from_mm() fastpath rechecks the pmd_t to > ensure that the page table was not removed by khugepaged in between. > > However, lockless_pages_from_mm() still requires that the page table is not > concurrently freed or reused to store non-PTE data. Otherwise, problems > can occur because: > > - deposited page tables can be freed when a THP page somewhere in the > mm is removed > - some architectures store non-PTE information inside deposited page > tables (see radix__pgtable_trans_huge_deposit()) > > Additionally, lockless_pages_from_mm() is also somewhat brittle with > regards to page tables being repeatedly moved back and forth, but > that shouldn't be an issue in practice. > > Fix it by sending IPIs (if the architecture uses > semi-RCU-style page table freeing) before freeing/reusing page tables. > > As noted in mm/gup.c, on configs that define CONFIG_HAVE_FAST_GUP, > there are two possible cases: > > 1. CONFIG_MMU_GATHER_RCU_TABLE_FREE is set, causing > tlb_remove_table_sync_one() to send an IPI to synchronize with > lockless_pages_from_mm(). > 2. CONFIG_MMU_GATHER_RCU_TABLE_FREE is unset, indicating that all > TLB flushes are already guaranteed to send IPIs. > tlb_remove_table_sync_one() will do nothing, but we've already > run pmdp_collapse_flush(), which did a TLB flush, which must have > involved IPIs. I'm trying to catch up with the discussion after the holiday break. I understand you switched from always allocating a new page table page (we decided before) to sending IPIs to serialize against fast-GUP, this is fine to me. So the code now looks like: pmdp_collapse_flush() sending IPI But the missing part is how we reached "TLB flushes are already guaranteed to send IPIs" when CONFIG_MMU_GATHER_RCU_TABLE_FREE is unset? ARM64 doesn't do it IIRC. Or did I miss something? > > Cc: stable@kernel.org > Fixes: ba76149f47d8 ("thp: khugepaged") > Acked-by: David Hildenbrand > Signed-off-by: Jann Horn > --- > v4: > - added ack from David Hildenbrand > - made commit message more verbose > > include/asm-generic/tlb.h | 4 ++++ > mm/khugepaged.c | 2 ++ > mm/mmu_gather.c | 4 +--- > 3 files changed, 7 insertions(+), 3 deletions(-) > > diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h > index 492dce43236ea..cab7cfebf40bd 100644 > --- a/include/asm-generic/tlb.h > +++ b/include/asm-generic/tlb.h > @@ -222,12 +222,16 @@ extern void tlb_remove_table(struct mmu_gather *tlb, void *table); > #define tlb_needs_table_invalidate() (true) > #endif > > +void tlb_remove_table_sync_one(void); > + > #else > > #ifdef tlb_needs_table_invalidate > #error tlb_needs_table_invalidate() requires MMU_GATHER_RCU_TABLE_FREE > #endif > > +static inline void tlb_remove_table_sync_one(void) { } > + > #endif /* CONFIG_MMU_GATHER_RCU_TABLE_FREE */ > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index 674b111a24fa7..c3d3ce596bff7 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -1057,6 +1057,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, > _pmd = pmdp_collapse_flush(vma, address, pmd); > spin_unlock(pmd_ptl); > mmu_notifier_invalidate_range_end(&range); > + tlb_remove_table_sync_one(); > > spin_lock(pte_ptl); > result = __collapse_huge_page_isolate(vma, address, pte, cc, > @@ -1415,6 +1416,7 @@ static void collapse_and_free_pmd(struct mm_struct *mm, struct vm_area_struct *v > lockdep_assert_held_write(&vma->anon_vma->root->rwsem); > > pmd = pmdp_collapse_flush(vma, addr, pmdp); > + tlb_remove_table_sync_one(); > mm_dec_nr_ptes(mm); > page_table_check_pte_clear_range(mm, addr, pmd); > pte_free(mm, pmd_pgtable(pmd)); > diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c > index add4244e5790d..3a2c3f8cad2fe 100644 > --- a/mm/mmu_gather.c > +++ b/mm/mmu_gather.c > @@ -153,7 +153,7 @@ static void tlb_remove_table_smp_sync(void *arg) > /* Simply deliver the interrupt */ > } > > -static void tlb_remove_table_sync_one(void) > +void tlb_remove_table_sync_one(void) > { > /* > * This isn't an RCU grace period and hence the page-tables cannot be > @@ -177,8 +177,6 @@ static void tlb_remove_table_free(struct mmu_table_batch *batch) > > #else /* !CONFIG_MMU_GATHER_RCU_TABLE_FREE */ > > -static void tlb_remove_table_sync_one(void) { } > - > static void tlb_remove_table_free(struct mmu_table_batch *batch) > { > __tlb_remove_table_free(batch); > -- > 2.38.1.584.g0f3c55d4c2-goog >