From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=vc3H=PU=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 0D770C433EF
	for <linux-mm@archiver.kernel.org>; Mon,  1 Nov 2021 15:45:30 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id A240061058
	for <linux-mm@archiver.kernel.org>; Mon,  1 Nov 2021 15:45:29 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org A240061058
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org
Received: by kanga.kvack.org (Postfix)
	id 3D26694001D; Mon,  1 Nov 2021 11:45:29 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 381A594000F; Mon,  1 Nov 2021 11:45:29 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 249B794001D; Mon,  1 Nov 2021 11:45:29 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0207.hostedemail.com [216.40.44.207])
	by kanga.kvack.org (Postfix) with ESMTP id 151EC94000F
	for <linux-mm@kvack.org>; Mon,  1 Nov 2021 11:45:29 -0400 (EDT)
Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay05.hostedemail.com (Postfix) with ESMTP id BECB118446BD8
	for <linux-mm@kvack.org>; Mon,  1 Nov 2021 15:45:28 +0000 (UTC)
X-FDA: 78760785936.06.3E8A617
Received: from mail-pf1-f169.google.com (mail-pf1-f169.google.com [209.85.210.169])
	by imf02.hostedemail.com (Postfix) with ESMTP id B86587001700
	for <linux-mm@kvack.org>; Mon,  1 Nov 2021 15:45:23 +0000 (UTC)
Received: by mail-pf1-f169.google.com with SMTP id h74so2738556pfe.0
        for <linux-mm@kvack.org>; Mon, 01 Nov 2021 08:45:28 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20210112;
        h=mime-version:subject:from:in-reply-to:date:cc
         :content-transfer-encoding:message-id:references:to;
        bh=IVZ+3X2Rjxc+ixerGwp7ypsajfDVTMwoHCpW87XMd9Q=;
        b=YZIbU4b61z7pDrPpqweZEq0UqGyo7ETxXPHWJDNP1+FgfC2iDskbORWARv/fI7lOVL
         NR5yknrkpN4/NBnaFhD3pRWNWuchtY2I4tR2MsYzcaf/CcXJsBb/jt/yYG1PAni0D8hj
         JGeeB2JJn9rJvjPAQcL5ejfB57MlOz16KjmNLVGhfBn1C46/pTVwpRzmtIaTY1LXP7Bk
         RieDSgfzQ95Dsi3VJfQ5WC3UdO7N5rzN4qkgqLMqTCJd4GSaFJIzLbCiRB7puq41x0pZ
         2Vh6D3hRQIsCjzukm/WVbGe98o9dEtqAZ35jaLYTdkyzyU5O3hzJZSv3meH+6jqOxKPG
         T8Ww==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc
         :content-transfer-encoding:message-id:references:to;
        bh=IVZ+3X2Rjxc+ixerGwp7ypsajfDVTMwoHCpW87XMd9Q=;
        b=MHrW/A7UPdJ4CcWvauUcZtNbR4wd6SooiwzEHWK4E/l6NIbs7EVfhkr4Ukh6FeVrc0
         PMVHqXtZ8RjQ/lVqResEpyXM/c9X8Z9L0E5dukhEkpxsy2LVhMLAQsYZ5rTRfsuVNq4Y
         Ndq0fCfQyqvzk04q3KJ8VQFcSZUyPmCn44a5H5VYywgQkAf7RxpXCbnmOPLvmx/JHgjR
         N9qr4kn7eqXmeVb89gFX1EeyZ7bfykXUkvHai7WoYpJCNoja/L8KNt5bq7Smp6QrN9Gd
         R65pSD1CQZRSQDobsT5k3UQ5NttJxWAKL1otznZd5iViJww/UpTGZojbCWciVBYolaZb
         oMOQ==
X-Gm-Message-State: AOAM53144Ut8UOlA7S6T5D3mNG5n87S/4bjeOoth6u8sFc2y6fqXmaIL
	u5cXBC+sZpQCst2uxXgB0m8=
X-Google-Smtp-Source: ABdhPJwGqDUkx3ky3N5utKupudoUHdzvzb3peFtCfxnm7yrUouJqtUBw1lGwerJKX1PD3tPI/k59SQ==
X-Received: by 2002:a05:6a00:c81:b029:30e:21bf:4c15 with SMTP id a1-20020a056a000c81b029030e21bf4c15mr29362219pfv.70.1635781527011;
        Mon, 01 Nov 2021 08:45:27 -0700 (PDT)
Received: from smtpclient.apple ([66.170.99.1])
        by smtp.gmail.com with ESMTPSA id n11sm13386812pgm.74.2021.11.01.08.45.25
        (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
        Mon, 01 Nov 2021 08:45:26 -0700 (PDT)
Content-Type: text/plain;
	charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\))
Subject: Re: [PATCH] mm: use correct VMA flags when freeing page-tables
From: Nadav Amit <nadav.amit@gmail.com>
In-Reply-To: <80283a1e-dfab-c02b-7a6a-424e2f7fda4@google.com>
Date: Mon, 1 Nov 2021 08:45:24 -0700
Cc: Linux-MM <linux-mm@kvack.org>,
 LKML <linux-kernel@vger.kernel.org>,
 Andrea Arcangeli <aarcange@redhat.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>,
 Andrew Morton <akpm@linux-foundation.org>,
 Andy Lutomirski <luto@kernel.org>,
 Dave Hansen <dave.hansen@linux.intel.com>,
 Peter Zijlstra <peterz@infradead.org>,
 Thomas Gleixner <tglx@linutronix.de>,
 Will Deacon <will@kernel.org>,
 Yu Zhao <yuzhao@google.com>,
 Nick Piggin <npiggin@gmail.com>,
 x86@kernel.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <6CAB8AAB-D6E2-47B3-BE00-E3D68310EB2D@gmail.com>
References: <20211021122322.592822-1-namit@vmware.com>
 <80283a1e-dfab-c02b-7a6a-424e2f7fda4@google.com>
To: Hugh Dickins <hughd@google.com>
X-Mailer: Apple Mail (2.3654.120.0.1.13)
X-Rspamd-Server: rspam05
X-Rspamd-Queue-Id: B86587001700
X-Stat-Signature: mcbxc1cbgmdn9o9ehb7krnw9us4bkt71
Authentication-Results: imf02.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20210112 header.b=YZIbU4b6;
	dmarc=pass (policy=none) header.from=gmail.com;
	spf=pass (imf02.hostedemail.com: domain of nadav.amit@gmail.com designates 209.85.210.169 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com
X-HE-Tag: 1635781523-444164
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>


> On Nov 1, 2021, at 12:28 AM, Hugh Dickins <hughd@google.com> wrote:
>=20
> On Thu, 21 Oct 2021, Nadav Amit wrote:
>=20
>> From: Nadav Amit <namit@vmware.com>
>>=20
>> Consistent use of the mmu_gather interface requires a call to
>> tlb_start_vma() and tlb_end_vma() for each VMA. free_pgtables() does =
not
>> follow this pattern.
>>=20
>> Certain architectures need tlb_start_vma() to be called in order for
>> tlb_update_vma_flags() to update the VMA flags (tlb->vma_exec and
>> tlb->vma_huge), which are later used for the proper TLB flush to be
>> issued. Since tlb_start_vma() is not called, this can lead to the =
wrong
>> VMA flags being used when the flush is performed.
>>=20
>> Specifically, the munmap syscall would call unmap_region(), which =
unmaps
>> the VMAs and then frees the page-tables. A flush is needed after
>> the page-tables are removed to prevent page-walk caches from holding
>> stale entries, but this flush would use the flags of the VMA flags of
>> the last VMA that was flushed. This does not appear to be right.
>>=20
>> Use tlb_start_vma() and tlb_end_vma() to prevent this from happening.
>> This might lead to unnecessary calls to flush_cache_range() on =
certain
>> arch's. If needed, a new flag can be added to mmu_gather to indicate
>> that the flush is not needed.
>>=20
>> Cc: Andrea Arcangeli <aarcange@redhat.com>
>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Andy Lutomirski <luto@kernel.org>
>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: Yu Zhao <yuzhao@google.com>
>> Cc: Nick Piggin <npiggin@gmail.com>
>> Cc: x86@kernel.org
>> Signed-off-by: Nadav Amit <namit@vmware.com>
>> ---
>> mm/memory.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>=20
>> diff --git a/mm/memory.c b/mm/memory.c
>> index 12a7b2094434..056fbfdd3c1f 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -412,6 +412,8 @@ void free_pgtables(struct mmu_gather *tlb, struct =
vm_area_struct *vma,
>> 		unlink_anon_vmas(vma);
>> 		unlink_file_vma(vma);
>>=20
>> +		tlb_start_vma(tlb, vma);
>> +
>> 		if (is_vm_hugetlb_page(vma)) {
>> 			hugetlb_free_pgd_range(tlb, addr, vma->vm_end,
>> 				floor, next ? next->vm_start : ceiling);
>> @@ -429,6 +431,8 @@ void free_pgtables(struct mmu_gather *tlb, struct =
vm_area_struct *vma,
>> 			free_pgd_range(tlb, addr, vma->vm_end,
>> 				floor, next ? next->vm_start : ceiling);
>> 		}
>> +
>> +		tlb_end_vma(tlb, vma);
>> 		vma =3D next;
>> 	}
>> }
>> --=20
>> 2.25.1
>=20
> No.
>=20
> This is an experiment to see whether reviewers look at a wider context
> than is seen in the patch itself?  Let's take a look:
>=20
> 		tlb_start_vma(tlb, vma);
>=20
> 		if (is_vm_hugetlb_page(vma)) {
> 			hugetlb_free_pgd_range(tlb, addr, vma->vm_end,
> 				floor, next ? next->vm_start : ceiling);
> 		} else {
> 			/*
> 			 * Optimization: gather nearby vmas into one =
call down
> 			 */
> 			while (next && next->vm_start <=3D vma->vm_end + =
PMD_SIZE
> 			       && !is_vm_hugetlb_page(next)) {
> 				vma =3D next;
> 				next =3D vma->vm_next;
> 				unlink_anon_vmas(vma);
> 				unlink_file_vma(vma);
> 			}
> 			free_pgd_range(tlb, addr, vma->vm_end,
> 				floor, next ? next->vm_start : ceiling);
> 		}
>=20
> 		tlb_end_vma(tlb, vma);
> 		vma =3D next;
>=20
> So, the vma may well have changed in between the new tlb_start_vma()
> and tlb_end_vma(): which defeats the intent of the patch.

Indeed, I made a an embarrassing bug. Having said that, I do not
understand the experiment and whether I conducted it or was the
object of it.

>=20
> And I doubt that optimization should be dropped to suit the patch:
> you admit to limited understanding of those architectures which need
> tlb_start_vma(), me too; but they seem to have survived many years
> without it there in free_pgtables(), and I think that tlb_start_vma()
> is for when freeing pages, not for when freeing page tables.  Surely
> all architectures have to accommodate the fact that a single page
> table can be occupied by many different kinds of vma.

When it comes to TLB flushing, the assumption that if something is
in the code for many years it must be working is questionable, and
I have already encountered (and fixed) many such cases before.

Freeing page-tables, as I mentioned before, is needed for the
invalidation the page-walk caches after the page-tables are dropped,
if a speculative page-walk takes place. I can post v2 and fix my
embarrassing bug. I am not going to force this patch - I just
encountered the issue as I was modifying a different piece of code
and the behavior seems very inconsistent.