From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D770C433EF for ; Mon, 1 Nov 2021 15:45:30 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A240061058 for ; Mon, 1 Nov 2021 15:45:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org A240061058 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 3D26694001D; Mon, 1 Nov 2021 11:45:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 381A594000F; Mon, 1 Nov 2021 11:45:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 249B794001D; Mon, 1 Nov 2021 11:45:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0207.hostedemail.com [216.40.44.207]) by kanga.kvack.org (Postfix) with ESMTP id 151EC94000F for ; Mon, 1 Nov 2021 11:45:29 -0400 (EDT) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id BECB118446BD8 for ; Mon, 1 Nov 2021 15:45:28 +0000 (UTC) X-FDA: 78760785936.06.3E8A617 Received: from mail-pf1-f169.google.com (mail-pf1-f169.google.com [209.85.210.169]) by imf02.hostedemail.com (Postfix) with ESMTP id B86587001700 for ; Mon, 1 Nov 2021 15:45:23 +0000 (UTC) Received: by mail-pf1-f169.google.com with SMTP id h74so2738556pfe.0 for ; Mon, 01 Nov 2021 08:45:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=IVZ+3X2Rjxc+ixerGwp7ypsajfDVTMwoHCpW87XMd9Q=; b=YZIbU4b61z7pDrPpqweZEq0UqGyo7ETxXPHWJDNP1+FgfC2iDskbORWARv/fI7lOVL NR5yknrkpN4/NBnaFhD3pRWNWuchtY2I4tR2MsYzcaf/CcXJsBb/jt/yYG1PAni0D8hj JGeeB2JJn9rJvjPAQcL5ejfB57MlOz16KjmNLVGhfBn1C46/pTVwpRzmtIaTY1LXP7Bk RieDSgfzQ95Dsi3VJfQ5WC3UdO7N5rzN4qkgqLMqTCJd4GSaFJIzLbCiRB7puq41x0pZ 2Vh6D3hRQIsCjzukm/WVbGe98o9dEtqAZ35jaLYTdkyzyU5O3hzJZSv3meH+6jqOxKPG T8Ww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=IVZ+3X2Rjxc+ixerGwp7ypsajfDVTMwoHCpW87XMd9Q=; b=MHrW/A7UPdJ4CcWvauUcZtNbR4wd6SooiwzEHWK4E/l6NIbs7EVfhkr4Ukh6FeVrc0 PMVHqXtZ8RjQ/lVqResEpyXM/c9X8Z9L0E5dukhEkpxsy2LVhMLAQsYZ5rTRfsuVNq4Y Ndq0fCfQyqvzk04q3KJ8VQFcSZUyPmCn44a5H5VYywgQkAf7RxpXCbnmOPLvmx/JHgjR N9qr4kn7eqXmeVb89gFX1EeyZ7bfykXUkvHai7WoYpJCNoja/L8KNt5bq7Smp6QrN9Gd R65pSD1CQZRSQDobsT5k3UQ5NttJxWAKL1otznZd5iViJww/UpTGZojbCWciVBYolaZb oMOQ== X-Gm-Message-State: AOAM53144Ut8UOlA7S6T5D3mNG5n87S/4bjeOoth6u8sFc2y6fqXmaIL u5cXBC+sZpQCst2uxXgB0m8= X-Google-Smtp-Source: ABdhPJwGqDUkx3ky3N5utKupudoUHdzvzb3peFtCfxnm7yrUouJqtUBw1lGwerJKX1PD3tPI/k59SQ== X-Received: by 2002:a05:6a00:c81:b029:30e:21bf:4c15 with SMTP id a1-20020a056a000c81b029030e21bf4c15mr29362219pfv.70.1635781527011; Mon, 01 Nov 2021 08:45:27 -0700 (PDT) Received: from smtpclient.apple ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id n11sm13386812pgm.74.2021.11.01.08.45.25 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 01 Nov 2021 08:45:26 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\)) Subject: Re: [PATCH] mm: use correct VMA flags when freeing page-tables From: Nadav Amit In-Reply-To: <80283a1e-dfab-c02b-7a6a-424e2f7fda4@google.com> Date: Mon, 1 Nov 2021 08:45:24 -0700 Cc: Linux-MM , LKML , Andrea Arcangeli , Andrew Cooper , Andrew Morton , Andy Lutomirski , Dave Hansen , Peter Zijlstra , Thomas Gleixner , Will Deacon , Yu Zhao , Nick Piggin , x86@kernel.org Content-Transfer-Encoding: quoted-printable Message-Id: <6CAB8AAB-D6E2-47B3-BE00-E3D68310EB2D@gmail.com> References: <20211021122322.592822-1-namit@vmware.com> <80283a1e-dfab-c02b-7a6a-424e2f7fda4@google.com> To: Hugh Dickins X-Mailer: Apple Mail (2.3654.120.0.1.13) X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: B86587001700 X-Stat-Signature: mcbxc1cbgmdn9o9ehb7krnw9us4bkt71 Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=YZIbU4b6; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf02.hostedemail.com: domain of nadav.amit@gmail.com designates 209.85.210.169 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com X-HE-Tag: 1635781523-444164 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > On Nov 1, 2021, at 12:28 AM, Hugh Dickins wrote: >=20 > On Thu, 21 Oct 2021, Nadav Amit wrote: >=20 >> From: Nadav Amit >>=20 >> Consistent use of the mmu_gather interface requires a call to >> tlb_start_vma() and tlb_end_vma() for each VMA. free_pgtables() does = not >> follow this pattern. >>=20 >> Certain architectures need tlb_start_vma() to be called in order for >> tlb_update_vma_flags() to update the VMA flags (tlb->vma_exec and >> tlb->vma_huge), which are later used for the proper TLB flush to be >> issued. Since tlb_start_vma() is not called, this can lead to the = wrong >> VMA flags being used when the flush is performed. >>=20 >> Specifically, the munmap syscall would call unmap_region(), which = unmaps >> the VMAs and then frees the page-tables. A flush is needed after >> the page-tables are removed to prevent page-walk caches from holding >> stale entries, but this flush would use the flags of the VMA flags of >> the last VMA that was flushed. This does not appear to be right. >>=20 >> Use tlb_start_vma() and tlb_end_vma() to prevent this from happening. >> This might lead to unnecessary calls to flush_cache_range() on = certain >> arch's. If needed, a new flag can be added to mmu_gather to indicate >> that the flush is not needed. >>=20 >> Cc: Andrea Arcangeli >> Cc: Andrew Cooper >> Cc: Andrew Morton >> Cc: Andy Lutomirski >> Cc: Dave Hansen >> Cc: Peter Zijlstra >> Cc: Thomas Gleixner >> Cc: Will Deacon >> Cc: Yu Zhao >> Cc: Nick Piggin >> Cc: x86@kernel.org >> Signed-off-by: Nadav Amit >> --- >> mm/memory.c | 4 ++++ >> 1 file changed, 4 insertions(+) >>=20 >> diff --git a/mm/memory.c b/mm/memory.c >> index 12a7b2094434..056fbfdd3c1f 100644 >> --- a/mm/memory.c >> +++ b/mm/memory.c >> @@ -412,6 +412,8 @@ void free_pgtables(struct mmu_gather *tlb, struct = vm_area_struct *vma, >> unlink_anon_vmas(vma); >> unlink_file_vma(vma); >>=20 >> + tlb_start_vma(tlb, vma); >> + >> if (is_vm_hugetlb_page(vma)) { >> hugetlb_free_pgd_range(tlb, addr, vma->vm_end, >> floor, next ? next->vm_start : ceiling); >> @@ -429,6 +431,8 @@ void free_pgtables(struct mmu_gather *tlb, struct = vm_area_struct *vma, >> free_pgd_range(tlb, addr, vma->vm_end, >> floor, next ? next->vm_start : ceiling); >> } >> + >> + tlb_end_vma(tlb, vma); >> vma =3D next; >> } >> } >> --=20 >> 2.25.1 >=20 > No. >=20 > This is an experiment to see whether reviewers look at a wider context > than is seen in the patch itself? Let's take a look: >=20 > tlb_start_vma(tlb, vma); >=20 > if (is_vm_hugetlb_page(vma)) { > hugetlb_free_pgd_range(tlb, addr, vma->vm_end, > floor, next ? next->vm_start : ceiling); > } else { > /* > * Optimization: gather nearby vmas into one = call down > */ > while (next && next->vm_start <=3D vma->vm_end + = PMD_SIZE > && !is_vm_hugetlb_page(next)) { > vma =3D next; > next =3D vma->vm_next; > unlink_anon_vmas(vma); > unlink_file_vma(vma); > } > free_pgd_range(tlb, addr, vma->vm_end, > floor, next ? next->vm_start : ceiling); > } >=20 > tlb_end_vma(tlb, vma); > vma =3D next; >=20 > So, the vma may well have changed in between the new tlb_start_vma() > and tlb_end_vma(): which defeats the intent of the patch. Indeed, I made a an embarrassing bug. Having said that, I do not understand the experiment and whether I conducted it or was the object of it. >=20 > And I doubt that optimization should be dropped to suit the patch: > you admit to limited understanding of those architectures which need > tlb_start_vma(), me too; but they seem to have survived many years > without it there in free_pgtables(), and I think that tlb_start_vma() > is for when freeing pages, not for when freeing page tables. Surely > all architectures have to accommodate the fact that a single page > table can be occupied by many different kinds of vma. When it comes to TLB flushing, the assumption that if something is in the code for many years it must be working is questionable, and I have already encountered (and fixed) many such cases before. Freeing page-tables, as I mentioned before, is needed for the invalidation the page-walk caches after the page-tables are dropped, if a speculative page-walk takes place. I can post v2 and fix my embarrassing bug. I am not going to force this patch - I just encountered the issue as I was modifying a different piece of code and the behavior seems very inconsistent.