From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E3DC6C02182 for ; Thu, 23 Jan 2025 22:26:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 534AA280022; Thu, 23 Jan 2025 17:26:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4E3CB280020; Thu, 23 Jan 2025 17:26:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3D36D280022; Thu, 23 Jan 2025 17:26:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 1B49A280020 for ; Thu, 23 Jan 2025 17:26:57 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id B7F6B1407B9 for ; Thu, 23 Jan 2025 22:26:56 +0000 (UTC) X-FDA: 83040152832.22.840627F Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) by imf24.hostedemail.com (Postfix) with ESMTP id DCB7B180004 for ; Thu, 23 Jan 2025 22:26:54 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=wg3PT+38; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf24.hostedemail.com: domain of hughd@google.com designates 209.85.214.177 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737671215; a=rsa-sha256; cv=none; b=RZleALTmGjU9hhEtbcsl7/1ozMXWcO4vVdwAOzr19E3NiFb6COOeYB/J8Kch0kZFmLgg9K YxO6J5521o+5rH1R5YR1TBb93aE2254glJp/mIQRhljL7b3BWOIki7vMMkxGczKjdulXj4 WJo+L36X0jG/y5ers22NuRCC1rQCC/A= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=wg3PT+38; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf24.hostedemail.com: domain of hughd@google.com designates 209.85.214.177 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737671215; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tft7nm1JR9P/DYe+6ctmV/lRjjdKS3eqeij/WSGrhTY=; b=4yESLYz4cNHKtlZ5fcgJ76U1QDK8s6/GPeX4bdY3s+whORTYpHfhEVRQkUo4StsDbzVn/6 ZbN7iL3vlGVPthA9pdpXNWVoQA+D1gRiiXS6Ftxvqbt9HbvK1QWmVUbFRWRRfMKx1pVHpN 7q9Xdhbcw+3Q/js1zTSxTrWx6wiomMg= Received: by mail-pl1-f177.google.com with SMTP id d9443c01a7336-21675fd60feso32254025ad.2 for ; Thu, 23 Jan 2025 14:26:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737671213; x=1738276013; darn=kvack.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=tft7nm1JR9P/DYe+6ctmV/lRjjdKS3eqeij/WSGrhTY=; b=wg3PT+38cdhyYxjjY7qXVZb1MahItXlfQwMWiETxSWW4TccDxqfxYtcxbAMtwrs9RJ UyUlcPZLvju4InHcyFx54j3ZhtMI61xzaXa3PFcWdCwqodOSceLOvm6aIhvHcYcK05wI 5o9AgFRYazQKUXjhAS0mW6PWI6/0WxgwdZIfqShrcPGdpxEEtfPerCHxVHzIQZE72vGv OXoFQXa4Ty+/Tp+dvRTY3wYS15PuOpKDU/Fy5r3YLI5JK7DwGAYIPSltoIZOViLOdMbB /QjMf+ovf43DNKDc/s3u4pHQxUABaeuKrn2QCs5Seg+wY9WEHWy2tchrMMuc3df2/ty6 ZBrw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737671213; x=1738276013; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=tft7nm1JR9P/DYe+6ctmV/lRjjdKS3eqeij/WSGrhTY=; b=Eh0nJdGw7UDI8WSoONyRG9srein+S0FWBnTzvKnPPSSjs3HN4jkYGhmgLrB2+XtEcE k9nkkQLkfNGMXM44R7S4C0GWWKMxEYELMXVxWJceo/Zbx8d/J4/ouazzv9u45TngT8gB 9hKHMX7G7kTf4peJYj5z7BlvLgNqlSdeZ33Lt1m/Fa95ccjDaC2GuLgICwTul5ulCy1B r3VoGYcsHY0DyF2AH3ADsJoop5gp2HgvUNm5A+h4mDuMr+725aOIa2cnoneO0d0pVwHh 9T5492aa4Yc2vJcPNoGac0VofTVSb8kxIm2FxZxhzb1H6r2tk5ZvotwLRyjUe2Jll1LT pjyQ== X-Forwarded-Encrypted: i=1; AJvYcCVzmogoMJFcmOeXIP+y3GH1rPbY4rRZNHpO3Ns8/LBqfGwLMa7MySdqRiCJvo2w2TDnEWWnOfABzQ==@kvack.org X-Gm-Message-State: AOJu0Yz84WwPKa1+GtM33sqqpkN1NLuGip2gbcO2UIuNttnoDaGDg8wi C8ztPdTp+0O5UdB1oWqeMXoNmZbDJSkEqZadC5IOk/8SU8HuSb9yLUjZQgfadQ== X-Gm-Gg: ASbGncvWhGozTObyvBwAdk84PPho3sI0BPjIRJ7uX75cLtsEgsYE8KyWu33opNJ3w1s 77zP0qB72ZQwf6Wr8XyKzxDcTKo8VUe6Dxir0Psljl8fWwHKy3tVKeAgFeNnR8eEwVQezoNGQ/m W4hdMqMmEx+OK67VrNtSh1FnAyycAq9nuv7+SPqs43/jHhV6Er+YK077wZOjcxLM0QdLQsFMgoL zSrhXGdCE3BrrdcjuAKrGMIUvz48mRJaA7azbN+xWHA+IQPsb46ecwq/AdzbGhHFIp8T2Pj1tDG +9PYdI5JPSYGeE9jHydfQhve7wW2o89ZWgTe+RE/i4LY/e5Jz61Qez0VgUmZ+n5zF8wD16m7sVY = X-Google-Smtp-Source: AGHT+IFdr/JY0zYSzFQXqKrxePDHm7GTPzdct2mHp7qiWEvNNUEv0l0Rhkvb8+8S4Eb6YB26iHiPTg== X-Received: by 2002:a05:6a20:7f96:b0:1d9:c78f:4207 with SMTP id adf61e73a8af0-1eb2147d3f9mr41098399637.11.1737671213391; Thu, 23 Jan 2025 14:26:53 -0800 (PST) Received: from darker.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-72f8a6d22e4sm471624b3a.83.2025.01.23.14.26.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Jan 2025 14:26:52 -0800 (PST) Date: Thu, 23 Jan 2025 14:26:42 -0800 (PST) From: Hugh Dickins To: Roman Gushchin cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton , Jann Horn , Peter Zijlstra , Will Deacon , "Aneesh Kumar K.V" , Nick Piggin , Hugh Dickins , linux-arch@vger.kernel.org Subject: Re: [PATCH v3] mmu_gather: move tlb flush for VM_PFNMAP/VM_MIXEDMAP vmas into free_pgtables() In-Reply-To: <20250123164358.2384447-1-roman.gushchin@linux.dev> Message-ID: <32128611-60d5-147c-7e82-7b1dfbe8236b@google.com> References: <20250123164358.2384447-1-roman.gushchin@linux.dev> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspam-User: X-Rspamd-Queue-Id: DCB7B180004 X-Rspamd-Server: rspam10 X-Stat-Signature: 9wqkf5taf5d9gcjpsmk3jiz68fun4799 X-HE-Tag: 1737671214-17571 X-HE-Meta: U2FsdGVkX1/XgFcnpxfbudQWdXQxIcKYbrnGlEfEtD5fl2rFVoxKiYczenT6s2B4OfSh/DRsWqqqjVeY7AoXIA0g6bcADbuNPr8ETk90+G9e7Gtnw+pZHT0dXzB7Agj7Echuc9LxNc/ke7xkr4no7ZEA/w5XQauKW7KsUJKHcfvtWZbzACVezBLrpplPAfZzeVBCFUVBc5tR5o5f1ty1MpC4VkKIHXKupyNhz+8AMCf9urx3IE8ePDbIwq1fJvSY4wYFxQuO9/3zMMROG5HizlXRY3ulP4eGXubdhnbnYN0ODNjPc1/yjCvFVHwzn+aW2pT5s/xhPhLuR4v3Dm3vJwAO1O2WBLaWoklbZ2jeN5Gflu5o7iJsmbmpclEl7oCyIaIRA3fEiKX/E3iKgXNdBzBUHQfSy3X7Xi66Q+EngDocozdaaIJIp6FhbRiXXFlc+ighMHNUjalLQREqjEpUf3e1fYQqA/x4mhHLapMjBCt23GPmEgIwv5mIV+/axQpY0I2iBH9+3+89Y7ob1AmVsGyN3hXjj5Wz98B99tF+kdNM3OwZaoNHpCe38Amm05m2qx2KGPQyNTkfGyCmxKqOtxZKCl/1v66w8K0Xw8zt53NdmbK50m2SvpyUDKEV5lREpVoweoKqjzzGbcnlbTA7lgZSk6GUlDOGuqiY+FAHAiyGzUpAlgy8G6VaMRUYKGWG/8NVxqDMjJ7z16GccV9xaNE5jzeJPgjhwAJCxm/T3qzDdi07d97N9O2n9GyfWbHjNMeD/z9Ilx6UP2ZfWS5hyr8X9Y048FK1VBieUtgYKVlAUnCSkjnccNECkdWoZN4a8aetXUcomWq7VRwz4gWI8hYO6P0zhX2b/9U5URvRedQKCVKwyL6r7XbwD49F7pg9Qwu/TAnL4b+VsMi6rp4uIJFGpGz0hbuLKuHEiqdKYVLYOPZAjsajcXnAvjlNT2C2rM96NtdW6GUpjvaPzyT kAZK0gEM MmCluOvnTXbe9xt0WIOGtkGcKjKbafafFD1EJgcS8Fsxmvlpr027ff/T+x+mLKE2WMYYhqRuc8ImNa7injnbJeM2lI8i4FZjTEBpYaJ6GA5pGwfRLbPaI8AZfBs5CXnRBH8b+NwaPunbQNG+UJHIndD/ooYRgu1oPzK4Uyh/UXnzcMu4LAcPMgV7aRu1ql46bYwwdmcWbMuBvu/FtvEaMfTETjnykChvY1xEKmB6snJ7S3jDARFi3CQn4WzWHZbwx+R8Ywu6y/sL/4GG2E2Jwf/qFMA4KsFLcMO2s33POp4ixUSq39unPL1joTWCcUNo0YxGj6wtLebD5U8HpBGp/cwjNlwEEZeJmaxpYKXacaL7MHXCiP2DiMC3escLm5IcGyRGT4DqlwL8LSo6rQd5xOVCGxsv/PenB1QtysLt0VheiecwZM5JzCaJtxDY21E7Kv413neeh+YBRA9zje5oN+w4636QUMve5E6loK9GBDYdF5cZan3XvXlXagdrcRZgull52UUmQtoWYK/iTgoN5ihFQt33HsU8ccFJlBWR9up+7FP7HbewncmbWIMx48Yk11ZDgwg/yxwheA25MrOh+5u3wzUenlCDJ/MWgv2OzLXYwAUsPWfrIJzGidw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, 23 Jan 2025, Roman Gushchin wrote: > Commit b67fbebd4cf9 ("mmu_gather: Force tlb-flush VM_PFNMAP vmas") > added a forced tlbflush to tlb_vma_end(), which is required to avoid a > race between munmap() and unmap_mapping_range(). However it added some > overhead to other paths where tlb_vma_end() is used, but vmas are not > removed, e.g. madvise(MADV_DONTNEED). > > Fix this by moving the tlb flush out of tlb_end_vma() into > free_pgtables(), somewhat similar to the stable version of the > original commit: e.g. stable commit 895428ee124a ("mm: Force TLB flush > for PFNMAP mappings before unlink_file_vma()"). > > Note, that if tlb->fullmm is set, no flush is required, as the whole > mm is about to be destroyed. > > --- As Liam said, thanks. > > v3: > - added initialization of vma_pfn in __tlb_reset_range() (by Hugh D.) > > v2: > - moved vma_pfn flag handling into tlb.h (by Peter Z.) > - added comments (by Peter Z.) > - fixed the vma_pfn flag setting (by Hugh D.) > > Suggested-by: Jann Horn > Signed-off-by: Roman Gushchin > Cc: Peter Zijlstra > Cc: Will Deacon > Cc: "Aneesh Kumar K.V" > Cc: Andrew Morton > Cc: Nick Piggin > Cc: Hugh Dickins > Cc: linux-arch@vger.kernel.org > Cc: linux-mm@kvack.org > --- > include/asm-generic/tlb.h | 49 ++++++++++++++++++++++++++++----------- > mm/memory.c | 7 ++++++ > 2 files changed, 42 insertions(+), 14 deletions(-) The code looks right to me now, but not the comments (I usually prefer no comment to a wrong or difficult to get right comment). Except when I try to write a simple enough correct comment, I find the code has to be changed, and that then suggests further changes... Sigh. (We could also go down a path of saying that all of the vma_pfn stuff would be better under #fidef CONFIG_MMU_GATHER_MERGE_VMAS; but I think we shall only confuse ourselves that way - it shouldn't be enough to matter, so long as it does not add any extra TLB flushes.) > > diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h > index 709830274b75..cdc95b69b91d 100644 > --- a/include/asm-generic/tlb.h > +++ b/include/asm-generic/tlb.h > @@ -380,8 +380,16 @@ static inline void __tlb_reset_range(struct mmu_gather *tlb) > tlb->cleared_pmds = 0; > tlb->cleared_puds = 0; > tlb->cleared_p4ds = 0; > + > + /* > + * vma_pfn can only be set in tlb_start_vma(), so let's > + * initialize it here. Also a tlb flush issued by > + * tlb_flush_mmu_pfnmap() will cancel the vma_pfn state, > + * so that unnecessary subsequent flushes are avoided. No, that misses the point (or misses half of the point): the tlb_flush_mmu_pfnmap() itself won't need to flush if for other reasons we've done a TLB flush since the last VM_PFNMAP|VM_MIXEDMAP vma. What I want to write is: * vma_pfn can only be set in tlb_start_vma(), so initialize it here. * And then any call to tlb_flush_mmu_tlbonly() will reset it again, * so that unnecessary subsequent flushes are avoided. But once I look at tlb_flush_mmu_tlbonly(), I'm reminded that actually it does nothing, if none of cleared_ptes etc. is set: so would not reset vma_pfn in that case; which is okay-ish, but makes writing the comment hard. So maybe tlb_flush_mmu_tlbonly() should do an explicit "tlb->vma_pfn = 0" before returning early; but that then raises the question of whether it would not be better just to initialize vma_pfn to 0 in __tlb_gather_mmu(), not touch it in __tlb_reset_range(), but reset it to 0 at the start of tlb_flush_mmu_tlbonly(). But it also makes me realize that tlb_flush_mmu_tlbonly() avoiding __tlb_reset_range() when nothing was cleared, is not all that good: because if flushing a small range is better than flushing a large range, then it would be good to reset the range even when nothing was cleared (though it looks stupid to reset all the fields just found 0 already). Arrgh. This is not what you wnat to hear, to get your slowdown fix in. Simplest just to ignore the existing range deficiency, I suppose. But where to reset vma_pfn? I'm torn, have to let you and others decide. Hugh > + */ > + tlb->vma_pfn = 0; > /* > - * Do not reset mmu_gather::vma_* fields here, we do not > + * Do not reset other mmu_gather::vma_* fields here, we do not > * call into tlb_start_vma() again to set them if there is an > * intermediate flush. > */ > @@ -449,7 +457,14 @@ tlb_update_vma_flags(struct mmu_gather *tlb, struct vm_area_struct *vma) > */ > tlb->vma_huge = is_vm_hugetlb_page(vma); > tlb->vma_exec = !!(vma->vm_flags & VM_EXEC); > - tlb->vma_pfn = !!(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP)); > + > + /* > + * vma_pfn is checked and cleared by tlb_flush_mmu_pfnmap() > + * for a set of vma's, so it should be set if at least one vma > + * has VM_PFNMAP or VM_MIXEDMAP flags set. > + */ > + if (vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP)) > + tlb->vma_pfn = 1; > } > > static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb) > @@ -466,6 +481,20 @@ static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb) > __tlb_reset_range(tlb); > } > > +static inline void tlb_flush_mmu_pfnmap(struct mmu_gather *tlb) > +{ > + /* > + * VM_PFNMAP and VM_MIXEDMAP maps are fragile because the core mm > + * doesn't track the page mapcount -- there might not be page-frames > + * for these PFNs after all. Force flush TLBs for such ranges to avoid > + * munmap() vs unmap_mapping_range() races. > + * Ensure we have no stale TLB entries by the time this mapping is > + * removed from the rmap. > + */ > + if (unlikely(!tlb->fullmm && tlb->vma_pfn)) > + tlb_flush_mmu_tlbonly(tlb); > +} > + > static inline void tlb_remove_page_size(struct mmu_gather *tlb, > struct page *page, int page_size) > { > @@ -549,22 +578,14 @@ static inline void tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct * > > static inline void tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vma) > { > - if (tlb->fullmm) > + if (tlb->fullmm || IS_ENABLED(CONFIG_MMU_GATHER_MERGE_VMAS)) > return; > > /* > - * VM_PFNMAP is more fragile because the core mm will not track the > - * page mapcount -- there might not be page-frames for these PFNs after > - * all. Force flush TLBs for such ranges to avoid munmap() vs > - * unmap_mapping_range() races. > + * Do a TLB flush and reset the range at VMA boundaries; this avoids > + * the ranges growing with the unused space between consecutive VMAs. > */ > - if (tlb->vma_pfn || !IS_ENABLED(CONFIG_MMU_GATHER_MERGE_VMAS)) { > - /* > - * Do a TLB flush and reset the range at VMA boundaries; this avoids > - * the ranges growing with the unused space between consecutive VMAs. > - */ > - tlb_flush_mmu_tlbonly(tlb); > - } > + tlb_flush_mmu_tlbonly(tlb); > } > > /* > diff --git a/mm/memory.c b/mm/memory.c > index 398c031be9ba..c2a9effb2e32 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -365,6 +365,13 @@ void free_pgtables(struct mmu_gather *tlb, struct ma_state *mas, > { > struct unlink_vma_file_batch vb; > > + /* > + * VM_PFNMAP and VM_MIXEDMAP maps require a special handling here: > + * force flush TLBs for such ranges to avoid munmap() vs > + * unmap_mapping_range() races. > + */ > + tlb_flush_mmu_pfnmap(tlb); > + > do { > unsigned long addr = vma->vm_start; > struct vm_area_struct *next; > -- > 2.48.1.262.g85cc9f2d1e-goog