From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 333CEC0218B for ; Thu, 23 Jan 2025 07:42:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 616A96B007B; Thu, 23 Jan 2025 02:42:29 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5C56B6B0082; Thu, 23 Jan 2025 02:42:29 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 48CA06B0083; Thu, 23 Jan 2025 02:42:29 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 2C3646B007B for ; Thu, 23 Jan 2025 02:42:29 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id D7568B082D for ; Thu, 23 Jan 2025 07:42:28 +0000 (UTC) X-FDA: 83037923976.12.9747FC0 Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) by imf13.hostedemail.com (Postfix) with ESMTP id 007A620006 for ; Thu, 23 Jan 2025 07:42:26 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=rvIJcIae; spf=pass (imf13.hostedemail.com: domain of hughd@google.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737618147; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Lo2Yjghf+PBm5FPd1YuKvq5zqJbtPenTYH/inXymsOg=; b=Cd31rYoBYdUnu5F339nNHhihPsVF/OHApiSdVLk3+wNTxgB6iGMKTHqQjTzP/YdiQ/VTXv vsCf8IfMOjVWpU+5npLWYC41gZ+7EY48YvsCQRYFsVmV8P11AZOKFpChpU7Ixcehq8hfeZ 1jTH2BBq1sFitcBQTj2fW2uXA8bsrII= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=rvIJcIae; spf=pass (imf13.hostedemail.com: domain of hughd@google.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737618147; a=rsa-sha256; cv=none; b=z/9vMGIiMznYK1xpHvFdgunfEdLBfMv9vhE58qPs5nY663VJSw85uaaf3w9XX6gn3C1Wpp sVd6wFUM639QO+GXMj6Z2ABmkNGyBIKb6uZ1ddtZjT6wXRdW0HQmsKGKPXpkae6JSQDS38 k2e6PZMKOEnkB9ILM+eQp4rFA2/QRDE= Received: by mail-pl1-f170.google.com with SMTP id d9443c01a7336-2164b1f05caso8500165ad.3 for ; Wed, 22 Jan 2025 23:42:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737618146; x=1738222946; darn=kvack.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=Lo2Yjghf+PBm5FPd1YuKvq5zqJbtPenTYH/inXymsOg=; b=rvIJcIaexhnXDidMiHgAlCpVj7A2+uR7OSJtXSSfrrvEYDzqOyDuMWM+N3dqwr5GL/ jwOnqXKCkkcMeqf5+ULquzS/ZqYqv04zZJGyz7zjBtjzzMMv6+JqYiUiaw3VzMAYJ+QS zaLsjppLAvd5FN39Xg2xrRjZ90VUsDYwiyVRLpkXUt07oT7JpZ8nhCIKv4QztTd+wulL rQHq0/hdLZVAQUR/CGTOdjM4UBLHzYqzLPh/JenNtFLt33U50KE3sB4a3peh2+zeZ1Af I6cTxxXs/y6F4WplZMLXyIO+Htc31WzN+EUBlP8M0qKp+zTWVUBpXVvrq0mrqDcZX6v+ 7UAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737618146; x=1738222946; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Lo2Yjghf+PBm5FPd1YuKvq5zqJbtPenTYH/inXymsOg=; b=K/MFHxSrTJGMFFDdBniN2wf7K7r8ztP10kEGEHVVqFVC45lg9LXC5hs6HUtC0NjDai EM37xP7QraPAy++qM91lsAZu2IDwucvSyG21pf4h4hnuGXKyL3/+qCg3qLBX0EUshRvj +chlQAgJw5qplE9fU5QVCbcvq42dG2tFtiE3Zy87mEWjntcOSbPfW9UL6T8NAnjKKY7X HIAerbbDgxEOBd5cZRwt3/KZxBRXHJAQRSoC+uI+wzrCvZlPIZvWkna//Sgbg2NEImOY PV5kGOK3y+PlTTzPmWs38QB0MO96mgH67b1Ja657loH6cok29Ov4KRrWmoXB/LqAscXY q4DQ== X-Forwarded-Encrypted: i=1; AJvYcCVkKqsrH3p1j0ChOwHe4m8BR8lLR0aWWi16zx7MyqpBluZfDxnR0d3QOjwP/oQQuGnFczAzBoIRVg==@kvack.org X-Gm-Message-State: AOJu0Yx9rGSUogtjl8t3CVRXrGJkoonxw4ehT7DXokqiFsSARlcfK9I2 UfK5dmBM+A5Myt0d6IX3JkKH5rfAdYUHnjLOcMHigUw1c4FfmDmzzZCzvT1pvQ== X-Gm-Gg: ASbGncuSXNu0rNqaSgV1BoTrlhJ+JfqEvn5/pjc7ab+nutcdAY85BGyzuT8DgFs7usS h9h6am67Ksjc20mHxMcWDutVNqUfabGFRs3wr6eSC7AtAGvGxICNHHDIrjQKLyjeJNTl8ZYKuyD +VWlBXCculSXa92hgOE6/wvRVxjGqPLUfdTM1dA4iTJM18nF57Ip0qTfd1Tu3BZVxLORWZD1GDj dTt7Bsrx0naGTVLkwITtWYb/+k2UBI/+aOB/0eL04ugPHyBh62mhWL/NGAvvlL2K2xTAok/JQZ6 Bbs9NNNc+B3TRdOrHmYh9v5s9/1vLeNupSRZLcswKHH/P/PaqqYAH+RKl4brDze/ X-Google-Smtp-Source: AGHT+IEPFrkjOSho6ZMrhwUrKQ+HhZzaOI9TlqZ8pK10JnKYj/aACTCK4JPXX4AQ/I7FKUG+1Xr1sQ== X-Received: by 2002:a05:6a20:430e:b0:1e1:9f24:2e4c with SMTP id adf61e73a8af0-1eb2147fd19mr34162393637.16.1737618145551; Wed, 22 Jan 2025 23:42:25 -0800 (PST) Received: from darker.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-72dab817385sm12858014b3a.68.2025.01.22.23.42.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 Jan 2025 23:42:24 -0800 (PST) Date: Wed, 22 Jan 2025 23:42:13 -0800 (PST) From: Hugh Dickins To: Roman Gushchin cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton , Jann Horn , Peter Zijlstra , Will Deacon , "Aneesh Kumar K.V" , Nick Piggin , Hugh Dickins , linux-arch@vger.kernel.org Subject: Re: [PATCH v2] mmu_gather: move tlb flush for VM_PFNMAP/VM_MIXEDMAP vmas into free_pgtables() In-Reply-To: <20250122232716.1321171-1-roman.gushchin@linux.dev> Message-ID: References: <20250122232716.1321171-1-roman.gushchin@linux.dev> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 007A620006 X-Stat-Signature: f3p5hrq5ayjo5xzr8i5w73jprw7nix1g X-Rspam-User: X-HE-Tag: 1737618146-706718 X-HE-Meta: U2FsdGVkX1+qvrdEAXDpy2189wChbdfpgzAgp8MYyvgB5LVs0bW2Oaj+MVuK+G0x41FmLX3FaI67H023DnovMbaSYBiOdpK+6k6WMbptBHRoH1JM6GombMpHcuMKDHOaCnEnCdQxGXV0piK5pEal1qflCytPkETnfDOueCn03aQBTQpNYGKwzhRoQutkFgVQ2DkYOhxPP2M60Zj9tiI4Z0Bh0DjcpIaSAE0yKH0u5bG4WbjutYmqUoh7dZtWpbLYAW9Cyh2iWCIXFOdmvNac4rSzX8k3Cdw2x08Btz+GmM4RHyobznWravsLkwq6wQUnj9zfTBRxVmRTHDz5dNPisvgrwsJcO95MnAludxyvhl0XpUgqOn1/EiUdHap67UeBD2dQarXjw98Ti8JbBIYdF5IcewHwBhjpM4MxFfvLwPp3l8aKHqgEuHtVA+Q9XofDMfOe3NyAc6XQxSj6F3uAk8+3oqI5Bq3cwXbLBsG2RSLG5VKfYtyaZPabBQ4pmJj06fdarwu60yVmJ5dWFOmMsoyrsdJ5n/rIEIku1vz4Ra68TqgkM01kIcik6BA9tooAk/ihlI5VV/GUfdepZl25VEuJtEXzJxcbrW9ThYHRDOtrIU019BiI4Cu3YpltUJ6cuZCf0HcuFSRu7p4HfLp90StiiXCd/I8pW+fo9isN+aHAQl0JeU26HXLS8x92dWmwqyqKncwWKwXzDIc/eZbCmwkFiW3t1/QQa6bLUbiK2CmDyPm+MzUPThy+XwBAUUaD6rWuTZo0ttnZOLp2ZLosYBJdNTwhZhBtzMH2sx47/phTD6aNwkG/q6FRwFXTtN8Ppxf2apWAAJ1OJKgFLFMpsUb16oNQjvo/7XoYvF7ktvyOdzVgDunCp2tCo5dMhRe7jhGAqVCcyDw+kLZP1kWh1Bm7Au+fIcKhoG2RgvGHGZHx3aLsnVm9y3cEOpNN8ubtpOimD6Y4yFwlFDN3qcE fbjBYKyD TLcJnrZ7ZKq/s/OB3DbOizPZWahrBB7Qr4jfPI5zUFPLIQgzLTe3Dk0QlNRBmERSmHWvCfZRmGkRepxnKvXXODmsd+4LgjTdxURuhupSphaKDd8OFirEhnpJRnaDtniNN5v6r+GVV0Q7PUq8QI+k3G+k/AhdT21J1c9jrsr9lrZNr87WKSVc1KssfXQgd6m0TM9YwnyQfdp4lCTOUP9p54gt96+WG+GtVETNBjo+4gVhYHkcyu7RsL2f5Nr3tyj7JFEIve0eR95XXWMp8/qizJcyHqCPiMFmo4ODIDQaz/GrSavhmV0kCCkCuBYUnCAd6WsS8VWU3nGwVHdlQhy7O2R8LGUU8RiNkgl3vHj6X5/SNneIJ8+9kAD2lvA5W+tHmvaEiuBp1Emw/Q04cbv0Hk3330K2p1niE4Vyyows3RS8tLzxVh5OXahaE6I8OQ58J19FlRoPLzPqXNVaasNDOWOeiVEmBvdIJdfv1+ibC5sjXz1TSmwJqUrj9S1L7uW3vkC5a1qe5s+IeAMdY7+YQIxWcjjfzFKS38ryX/uHiI1cSR05ykKnNx2jJXdXp4XlNnpuXnUkAICB/3gvY39tJK22Tp2KtLHBFxmiMmnUG5zVd8FKYK8o9c3mWQg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, 22 Jan 2025, Roman Gushchin wrote: > Commit b67fbebd4cf9 ("mmu_gather: Force tlb-flush VM_PFNMAP vmas") > added a forced tlbflush to tlb_vma_end(), which is required to avoid a > race between munmap() and unmap_mapping_range(). However it added some > overhead to other paths where tlb_vma_end() is used, but vmas are not > removed, e.g. madvise(MADV_DONTNEED). > > Fix this by moving the tlb flush out of tlb_end_vma() into > free_pgtables(), somewhat similar to the stable version of the > original commit: e.g. stable commit 895428ee124a ("mm: Force TLB flush > for PFNMAP mappings before unlink_file_vma()"). > > Note, that if tlb->fullmm is set, no flush is required, as the whole > mm is about to be destroyed. > > v2: > - moved vma_pfn flag handling into tlb.h (by Peter Z.) > - added comments (by Peter Z.) > - fixed the vma_pfn flag setting (by Hugh D.) > > Suggested-by: Jann Horn > Signed-off-by: Roman Gushchin > Cc: Peter Zijlstra > Cc: Will Deacon > Cc: "Aneesh Kumar K.V" > Cc: Andrew Morton > Cc: Nick Piggin > Cc: Hugh Dickins > Cc: linux-arch@vger.kernel.org > Cc: linux-mm@kvack.org > --- > include/asm-generic/tlb.h | 41 ++++++++++++++++++++++++++------------- > mm/memory.c | 7 +++++++ > 2 files changed, 35 insertions(+), 13 deletions(-) > > diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h > index 709830274b75..fbe31f49a5af 100644 > --- a/include/asm-generic/tlb.h > +++ b/include/asm-generic/tlb.h > @@ -449,7 +449,14 @@ tlb_update_vma_flags(struct mmu_gather *tlb, struct vm_area_struct *vma) > */ > tlb->vma_huge = is_vm_hugetlb_page(vma); > tlb->vma_exec = !!(vma->vm_flags & VM_EXEC); > - tlb->vma_pfn = !!(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP)); > + > + /* > + * vma_pfn is checked and cleared by tlb_flush_mmu_pfnmap() > + * for a set of vma's, so it should be set if at least one vma > + * has VM_PFNMAP or VM_MIXEDMAP flags set. > + */ > + if (vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP)) > + tlb->vma_pfn = 1; Okay, but struct mmu_gather is usually on the caller's stack, containing junk initially, and there's nothing in this patch yet to initialize tlb->vma_pfn to 0. __tlb_reset_range() needs to do that, doesn't it? With some adjustment to its comment about not resetting mmu_gather::vma_* fields. Or would it be better to get around that by renaming vma_pfn to, er, something else - I'd have to understand the essence of Jann's race better to come up with the right name - untracked_mappings? would that be right? I still haven't grasped the essence of that race. (Panic attack: where is, for example, tlb->need_flush_all initialized to 0? Ah, over in mm/mmu_gather.c, __tlb_gather_mmu(). Phew.) And if __tlb_reset_range() resets tlb->vma_pfn to 0, then that has the side-effect that any TLB flush cancels the vma_pfn state: which is a desirable side-effect, isn't it? avoiding the possibility of doing an unnecessary extra TLB flush in free_pgtables(), as I criticized before. Hugh > } > > static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb) > @@ -466,6 +473,22 @@ static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb) > __tlb_reset_range(tlb); > } > > +static inline void tlb_flush_mmu_pfnmap(struct mmu_gather *tlb) > +{ > + /* > + * VM_PFNMAP and VM_MIXEDMAP maps are fragile because the core mm > + * doesn't track the page mapcount -- there might not be page-frames > + * for these PFNs after all. Force flush TLBs for such ranges to avoid > + * munmap() vs unmap_mapping_range() races. > + * Ensure we have no stale TLB entries by the time this mapping is > + * removed from the rmap. > + */ > + if (unlikely(!tlb->fullmm && tlb->vma_pfn)) { > + tlb_flush_mmu_tlbonly(tlb); > + tlb->vma_pfn = 0; > + } > +} > + > static inline void tlb_remove_page_size(struct mmu_gather *tlb, > struct page *page, int page_size) > { > @@ -549,22 +572,14 @@ static inline void tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct * > > static inline void tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vma) > { > - if (tlb->fullmm) > + if (tlb->fullmm || IS_ENABLED(CONFIG_MMU_GATHER_MERGE_VMAS)) > return; > > /* > - * VM_PFNMAP is more fragile because the core mm will not track the > - * page mapcount -- there might not be page-frames for these PFNs after > - * all. Force flush TLBs for such ranges to avoid munmap() vs > - * unmap_mapping_range() races. > + * Do a TLB flush and reset the range at VMA boundaries; this avoids > + * the ranges growing with the unused space between consecutive VMAs. > */ > - if (tlb->vma_pfn || !IS_ENABLED(CONFIG_MMU_GATHER_MERGE_VMAS)) { > - /* > - * Do a TLB flush and reset the range at VMA boundaries; this avoids > - * the ranges growing with the unused space between consecutive VMAs. > - */ > - tlb_flush_mmu_tlbonly(tlb); > - } > + tlb_flush_mmu_tlbonly(tlb); > } > > /* > diff --git a/mm/memory.c b/mm/memory.c > index 398c031be9ba..c2a9effb2e32 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -365,6 +365,13 @@ void free_pgtables(struct mmu_gather *tlb, struct ma_state *mas, > { > struct unlink_vma_file_batch vb; > > + /* > + * VM_PFNMAP and VM_MIXEDMAP maps require a special handling here: > + * force flush TLBs for such ranges to avoid munmap() vs > + * unmap_mapping_range() races. > + */ > + tlb_flush_mmu_pfnmap(tlb); > + > do { > unsigned long addr = vma->vm_start; > struct vm_area_struct *next; > -- > 2.48.1.262.g85cc9f2d1e-goog