From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B9F9BC02181 for ; Fri, 24 Jan 2025 04:44:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4CAB26B009C; Thu, 23 Jan 2025 23:44:16 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 45488280025; Thu, 23 Jan 2025 23:44:16 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2CD126B009E; Thu, 23 Jan 2025 23:44:16 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 02539280025 for ; Thu, 23 Jan 2025 23:44:15 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id ACBB71A0D35 for ; Fri, 24 Jan 2025 04:44:15 +0000 (UTC) X-FDA: 83041103670.03.2568D55 Received: from mail-pl1-f178.google.com (mail-pl1-f178.google.com [209.85.214.178]) by imf24.hostedemail.com (Postfix) with ESMTP id DC096180005 for ; Fri, 24 Jan 2025 04:44:13 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=ssSX0KRK; spf=pass (imf24.hostedemail.com: domain of hughd@google.com designates 209.85.214.178 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737693854; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=uv7cwOTpV0KzNA+qTFqu9tELK30ddmZjfRf/roGfDiI=; b=bcbTC0V7Yktgw5EAHV4BoY8UmpY964B9Xil3TnxehVf2j/FmykqYz4msoFPIXGvzx5ij/s yRfFd3Bt9fr8Kwc4TBqbLT4D5ZlOYCURzeku7hY9vO4+o3uVo5n+p4oEOH2wGuX6E3BIMZ w6gUDh9nfOaiED3poYCYE0hEEGInv8o= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=ssSX0KRK; spf=pass (imf24.hostedemail.com: domain of hughd@google.com designates 209.85.214.178 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737693854; a=rsa-sha256; cv=none; b=hXrK5LDuYGBOlPI2+1gsq/ABRZ7j6hU7pIzIFsclw/LfmnVzPvjaJLJEIxaG8kk5DFVHqh D+o8hEXTttQkeiHflDM7E60n7I6/CzjhCKvGrXLVX7nbZXj7NY2TDqFqwP2BUspAd8lJvI NWeUfcwKcPwNMMO61Z6Pyiq9YVHRY8U= Received: by mail-pl1-f178.google.com with SMTP id d9443c01a7336-21661be2c2dso29187475ad.1 for ; Thu, 23 Jan 2025 20:44:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737693852; x=1738298652; darn=kvack.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=uv7cwOTpV0KzNA+qTFqu9tELK30ddmZjfRf/roGfDiI=; b=ssSX0KRKQsUoMWCsnMCDJvHeOLzAhVyOB2ijZB5cxGz/5xbms2chuM5FjAaLQs+PKO TFmgDM7SOZ4xx8mOMMvgpHJqxWyFi8NUc+nbK5Ro0GEd0nQB1xzRLN8TGLUB6HUs5Mp8 ERnPQVmtpZq/G5tjoE2Bgq7coRNmNN/PfJo15HY16PxY1I3w0UMOk5lJ4dRm+fwvbJ21 tFVp1fVOHZbMAJPI5qvdFzZfoUfDz9fOtyDqQqnpCJ5DnV+WCdWbnzw1eUxzideOxTcA itylV+zbLSqJpVCR2u6UYLm2nL9/PNCCdNT1bza3L/fkHUsPFBsphPXAW0TTsiehLQ0P 27wg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737693852; x=1738298652; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=uv7cwOTpV0KzNA+qTFqu9tELK30ddmZjfRf/roGfDiI=; b=coeHz7CxAmhciF30mJPhozg3q8FKARaVbGmCgrB03CokXqLPIiTEpgtnrRFtCyEcOf zajOG4/yj/P5c2dYP86k77+g1A8hujlSiY+iWEq6Xa948dFoOIEdhs6SyMQijGpmxlWV rppfLttlJLUbXbhD4WciF6a1mdOk8I1pTVct8+/X7iVRjlz7BEaSVhKbs5ITB/+ywfD2 R23NLyv1PasmTJE0lxpeOAxWEohmRGRyBkSfT9Yb0FwP2g/FEosKJhnNQBgBufIvxDZr gssrDKNsxhryQDP7wNBnbk+r4Ym4hx82zwBveowuVXdfqsVtKV3/ff5nVEurpYQK+Kb5 8p+Q== X-Forwarded-Encrypted: i=1; AJvYcCW7tF+9e8bRzr7+3T9Mt+y8GMFUC47Do1HhMlrMhFcwHA154SzFyLVcdFN1DQL63exsVkTQ4m7hbA==@kvack.org X-Gm-Message-State: AOJu0YyfmcSBrN+vTD3i8OIWIDDwyBPWEhhAy8WdR73IuFKnrfvN9HWO OCNYGtYyPmlsV0cJK6+Dlp0vsdxNV32Pfs3ilmYZLgldPghwAK1+eFVbhIpKxw== X-Gm-Gg: ASbGncvTyMFtTk3RIbWs5/RbK16vUOIym4qRGBsufQmaSGpd95SyNyP5DY7pj8BoM6R V1hWWGpi7PC1+eZryKZmDeLZ9YDEgr6SdZeNvG+KKLAiQysZrVsw2Ije80UYiusuLauM07MKzM9 uh0Hl6Nc0YlP2ObQhloilZWProC6WkVj9r0jaJm4OiUEIuuYlrqcPCGz0JcOGMzKu0ENgKd8o5w Tfgvds/3//QsdBxDjGKE6VQzBydS5lRUL3HxfWiSbDRvxl+XenQg+e/QniQ8XVhDMpyxMpTvvcg hlOGlWkeGAm4h0+PeO7aVksSiJVevYZ50fV88fQ1WiC+Uu7VxMZfSPa2JcO3SaZscoVod7wbqIU = X-Google-Smtp-Source: AGHT+IGmuM5xbJ0dRlzaglpzHrCdlVIIELIc/CxP5w9jZ9cZ7t6VCUI3U4eu9nUuB+cgcOXhCnqk6g== X-Received: by 2002:a05:6a20:734e:b0:1e8:bd15:6819 with SMTP id adf61e73a8af0-1eb214ea482mr45448432637.22.1737693852382; Thu, 23 Jan 2025 20:44:12 -0800 (PST) Received: from darker.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-72f8a78e41dsm855895b3a.173.2025.01.23.20.44.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Jan 2025 20:44:11 -0800 (PST) Date: Thu, 23 Jan 2025 20:42:36 -0800 (PST) From: Hugh Dickins To: Jann Horn , Roman Gushchin cc: Peter Zijlstra , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton , Will Deacon , "Aneesh Kumar K.V" , Nick Piggin , Hugh Dickins , linux-arch@vger.kernel.org Subject: Re: [PATCH v2] mmu_gather: move tlb flush for VM_PFNMAP/VM_MIXEDMAP vmas into free_pgtables() In-Reply-To: Message-ID: <26cd41c2-b8b6-0c1d-c36d-28f2f9f369be@google.com> References: <20250122232716.1321171-1-roman.gushchin@linux.dev> <20250123214531.GA969@noisy.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspamd-Queue-Id: DC096180005 X-Stat-Signature: ytjk4f954tnjmsd15juu48s5yjdbkm83 X-Rspamd-Server: rspam08 X-Rspam-User: X-HE-Tag: 1737693853-553342 X-HE-Meta: U2FsdGVkX1+MOkAH2O1WcrzZQOHSa7VutEEtqYOnTAygXIZ9LNZNajvRrNzcaWm0FrkOz+JMHaB8eRERgG/nHwES22BMBxSGSYGupujSgWxQi4LPEUl5NMofXAZUrczyiQgF7PK+7/iAr7uBZwGqMugiIlASTHF7/z0RznktSDquuP0IGQRg34EJejNEf6/+J9IahpggPev4m+Ae894CJ0bkWbrtBFL2awKAwk4xZtEpDs87adpAoizXDE2eGuKR3LtE2Y8DyUCB2cdYw+kSncmmB5z+4PS2zjNJrqYZmGLxqpLqJGGQQUh3IPbgwRG4M9SsoqhxWWJhfdC0QBXZsrbFfkHYruEj+o1EWjbOWE8AeCZcAlMWACAJxU4v9KN+iBjfaca40iM4pUurQiY8m7P9tmY34B2NPY6ZuFjWUSYak6SWxGYvQWVsD9KMgbnfAV2/+xsUT9YXFqV38XF+lb+WaJVoWwtf8BP6IHgYjnTrw5W7376JF1dxE9H6/TXJ5mRfTnZirjwR9iT5gov1hvmEErdrAEXkw4wvs+OLbnCqV0dHvS7qsWKfA+4aJ1LM5Zy0dfKf6HmwqDxrL9Wrvm/AJBedLpHkV1ml5bbkSwxOWqxIN/fNem9lP1oWUvm7p+RdwlC3iHiwDuFQTx63+pB65vgD/z2avHi1/6frJPYXLfGTko9WoYJsYlP0vh7zaErpdnvjOvydnwDBoYCPjBsfHtflmSp/mp2erMX5sc/AaQ2Xe/8JWU4e8gbMYAaNrom4G7Y3z7qgCCiEztGndoMaCbi3I7XvlZuhsJqlq+IN4LPs2x8bW4UrxvxFWsBYYQG7FPBK/6pwVOraNw3zgKxRR+D3mBE6BZMnnK6ckQSYDor6mxtfA4Fb0Q8Hb0E3u0kivzD0CxO5w0tiN5CrBvBcT0sUkfT6Tm62PWZ3lHmIeiA/4ltGRrZQOkuLX0ZYne5iUXAu27dt/fkSg1c udlmSGdd cjAZ7obirkkuesuu7+bTOEvy8UAMa0zuT+YhZwVg18WaGk5sO62ZRvWK3iTiyKUw/XQsPIi1s3pTGNxwmrzUEaDR71FfzVfPkeqwvsYyPch63RcXOoLUEhlrVjp8XH2H5jorvcPHzpZilD0FePIKZGPYOKUXuSxJbUe0Onj0shx8yYAZDUiH+deqJib1Rb/w3sLealyOqYZHHIYhgXhXjFzoX8qIn010C/eXsxjJ+An8zBVcZd0nIJUlB6PUzuEhLcAp3z1SoIECW4lhPCtLPj8r1LPBfJgdVMHvnFA1lvs25rVCkemYxaAu5qv/tqjbCHKFWSTiG/VCGzzQ5/rAePVi9YkkFqTGIuDSwOXaag8W4LyWqndlRudbS0myTnu0G6bquwoK0kugWD2qkkqBEuMrfiQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, 23 Jan 2025, Roman Gushchin wrote: > On Thu, Jan 23, 2025 at 10:45:31PM +0100, Peter Zijlstra wrote: > > On Wed, Jan 22, 2025 at 11:27:16PM +0000, Roman Gushchin wrote: > > > Commit b67fbebd4cf9 ("mmu_gather: Force tlb-flush VM_PFNMAP vmas") > > > added a forced tlbflush to tlb_vma_end(), which is required to avoid a > > > race between munmap() and unmap_mapping_range(). However it added some > > > overhead to other paths where tlb_vma_end() is used, but vmas are not > > > removed, e.g. madvise(MADV_DONTNEED). > > > > > > Fix this by moving the tlb flush out of tlb_end_vma() into > > > free_pgtables(), somewhat similar to the stable version of the > > > original commit: e.g. stable commit 895428ee124a ("mm: Force TLB flush > > > for PFNMAP mappings before unlink_file_vma()"). > > > > > > > diff --git a/mm/memory.c b/mm/memory.c > > > index 398c031be9ba..c2a9effb2e32 100644 > > > --- a/mm/memory.c > > > +++ b/mm/memory.c > > > @@ -365,6 +365,13 @@ void free_pgtables(struct mmu_gather *tlb, struct ma_state *mas, > > > { > > > struct unlink_vma_file_batch vb; > > > > > > + /* > > > + * VM_PFNMAP and VM_MIXEDMAP maps require a special handling here: > > > + * force flush TLBs for such ranges to avoid munmap() vs > > > + * unmap_mapping_range() races. > > > + */ > > > + tlb_flush_mmu_pfnmap(tlb); > > > + > > > do { > > > unsigned long addr = vma->vm_start; > > > struct vm_area_struct *next; > > > > So I'm not sure I muc like this name, it is fairly random and does very > > much not convey the reason we're calling this. > > > > Anyway, going back to reading the original commit (because this > > changelog isn't helping me much), the problem appears to be that > > unlinking the vma will make unmap_mapping_range() skip things (no vma, > > nothing to do, etc) return early and bad things happen. The changelog of commit b67fbebd4cf9 ("mmu_gather: Force tlb-flush VM_PFNMAP vmas") has not helped me either. Nor could I locate any discussion (Jann, Linus, Peter, Will?) that led up to it. I can see that a racing unmap_mapping_range() may complete before the munmap() has done its TLB flush, but (a) so what? and (b) how does VM_PFNMAP or page_mapcount affect that? and (c) did Linus's later delay_rmap changes make any difference to the story here. Jann, please spell it out for us: I think I'm not the only one who fails to understand the race in question. At present I'm hovering between thinking there was no bug to be fixed in the first place, or that a tlb_flush_mmu_tlbonly() has to be done before unlinking vmas in all cases (or in all file cases?). I'm probably wrong in both directions, but cannot see it without help. > > > > So am I right in thinking we need this tlb flush before all those > > unlink_{anon,file}_vma*() calls, and that the whole free_pgd_range() > > that goes and invalidates the page-tables is too late? > > > > So how about we do something like so instead? > > Overall looks good to me, except one question (below). To me, Peter's patch looks much like yours, except wth different names and comments, plus the "vma" error you point out below. > > +static inline void tlb_free_vma(struct mmu_gather *tlb, struct vm_area_struct *vma) > > { > > if (tlb->fullmm) > > return; > > > > /* > > * VM_PFNMAP is more fragile because the core mm will not track the > > - * page mapcount -- there might not be page-frames for these PFNs after > > - * all. Force flush TLBs for such ranges to avoid munmap() vs > > - * unmap_mapping_range() races. > > + * page mapcount -- there might not be page-frames for these PFNs > > + * after all. > > + * > > + * Specifically() there is a race between munmap() and > > + * unmap_mapping_range(), where munmap() will unlink the VMA, such > > + * that unmap_mapping_range() will no longer observe the VMA and > > + * no-op, without observing the TLBI, returning prematurely. > > + * > > + * So if we're about to unlink such a VMA, and we have pending > > + * TLBI for such a vma, flush things now. > > */ > > - if (tlb->vma_pfn || !IS_ENABLED(CONFIG_MMU_GATHER_MERGE_VMAS)) { > > - /* > > - * Do a TLB flush and reset the range at VMA boundaries; this avoids > > - * the ranges growing with the unused space between consecutive VMAs. > > - */ > > + if ((vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP)) && tlb->vma_pfn) > > tlb_flush_mmu_tlbonly(tlb); > > - } > > Why do we need to re-check vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP) here? > > In free_pgtables() we're iterating over multiple vma's. What if the first has > no VM_PFNMAP set, but some other do? Idk if it's even possible, but it's not > obvious that it's not possible either. Yes, of course that is possible: the "vma" to tlb_free_vma() is a mistake (hmm, and there's no "free" in it either). Hugh