From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A182BC433FE for ; Fri, 14 Oct 2022 22:23:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F02BC6B0072; Fri, 14 Oct 2022 18:23:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EB1FF6B0075; Fri, 14 Oct 2022 18:23:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D2C466B0078; Fri, 14 Oct 2022 18:23:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id BD6A76B0072 for ; Fri, 14 Oct 2022 18:23:53 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 80897C0341 for ; Fri, 14 Oct 2022 22:23:53 +0000 (UTC) X-FDA: 80020983546.09.97E4431 Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com [64.147.123.19]) by imf01.hostedemail.com (Postfix) with ESMTP id C0BF34003A for ; Fri, 14 Oct 2022 22:23:52 +0000 (UTC) Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.west.internal (Postfix) with ESMTP id 6C0A03200945; Fri, 14 Oct 2022 18:23:50 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute5.internal (MEProxy); Fri, 14 Oct 2022 18:23:51 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to; s=fm1; t=1665786230; x=1665872630; bh=Y6 ngsLLxEEPsALgDBVsNDsl4csmgSHWiccz5cFir6Vk=; b=Yiii8jrRkL8w+/owty jm2EvI7nONMFGGGjBK/QGCbBpIAxkty3RXMNojrEBC44tcN0RSPRH/Q8kr1OtRr/ T+2GMwR+Rcb3n5KPusdZL55SQbADOfGV3Rb7YY02yl6fFMaiDA1wAecPhjwQovlL pctk2XWTIpnYfmJrd3VkbUrQQRloLst6f+icu3tegRqks84beneI9QwtCJQ2inJl qIIVFuZKRmRHMcSZ2him7UKefa/+hDjqqDxXaPpCtl6Td+mbJ/47hK798Nzy8+Pk FZ7/Pjmx9VwmNjWsE45YuE0krZ/+SHQwml4kzfcivNOEd+3uv0pXdRgSbpHYWssy gb2Q== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:date:date:feedback-id :feedback-id:from:from:in-reply-to:in-reply-to:message-id :mime-version:references:reply-to:sender:subject:subject:to:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; t=1665786230; x=1665872630; bh=Y6ngsLLxEEPsALgDBVsNDsl4csmg SHWiccz5cFir6Vk=; b=AFKuzg7kN5WYamgZ8J9vuz5dVRZV1akJiaRAO32SCH4+ o0pbSnDrnXD+o8ShW4H7DwB2SRlJOmlbz9LyDneVRgbApkaFYTj75Yri47Uy8D5I Qh05iPd84SwlRxQ7WXAnmFo/BSb016hW+US1t0MBv2+FTKGXJKxEnDy39j4PNqMH S/yA/L8+I5ZA1l6mWfENCXbo7qdoSSDlOD+fj7O3SQApfb25QcnI0nByk+xbt1rq dLO+XSUDI1IoY/XyZCklfUIt+60MWWktnPNXGb0TgIycRFYLLbfvFjpwN1BhRIKU FhbHKxapSNzlUkAnJxKIX9zymycmFCURvm1yM0fO/w== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvfedrfeekfedgtdelucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepfffhvfevuffkfhggtggujgesthdttddttddtvdenucfhrhhomhepfdfmihhr ihhllhcutedrucfuhhhuthgvmhhovhdfuceokhhirhhilhhlsehshhhuthgvmhhovhdrnh grmhgvqeenucggtffrrghtthgvrhhnpefhieeghfdtfeehtdeftdehgfehuddtvdeuheet tddtheejueekjeegueeivdektdenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmh epmhgrihhlfhhrohhmpehkihhrihhllhesshhhuhhtvghmohhvrdhnrghmvg X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 14 Oct 2022 18:23:49 -0400 (EDT) Received: by box.shutemov.name (Postfix, from userid 1000) id 4F21E1094FB; Sat, 15 Oct 2022 01:23:46 +0300 (+03) Date: Sat, 15 Oct 2022 01:23:46 +0300 From: "Kirill A. Shutemov" To: Jann Horn Cc: Andy Lutomirski , Linux-MM , Mel Gorman , Rik van Riel , kernel list , Kees Cook , Ingo Molnar , Sasha Levin , Andrew Morton , Will Deacon , Peter Zijlstra , Linus Torvalds Subject: Re: [BUG?] X86 arch_tlbbatch_flush() seems to be lacking mm_tlb_flush_nested() integration Message-ID: <20221014222346.n337tvkbyr33dsdx@box.shutemov.name> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=shutemov.name header.s=fm1 header.b=Yiii8jrR; dkim=pass header.d=messagingengine.com header.s=fm3 header.b=AFKuzg7k; spf=pass (imf01.hostedemail.com: domain of kirill@shutemov.name designates 64.147.123.19 as permitted sender) smtp.mailfrom=kirill@shutemov.name; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665786233; a=rsa-sha256; cv=none; b=x7KlpkmDgOI5C2zYm+cv3TsWnKf+FAD1bgU5bAQH/p4v8LbY2hMlIpRHjsB61Xu2K4UBwb Y4To/rGgKEVwKqn0evGE6YhSMdPb63mnlWACfWuVbOoiGkRxvYeA5Qu5huafkhnr7W9exJ tJLPYyYshS4a06rLF99GBGIrT53tRcw= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665786233; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Y6ngsLLxEEPsALgDBVsNDsl4csmgSHWiccz5cFir6Vk=; b=14QlswuvzIbZy3XaWDn+nGwfY990FiGCSN5g2uy66goNBwbpVLc/11/7F6XUlVj5cHL5mR JA/oiuChj/hclgP3dWSqVpmt+J8MGgrenTyoZd1xoYwiYBvNi5tIHzjdEgmZrRB+3pQROL 5kHO0lEGR5OoLqGlzRdnAoDKxTZEqD4= X-Rspam-User: Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=shutemov.name header.s=fm1 header.b=Yiii8jrR; dkim=pass header.d=messagingengine.com header.s=fm3 header.b=AFKuzg7k; spf=pass (imf01.hostedemail.com: domain of kirill@shutemov.name designates 64.147.123.19 as permitted sender) smtp.mailfrom=kirill@shutemov.name; dmarc=none X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: C0BF34003A X-Stat-Signature: hqntkxghubxx98pg76pc16dfwtbgdown X-HE-Tag: 1665786232-800944 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Oct 14, 2022 at 08:19:42PM +0200, Jann Horn wrote: > Hi! > > I haven't actually managed to reproduce this behavior, so maybe I'm > just misunderstanding how this works; but I think the > arch_tlbbatch_flush() path for batched TLB flushing in vmscan ought to > have some kind of integration with mm_tlb_flush_nested(). > > I think that currently, the following race could happen: > > [initial situation: page P is mapped into a page table of task B, but > the page is not referenced, the PTE's A/D bits are clear] > A: vmscan begins > A: vmscan looks at P and P's PTEs, and concludes that P is not currently in use > B: reads from P through the PTE, setting the Accessed bit and creating > a TLB entry > A: vmscan enters try_to_unmap_one() > A: try_to_unmap_one() calls should_defer_flush(), which returns true > A: try_to_unmap_one() removes the PTE and queues a TLB flush > (arch_tlbbatch_add_mm()) > A: try_to_unmap_one() returns, try_to_unmap() returns to shrink_folio_list() > B: calls munmap() on the VMA that mapped P > B: no PTEs are removed, so no TLB flush happens > B: munmap() returns I think here we will serialize against anon_vma/i_mmap lock in __do_munmap() -> unmap_region() -> free_pgtables() that A also holds. So I believe munmap() is safe, but MADV_DONTNEED (and its flavours) is not. > [at this point, the TLB entry still exists] > B: calls mmap(), which reuses the same area that was just unmapped > B: tries to access the newly created VMA, but instead the access goes > through the stale TLB entry > A: shrink_folio_list() calls try_to_unmap_flush(), which removes the > stale TLB entry > > The effect would be that after process B removes a mapping with > munmap() and creates a new mapping in its place, it would still see > data from the old mapping when trying to access the new mapping. > > Am I missing something that protects against this scenario? > > munmap() uses the mmu_gather infrastructure, which tries to protect > against this kind of correctness bug with multiple racing TLB > invalidations in tlb_finish_mmu() by blowing away the whole TLB > whenever one TLB invalidation ends while another is still in progress > (tested with mm_tlb_flush_nested(tlb->mm)). But mmu_gather doesn't > seem to be aware of TLB flushes that are batched up in the > arch_tlbbatch_flush() infrastructure, so that doesn't help here. > > I think it might be necessary to add a new global counter of pending > arch_tlbbatch_flush() flushes, and query that in > mm_tlb_flush_nested(), or something like that. -- Kiryl Shutsemau / Kirill A. Shutemov