From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 16419C433FE for ; Fri, 14 Oct 2022 18:20:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 645656B0072; Fri, 14 Oct 2022 14:20:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5F5D66B0075; Fri, 14 Oct 2022 14:20:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 497476B0078; Fri, 14 Oct 2022 14:20:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 384CF6B0072 for ; Fri, 14 Oct 2022 14:20:20 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 0E182808F0 for ; Fri, 14 Oct 2022 18:20:19 +0000 (UTC) X-FDA: 80020369800.30.248806A Received: from mail-il1-f169.google.com (mail-il1-f169.google.com [209.85.166.169]) by imf03.hostedemail.com (Postfix) with ESMTP id A048A2001C for ; Fri, 14 Oct 2022 18:20:19 +0000 (UTC) Received: by mail-il1-f169.google.com with SMTP id l3so2535863ilg.13 for ; Fri, 14 Oct 2022 11:20:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=hRrLIOmcMN6dxNUCyEtTWc83pVv8BngJLKL099x7jQ0=; b=CM1KudZ8ldL0cjM/0tzdsATxKwcOAqtoIJ+e5963ilVoRUTzCZe3X4olG0lO5gt6fn wLVdC2HMaXGL3DTX0vvPCWlzi5wRQqssXGSZ735mX/d9WkUxIJjoqoSZ9fF6vmRdBDGU JqUhIeNLn8+P7GOhD7AB5cu+CaThVVIQjV0J22mgJ3p4S75R0J+L8KRquTaxT9YeML21 mi8KgTuAEJmIMb/3rr/Ms1+FQq+4/c3r+2L5aVLp4GRqDrOvC0/n4qgd6RMacBmFh9sO KDq+GwXx4VsH6kgyUMklo6VsVcP+HVATpazb9pkE2RTXvhY5yq7BnvUd9mcUU6f0YhSY pM+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=hRrLIOmcMN6dxNUCyEtTWc83pVv8BngJLKL099x7jQ0=; b=h0J6By2rfMNb+Wai/0wV6yIzYONQkMfZ+OwowkrigvYq25jRkx9Sz+IeLH2Wb8Kh4z U3lHxKbE62EQQ3s4yDqDlu5PmtlxH9l5Aux7OGEdwoAFZLdmKsui4qPJG2kvg9gatiyh ZksYaW//yxxsAiuVIZycEsATZXhd5X+ozwy/M8av+Kh1KRkbn5s979b/WEvCRMSn4BkS 8HJsr02w+A+yikuMvPf4wuOFFuMLr08jWo0SseeksEvMXd59x9mzKIyzDCTyWcsZb8X8 VHtXCNEw/x9a0Fsv8cLd7d2rNhNZli/tFZ9mhz0tQf2svqPdVvNL4EQ6jEQumRR/Q3h6 QeGg== X-Gm-Message-State: ACrzQf2oR+DOVo1IKEXhHHTRYp4qikGgST5r2EjL+6Du3j5E12drZBHo 1xCb7pWo5YH02RgYKcDVBQhTtXV7dW7ETtdHw4WAHg== X-Google-Smtp-Source: AMsMyM5eyMnjdOkPngOEqZ+qQ0ly+6FZRJmp2O649v4CBBVUThfGi+74rqhUc7jQkDntxbpAQ2fjjrqwFsDJ+DXTaag= X-Received: by 2002:a05:6e02:930:b0:2f9:9d1b:2525 with SMTP id o16-20020a056e02093000b002f99d1b2525mr3000381ilt.173.1665771618748; Fri, 14 Oct 2022 11:20:18 -0700 (PDT) MIME-Version: 1.0 From: Jann Horn Date: Fri, 14 Oct 2022 20:19:42 +0200 Message-ID: Subject: [BUG?] X86 arch_tlbbatch_flush() seems to be lacking mm_tlb_flush_nested() integration To: Andy Lutomirski , Linux-MM , Mel Gorman , Rik van Riel Cc: kernel list , Kees Cook , Ingo Molnar , Sasha Levin , Andrew Morton , Will Deacon , Peter Zijlstra , Linus Torvalds Content-Type: text/plain; charset="UTF-8" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665771619; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=hRrLIOmcMN6dxNUCyEtTWc83pVv8BngJLKL099x7jQ0=; b=kSrE8XL+r8iGB0sQ68sjCkaBdQxn1AUIuz4GIWOq3ymgUxwUBIg9HtR5aBgWUv5V4aXZ/x LKQBxcI4KfWbdSGbbqz/H/sC0j3RycvJTffjNGhUcbBc3OKr1VpHgUBqY1rZwjMvcQOZdN GiUXUSkN38BSEQx9iQuOMlwAV15VRRg= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=CM1KudZ8; spf=pass (imf03.hostedemail.com: domain of jannh@google.com designates 209.85.166.169 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665771619; a=rsa-sha256; cv=none; b=p+rQ11hPXdtcWfJMB8wXSx1QIddRkUv4PAovcRoZ4LCrt5Q4b8E3DyUcxbssIQPdEB9HOX WqabFGTeqiK90LyC42lupZPhL8c93WOLek6/m9mFPDzYdIFcF9RGO5N5us/xaRw3Z8kU6D g/Svkx/gVwofji55v8mqKl//NjPOO1Y= X-Stat-Signature: a8gjsb8sb9km5kpju4r9cb4jxfz5dn99 X-Rspamd-Queue-Id: A048A2001C Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=CM1KudZ8; spf=pass (imf03.hostedemail.com: domain of jannh@google.com designates 209.85.166.169 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam09 X-HE-Tag: 1665771619-310003 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi! I haven't actually managed to reproduce this behavior, so maybe I'm just misunderstanding how this works; but I think the arch_tlbbatch_flush() path for batched TLB flushing in vmscan ought to have some kind of integration with mm_tlb_flush_nested(). I think that currently, the following race could happen: [initial situation: page P is mapped into a page table of task B, but the page is not referenced, the PTE's A/D bits are clear] A: vmscan begins A: vmscan looks at P and P's PTEs, and concludes that P is not currently in use B: reads from P through the PTE, setting the Accessed bit and creating a TLB entry A: vmscan enters try_to_unmap_one() A: try_to_unmap_one() calls should_defer_flush(), which returns true A: try_to_unmap_one() removes the PTE and queues a TLB flush (arch_tlbbatch_add_mm()) A: try_to_unmap_one() returns, try_to_unmap() returns to shrink_folio_list() B: calls munmap() on the VMA that mapped P B: no PTEs are removed, so no TLB flush happens B: munmap() returns [at this point, the TLB entry still exists] B: calls mmap(), which reuses the same area that was just unmapped B: tries to access the newly created VMA, but instead the access goes through the stale TLB entry A: shrink_folio_list() calls try_to_unmap_flush(), which removes the stale TLB entry The effect would be that after process B removes a mapping with munmap() and creates a new mapping in its place, it would still see data from the old mapping when trying to access the new mapping. Am I missing something that protects against this scenario? munmap() uses the mmu_gather infrastructure, which tries to protect against this kind of correctness bug with multiple racing TLB invalidations in tlb_finish_mmu() by blowing away the whole TLB whenever one TLB invalidation ends while another is still in progress (tested with mm_tlb_flush_nested(tlb->mm)). But mmu_gather doesn't seem to be aware of TLB flushes that are batched up in the arch_tlbbatch_flush() infrastructure, so that doesn't help here. I think it might be necessary to add a new global counter of pending arch_tlbbatch_flush() flushes, and query that in mm_tlb_flush_nested(), or something like that.