From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6F393C83F10 for ; Thu, 31 Aug 2023 14:53:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E323B900003; Thu, 31 Aug 2023 10:53:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DE2108D0001; Thu, 31 Aug 2023 10:53:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CF7D6900003; Thu, 31 Aug 2023 10:53:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id C12818D0001 for ; Thu, 31 Aug 2023 10:53:02 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 6CB3A140351 for ; Thu, 31 Aug 2023 14:53:02 +0000 (UTC) X-FDA: 81184692204.13.AA9651A Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf10.hostedemail.com (Postfix) with ESMTP id D0E3AC001F for ; Thu, 31 Aug 2023 14:52:59 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=ZPYbF54t; dmarc=none; spf=none (imf10.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1693493580; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=lwE1GdV6D69f6G+zYZ3quXQds5jqv1KmBeIZnoFgnz0=; b=3wDZLe/bHYDKbBOInKDpCdCcS6JYEz+i7dOTcdl++uqGInFuYVAUxysNapH/h4C51WWzvN wPqaph08Qowf47kdx230lTND90lRm1VzX11wAobqdNC5kM8LI9MKxHYeaOyTV/WTQn5L9v fXEwxz/5qpbj5V/uooM9uH44rwXG7g0= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=ZPYbF54t; dmarc=none; spf=none (imf10.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1693493580; a=rsa-sha256; cv=none; b=Ola0mmoTKlDOYsmub4uanky7H4XiSsCpsr3cm0GZ6SaX1SN5BwdsnDoXtMYUTp30mbvlEN Q7nI4xVqU75CX3KpIJGGmo/pXXIiInXUwZ+V8MrD4paYXdvbsOwQYd6oHZayAoxDwnYbFi WVe7THclNaaphud3e6aTn311oWWO1QI= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=lwE1GdV6D69f6G+zYZ3quXQds5jqv1KmBeIZnoFgnz0=; b=ZPYbF54tuouED9z6EN3KIkl3g9 JdC7ngDwcMKciB6aIQXbMZKb4zWv+FaIV9nJA3I6XU5VtcZfbVDQ8cKaN+3teDlXYFGr7dqQvi3YI QKelGrsc2qvjHRkernw6hRQtSn8i5Ubsf/BIJPH80jkHBV2takvPiJs87NWw+OBxL2/0YtclokET+ mU2wbLkPDe82empFNt0sOlk0G+T+VHx4+gmbEfBsred9xYqmWHOtnEcetovtC4LwYQXl6PBtni9bB 9qWxxNHXa9WoE8PEprurVWx7FWb4EM4LUcINoj7IfkVPJk8ADpj9bKBwjNYjnDnHX7A+s2affthen yCtonS/w==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1qbj2P-001ufX-K8; Thu, 31 Aug 2023 14:52:49 +0000 Date: Thu, 31 Aug 2023 15:52:49 +0100 From: Matthew Wilcox To: Mirsad Todorovac Cc: linux-kernel@vger.kernel.org, Andrew Morton , linux-mm@kvack.org, Keith Busch , Jens Axboe , Christoph Hellwig , Sagi Grimberg , linux-nvme@lists.infradead.org Subject: Re: BUG: KCSAN: data-race in folio_batch_move_lru / mpage_read_end_io Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: D0E3AC001F X-Stat-Signature: h16tjrq3y7q444cj66ae3j1hropw5865 X-Rspam-User: X-HE-Tag: 1693493579-237897 X-HE-Meta: U2FsdGVkX18sTuy/pbNQ+n8qmFfvlnEKuc4Gn7PcbqfhRWUzc2qhOPo/u5gEw8jvp7eIv+CoKmvTEbHT7s0cg9hQceYdgfiClczJmdV7bpn11qz4r8RWsV79+uuSus0AWTvvGiZ4L6ejs9LryjJpDkOvnr3o9oU3BNcRlZA/XcJ6VVNORtEsqd1Qh473NhyrKJ2Ef8TKWDkhItjb9birF93zKvNH621I0VLneDvvisMM3rPFsFZNrmBnXB+ZFiz/zrjJbWO24V8ifbBGOhUQ+YFyGonOrzGwqN+hAurOQ/tODLDCL0vyNXYwmOeZqdCb2IFc18H+O+iyUgDFGmNMnKvE1JwLgmZ66z/s9gTBnV9YDNFIlvsUMNC3HcLQRP2ozxxLEEF16G4qhaglPpIM9wn0C1UVVSenAI/fhpz8T2Op3ExM9iGo871/z97kZeNJUJzKokXvTQQtnrT4lJTyLlggpjSRwBwbrWmnQ175YmTYFmUlo9MkpCrpMz6eFCWABQW7pAQRmyqtkZ+duuDl0Teqj9aNCI7x74Fx+2gieirXYp9jwxT97aoKgR/T+HcUhjm9dZtGMQg+PVzvt4JVIpa7wOdyC+C4JwVMnNCl2I0f33yGU75XlKkfEA5jxAndU0MFz9+0Ggnly2fYiljhTzkZhtXhkERdUjS3I4WfsFh8kHdZFE8rvZM0Y0kzCTNfdoCRTTNK/49iJW/XelNW5W1oIDK2UQ/wimfrEg2AMbWaGe55vYZsgPFtkzrL2Tded6YSm26k7IDIJ4lAIPg98/7494z4H14ZXB+vj2BXVzJCjZEH7Ae2gHByeaHziBMsfBTDQYU7f7qmZoZQ/73tjKRI/bC/UrLLryhouX+yWuN0TvwfoBwphfrnQpLpL8+8L4tAetQDa2epYTSfXXHBzKSwjVkkznXPoXYjH7kv/CAkhCfS4RoJQ5cwl2cEF+elzSeoH9emyoeJ/hrkrjZ m29Fybnq 1Ej5p6n4RYAkdjZsdgGLXbaA7DSCLN5lT/JEZkQnyEdfyJT8LRRLSuhtU9KVHEe5+j6VFM/hdbCVxO8l0/OUDCdi9IEAdka8I9OE0sGG9RUI8Kqn8MV/u+jPczAvQ5Vr5WpnfFdVBOVeodwFDx1INt7yCpIS0XVi/rA6TyNfgvVWyxIv2DXS//UDaJQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Aug 28, 2023 at 11:14:23PM +0200, Mirsad Todorovac wrote: > BUG: KCSAN: data-race in folio_batch_move_lru / mpage_read_end_io This one's still niggling at me. I've trimmed the timestamps and some of the other irrelevant stuff out of this to make it easier to read. > value changed: 0x0017ffffc0020001 -> 0x0017ffffc0020004 Notionally I understand this. This is page->flags and the PG_locked bit was set initially, but after a short delay PG_locked was cleared and PG_uptodate was set. That's _normal_. For many, many pages, we set the locked bit, initiate a read; the device does a DMA, sends an interrupt; the interrupt handler sets the PG_uptodate bit and clears the PG_locked bit to indicate the page is no longer under I/O. But what I don't understand is how we see this for _this_ page. > write (marked) to 0xffffef9a44978bc0 of 8 bytes by interrupt on cpu 28: > mpage_read_end_io (arch/x86/include/asm/bitops.h:55 include/asm-generic/bitops/instrumented-atomic.h:29 include/linux/page-flags.h:739 fs/mpage.c:55) > bio_endio (block/bio.c:1617) > blk_mq_end_request_batch (block/blk-mq.c:850 block/blk-mq.c:1088) > nvme_pci_complete_batch (drivers/nvme/host/pci.c:986) nvme > nvme_irq (drivers/nvme/host/pci.c:1086) nvme This is the interrupt handler. It's doing what it's supposed to; marking the page uptodate and unlocking it. > read to 0xffffef9a44978bc0 of 8 bytes by task 348 on cpu 12: > folio_batch_move_lru (./include/linux/mm.h:1814 ./include/linux/mm.h:1824 ./include/linux/memcontrol.h:1636 ./include/linux/memcontrol.h:1659 mm/swap.c:216) > folio_batch_add_and_move (mm/swap.c:235) > folio_add_lru (./arch/x86/include/asm/preempt.h:95 mm/swap.c:518) > folio_add_lru_vma (mm/swap.c:538) > do_anonymous_page (mm/memory.c:4146) This is the part I don't understand. The path to calling folio_add_lru_vma() comes directly from vma_alloc_zeroed_movable_folio(): folio = vma_alloc_zeroed_movable_folio(vma, vmf->address); if (!folio) goto oom; if (mem_cgroup_charge(folio, vma->vm_mm, GFP_KERNEL)) goto oom_free_page; folio_throttle_swaprate(folio, GFP_KERNEL); __folio_mark_uptodate(folio); entry = mk_pte(&folio->page, vma->vm_page_prot); entry = pte_sw_mkyoung(entry); if (vma->vm_flags & VM_WRITE) entry = pte_mkwrite(pte_mkdirty(entry)); vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); if (!vmf->pte) goto release; if (vmf_pte_changed(vmf)) { update_mmu_tlb(vma, vmf->address, vmf->pte); goto release; } ret = check_stable_address_space(vma->vm_mm); if (ret) goto release; /* Deliver the page fault to userland, check inside PT lock */ if (userfaultfd_missing(vma)) { pte_unmap_unlock(vmf->pte, vmf->ptl); folio_put(folio); return handle_userfault(vmf, VM_UFFD_MISSING); } inc_mm_counter(vma->vm_mm, MM_ANONPAGES); folio_add_new_anon_rmap(folio, vma, vmf->address); folio_add_lru_vma(folio, vma); (sorry that's a lot of lines). But there's _nowhere_ there that sets PG_locked. It's a freshly allocated page; all page flags (that are actually flags; ignore the stuff up at the top) should be clear. We even check that with PAGE_FLAGS_CHECK_AT_PREP. Plus, it doesn't make sense that we'd start I/O; the page is freshly allocated, full of zeroes; there's no backing store to read the page from. It really feels like this page was freed while it was still under I/O and it's been reallocated to this victim process. I'm going to try a few things and see if I can figure this out. > __handle_mm_fault (mm/memory.c:3662 mm/memory.c:4939 mm/memory.c:5079) > handle_mm_fault (mm/memory.c:5233) > do_user_addr_fault (arch/x86/mm/fault.c:1392) > exc_page_fault (./arch/x86/include/asm/paravirt.h:695 arch/x86/mm/fault.c:1494 arch/x86/mm/fault.c:1542) > asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:570) > copyout (./arch/x86/include/asm/uaccess_64.h:112 ./arch/x86/include/asm/uaccess_64.h:133 lib/iov_iter.c:168) > _copy_to_iter (lib/iov_iter.c:316 (discriminator 5)) > copy_page_to_iter (lib/iov_iter.c:483 lib/iov_iter.c:468) > filemap_read (mm/filemap.c:2712) > blkdev_read_iter (block/fops.c:620) > vfs_read (./include/linux/fs.h:1871 fs/read_write.c:389 fs/read_write.c:470) > ksys_read (fs/read_write.c:613) > __x64_sys_read (fs/read_write.c:621)