linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Cc: linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Keith Busch <kbusch@kernel.org>,
	Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@lst.de>,
	Sagi Grimberg <sagi@grimberg.me>,
	linux-nvme@lists.infradead.org
Subject: Re: BUG: KCSAN: data-race in folio_batch_move_lru / mpage_read_end_io
Date: Thu, 31 Aug 2023 15:52:49 +0100	[thread overview]
Message-ID: <ZPCpQQVTtlB0FA5A@casper.infradead.org> (raw)
In-Reply-To: <cbb9d596-43ac-bad4-b6f6-8c13f95d244e@alu.unizg.hr>

On Mon, Aug 28, 2023 at 11:14:23PM +0200, Mirsad Todorovac wrote:
>  BUG: KCSAN: data-race in folio_batch_move_lru / mpage_read_end_io

This one's still niggling at me.  I've trimmed the timestamps and some
of the other irrelevant stuff out of this to make it easier to read.

>  value changed: 0x0017ffffc0020001 -> 0x0017ffffc0020004

Notionally I understand this.  This is page->flags and the PG_locked bit
was set initially, but after a short delay PG_locked was cleared and
PG_uptodate was set.  That's _normal_.  For many, many pages, we set the
locked bit, initiate a read; the device does a DMA, sends an interrupt;
the interrupt handler sets the PG_uptodate bit and clears the PG_locked
bit to indicate the page is no longer under I/O.

But what I don't understand is how we see this for _this_ page.

>  write (marked) to 0xffffef9a44978bc0 of 8 bytes by interrupt on cpu 28:
>  mpage_read_end_io (arch/x86/include/asm/bitops.h:55 include/asm-generic/bitops/instrumented-atomic.h:29 include/linux/page-flags.h:739 fs/mpage.c:55)
>  bio_endio (block/bio.c:1617)
>  blk_mq_end_request_batch (block/blk-mq.c:850 block/blk-mq.c:1088)
>  nvme_pci_complete_batch (drivers/nvme/host/pci.c:986) nvme
>  nvme_irq (drivers/nvme/host/pci.c:1086) nvme

This is the interrupt handler.  It's doing what it's supposed to;
marking the page uptodate and unlocking it.

>  read to 0xffffef9a44978bc0 of 8 bytes by task 348 on cpu 12:
>  folio_batch_move_lru (./include/linux/mm.h:1814 ./include/linux/mm.h:1824 ./include/linux/memcontrol.h:1636 ./include/linux/memcontrol.h:1659 mm/swap.c:216)
>  folio_batch_add_and_move (mm/swap.c:235)
>  folio_add_lru (./arch/x86/include/asm/preempt.h:95 mm/swap.c:518)
>  folio_add_lru_vma (mm/swap.c:538)
>  do_anonymous_page (mm/memory.c:4146)

This is the part I don't understand.  The path to calling
folio_add_lru_vma() comes directly from vma_alloc_zeroed_movable_folio():

        folio = vma_alloc_zeroed_movable_folio(vma, vmf->address);
        if (!folio)
                goto oom;
        if (mem_cgroup_charge(folio, vma->vm_mm, GFP_KERNEL))
                goto oom_free_page;
        folio_throttle_swaprate(folio, GFP_KERNEL);
        __folio_mark_uptodate(folio);
        entry = mk_pte(&folio->page, vma->vm_page_prot);
        entry = pte_sw_mkyoung(entry);
        if (vma->vm_flags & VM_WRITE)
                entry = pte_mkwrite(pte_mkdirty(entry));
        vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address,
                        &vmf->ptl);
        if (!vmf->pte)
                goto release;
        if (vmf_pte_changed(vmf)) {
                update_mmu_tlb(vma, vmf->address, vmf->pte);
                goto release;
        }
        ret = check_stable_address_space(vma->vm_mm);
        if (ret)
                goto release;
        /* Deliver the page fault to userland, check inside PT lock */
        if (userfaultfd_missing(vma)) {
                pte_unmap_unlock(vmf->pte, vmf->ptl);
                folio_put(folio);
                return handle_userfault(vmf, VM_UFFD_MISSING);
        }
        inc_mm_counter(vma->vm_mm, MM_ANONPAGES);
        folio_add_new_anon_rmap(folio, vma, vmf->address);
        folio_add_lru_vma(folio, vma);

(sorry that's a lot of lines).  But there's _nowhere_ there that sets
PG_locked.  It's a freshly allocated page; all page flags (that are
actually flags; ignore the stuff up at the top) should be clear.  We
even check that with PAGE_FLAGS_CHECK_AT_PREP.  Plus, it doesn't
make sense that we'd start I/O; the page is freshly allocated, full of
zeroes; there's no backing store to read the page from.

It really feels like this page was freed while it was still under I/O
and it's been reallocated to this victim process.

I'm going to try a few things and see if I can figure this out.

>  __handle_mm_fault (mm/memory.c:3662 mm/memory.c:4939 mm/memory.c:5079)
>  handle_mm_fault (mm/memory.c:5233)
>  do_user_addr_fault (arch/x86/mm/fault.c:1392)
>  exc_page_fault (./arch/x86/include/asm/paravirt.h:695 arch/x86/mm/fault.c:1494 arch/x86/mm/fault.c:1542)
>  asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:570)
>  copyout (./arch/x86/include/asm/uaccess_64.h:112 ./arch/x86/include/asm/uaccess_64.h:133 lib/iov_iter.c:168)
>  _copy_to_iter (lib/iov_iter.c:316 (discriminator 5))
>  copy_page_to_iter (lib/iov_iter.c:483 lib/iov_iter.c:468)
>  filemap_read (mm/filemap.c:2712)
>  blkdev_read_iter (block/fops.c:620)
>  vfs_read (./include/linux/fs.h:1871 fs/read_write.c:389 fs/read_write.c:470)
>  ksys_read (fs/read_write.c:613)
>  __x64_sys_read (fs/read_write.c:621)



  parent reply	other threads:[~2023-08-31 14:53 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-28 21:14 Mirsad Todorovac
2023-08-29 19:13 ` Matthew Wilcox
2023-08-30 11:43   ` Mirsad Todorovac
2023-08-30 13:56     ` Mirsad Todorovac
2023-08-31 14:52 ` Matthew Wilcox [this message]
     [not found]   ` <ZPs8+sLv5oaubrKj@casper.infradead.org>
2023-09-12 16:05     ` Mirsad Todorovac
2023-09-18 12:15     ` Mirsad Todorovac
2023-09-18 14:53       ` Matthew Wilcox
2023-09-19 11:44         ` Mirsad Todorovac
2023-10-03 20:12         ` Mirsad Todorovac

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZPCpQQVTtlB0FA5A@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=mirsad.todorovac@alu.unizg.hr \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox