Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Luis Chamberlain <mcgrof@kernel.org>
To: Jan Kara <jack@suse.cz>, Kefeng Wang <wangkefeng.wang@huawei.com>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	David Bueso <dave@stgolabs.net>, Tso Ted <tytso@mit.edu>,
	Ritesh Harjani <ritesh.list@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	Oliver Sang <oliver.sang@intel.com>,
	Matthew Wilcox <willy@infradead.org>,
	David Hildenbrand <david@redhat.com>,
	Alistair Popple <apopple@nvidia.com>,
	linux-mm@kvack.org, Christian Brauner <brauner@kernel.org>,
	Hannes Reinecke <hare@suse.de>,
	oe-lkp@lists.linux.dev, lkp@intel.com,
	John Garry <john.g.garry@oracle.com>,
	linux-block@vger.kernel.org, ltp@lists.linux.it,
	Pankaj Raghav <p.raghav@samsung.com>,
	Daniel Gomez <da.gomez@samsung.com>,
	Dave Chinner <david@fromorbit.com>,
	gost.dev@samsung.com
Subject: Re: [linux-next:master] [block/bdev]  3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c
Date: Fri, 28 Mar 2025 17:08:38 -0700	[thread overview]
Message-ID: <Z-c6BqCSmAnNxb57@bombadil.infradead.org> (raw)
In-Reply-To: <Z-bz0IZuTtwNYPBq@bombadil.infradead.org>

On Fri, Mar 28, 2025 at 12:09:06PM -0700, Luis Chamberlain wrote:
> On Fri, Mar 28, 2025 at 02:48:00AM -0700, Luis Chamberlain wrote:
> > On Thu, Mar 27, 2025 at 09:21:30PM -0700, Luis Chamberlain wrote:
> > > Would the extra ref check added via commit 060913999d7a9e50 ("mm:
> > > migrate: support poisoned recover from migrate folio") make the removal
> > > of the spin lock safe now given all the buffers are locked from the
> > > folio? This survives some basic sanity checks on my end with
> > > generic/750 against ext4 and also filling a drive at the same time with
> > > fio. I have a feeling is we are not sure, do we have a reproducer for
> > > the issue reported through ebdf4de5642fb6 ("mm: migrate: fix reference
> > > check race between __find_get_block() and migration")? I suspect the
> > > answer is no.
> 
> Sebastian, David, is there a reason CONFIG_DEBUG_ATOMIC_SLEEP=y won't
> trigger a atomic sleeping context warning when cond_resched() is used?
> Syzbot and 0-day had ways to reproduce it a kernel warning under these
> conditions, but this config didn't, and require dan explicit might_sleep()
> 
> CONFIG_PREEMPT_BUILD=y
> CONFIG_ARCH_HAS_PREEMPT_LAZY=y
> # CONFIG_PREEMPT_NONE is not set
> # CONFIG_PREEMPT_VOLUNTARY is not set
> CONFIG_PREEMPT=y
> # CONFIG_PREEMPT_LAZY is not set
> # CONFIG_PREEMPT_RT is not set
> CONFIG_PREEMPT_COUNT=y
> CONFIG_PREEMPTION=y
> CONFIG_PREEMPT_DYNAMIC=y
> CONFIG_PREEMPT_RCU=y
> CONFIG_HAVE_PREEMPT_DYNAMIC=y
> CONFIG_HAVE_PREEMPT_DYNAMIC_CALL=y
> CONFIG_PREEMPT_NOTIFIERS=y
> CONFIG_DEBUG_PREEMPT=y
> CONFIG_PREEMPTIRQ_TRACEPOINTS=y
> # CONFIG_PREEMPT_TRACER is not set
> # CONFIG_PREEMPTIRQ_DELAY_TEST is not set
> 
> Are there some preemption configs under which cond_resched() won't
> trigger a kernel splat where expected so the only thing I can think of
> is perhaps some preempt configs don't implicate a sleep? If true,
> instead of adding might_sleep() to one piece of code (in this case
> foio_mc_copy()) I wonder if instead just adding it to cond_resched() may
> be useful.

I think the answer to the above is "no".

And it took me quite some more testing with the below patch to convince myself
of that. Essentially, to trigger the cond_resched() atomic context warning
kernel warning we'd need to be in atomic context, and that today we can get
there through folio_mc_copy() through large folios.

Today the only atomic context we know which would end up in page
migration and folio_mc_copy() would be with buffer-head filesystems
which support large folios and which use buffer_migrate_folio_norefs() for
their migrate_folio() callback. The patch which we added which enabled the
block layer to support large folios did this only for cases where the
block size of the backing device is > PAGE_SIZE. So for instance your
qemu guest would need to have a logical block size larer than 4096 on
x86_64. To be clear, ext4 cannot possibly trigger this. No filesystem
can trigger this *case* other than the block device cache, and that
is only possible if block devices have larger block sizes.

The whole puzzle above about cond_resched() not rigger atomic warning
is because in fact, although buffer_migrate_folio_norefs() *does* always
use atomic context to call filemap_migrate_folio(), in practice I'm not
seeing it, that is, we likley bail before we even call folio_mc_copy().

So for instance we can see:

Mar 28 23:22:04 extra-ext4-4k kernel: __buffer_migrate_folio() in_atomic: 1
Mar 28 23:22:04 extra-ext4-4k kernel: __buffer_migrate_folio() in_atomic: 1
Mar 28 23:23:11 extra-ext4-4k kernel: large folios on folio_mc_copy(): 512 in_atomic(): 0
Mar 28 23:23:11 extra-ext4-4k kernel: large folios on folio_mc_copy(): in_atomic(): 0 calling cond_resched()
Mar 28 23:23:11 extra-ext4-4k kernel: large folios on folio_mc_copy(): in_atomic(): 0 calling cond_resched()

diff --git a/block/bdev.c b/block/bdev.c
index 4844d1e27b6f..1db9edfc4bc1 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -147,6 +147,11 @@ static void set_init_blocksize(struct block_device *bdev)
 			break;
 		bsize <<= 1;
 	}
+
+	if (bsize > PAGE_SIZE)
+		printk("%s: LBS device: mapping_set_folio_min_order(%u): %u\n",
+		       bdev->bd_disk->disk_name, get_order(bsize), bsize);
+
 	BD_INODE(bdev)->i_blkbits = blksize_bits(bsize);
 	mapping_set_folio_min_order(BD_INODE(bdev)->i_mapping,
 				    get_order(bsize));
diff --git a/mm/migrate.c b/mm/migrate.c
index f3ee6d8d5e2e..210df4970573 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -851,6 +851,7 @@ static int __buffer_migrate_folio(struct address_space *mapping,
 recheck_buffers:
 		busy = false;
 		spin_lock(&mapping->i_private_lock);
+		printk("__buffer_migrate_folio() in_atomic: %d\n", in_atomic());
 		bh = head;
 		do {
 			if (atomic_read(&bh->b_count)) {
@@ -871,6 +872,8 @@ static int __buffer_migrate_folio(struct address_space *mapping,
 		}
 	}

+	if (check_refs)
+		printk("__buffer_migrate_folio() calling filemap_migrate_folio() in_atomic: %d\n", in_atomic());
 	rc = filemap_migrate_folio(mapping, dst, src, mode);
 	if (rc != MIGRATEPAGE_SUCCESS)
 		goto unlock_buffers;
diff --git a/mm/util.c b/mm/util.c
index 448117da071f..61c76712d4bb 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -735,11 +735,15 @@ int folio_mc_copy(struct folio *dst, struct folio *src)
 	long nr = folio_nr_pages(src);
 	long i = 0;

+	if (nr > 1)
+		printk("large folios on folio_mc_copy(): %lu in_atomic(): %d\n", nr, in_atomic());
+
 	for (;;) {
 		if (copy_mc_highpage(folio_page(dst, i), folio_page(src, i)))
 			return -EHWPOISON;
 		if (++i == nr)
 			break;
+		printk("large folios on folio_mc_copy(): in_atomic(): %d calling cond_resched()\n", in_atomic());
 		cond_resched();
 	}

And so effectively, it is true, cond_resched() is not in atomic context
above, even though  filemap_migrate_folio() is certainly being called
in atomic context. What changes in between is folios likely won't
migrate due to later checks in filemap_migrate_folio() like the new
ref check, and instead we end up with page migraiton later of a huge
page, and *that* is not in atomic context.

So, to be clear, I *still* cannot reproduce the original reports, even
though in theory it is evident how buffer_migrate_folio_norefs() *can*
call filemap_migrate_folio() in atomic context.

How 0-day and syzbot triggered this *without* a large block size block
device is perplexing to me, if it is true that one was not used.

How we still can't reproduce in_atomic() context in folio_mc_copy() is
another fun mystery.

That is to say, I can't see how the existing code could regress here.
Given only the only buffer-head filesystem which enables large folios
is the pseudo block device cache filesystem, and you'll only get LBS
devices if the logical block size > PAGE_SIZE.

Despite all this, we have two separate reports and no clear information
if this was using a large block device enabled or not, and so given the
traces above to help root out more bugs with large folios we should just
proactively add might_sleep() to __migrate_folio(). I'll send a patch
for that, that'll enhance our test coverage.

The reason why we likely are having  hard time to reproduce the issue is
this new check:

	/* Check whether src does not
	have extra refs before we do more work */  
        if (folio_ref_count(src) != expected_count)                              
		return -EAGAIN;    .

So, moving on, I think what's best is to see how we can get __find_get_block()
to not chug on during page migration.

  Luis

next prev parent reply	other threads:[~2025-03-29  0:08 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <202503101536.27099c77-lkp@intel.com>
     [not found] ` <20250311-testphasen-behelfen-09b950bbecbf@brauner>
     [not found]   ` <Z9kEdPLNT8SOyOQT@xsang-OptiPlex-9020>
2025-03-18  8:15     ` Luis Chamberlain
2025-03-18 14:37       ` Matthew Wilcox
2025-03-18 23:17         ` Luis Chamberlain
2025-03-19  2:58           ` Matthew Wilcox
2025-03-19 16:55             ` Luis Chamberlain
2025-03-19 19:16               ` Luis Chamberlain
2025-03-19 19:24                 ` Matthew Wilcox
2025-03-20 12:11                   ` Luis Chamberlain
2025-03-20 12:18                     ` Luis Chamberlain
2025-03-22 23:14                     ` Johannes Weiner
2025-03-23  1:02                       ` Luis Chamberlain
2025-03-23  7:07                         ` Luis Chamberlain
2025-03-25  6:52                           ` Oliver Sang
2025-03-28  1:44                             ` Luis Chamberlain
2025-03-28  4:21                               ` Luis Chamberlain
2025-03-28  9:47                                 ` Luis Chamberlain
2025-03-28 19:09                                   ` Luis Chamberlain
2025-03-29  0:08                                     ` Luis Chamberlain [this message]
2025-03-29  1:06                                       ` Luis Chamberlain
2025-03-31  7:45                                       ` Sebastian Andrzej Siewior
2025-04-08 16:43                                         ` Darrick J. Wong
2025-04-08 17:06                                           ` Luis Chamberlain
2025-04-08 17:24                                             ` Luis Chamberlain
2025-04-08 17:48                                               ` Darrick J. Wong
2025-04-08 17:51                                                 ` Matthew Wilcox
2025-04-08 18:02                                                   ` Darrick J. Wong
2025-04-08 18:51                                                     ` Matthew Wilcox
2025-04-08 19:13                                                       ` Luis Chamberlain
2025-04-08 19:13                                                       ` Luis Chamberlain
2025-04-08 18:06                                                 ` Luis Chamberlain
2025-03-20  1:24       ` Lai, Yi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z-c6BqCSmAnNxb57@bombadil.infradead.org \
    --to=mcgrof@kernel.org \
    --cc=apopple@nvidia.com \
    --cc=bigeasy@linutronix.de \
    --cc=brauner@kernel.org \
    --cc=da.gomez@samsung.com \
    --cc=dave@stgolabs.net \
    --cc=david@fromorbit.com \
    --cc=david@redhat.com \
    --cc=gost.dev@samsung.com \
    --cc=hannes@cmpxchg.org \
    --cc=hare@suse.de \
    --cc=jack@suse.cz \
    --cc=john.g.garry@oracle.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=ltp@lists.linux.it \
    --cc=oe-lkp@lists.linux.dev \
    --cc=oliver.sang@intel.com \
    --cc=p.raghav@samsung.com \
    --cc=ritesh.list@gmail.com \
    --cc=tytso@mit.edu \
    --cc=wangkefeng.wang@huawei.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox