Re: [PATCH v4 05/11] mm: do not split a folio if it has minimum folio order requirement

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Zi Yan <ziy@nvidia.com>
To: Luis Chamberlain <mcgrof@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>,
	Sean Christopherson <seanjc@google.com>,
	Matthew Wilcox <willy@infradead.org>,
	"Pankaj Raghav (Samsung)" <kernel@pankajraghav.com>,
	djwong@kernel.org, brauner@kernel.org, david@fromorbit.com,
	chandan.babu@oracle.com, akpm@linux-foundation.org,
	linux-fsdevel@vger.kernel.org, hare@suse.de,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-xfs@vger.kernel.org, gost.dev@samsung.com,
	p.raghav@samsung.com
Subject: Re: [PATCH v4 05/11] mm: do not split a folio if it has minimum folio order requirement
Date: Mon, 29 Apr 2024 10:29:29 -0400	[thread overview]
Message-ID: <6799F341-9E37-4F3E-B0D0-B5B2138A5F5F@nvidia.com> (raw)
In-Reply-To: <Zi8aYA92pvjDY7d5@bombadil.infradead.org>

[-- Attachment #1: Type: text/plain, Size: 5564 bytes --]

On 28 Apr 2024, at 23:56, Luis Chamberlain wrote:

> On Sat, Apr 27, 2024 at 05:57:17PM -0700, Luis Chamberlain wrote:
>> On Fri, Apr 26, 2024 at 04:46:11PM -0700, Luis Chamberlain wrote:
>>> On Thu, Apr 25, 2024 at 05:47:28PM -0700, Luis Chamberlain wrote:
>>>> On Thu, Apr 25, 2024 at 09:10:16PM +0100, Matthew Wilcox wrote:
>>>>> On Thu, Apr 25, 2024 at 01:37:40PM +0200, Pankaj Raghav (Samsung) wrote:
>>>>>> From: Pankaj Raghav <p.raghav@samsung.com>
>>>>>>
>>>>>> using that API for LBS is resulting in an NULL ptr dereference
>>>>>> error in the writeback path [1].
>>>>>>
>>>>>> [1] https://gist.github.com/mcgrof/d12f586ec6ebe32b2472b5d634c397df
>>>>>
>>>>>  How would I go about reproducing this?
>>
>> Well so the below fixes this but I am not sure if this is correct.
>> folio_mark_dirty() at least says that a folio should not be truncated
>> while its running. I am not sure if we should try to split folios then
>> even though we check for writeback once. truncate_inode_partial_folio()
>> will folio_wait_writeback() but it will split_folio() before checking
>> for claiming to fail to truncate with folio_test_dirty(). But since the
>> folio is locked its not clear why this should be possible.
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 83955362d41c..90195506211a 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -3058,7 +3058,7 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
>>  	if (new_order >= folio_order(folio))
>>  		return -EINVAL;
>>
>> -	if (folio_test_writeback(folio))
>> +	if (folio_test_dirty(folio) || folio_test_writeback(folio))
>>  		return -EBUSY;
>>
>>  	if (!folio_test_anon(folio)) {
>
> I wondered what code path is causing this and triggering this null
> pointer, so I just sprinkled a check here:
>
> 	VM_BUG_ON_FOLIO(folio_test_dirty(folio), folio);
>
> The answer was:
>
> kcompactd() --> migrate_pages_batch()
>                   --> try_split_folio --> split_folio_to_list() -->
> 		       split_huge_page_to_list_to_order()
>

There are 3 try_split_folio() in migrate_pages_batch(). First one is to
split anonymous large folios that are on deferred split list, so not related;
second one is to split THPs when thp migration is not supported, but
this is compaction, so not related; third one is to split large folios
when there is no same size free page in the system, and this should be
the one.

> Since it took running fstests generic/447 twice to reproduce the crash
> with a minorder and 16k block size, modified generic/447 as followed and
> it helps to speed up the reproducer with just running the test once:
>
> diff --git a/tests/generic/447 b/tests/generic/447
> index 16b814ee7347..43050b58e8ba 100755
> --- a/tests/generic/447
> +++ b/tests/generic/447
> @@ -36,6 +39,15 @@ _scratch_mount >> "$seqres.full" 2>&1
>  testdir="$SCRATCH_MNT/test-$seq"
>  mkdir "$testdir"
>
> +runfile="$tmp.compaction"
> +touch $runfile
> +while [ -e $runfile ]; do
> +	echo 1 > /proc/sys/vm/compact_memory
> +	sleep 10
> +done &
> +compaction_pid=$!
> +
> +
>  # Setup for one million blocks, but we'll accept stress testing down to
>  # 2^17 blocks... that should be plenty for anyone.
>  fnr=20
> @@ -69,6 +81,8 @@ _scratch_cycle_mount
>  echo "Delete file1"
>  rm -rf $testdir/file1
>
> +rm -f $runfile
> +wait > /dev/null 2>&1
>  # success, all done
>  status=0
>  exit
>
> And I verified that moving the check only to the migrate_pages_batch()
> path also fixes the crash:
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 73a052a382f1..83b528eb7100 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1484,7 +1484,12 @@ static inline int try_split_folio(struct folio *folio, struct list_head *split_f
>  	int rc;
>
>  	folio_lock(folio);
> +	if (folio_test_dirty(folio)) {
> +		rc = -EBUSY;
> +		goto out;
> +	}
>  	rc = split_folio_to_list(folio, split_folios);
> +out:
>  	folio_unlock(folio);
>  	if (!rc)
>  		list_move_tail(&folio->lru, split_folios);
>
> However I'd like compaction folks to review this. I see some indications
> in the code that migration can race with truncation but we feel fine by
> it by taking the folio lock. However here we have a case where we see
> the folio clearly locked and the folio is dirty. Other migraiton code
> seems to write back the code and can wait, here we just move on. Further
> reading on commit 0003e2a414687 ("mm: Add AS_UNMOVABLE to mark mapping
> as completely unmovable") seems to hint that migration is safe if the
> mapping either does not exist or the mapping does exist but has
> mapping->a_ops->migrate_folio so I'd like further feedback on this.

During migration, all page table entries pointing to this dirty folio
are invalid, and accesses to this folio will cause page fault and
wait on the migration entry. I am not sure we need to skip dirty folios.

> Another thing which requires review is if we we split a folio but not
> down to order 0 but to the new min order, does the accounting on
> migrate_pages_batch() require changing?  And most puzzling, why do we

What accounting are you referring to? split code should take care of it.

> not see this with regular large folios, but we do see it with minorder ?

I wonder if the split code handles folio->mapping->i_pages properly.
Does the i_pages store just folio pointers or also need all tail page
pointers? I am no expert in fs, thus need help.


--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

next prev parent reply	other threads:[~2024-04-29 14:29 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-25 11:37 [PATCH v4 00/11] enable bs > ps in XFS Pankaj Raghav (Samsung)
2024-04-25 11:37 ` [PATCH v4 01/11] readahead: rework loop in page_cache_ra_unbounded() Pankaj Raghav (Samsung)
2024-04-25 11:37 ` [PATCH v4 02/11] fs: Allow fine-grained control of folio sizes Pankaj Raghav (Samsung)
2024-04-25 18:07   ` Hannes Reinecke
2024-04-26 15:09   ` Darrick J. Wong
2024-04-25 11:37 ` [PATCH v4 03/11] filemap: allocate mapping_min_order folios in the page cache Pankaj Raghav (Samsung)
2024-04-25 19:04   ` Hannes Reinecke
2024-04-26 15:12   ` Darrick J. Wong
2024-04-28 20:59     ` Pankaj Raghav (Samsung)
2024-04-25 11:37 ` [PATCH v4 04/11] readahead: allocate folios with mapping_min_order in readahead Pankaj Raghav (Samsung)
2024-04-25 18:53   ` Matthew Wilcox
2024-04-25 11:37 ` [PATCH v4 05/11] mm: do not split a folio if it has minimum folio order requirement Pankaj Raghav (Samsung)
2024-04-25 20:10   ` Matthew Wilcox
2024-04-26  0:47     ` Luis Chamberlain
2024-04-26 23:46       ` Luis Chamberlain
2024-04-28  0:57         ` Luis Chamberlain
2024-04-29  3:56           ` Luis Chamberlain
2024-04-29 14:29             ` Zi Yan [this message]
2024-04-30  0:31               ` Luis Chamberlain
2024-04-30  0:49                 ` Luis Chamberlain
2024-04-30  2:43                 ` Zi Yan
2024-04-30 19:27                   ` Luis Chamberlain
2024-05-01  4:13                     ` Matthew Wilcox
2024-05-01 14:28                       ` Matthew Wilcox
2024-04-26 15:49     ` Pankaj Raghav (Samsung)
2024-04-25 11:37 ` [PATCH v4 06/11] filemap: cap PTE range to be created to i_size in folio_map_range() Pankaj Raghav (Samsung)
2024-04-25 20:24   ` Matthew Wilcox
2024-04-26 12:54     ` Pankaj Raghav (Samsung)
2024-04-25 11:37 ` [PATCH v4 07/11] iomap: fix iomap_dio_zero() for fs bs > system page size Pankaj Raghav (Samsung)
2024-04-26  6:22   ` Christoph Hellwig
2024-04-26 11:43     ` Pankaj Raghav (Samsung)
2024-04-27  5:12       ` Christoph Hellwig
2024-04-29 21:02         ` Pankaj Raghav (Samsung)
2024-04-27  3:26     ` Matthew Wilcox
2024-04-27  4:52       ` Christoph Hellwig
2024-04-25 11:37 ` [PATCH v4 08/11] xfs: use kvmalloc for xattr buffers Pankaj Raghav (Samsung)
2024-04-26 15:18   ` Darrick J. Wong
2024-04-28 21:06     ` Pankaj Raghav (Samsung)
2024-04-25 11:37 ` [PATCH v4 09/11] xfs: expose block size in stat Pankaj Raghav (Samsung)
2024-04-26 15:15   ` Darrick J. Wong
2024-04-25 11:37 ` [PATCH v4 10/11] xfs: make the calculation generic in xfs_sb_validate_fsb_count() Pankaj Raghav (Samsung)
2024-04-26 15:16   ` Darrick J. Wong
2024-04-25 11:37 ` [PATCH v4 11/11] xfs: enable block size larger than page size support Pankaj Raghav (Samsung)
2024-04-26 15:18   ` Darrick J. Wong
     [not found] ` <87y18zxvpd.fsf@gmail.com>
2024-04-27  5:05   ` [PATCH v4 00/11] enable bs > ps in XFS Darrick J. Wong
2024-04-29 20:39   ` Pankaj Raghav (Samsung)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6799F341-9E37-4F3E-B0D0-B5B2138A5F5F@nvidia.com \
    --to=ziy@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=chandan.babu@oracle.com \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=gost.dev@samsung.com \
    --cc=hare@suse.de \
    --cc=kernel@pankajraghav.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=p.raghav@samsung.com \
    --cc=seanjc@google.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox