Re: [ISSUE] split_folio() and dirty IOMAP folios

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Hildenbrand <david@redhat.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: linux-fsdevel@vger.kernel.org,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	kvm@vger.kernel.org, Zi Yan <ziy@nvidia.com>,
	Christian Brauner <brauner@kernel.org>,
	"Darrick J. Wong" <djwong@kernel.org>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Janosch Frank <frankja@linux.ibm.com>,
	Claudio Imbrenda <imbrenda@linux.ibm.com>,
	Thomas Huth <thuth@redhat.com>
Subject: Re: [ISSUE] split_folio() and dirty IOMAP folios
Date: Fri, 8 Nov 2024 10:11:24 +0100	[thread overview]
Message-ID: <6099e202-ef0a-4d21-958c-2c42db43a5bb@redhat.com> (raw)
In-Reply-To: <Zy0g8DdnuZxQly3b@casper.infradead.org>

On 07.11.24 21:20, Matthew Wilcox wrote:
> On Thu, Nov 07, 2024 at 05:34:40PM +0100, David Hildenbrand wrote:
>> On 07.11.24 17:09, Matthew Wilcox wrote:
>>> On Thu, Nov 07, 2024 at 04:07:08PM +0100, David Hildenbrand wrote:
>>>> I'm debugging an interesting problem: split_folio() will fail on dirty
>>>> folios on XFS, and I am not sure who will trigger the writeback in a timely
>>>> manner so code relying on the split to work at some point (in sane setups
>>>> where page pinning is not applicable) can make progress.
>>>
>>> You could call something like filemap_write_and_wait_range()?
>>
>> Thanks, have to look into some details of that.
>>
>> Looks like the folio_clear_dirty_for_io() is buried in
>> folio_prepare_writeback(), so that part is taken care of.
>>
>> Guess I have to fo from folio to "mapping,lstart,lend" such that
>> __filemap_fdatawrite_range() would look up the folio again. Sounds doable.
>>
>> (I assume I have to drop the folio lock+reference before calling that)
> 
> I was thinking you'd do it higher in the callchain than
> gmap_make_secure().  Presumably userspace says "I want to make this
> 256MB range secure" and we can start by writing back that entire
> 256MB chunk of address space.
> 
> That doesn't prevent anybody from dirtying it in-between, of course,
> so you can still get -EBUSY and have to loop round again.

I'm afraid that won't really work.

On the one hand, we might be allocating these pages (+disk blocks) 
during the unpack operation -- where we essentially trigger page faults 
first using gmap_fault() -- so the pages might not even exist before the 
gmap_make_secure() during unpack. One work around would be to 
preallocate+writeback from user space, but it doesn't sound quite right.

But the bigger problem I see is that the initial "unpack" operation is 
not the only case where we trigger this conversion to "secure" state. 
Once the VM is running, we can see calls on arbitrary guest memory even 
during page faults, when gmap_make_secure() is called via 
gmap_convert_to_secure().

I'm still not sure why we see essentially no progress being made, even 
though we temporarily drop the PTL, mmap lock, folio lock, folio ref ... 
maybe related to us triggering a write fault that somehow ends up 
setting the folio dirty :/ Or because writeback is simply too slow / 
backs off.

I'll play with handling -EBUSY from split_folio() differently: if the 
folio is under writeback, wait on that. If the folio is dirty, trigger 
writeback. And I'll look into whether we really need a writable PTE, I 
suspect not, because we are not actually "modifying" page content.

-- 
Cheers,

David / dhildenb

next prev parent reply	other threads:[~2024-11-08  9:11 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-07 15:07 David Hildenbrand
2024-11-07 16:09 ` Matthew Wilcox
2024-11-07 16:34   ` David Hildenbrand
2024-11-07 20:20     ` Matthew Wilcox
2024-11-08  9:11       ` David Hildenbrand [this message]
2024-11-11 15:19         ` David Hildenbrand
2024-11-21 12:15           ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6099e202-ef0a-4d21-958c-2c42db43a5bb@redhat.com \
    --to=david@redhat.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=brauner@kernel.org \
    --cc=djwong@kernel.org \
    --cc=frankja@linux.ibm.com \
    --cc=imbrenda@linux.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=thuth@redhat.com \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox