From: Alistair Popple <apopple@nvidia.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Dan Williams <dan.j.williams@intel.com>,
linux-mm@kvack.org, jhubbard@nvidia.com, rcampbell@nvidia.com,
willy@infradead.org, jgg@nvidia.com,
linux-fsdevel@vger.kernel.org, jack@suse.cz, djwong@kernel.org,
hch@lst.de, david@redhat.com, ruansy.fnst@fujitsu.com
Subject: Re: ZONE_DEVICE refcounting
Date: Fri, 22 Mar 2024 16:34:33 +1100 [thread overview]
Message-ID: <87v85e6cj2.fsf@nvdebian.thelocal> (raw)
In-Reply-To: <Zfz4dT+YWpx5OYxM@dread.disaster.area>
Dave Chinner <david@fromorbit.com> writes:
> On Fri, Mar 22, 2024 at 11:01:25AM +1100, Alistair Popple wrote:
>>
>> Dan Williams <dan.j.williams@intel.com> writes:
>>
>> > Alistair Popple wrote:
>> >>
>> >> Alistair Popple <apopple@nvidia.com> writes:
>> >>
>> >> > Dan Williams <dan.j.williams@intel.com> writes:
>> >> >
>> >> >> Alistair Popple wrote:
>> >> >
>> >> > I also noticed folio_anon() is not safe to call on a FS DAX page due to
>> >> > sharing PAGE_MAPPING_DAX_SHARED.
>> >>
>> >> Also it feels like I could be missing something here. AFAICT the
>> >> page->mapping and page->index fields can't actually be used outside of
>> >> fs/dax because they are overloaded for the shared case. Therefore
>> >> setting/clearing them could be skipped and the only reason for doing so
>> >> is so dax_associate_entry()/dax_disassociate_entry() can generate
>> >> warnings which should never occur anyway. So all that code is
>> >> functionally unnecessary.
>> >
>> > What do you mean outside of fs/dax, do you literally mean outside of
>> > fs/dax.c, or the devdax case (i.e. dax without fs-entanglements)?
>>
>> Only the cases fs dax pages might need it. ie. Not devdax which I
>> haven't looked at closely yet.
>>
>> > Memory
>> > failure needs ->mapping and ->index to rmap dax pages. See
>> > mm/memory-failure.c::__add_to_kill() and
>> > mm/memory-failure.c::__add_to_kill_fsdax() where that latter one is for
>> > cases where the fs needs has signed up to react to dax page failure.
>>
>> How does that work for reflink/shared pages which overwrite
>> page->mapping and page->index?
>
> Via reverse mapping in the *filesystem*, not the mm rmap stuff.
>
> pmem_pagemap_memory_failure()
> dax_holder_notify_failure()
> .notify_failure()
> xfs_dax_notify_failure()
> xfs_dax_notify_ddev_failure()
> xfs_rmap_query_range(xfs_dax_failure_fn)
> xfs_dax_failure_fn(rmap record)
> <grabs inode from cache>
> <converts range to file offset>
> mf_dax_kill_procs(inode->mapping, pgoff)
> collect_procs_fsdax(mapping, page)
> add_to_kill_fsdax(task)
> __add_to_kill(task)
> unmap_and_kill_tasks()
>
> Remember: in FSDAX, the pages are the storage media physically owned
> by the filesystem, not the mm subsystem. Hence answering questions
> like "who owns this page" can only be answered correctly by asking
> the filesystem.
Thanks Dave for writing that up, it really helped solidify my
understanding of how this is all supposed to work.
> We shortcut that for pages that only have one owner - we just store
> the owner information in the page as a {mapping, offset} tuple. But
> when we have multiple owners, the only way to find all the {mapping,
> offset} tuples is to ask the filesystem to find all the owners of
> that page.
>
> Hence the special case values for page->mapping/page->index for
> pages over shared filesystem extents. These shared extents are
> communicated to the fsdax layer via the IOMAP_F_SHARED flag
> in the iomaps returned by the filesystem. This flag is the trigger
> for the special mapping share count behaviour to be used. e.g. see
> dax_insert_entry(iomap_iter) -> dax_associate_entry(shared) ->
> dax_page_share_get()....
>
>> Eg. in __add_to_kill() if *p is a shared fs
>> dax page then p->mapping == PAGE_MAPPING_DAX_SHARED and
>> page_address_in_vma(vma, p) will probably crash.
>
> As per above, we don't get the mapping from the page in those cases.
Yep, that all makes sense and I see where I was getting confsued. It was
because in __add_to_kill() we do actually use page->mapping when
page_address_in_vma(vma, p) is called. And because
folio_test_anon(page_folio(p)) is also true for shared FS DAX pages
(p->mapping == PAGE_MAPPING_DAX_SHARED/PAGE_MAPPING_DAX_SHARED) I
thought things would go bad there.
However after re-reading the page_address_in_vma() implementation I
noticed that isn't the case, because folio_anon_vma(page_folio(p)) will
still return NULL which was the detail I had missed.
So to go back to my original point it seems page->mapping and
page->index is largely unused on fs dax pages, even for memory
failure. Because for memory failure the mapping and fsdax_pgoff comes
from the filesystem not the page.
> If you haven't got access to the page though a filesystem method and
> guaranteed that truncate() can't free it from under you, then you're
> probably doing the wrong thing with fsdax...
>
> -Dave.
next prev parent reply other threads:[~2024-03-22 5:50 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-08 4:24 Alistair Popple
2024-03-08 13:44 ` Jason Gunthorpe
2024-03-13 6:32 ` Dan Williams
2024-03-20 5:20 ` Alistair Popple
2024-03-21 5:26 ` Alistair Popple
2024-03-21 6:03 ` Dan Williams
2024-03-22 0:01 ` Alistair Popple
2024-03-22 3:18 ` Dave Chinner
2024-03-22 5:34 ` Alistair Popple [this message]
2024-03-22 6:58 ` Dan Williams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87v85e6cj2.fsf@nvdebian.thelocal \
--to=apopple@nvidia.com \
--cc=dan.j.williams@intel.com \
--cc=david@fromorbit.com \
--cc=david@redhat.com \
--cc=djwong@kernel.org \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=jgg@nvidia.com \
--cc=jhubbard@nvidia.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=rcampbell@nvidia.com \
--cc=ruansy.fnst@fujitsu.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox