* Re: [PATCH] fix page_count in ->iomap_migrate_page()
[not found] <1544766961-3492-1-git-send-email-openzhangj@gmail.com>
@ 2018-12-14 11:25 ` Richard Weinberger
2018-12-14 12:26 ` Gao Xiang
2018-12-15 10:51 ` Christoph Hellwig
0 siblings, 2 replies; 6+ messages in thread
From: Richard Weinberger @ 2018-12-14 11:25 UTC (permalink / raw)
To: zhangjun
Cc: Alexander Viro, linux-fsdevel, linux-kernel, Darrick J . Wong,
hch, bfoster
[CC'ing authors of the code plus mm folks]
Am Freitag, 14. Dezember 2018, 06:56:01 CET schrieb zhangjun:
> IOMAP uses PG_private a little different with buffer_head based
> filesystem.
> It uses it as marker and when set, the page counter is not incremented,
> migrate_page_move_mapping() assumes that PG_private indicates a counter
> of +1.
> so, we have to pass a extra count of -1 to migrate_page_move_mapping()
> if the flag is set.
>
> Signed-off-by: zhangjun <openzhangj@gmail.com>
> ---
> fs/iomap.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/fs/iomap.c b/fs/iomap.c
> index 64ce240..352e58a 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -544,8 +544,17 @@ iomap_migrate_page(struct address_space *mapping, struct page *newpage,
> struct page *page, enum migrate_mode mode)
> {
> int ret;
> + int extra_count = 0;
>
> - ret = migrate_page_move_mapping(mapping, newpage, page, NULL, mode, 0);
> + /*
> + * IOMAP uses PG_private as marker and does not raise the page counter.
> + * migrate_page_move_mapping() expects a incremented counter if PG_private
> + * is set. Therefore pass -1 as extra_count for this case.
> + */
> + if (page_has_private(page))
> + extra_count = -1;
> + ret = migrate_page_move_mapping(mapping, newpage, page,
> + NULL, mode, extra_count);
> if (ret != MIGRATEPAGE_SUCCESS)
> return ret;
This is the third place which needs this workaround.
UBIFS, F2FS, and now iomap.
I agree with Dave that nobody can assume that PG_private implies an additional
page reference.
But page migration does that. Including parts of the write back code.
Thanks,
//richard
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] fix page_count in ->iomap_migrate_page()
2018-12-14 11:25 ` [PATCH] fix page_count in ->iomap_migrate_page() Richard Weinberger
@ 2018-12-14 12:26 ` Gao Xiang
2018-12-14 13:35 ` Richard Weinberger
2018-12-15 10:51 ` Christoph Hellwig
1 sibling, 1 reply; 6+ messages in thread
From: Gao Xiang @ 2018-12-14 12:26 UTC (permalink / raw)
To: Richard Weinberger
Cc: zhangjun, Alexander Viro, linux-fsdevel, linux-kernel,
Darrick J . Wong, hch, bfoster, Dave Chinner, akpm,
kirill.shutemov, mhocko, n-horiguchi, mgorman, aarcange, willy,
linux, linux-mm, Gao Xiang
Hi Richard,
On 2018/12/14 19:25, Richard Weinberger wrote:
> This is the third place which needs this workaround.
> UBIFS, F2FS, and now iomap.
>
> I agree with Dave that nobody can assume that PG_private implies an additional
> page reference.
> But page migration does that. Including parts of the write back code.
It seems that it's clearly documented in
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/mm.h#n780
* A pagecache page contains an opaque `private' member, which belongs to the
* page's address_space. Usually, this is the address of a circular list of
* the page's disk buffers. PG_private must be set to tell the VM to call
* into the filesystem to release these pages.
*
* A page may belong to an inode's memory mapping. In this case, page->mapping
* is the pointer to the inode, and page->index is the file offset of the page,
* in units of PAGE_SIZE.
*
* If pagecache pages are not associated with an inode, they are said to be
* anonymous pages. These may become associated with the swapcache, and in that
* case PG_swapcache is set, and page->private is an offset into the swapcache.
*
* In either case (swapcache or inode backed), the pagecache itself holds one
* reference to the page. Setting PG_private should also increment the
* refcount. The each user mapping also has a reference to the page.
and when I looked into that, I found
https://lore.kernel.org/lkml/3CB3CA93.D141680B@zip.com.au/
Thanks,
Gao Xiang
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] fix page_count in ->iomap_migrate_page()
2018-12-14 12:26 ` Gao Xiang
@ 2018-12-14 13:35 ` Richard Weinberger
2018-12-14 13:55 ` Gao Xiang
0 siblings, 1 reply; 6+ messages in thread
From: Richard Weinberger @ 2018-12-14 13:35 UTC (permalink / raw)
To: Gao Xiang, Artem Bityutskiy
Cc: zhangjun, Alexander Viro, linux-fsdevel, linux-kernel,
Darrick J . Wong, hch, bfoster, Dave Chinner, akpm,
kirill.shutemov, mhocko, n-horiguchi, mgorman, aarcange, willy,
linux, linux-mm, Gao Xiang
Am Freitag, 14. Dezember 2018, 13:26:28 CET schrieb Gao Xiang:
> Hi Richard,
>
> On 2018/12/14 19:25, Richard Weinberger wrote:
> > This is the third place which needs this workaround.
> > UBIFS, F2FS, and now iomap.
> >
> > I agree with Dave that nobody can assume that PG_private implies an additional
> > page reference.
> > But page migration does that. Including parts of the write back code.
>
> It seems that it's clearly documented in
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/mm.h#n780
>
> * A pagecache page contains an opaque `private' member, which belongs to the
> * page's address_space. Usually, this is the address of a circular list of
> * the page's disk buffers. PG_private must be set to tell the VM to call
> * into the filesystem to release these pages.
> *
> * A page may belong to an inode's memory mapping. In this case, page->mapping
> * is the pointer to the inode, and page->index is the file offset of the page,
> * in units of PAGE_SIZE.
> *
> * If pagecache pages are not associated with an inode, they are said to be
> * anonymous pages. These may become associated with the swapcache, and in that
> * case PG_swapcache is set, and page->private is an offset into the swapcache.
> *
> * In either case (swapcache or inode backed), the pagecache itself holds one
> * reference to the page. Setting PG_private should also increment the
> * refcount. The each user mapping also has a reference to the page.
>
> and when I looked into that, I found
> https://lore.kernel.org/lkml/3CB3CA93.D141680B@zip.com.au/
Hmm, in case of UBIFS it seems easy. We can add a get/put_page() around setting/clearing
the flag.
I did that now and so far none of my tests exploded.
Artem, do you remember why UBIFS never raised the page counter when setting PG_private?
Thanks,
//richard
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] fix page_count in ->iomap_migrate_page()
2018-12-14 13:35 ` Richard Weinberger
@ 2018-12-14 13:55 ` Gao Xiang
0 siblings, 0 replies; 6+ messages in thread
From: Gao Xiang @ 2018-12-14 13:55 UTC (permalink / raw)
To: Richard Weinberger
Cc: Artem Bityutskiy, zhangjun, Alexander Viro, linux-fsdevel,
linux-kernel, Darrick J . Wong, hch, bfoster, Dave Chinner, akpm,
kirill.shutemov, mhocko, n-horiguchi, mgorman, aarcange, willy,
linux, linux-mm, Gao Xiang
Hi Richard,
On 2018/12/14 21:35, Richard Weinberger wrote:
> Hmm, in case of UBIFS it seems easy. We can add a get/put_page() around setting/clearing
> the flag.
> I did that now and so far none of my tests exploded.
Yes, many existed codes are based on this restriction in order to be freeable race-free.
and that's it since PG_Private was once introduced at first by Andrew Morton in 2002
for many Linux versions....and it's not bad I think... :)
Thanks,
Gao Xiang
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] fix page_count in ->iomap_migrate_page()
2018-12-14 11:25 ` [PATCH] fix page_count in ->iomap_migrate_page() Richard Weinberger
2018-12-14 12:26 ` Gao Xiang
@ 2018-12-15 10:51 ` Christoph Hellwig
2018-12-15 11:17 ` Richard Weinberger
1 sibling, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2018-12-15 10:51 UTC (permalink / raw)
To: Richard Weinberger
Cc: zhangjun, Alexander Viro, linux-fsdevel, linux-kernel,
Darrick J . Wong, hch, bfoster, Dave Chinner, akpm,
kirill.shutemov, mhocko, n-horiguchi, mgorman, aarcange, willy,
linux, linux-mm, Gao Xiang
FYI, for iomap we got a patch to just increment the page count when
setting the private data, and it finally got merged into mainline after
a while.
Not that it totally makes sense to me, but it is what it is. It would
just be nice if set_page_private took care of it and we had a
clear_page_private to undo it, making the whole scheme at lot more
obvious.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] fix page_count in ->iomap_migrate_page()
2018-12-15 10:51 ` Christoph Hellwig
@ 2018-12-15 11:17 ` Richard Weinberger
0 siblings, 0 replies; 6+ messages in thread
From: Richard Weinberger @ 2018-12-15 11:17 UTC (permalink / raw)
To: Christoph Hellwig
Cc: zhangjun, Alexander Viro, linux-fsdevel, linux-kernel,
Darrick J . Wong, bfoster, Dave Chinner, akpm, kirill.shutemov,
mhocko, n-horiguchi, mgorman, aarcange, willy, linux, linux-mm,
Gao Xiang
Am Samstag, 15. Dezember 2018, 11:51:12 CET schrieb Christoph Hellwig:
> FYI, for iomap we got a patch to just increment the page count when
> setting the private data, and it finally got merged into mainline after
> a while.
>
> Not that it totally makes sense to me, but it is what it is. It would
> just be nice if set_page_private took care of it and we had a
> clear_page_private to undo it, making the whole scheme at lot more
> obvious.
Yeah, UBIFS will go the same route. I have already a patch prepared
which increments the page count when UBIFS sets PG_private.
Thanks,
//richard
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-12-15 11:17 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <1544766961-3492-1-git-send-email-openzhangj@gmail.com>
2018-12-14 11:25 ` [PATCH] fix page_count in ->iomap_migrate_page() Richard Weinberger
2018-12-14 12:26 ` Gao Xiang
2018-12-14 13:35 ` Richard Weinberger
2018-12-14 13:55 ` Gao Xiang
2018-12-15 10:51 ` Christoph Hellwig
2018-12-15 11:17 ` Richard Weinberger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox