From: David Hildenbrand <david@redhat.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: linux-kernel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Hugh Dickins <hughd@google.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
David Rientjes <rientjes@google.com>,
Shakeel Butt <shakeelb@google.com>,
John Hubbard <jhubbard@nvidia.com>,
Jason Gunthorpe <jgg@nvidia.com>,
Mike Kravetz <mike.kravetz@oracle.com>,
Mike Rapoport <rppt@linux.ibm.com>,
Yang Shi <shy828301@gmail.com>,
"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
Vlastimil Babka <vbabka@suse.cz>, Jann Horn <jannh@google.com>,
Michal Hocko <mhocko@kernel.org>, Nadav Amit <namit@vmware.com>,
Rik van Riel <riel@surriel.com>, Roman Gushchin <guro@fb.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Peter Xu <peterx@redhat.com>, Donald Dutile <ddutile@redhat.com>,
Christoph Hellwig <hch@lst.de>, Oleg Nesterov <oleg@redhat.com>,
Jan Kara <jack@suse.cz>, Liang Zhang <zhangliang5@huawei.com>,
Pedro Gomes <pedrodemargomes@gmail.com>,
Oded Gabbay <oded.gabbay@gmail.com>,
linux-mm@kvack.org, Alexander Potapenko <glider@google.com>
Subject: Re: [PATCH v1 10/15] mm/page-flags: reuse PG_slab as PG_anon_exclusive for PageAnon() pages
Date: Wed, 9 Mar 2022 19:00:46 +0100 [thread overview]
Message-ID: <32c4bd91-08cf-7318-02b7-f43452447aef@redhat.com> (raw)
In-Reply-To: <YijmoncSWx/XWkOW@casper.infradead.org>
>> It's making the semantics of PG_slab depend on another bit in the head
>> page. I agree, it's not perfect, but it's not *too* crazy. As raised in
>> the cover letter, not proud of this, but I didn't really find an
>> alternative for the time being.
>>
>>> requirement that SL[AOU]B doesn't use the bottom two bits of
>>
>> I think only the last bit (0x1)
>
> Yeah, OK, they can use three of the four possible combinations of the
> bottom two bits ;-) Still, it's yet more constraints on use of struct
> page, which aren't obvious (and are even upside down for the casual
> observer).
I don't disagree that such constraints are nasty.
Having that said, I'd really like to avoid overloading PG_slab
(especially, such that I don't have to mess with
scripts/crash/makedumpfile). So if we can reuse MappedToDisk, that would
be very nice.
>
>>> I have plans to get rid of PageError and PagePrivate, but those are going
>>> to be too late for you. I don't think mappedtodisk has meaning for anon
>>> pages, even if they're in the swapcache. It would need PG_has_hwpoisoned
>>
>> Are you sure it's not used if the page is in the swapcache? I have no
>> detailed knowledge how file-back swap targets are handled in that
>> regard. So fs experience would be highly appreciated :)
>
> I have two arguments here. One is based on grep:
>
> $ git grep mappedtodisk
> fs/proc/page.c: u |= kpf_copy_bit(k, KPF_MAPPEDTODISK, PG_mappedtodisk);
> include/linux/page-flags.h: PG_mappedtodisk, /* Has blocks allocated on-disk */
> include/linux/page-flags.h: PG_has_hwpoisoned = PG_mappedtodisk,
> include/linux/page-flags.h:PAGEFLAG(MappedToDisk, mappedtodisk, PF_NO_TAIL)
> include/trace/events/mmflags.h: {1UL << PG_mappedtodisk, "mappedtodisk" }, \
> include/trace/events/pagemap.h: (folio_test_mappedtodisk(folio) ? PAGEMAP_MAPPEDDISK : 0) | \
> mm/migrate.c: if (folio_test_mappedtodisk(folio))
> mm/migrate.c: folio_set_mappedtodisk(newfolio);
> mm/truncate.c: folio_clear_mappedtodisk(folio);
> tools/vm/page-types.c: [KPF_MAPPEDTODISK] = "d:mappedtodisk",
>
> $ git grep MappedToDisk
> fs/buffer.c: SetPageMappedToDisk(page);
> fs/buffer.c: if (PageMappedToDisk(page))
> fs/buffer.c: SetPageMappedToDisk(page);
> fs/ext4/readpage.c: SetPageMappedToDisk(page);
> fs/f2fs/data.c: SetPageMappedToDisk(page);
> fs/f2fs/file.c: if (PageMappedToDisk(page))
> fs/fuse/dev.c: ClearPageMappedToDisk(newpage);
> fs/mpage.c: SetPageMappedToDisk(page);
> fs/nilfs2/file.c: if (PageMappedToDisk(page))
> fs/nilfs2/file.c: SetPageMappedToDisk(page);
> fs/nilfs2/page.c: ClearPageMappedToDisk(page);
> fs/nilfs2/page.c: SetPageMappedToDisk(dpage);
> fs/nilfs2/page.c: ClearPageMappedToDisk(dpage);
> fs/nilfs2/page.c: if (PageMappedToDisk(src) && !PageMappedToDisk(dst))
> fs/nilfs2/page.c: SetPageMappedToDisk(dst);
> fs/nilfs2/page.c: else if (!PageMappedToDisk(src) && PageMappedToDisk(dst))
> fs/nilfs2/page.c: ClearPageMappedToDisk(dst);
> fs/nilfs2/page.c: ClearPageMappedToDisk(page);
> include/linux/page-flags.h:PAGEFLAG(MappedToDisk, mappedtodisk, PF_NO_TAIL)
>
> so you can see it's _rarely_ used, and only by specific filesystems.
Right, but I spot ext4 and fs/buffer.c core functionality. That
naturally makes me nervous :)
>
> The swap code actually bypasses the filesystem (except for network
> filesystems) and submits BIOs directly (see __swap_writepage() and
> swap_readpage()). There's no check for MappedToDisk, and nowhere to
> set it/clear it.
>
> The second argument is that MappedToDisk is used for delayed allocation
> on writes for filesystems. But swap is required to be allocated at
> swapfile setup (so that the swap code can bypass the filesystem ...)
> and so this flag is useless.
I have some faint memory that there are corner cases, but maybe
(hopefully) my memory is wrong.
>
> Of course, I may have missed something. This is complex.
>
Yeah, that's why I was very careful. If any FS would end up setting that
flag we'd be in trouble and would have to blacklist that fs for swapping
or rework it (well, if we're even able to identify such a file system).
I think one sanity check I could add is making sure that
PageAnonExclusive() is never set when getting a page from the swapcache
in do_swap_page(). I think that would take care of the obvious bugs.
--
Thanks,
David / dhildenb
next prev parent reply other threads:[~2022-03-09 18:01 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-08 14:14 [PATCH v1 00/15] mm: COW fixes part 2: reliable GUP pins of anonymous pages David Hildenbrand
2022-03-08 14:14 ` [PATCH v1 01/15] mm/rmap: fix missing swap_free() in try_to_unmap() after arch_unmap_one() failed David Hildenbrand
2022-03-08 14:14 ` [PATCH v1 02/15] mm/hugetlb: take src_mm->write_protect_seq in copy_hugetlb_page_range() David Hildenbrand
2022-03-08 14:14 ` [PATCH v1 03/15] mm/memory: slightly simplify copy_present_pte() David Hildenbrand
2022-03-08 14:14 ` [PATCH v1 04/15] mm/rmap: split page_dup_rmap() into page_dup_file_rmap() and page_try_dup_anon_rmap() David Hildenbrand
2022-03-08 14:14 ` [PATCH v1 05/15] mm/rmap: convert RMAP flags to a proper distinct rmap_t type David Hildenbrand
2022-03-08 17:15 ` Nadav Amit
2022-03-08 17:30 ` David Hildenbrand
2022-03-08 18:09 ` Linus Torvalds
2022-03-08 18:24 ` Nadav Amit
2022-03-08 18:42 ` Linus Torvalds
2022-03-08 14:14 ` [PATCH v1 06/15] mm/rmap: remove do_page_add_anon_rmap() David Hildenbrand
2022-03-08 14:14 ` [PATCH v1 07/15] mm/rmap: pass rmap flags to hugepage_add_anon_rmap() David Hildenbrand
2022-03-08 14:14 ` [PATCH v1 08/15] mm/rmap: drop "compound" parameter from page_add_new_anon_rmap() David Hildenbrand
2022-03-08 14:14 ` [PATCH v1 09/15] mm/rmap: use page_move_anon_rmap() when reusing a mapped PageAnon() page exclusively David Hildenbrand
2022-03-08 14:14 ` [PATCH v1 10/15] mm/page-flags: reuse PG_slab as PG_anon_exclusive for PageAnon() pages David Hildenbrand
2022-03-09 15:47 ` Matthew Wilcox
2022-03-09 16:57 ` David Hildenbrand
2022-03-09 17:40 ` Matthew Wilcox
2022-03-09 18:00 ` David Hildenbrand [this message]
2022-03-11 18:46 ` David Hildenbrand
[not found] ` <CAHk-=wjWx_bPBLB=qMMae8Sy3KrO+Kvaf4juPknO5HX-+Ot0XQ@mail.gmail.com>
2022-03-11 19:36 ` David Hildenbrand
2022-03-11 21:11 ` Matthew Wilcox
2022-03-12 8:11 ` David Hildenbrand
2022-03-11 21:02 ` Matthew Wilcox
2022-03-12 8:26 ` David Hildenbrand
2022-03-08 14:14 ` [PATCH v1 11/15] mm: remember exclusively mapped anonymous pages with PG_anon_exclusive David Hildenbrand
2022-03-11 18:52 ` David Hildenbrand
2022-03-08 14:14 ` [PATCH v1 12/15] mm/gup: disallow follow_page(FOLL_PIN) David Hildenbrand
2022-03-08 14:14 ` [PATCH v1 13/15] mm: support GUP-triggered unsharing of anonymous pages David Hildenbrand
2022-03-08 14:14 ` [PATCH v1 14/15] mm/gup: trigger FAULT_FLAG_UNSHARE when R/O-pinning a possibly shared anonymous page David Hildenbrand
2022-03-08 14:14 ` [PATCH v1 15/15] mm/gup: sanity-check with CONFIG_DEBUG_VM that anonymous pages are exclusive when (un)pinning David Hildenbrand
2022-03-08 14:19 ` [PATCH v1 00/15] mm: COW fixes part 2: reliable GUP pins of anonymous pages David Hildenbrand
2022-03-08 21:22 ` Linus Torvalds
2022-03-09 8:00 ` David Hildenbrand
2022-03-10 11:13 ` Oded Gabbay
2022-03-10 11:57 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=32c4bd91-08cf-7318-02b7-f43452447aef@redhat.com \
--to=david@redhat.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=ddutile@redhat.com \
--cc=glider@google.com \
--cc=guro@fb.com \
--cc=hch@lst.de \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=jannh@google.com \
--cc=jgg@nvidia.com \
--cc=jhubbard@nvidia.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=mike.kravetz@oracle.com \
--cc=namit@vmware.com \
--cc=oded.gabbay@gmail.com \
--cc=oleg@redhat.com \
--cc=pedrodemargomes@gmail.com \
--cc=peterx@redhat.com \
--cc=riel@surriel.com \
--cc=rientjes@google.com \
--cc=rppt@linux.ibm.com \
--cc=shakeelb@google.com \
--cc=shy828301@gmail.com \
--cc=torvalds@linux-foundation.org \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
--cc=zhangliang5@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox