linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Damian Tometzki <dtometzki@fedoraproject.org>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kent Overstreet <kent.overstreet@gmail.com>,
	Matthew Wilcox <willy@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	"Darrick J. Wong" <djwong@kernel.org>,
	Christoph Hellwig <hch@infradead.org>,
	David Howells <dhowells@redhat.com>
Subject: Re: Folio discussion recap
Date: Wed, 15 Sep 2021 19:55:01 +0200	[thread overview]
Message-ID: <YUIzdTyFBITDIPnj@fedora.tometzki.de> (raw)
In-Reply-To: <YUIT2/xXwvZ4IErc@cmpxchg.org>

Hello together,

I am an outsider and  following the discussion here on the subject. 
Can we not go upsream with the state of development ? 
Optimizations will always be there and new kernel releases too.

I can not assess the risk but I think a decision must be made. 

Damian
 

On Wed, 15. Sep 11:40, Johannes Weiner wrote:
> On Fri, Sep 10, 2021 at 04:16:28PM -0400, Kent Overstreet wrote:
> > One particularly noteworthy idea was having struct page refer to
> > multiple hardware pages, and using slab/slub for larger
> > alloctions. In my view, the primary reason for making this change
> > isn't the memory overhead to struct page (though reducing that would
> > be nice);
> 
> Don't underestimate this, however.
> 
> Picture the near future Willy describes, where we don't bump struct
> page size yet but serve most cache with compound huge pages.
> 
> On x86, it would mean that the average page cache entry has 512
> mapping pointers, 512 index members, 512 private pointers, 1024 LRU
> list pointers, 512 dirty flags, 512 writeback flags, 512 uptodate
> flags, 512 memcg pointers etc. - you get the idea.
> 
> This is a ton of memory. I think this doesn't get more traction
> because it's memory we've always allocated, and we're simply more
> sensitive to regressions than long-standing pain. But nevertheless
> this is a pretty low-hanging fruit.
> 
> The folio makes a great first step moving those into a separate data
> structure, opening the door to one day realizing these savings. Even
> when some MM folks say this was never the intent behind the patches, I
> think this is going to matter significantly, if not more so, later on.
> 
> > Fortunately, Matthew made a big step in the right direction by making folios a
> > new type. Right now, struct folio is not separately allocated - it's just
> > unionized/overlayed with struct page - but perhaps in the future they could be
> > separately allocated. I don't think that is a remotely realistic goal for _this_
> > patch series given the amount of code that touches struct page (thing: writeback
> > code, LRU list code, page fault handlers!) - but I think that's a goal we could
> > keep in mind going forward.
> 
> Yeah, agreed. Not doable out of the gate, but retaining the ability to
> allocate the "cache entry descriptor" bits - mapping, index etc. -
> on-demand would be a huge benefit down the road for the above reason.
> 
> For that they would have to be in - and stay in - their own type.
> 
> > We should also be clear on what _exactly_ folios are for, so they don't become
> > the new dumping ground for everyone to stash their crap. They're to be a new
> > core abstraction, and we should endeaver to keep our core data structures
> > _small_, and _simple_.
> 
> Right. struct page is a lot of things and anything but simple and
> obvious today. struct folio in its current state does a good job
> separating some of that stuff out.
> 
> However, when we think about *which* of the struct page mess the folio
> wants to address, I think that bias toward recent pain over much
> bigger long-standing pain strikes again.
> 
> The compound page proliferation is new, and we're sensitive to the
> ambiguity it created between head and tail pages. It's added some
> compound_head() in lower-level accessor functions that are not
> necessary for many contexts. The folio type safety will help clean
> that up, and this is great.
> 
> However, there is a much bigger, systematic type ambiguity in the MM
> world that we've just gotten used to over the years: anon vs file vs
> shmem vs slab vs ...
> 
> - Many places rely on context to say "if we get here, it must be
>   anon/file", and then unsafely access overloaded member elements:
>   page->mapping, PG_readahead, PG_swapcache, PG_private
> 
> - On the other hand, we also have low-level accessor functions that
>   disambiguate the type and impose checks on contexts that may or may
>   not actually need them - not unlike compound_head() in PageActive():
> 
>   struct address_space *folio_mapping(struct folio *folio)
>   {
> 	struct address_space *mapping;
> 
> 	/* This happens if someone calls flush_dcache_page on slab page */
> 	if (unlikely(folio_test_slab(folio)))
> 		return NULL;
> 
> 	if (unlikely(folio_test_swapcache(folio)))
> 		return swap_address_space(folio_swap_entry(folio));
> 
> 	mapping = folio->mapping;
> 	if ((unsigned long)mapping & PAGE_MAPPING_ANON)
> 		return NULL;
> 
> 	return (void *)((unsigned long)mapping & ~PAGE_MAPPING_FLAGS);
>   }
> 
>   Then we go identify places that say "we know it's at least not a
>   slab page!" and convert them to page_mapping_file() which IS safe to
>   use with anon. Or we say "we know this MUST be a file page" and just
>   access the (unsafe) mapping pointer directly.
> 
> - We have a singular page lock, but what it guards depends on what
>   type of page we're dealing with. For a cache page it protects
>   uptodate and the mapping. For an anon page it protects swap state.
> 
>   A lot of us can remember the rules if we try, but the code doesn't
>   help and it gets really tricky when dealing with multiple types of
>   pages simultaneously. Even mature code like reclaim just serializes
>   the operation instead of protecting data - the writeback checks and
>   the page table reference tests don't seem to need page lock.
> 
>   When the cgroup folks wrote the initial memory controller, they just
>   added their own page-scope lock to protect page->memcg even though
>   the page lock would have covered what it needed.
> 
> - shrink_page_list() uses page_mapping() in the first half of the
>   function to tell whether the page is anon or file, but halfway
>   through we do this:
> 
> 	  /* Adding to swap updated mapping */
>           mapping = page_mapping(page);
> 
>   and then use PageAnon() to disambiguate the page type.
> 
> - At activate_locked:, we check PG_swapcache directly on the page and
>   rely on it doing the right thing for anon, file, and shmem pages.
>   But this flag is PG_owner_priv_1 and actually used by the filesystem
>   for something else. I guess PG_checked pages currently don't make it
>   this far in reclaim, or we'd crash somewhere in try_to_free_swap().
> 
>   I suppose we're also never calling page_mapping() on PageChecked
>   filesystem pages right now, because it would return a swap mapping
>   before testing whether this is a file page. You know, because shmem.
> 
> These are just a few examples from an MM perspective. I'm sure the FS
> folks have their own stories and examples about pitfalls in dealing
> with struct page members.
> 
> We're so used to this that we don't realize how much bigger and
> pervasive this lack of typing is than the compound page thing.
> 
> I'm not saying the compound page mess isn't worth fixing. It is.
> 
> I'm saying if we started with a file page or cache entry abstraction
> we'd solve not only the huge page cache, but also set us up for a MUCH
> more comprehensive cleanup in MM code and MM/FS interaction that makes
> the tailpage cleanup pale in comparison. For the same amount of churn,
> since folio would also touch all of these places.
> 


  reply	other threads:[~2021-09-15 17:55 UTC|newest]

Thread overview: 162+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-23 19:01 [GIT PULL] Memory folios for v5.15 Matthew Wilcox
2021-08-23 21:26 ` Johannes Weiner
2021-08-23 22:06   ` Linus Torvalds
2021-08-24  2:20     ` Matthew Wilcox
2021-08-24 13:04     ` Matthew Wilcox
2021-08-23 22:15   ` Matthew Wilcox
2021-08-24 18:32     ` Johannes Weiner
2021-08-24 18:59       ` Linus Torvalds
2021-08-25  6:39         ` Christoph Hellwig
2021-08-24 19:44       ` Matthew Wilcox
2021-08-25 15:13         ` Johannes Weiner
2021-08-26  0:45           ` Darrick J. Wong
2021-08-27 14:07             ` Johannes Weiner
2021-08-27 18:44               ` Matthew Wilcox
2021-08-27 21:41                 ` Dan Williams
2021-08-27 21:49                   ` Matthew Wilcox
2021-08-30 17:32                 ` Johannes Weiner
2021-08-30 18:22                   ` Matthew Wilcox
2021-08-30 20:27                     ` Johannes Weiner
2021-08-30 21:38                       ` Matthew Wilcox
2021-08-31 17:40                         ` Vlastimil Babka
2021-09-01 17:43                         ` Johannes Weiner
2021-09-02 15:13                           ` Zi Yan
2021-09-06 14:00                             ` Vlastimil Babka
2021-08-31 18:50                       ` Eric W. Biederman
2021-08-26  8:58         ` David Howells
2021-08-27 10:03           ` Johannes Weiner
2021-08-27 12:05             ` Matthew Wilcox
2021-08-27 10:49           ` David Howells
2021-08-24 15:54   ` David Howells
2021-08-24 17:56     ` Matthew Wilcox
2021-08-24 18:26       ` Linus Torvalds
2021-08-24 18:29         ` Linus Torvalds
2021-08-24 19:26           ` Theodore Ts'o
2021-08-24 19:34           ` David Howells
2021-08-24 20:02             ` Theodore Ts'o
2021-08-24 21:32             ` David Howells
2021-08-25 12:08               ` Jeff Layton
2021-08-24 19:01         ` Matthew Wilcox
2021-08-24 19:11           ` Linus Torvalds
2021-08-24 19:23             ` Matthew Wilcox
2021-08-24 19:44               ` Theodore Ts'o
2021-08-24 20:00                 ` Matthew Wilcox
2021-08-25  6:32                 ` Christoph Hellwig
2021-08-25  9:01                   ` Rasmus Villemoes
2021-08-26  6:32                     ` Amir Goldstein
2021-08-25 12:03                   ` Jeff Layton
2021-08-26  0:59                     ` Darrick J. Wong
2021-08-26  4:02                   ` Nicholas Piggin
2021-09-01 12:58                 ` Mike Rapoport
2021-08-24 19:35             ` David Howells
2021-08-24 20:35               ` Vlastimil Babka
2021-08-24 20:40                 ` Vlastimil Babka
2021-08-24 19:11         ` David Howells
2021-08-24 19:25           ` Linus Torvalds
2021-08-24 19:38             ` Linus Torvalds
2021-08-24 19:48               ` Linus Torvalds
2021-08-26 17:18                 ` Matthew Wilcox
2021-08-24 19:59             ` David Howells
2021-10-05 13:52   ` Matthew Wilcox
2021-10-05 17:29     ` Johannes Weiner
2021-10-05 17:32       ` David Hildenbrand
2021-10-05 18:30       ` Matthew Wilcox
2021-10-05 19:56         ` Jason Gunthorpe
2021-08-28  3:29 ` Matthew Wilcox
2021-09-09 12:43 ` Christoph Hellwig
2021-09-09 13:56   ` Vlastimil Babka
2021-09-09 18:16     ` Johannes Weiner
2021-09-09 18:44       ` Matthew Wilcox
2021-09-09 22:03         ` Johannes Weiner
2021-09-09 22:48           ` Matthew Wilcox
2021-09-09 19:17     ` John Hubbard
2021-09-09 19:23       ` Matthew Wilcox
2021-09-10 20:16 ` Folio discussion recap Kent Overstreet
2021-09-11  1:23   ` Kirill A. Shutemov
2021-09-13 11:32     ` Michal Hocko
2021-09-13 18:12       ` Johannes Weiner
2021-09-15 15:40   ` Johannes Weiner
2021-09-15 17:55     ` Damian Tometzki [this message]
2021-09-16  2:58     ` Darrick J. Wong
2021-09-16 16:54       ` Johannes Weiner
2021-09-17  5:24         ` Dave Chinner
2021-09-17  7:18           ` Christoph Hellwig
2021-09-17 16:31           ` Johannes Weiner
2021-09-17 20:57             ` Kirill A. Shutemov
2021-09-17 21:17               ` Kent Overstreet
2021-09-17 22:02                 ` Kirill A. Shutemov
2021-09-17 22:21                   ` Kent Overstreet
2021-09-17 23:15               ` Johannes Weiner
2021-09-20 10:03                 ` Kirill A. Shutemov
2021-09-17 21:13             ` Kent Overstreet
2021-09-17 22:25               ` Theodore Ts'o
2021-09-17 23:35                 ` Josef Bacik
2021-09-18  1:04             ` Dave Chinner
2021-09-18  4:51               ` Kent Overstreet
2021-09-20  1:04                 ` Dave Chinner
2021-09-16 21:58       ` David Howells
2021-09-20  2:17   ` Matthew Wilcox
2021-09-21 19:47     ` Johannes Weiner
2021-09-21 20:38       ` Matthew Wilcox
2021-09-21 21:11         ` Kent Overstreet
2021-09-21 21:22           ` Folios for 5.15 request - Was: re: Folio discussion recap - Kent Overstreet
2021-09-22 15:08             ` Johannes Weiner
2021-09-22 15:46               ` Kent Overstreet
2021-09-22 16:26                 ` Matthew Wilcox
2021-09-22 16:56                   ` Chris Mason
2021-09-22 19:54                     ` Matthew Wilcox
2021-09-22 20:15                       ` Kent Overstreet
2021-09-22 20:21                       ` Linus Torvalds
2021-09-23  5:42               ` Kent Overstreet
2021-09-23 18:00                 ` Johannes Weiner
2021-09-23 19:31                   ` Matthew Wilcox
2021-09-23 20:20                   ` Kent Overstreet
2021-10-16  3:28               ` Matthew Wilcox
2021-10-18 16:47                 ` Johannes Weiner
2021-10-18 18:12                   ` Kent Overstreet
2021-10-18 20:45                     ` Johannes Weiner
2021-10-19 16:11                       ` Splitting struct page into multiple types " Kent Overstreet
2021-10-19 17:06                         ` Gao Xiang
2021-10-19 17:34                           ` Matthew Wilcox
2021-10-19 17:54                             ` Gao Xiang
2021-10-20 17:46                               ` Kent Overstreet
2021-10-19 17:37                         ` Jason Gunthorpe
2021-10-19 21:14                       ` David Howells
2021-10-18 18:28                   ` Folios for 5.15 request " Matthew Wilcox
2021-10-18 21:56                     ` Johannes Weiner
2021-10-18 23:16                       ` Kirill A. Shutemov
2021-10-19 15:16                         ` Johannes Weiner
2021-10-20  3:19                           ` Matthew Wilcox
2021-10-20  7:50                           ` David Hildenbrand
2021-10-20 17:26                             ` Matthew Wilcox
2021-10-20 18:04                               ` David Hildenbrand
2021-10-21  6:51                                 ` Christoph Hellwig
2021-10-21  7:21                                   ` David Hildenbrand
2021-10-21 12:03                                     ` Kent Overstreet
2021-10-21 12:35                                       ` David Hildenbrand
2021-10-21 12:38                                         ` Christoph Hellwig
2021-10-21 13:00                                           ` David Hildenbrand
2021-10-21 12:41                                         ` Matthew Wilcox
2021-10-20 17:39                           ` Kent Overstreet
2021-10-21 21:37                             ` Johannes Weiner
2021-10-22  1:52                               ` Matthew Wilcox
2021-10-22  7:59                                 ` David Hildenbrand
2021-10-22 13:01                                   ` Matthew Wilcox
2021-10-22 14:40                                     ` David Hildenbrand
2021-10-23  2:22                                       ` Matthew Wilcox
2021-10-23  5:02                                         ` Christoph Hellwig
2021-10-23  9:58                                         ` David Hildenbrand
2021-10-23 16:00                                           ` Kent Overstreet
2021-10-23 21:41                                             ` Matthew Wilcox
2021-10-23 22:23                                               ` Kent Overstreet
2021-10-25 15:35                                 ` Johannes Weiner
2021-10-25 15:52                                   ` Matthew Wilcox
2021-10-25 16:05                                   ` Kent Overstreet
2021-10-16 19:07               ` Matthew Wilcox
2021-10-18 17:25                 ` Johannes Weiner
2021-09-21 22:18           ` Folio discussion recap Matthew Wilcox
2021-09-23  0:45             ` Ira Weiny
2021-09-23  3:41               ` Matthew Wilcox
2021-09-23 22:12                 ` Ira Weiny
2021-09-29 15:24                   ` Matthew Wilcox
2021-09-21 21:59         ` Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YUIzdTyFBITDIPnj@fedora.tometzki.de \
    --to=dtometzki@fedoraproject.org \
    --cc=akpm@linux-foundation.org \
    --cc=dhowells@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=hch@infradead.org \
    --cc=kent.overstreet@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=torvalds@linux-foundation.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox