linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 00/25] Page folios
Date: Thu, 17 Dec 2020 15:35:58 +0100	[thread overview]
Message-ID: <2afc35eb-4f32-4376-9c2d-b4c411f4cb76@redhat.com> (raw)
In-Reply-To: <20201217135517.GF15600@casper.infradead.org>

On 17.12.20 14:55, Matthew Wilcox wrote:
> On Thu, Dec 17, 2020 at 01:47:57PM +0100, David Hildenbrand wrote:
>> On 16.12.20 19:23, Matthew Wilcox (Oracle) wrote:
>>> One of the great things about compound pages is that when you try to
>>> do various operations on a tail page, it redirects to the head page and
>>> everything Just Works.  One of the awful things is how much we pay for
>>> that simplicity.  Here's an example, end_page_writeback():
>>>
>>>         if (PageReclaim(page)) {
>>>                 ClearPageReclaim(page);
>>>                 rotate_reclaimable_page(page);
>>>         }
>>>         get_page(page);
>>>         if (!test_clear_page_writeback(page))
>>>                 BUG();
>>>
>>>         smp_mb__after_atomic();
>>>         wake_up_page(page, PG_writeback);
>>>         put_page(page);
>>>
>>> That all looks very straightforward, but if you dive into the disassembly,
>>> you see that there are four calls to compound_head() in this function
>>> (PageReclaim(), ClearPageReclaim(), get_page() and put_page()).  It's
>>> all for nothing, because if anyone does call this routine with a tail
>>> page, wake_up_page() will VM_BUG_ON_PGFLAGS(PageTail(page), page).
>>>
>>> I'm not really a CPU person, but I imagine there's some kind of dependency
>>> here that sucks too:
>>>
>>>     1fd7:       48 8b 57 08             mov    0x8(%rdi),%rdx
>>>     1fdb:       48 8d 42 ff             lea    -0x1(%rdx),%rax
>>>     1fdf:       83 e2 01                and    $0x1,%edx
>>>     1fe2:       48 0f 44 c7             cmove  %rdi,%rax
>>>     1fe6:       f0 80 60 02 fb          lock andb $0xfb,0x2(%rax)
>>>
>>> Sure, it's going to be cache hot, but that cmove has to execute before
>>> the lock andb.
>>>
>>> I would like to introduce a new concept that I call a Page Folio.
>>> Or just struct folio to its friends.  Here it is,
>>> struct folio {
>>>         struct page page;
>>> };
>>>
>>> A folio is a struct page which is guaranteed not to be a tail page.
>>> So it's either a head page or a base (order-0) page.  That means
>>> we don't have to call compound_head() on it and we save massively.
>>> end_page_writeback() reduces from four calls to compound_head() to just
>>> one (at the beginning of the function) and it shrinks from 213 bytes
>>> to 126 bytes (using distro kernel config options).  I think even that one
>>> can be eliminated, but I'm going slowly at this point and taking the
>>> safe route of transforming a random struct page pointer into a struct
>>> folio pointer by calling page_folio().  By the end of this exercise,
>>> end_page_writeback() will become end_folio_writeback().
>>>
>>> This is going to be a ton of work, and massively disruptive.  It'll touch
>>> every filesystem, and a good few device drivers!  But I think it's worth
>>> it.  Not every routine benefits as much as end_page_writeback(), but it
>>> makes everything a little better.  At 29 bytes per call to lock_page(),
>>> unlock_page(), put_page() and get_page(), that's on the order of 60kB of
>>> text for allyesconfig.  More when you add on all the PageFoo() calls.
>>> With the small amount of work I've done here, mm/filemap.o shrinks its
>>> text segment by over a kilobyte from 33687 to 32318 bytes (and also 192
>>> bytes of data).
>>
>> Just wondering, as the primary motivation here is "minimizing CPU work",
>> did you run any benchmarks that revealed a visible performance improvement?
>>
>> Otherwise, we're left with a concept that's hard to grasp first (folio -
>> what?!) and "a ton of work, and massively disruptive", saving some kb of
>> code - which does not sound too appealing to me.
>>
>> (I like the idea of abstracting which pages are actually worth looking
>> at directly instead of going via a tail page - tail pages act somewhat
>> like a proxy for the head page when accessing flags)
> 
> My primary motivation here isn't minimising CPU work at all.  It's trying

Ah, okay, reading about disassembly gave me that impression.

> to document which interfaces are expected to operate on an entire
> compound page and which are expected to operate on a PAGE_SIZE page.
> Today, we have a horrible mishmash of
> 
>  - This is a head page, I shall operate on 2MB of data
>  - This is a tail page, I shall operate on 2MB of data
>  - This is not a head page, I shall operate on 4kB of data
>  - This is a head page, I shall operate on 4kB of data
>  - This is a head|tail page, I shall operate on the size of the compound page.
> 
> You might say "Well, why not lead with that?", but I don't know which
> advantages people are going to find most compelling.  Even if someone
> doesn't believe in the advantages of using folios in the page cache,
> looking at the assembler output is, I think, compelling.

Personally, I think the implicit documentation of which type of pages
functions expect is a clear advantage. Having less code is a nice cherry
on top.

-- 
Thanks,

David / dhildenb



      reply	other threads:[~2020-12-17 14:37 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-16 18:23 Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 01/25] mm: Introduce struct folio Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 02/25] mm: Add put_folio Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 03/25] mm: Add get_folio Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 04/25] mm: Create FolioFlags Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 05/25] mm: Add unlock_folio Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 06/25] mm: Add lock_folio Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 07/25] mm: Add lock_folio_killable Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 08/25] mm: Add __alloc_folio_node and alloc_folio Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 09/25] mm: Convert __page_cache_alloc to return a folio Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 10/25] mm/filemap: Convert end_page_writeback to use " Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 11/25] mm: Convert mapping_get_entry to return " Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 12/25] mm: Add mark_folio_accessed Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 13/25] mm: Add filemap_get_folio and find_get_folio Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 14/25] mm/filemap: Add folio_add_to_page_cache Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 15/25] mm/swap: Convert rotate_reclaimable_page to folio Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 16/25] mm: Add folio_mapping Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 17/25] mm: Rename THP_SUPPORT to MULTI_PAGE_FOLIOS Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 18/25] btrfs: Use readahead_batch_length Matthew Wilcox (Oracle)
2020-12-17  9:15   ` John Hubbard
2020-12-17 12:12     ` Matthew Wilcox
2020-12-17 13:42       ` Matthew Wilcox
2020-12-17 19:36         ` John Hubbard
2020-12-16 18:23 ` [PATCH 19/25] fs: Change page refcount rules for readahead Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 20/25] fs: Change readpage to take a folio Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 21/25] mm: Convert wait_on_page_bit to wait_on_folio_bit Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 22/25] mm: Add wait_on_folio_locked & wait_on_folio_locked_killable Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 23/25] mm: Add flush_dcache_folio Matthew Wilcox (Oracle)
2020-12-16 20:59   ` kernel test robot
2020-12-16 22:01     ` Matthew Wilcox
2020-12-16 18:23 ` [PATCH 24/25] mm: Add read_cache_folio and read_mapping_folio Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 25/25] fs: Convert vfs_dedupe_file_range_compare to folios Matthew Wilcox (Oracle)
2020-12-17 12:47 ` [PATCH 00/25] Page folios David Hildenbrand
2020-12-17 13:55   ` Matthew Wilcox
2020-12-17 14:35     ` David Hildenbrand [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2afc35eb-4f32-4376-9c2d-b4c411f4cb76@redhat.com \
    --to=david@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox