From: Dave Hansen <dave@sr71.net>
To: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Al Viro <viro@zeniv.linux.org.uk>,
Hugh Dickins <hughd@google.com>,
Wu Fengguang <fengguang.wu@intel.com>, Jan Kara <jack@suse.cz>,
Mel Gorman <mgorman@suse.de>,
linux-mm@kvack.org, Andi Kleen <ak@linux.intel.com>,
Matthew Wilcox <matthew.r.wilcox@intel.com>,
"Kirill A. Shutemov" <kirill@shutemov.name>,
Hillf Danton <dhillf@gmail.com>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCHv2, RFC 08/30] thp, mm: rewrite add_to_page_cache_locked() to support huge pages
Date: Thu, 21 Mar 2013 10:11:00 -0700 [thread overview]
Message-ID: <514B3F24.3070006@sr71.net> (raw)
In-Reply-To: <1363283435-7666-9-git-send-email-kirill.shutemov@linux.intel.com>
On 03/14/2013 10:50 AM, Kirill A. Shutemov wrote:
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>
> For huge page we add to radix tree HPAGE_CACHE_NR pages at once: head
> page for the specified index and HPAGE_CACHE_NR-1 tail pages for
> following indexes.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
> mm/filemap.c | 76 ++++++++++++++++++++++++++++++++++++++++------------------
> 1 file changed, 53 insertions(+), 23 deletions(-)
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 2d99191..6bac9e2 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -447,6 +447,7 @@ int add_to_page_cache_locked(struct page *page, struct address_space *mapping,
> pgoff_t offset, gfp_t gfp_mask)
> {
> int error;
> + int nr = 1;
>
> VM_BUG_ON(!PageLocked(page));
> VM_BUG_ON(PageSwapBacked(page));
> @@ -454,32 +455,61 @@ int add_to_page_cache_locked(struct page *page, struct address_space *mapping,
> error = mem_cgroup_cache_charge(page, current->mm,
> gfp_mask & GFP_RECLAIM_MASK);
> if (error)
> - goto out;
> + return error;
>
> - error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM);
> - if (error == 0) {
> - page_cache_get(page);
> - page->mapping = mapping;
> - page->index = offset;
> + if (PageTransHuge(page)) {
> + BUILD_BUG_ON(HPAGE_CACHE_NR > RADIX_TREE_PRELOAD_NR);
> + nr = HPAGE_CACHE_NR;
> + }
That seems like a slightly odd place to put a BUILD_BUG_ON(). I guess
it doesn't matter to some degree, but does putting it inside the if()
imply anything?
> + error = radix_tree_preload_count(nr, gfp_mask & ~__GFP_HIGHMEM);
> + if (error) {
> + mem_cgroup_uncharge_cache_page(page);
> + return error;
> + }
>
> - spin_lock_irq(&mapping->tree_lock);
> - error = radix_tree_insert(&mapping->page_tree, offset, page);
> - if (likely(!error)) {
> - mapping->nrpages++;
> - __inc_zone_page_state(page, NR_FILE_PAGES);
> - spin_unlock_irq(&mapping->tree_lock);
> - trace_mm_filemap_add_to_page_cache(page);
> - } else {
> - page->mapping = NULL;
> - /* Leave page->index set: truncation relies upon it */
> - spin_unlock_irq(&mapping->tree_lock);
> - mem_cgroup_uncharge_cache_page(page);
> - page_cache_release(page);
I do really like how this rewrite de-indents this code. :)
> + page_cache_get(page);
> + spin_lock_irq(&mapping->tree_lock);
> + page->mapping = mapping;
> + page->index = offset;
> + error = radix_tree_insert(&mapping->page_tree, offset, page);
> + if (unlikely(error))
> + goto err;
> + if (PageTransHuge(page)) {
> + int i;
> + for (i = 1; i < HPAGE_CACHE_NR; i++) {
> + page_cache_get(page + i);
> + page[i].index = offset + i;
Is it OK to leave page->mapping unset for these?
> + error = radix_tree_insert(&mapping->page_tree,
> + offset + i, page + i);
> + if (error) {
> + page_cache_release(page + i);
> + break;
> + }
> }
Throughout all this new code, I'd really challenge you to try as much as
possible to minimize the code stuck under "if (PageTransHuge(page))".
For instance, could you change the for() loop a bit and have it shared
between both cases, like:
> + for (i = 0; i < nr; i++) {
> + page_cache_get(page + i);
> + page[i].index = offset + i;
> + error = radix_tree_insert(&mapping->page_tree,
> + offset + i, page + i);
> + if (error) {
> + page_cache_release(page + i);
> + break;
> + }
> }
> - radix_tree_preload_end();
> - } else
> - mem_cgroup_uncharge_cache_page(page);
> -out:
> + if (error) {
> + error = ENOSPC; /* no space for a huge page */
> + for (i--; i > 0; i--) {
> + radix_tree_delete(&mapping->page_tree,
> + offset + i);
> + page_cache_release(page + i);
> + }
> + radix_tree_delete(&mapping->page_tree, offset);
I wonder if this would look any nicer if you just did all the
page_cache_get()s for the entire huge page along with the head page, and
then released them all in one place. I think it might shrink the error
handling paths here.
> + goto err;
> + }
> + }
> + __mod_zone_page_state(page_zone(page), NR_FILE_PAGES, nr);
> + mapping->nrpages += nr;
> + spin_unlock_irq(&mapping->tree_lock);
> + trace_mm_filemap_add_to_page_cache(page);
Do we need to change the tracing to make sure it notes that these were
or weren't huge pages?
> + radix_tree_preload_end();
> + return 0;
> +err:
> + page->mapping = NULL;
> + /* Leave page->index set: truncation relies upon it */
> + spin_unlock_irq(&mapping->tree_lock);
> + radix_tree_preload_end();
> + mem_cgroup_uncharge_cache_page(page);
> + page_cache_release(page);
> return error;
> }
> EXPORT_SYMBOL(add_to_page_cache_locked);
Does the cgroup code know how to handle these large pages internally
somehow? It looks like the charge/uncharge is only being done for the
head page.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-03-21 17:09 UTC|newest]
Thread overview: 124+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-14 17:50 [PATCHv2, RFC 00/30] Transparent huge page cache Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 01/30] block: implement add_bdi_stat() Kirill A. Shutemov
2013-03-21 14:46 ` Dave Hansen
2013-03-21 17:19 ` Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 02/30] mm: implement zero_huge_user_segment and friends Kirill A. Shutemov
2013-03-21 15:23 ` Dave Hansen
2013-03-22 9:21 ` Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 03/30] mm: drop actor argument of do_generic_file_read() Kirill A. Shutemov
2013-03-15 0:21 ` Hillf Danton
2013-03-15 0:27 ` Hillf Danton
2013-03-15 13:22 ` Kirill A. Shutemov
2013-03-21 15:26 ` Dave Hansen
2013-03-14 17:50 ` [PATCHv2, RFC 04/30] radix-tree: implement preload for multiple contiguous elements Kirill A. Shutemov
2013-03-21 15:56 ` Dave Hansen
2013-03-22 9:47 ` Kirill A. Shutemov
2013-03-22 14:38 ` Dave Hansen
2013-03-25 13:03 ` Kirill A. Shutemov
2013-04-05 3:37 ` Ric Mason
2013-03-14 17:50 ` [PATCHv2, RFC 05/30] thp, mm: avoid PageUnevictable on active/inactive lru lists Kirill A. Shutemov
2013-03-21 16:15 ` Dave Hansen
2013-03-22 10:11 ` Kirill A. Shutemov
2013-04-05 3:42 ` Ric Mason
2013-03-14 17:50 ` [PATCHv2, RFC 06/30] thp, mm: basic defines for transparent huge page cache Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 07/30] thp, mm: introduce mapping_can_have_hugepages() predicate Kirill A. Shutemov
2013-03-21 16:21 ` Dave Hansen
2013-03-22 10:12 ` Kirill A. Shutemov
2013-03-22 14:44 ` Dave Hansen
2013-04-02 14:46 ` Kirill A. Shutemov
2013-04-05 3:45 ` Ric Mason
2013-04-05 3:48 ` Ric Mason
2013-03-14 17:50 ` [PATCHv2, RFC 08/30] thp, mm: rewrite add_to_page_cache_locked() to support huge pages Kirill A. Shutemov
2013-03-15 1:30 ` Hillf Danton
2013-03-15 13:23 ` Kirill A. Shutemov
2013-03-15 13:25 ` Hillf Danton
2013-03-15 13:50 ` Kirill A. Shutemov
2013-03-15 13:55 ` Hillf Danton
2013-03-15 15:05 ` Kirill A. Shutemov
2013-03-21 17:11 ` Dave Hansen [this message]
2013-03-22 10:34 ` Kirill A. Shutemov
2013-03-22 14:51 ` Dave Hansen
2013-03-14 17:50 ` [PATCHv2, RFC 09/30] thp, mm: rewrite delete_from_page_cache() " Kirill A. Shutemov
2013-03-15 2:25 ` Hillf Danton
2013-03-15 13:23 ` Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 10/30] thp, mm: locking tail page is a bug Kirill A. Shutemov
2013-03-21 17:20 ` Dave Hansen
2013-03-14 17:50 ` [PATCHv2, RFC 11/30] thp, mm: handle tail pages in page_cache_get_speculative() Kirill A. Shutemov
2013-04-05 4:03 ` Ric Mason
2013-03-14 17:50 ` [PATCHv2, RFC 12/30] thp, mm: add event counters for huge page alloc on write to a file Kirill A. Shutemov
2013-03-21 17:59 ` Dave Hansen
2013-03-26 8:40 ` Kirill A. Shutemov
2013-04-05 4:05 ` Ric Mason
2013-03-14 17:50 ` [PATCHv2, RFC 13/30] thp, mm: implement grab_cache_huge_page_write_begin() Kirill A. Shutemov
2013-03-15 2:34 ` Hillf Danton
2013-03-15 13:24 ` Kirill A. Shutemov
2013-03-15 13:30 ` Hillf Danton
2013-03-15 13:35 ` Kirill A. Shutemov
2013-03-15 13:37 ` Hillf Danton
2013-03-21 18:15 ` Dave Hansen
2013-03-26 10:48 ` Kirill A. Shutemov
2013-03-26 15:40 ` Dave
2013-03-21 18:16 ` Dave Hansen
2013-03-14 17:50 ` [PATCHv2, RFC 14/30] thp, mm: naive support of thp in generic read/write routines Kirill A. Shutemov
2013-03-15 3:11 ` Hillf Danton
2013-03-15 13:27 ` Kirill A. Shutemov
2013-03-22 15:22 ` Dave Hansen
2013-03-28 12:25 ` Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 15/30] thp, libfs: initial support of thp in simple_read/write_begin/write_end Kirill A. Shutemov
2013-03-22 18:01 ` Dave
2013-03-28 14:29 ` Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 16/30] thp: handle file pages in split_huge_page() Kirill A. Shutemov
2013-03-15 6:15 ` Hillf Danton
2013-03-15 13:26 ` Kirill A. Shutemov
2013-03-15 13:33 ` Hillf Danton
2013-03-22 18:18 ` Dave
2013-03-28 14:32 ` Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 17/30] thp: wait_split_huge_page(): serialize over i_mmap_mutex too Kirill A. Shutemov
2013-03-22 18:22 ` Dave
2013-03-28 15:08 ` Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 18/30] thp, mm: truncate support for transparent huge page cache Kirill A. Shutemov
2013-03-22 18:29 ` Dave
2013-03-28 15:31 ` Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 19/30] thp, mm: split huge page on mmap file page Kirill A. Shutemov
2013-03-15 6:58 ` Hillf Danton
2013-03-15 13:29 ` Kirill A. Shutemov
2013-03-15 13:35 ` Hillf Danton
2013-03-15 13:45 ` Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 20/30] ramfs: enable transparent huge page cache Kirill A. Shutemov
2013-04-02 16:28 ` Kirill A. Shutemov
2013-04-02 22:15 ` Hugh Dickins
2013-04-03 1:11 ` Minchan Kim
2013-04-05 6:47 ` Simon Jeons
2013-04-05 8:01 ` Minchan Kim
2013-04-05 8:22 ` Wanpeng Li
2013-04-05 8:22 ` Wanpeng Li
[not found] ` <515e89d2.e725320a.3a74.7fe7SMTPIN_ADDED_BROKEN@mx.google.com>
2013-04-05 8:31 ` Minchan Kim
2013-04-05 8:35 ` Wanpeng Li
2013-04-05 8:35 ` Wanpeng Li
2013-04-05 13:46 ` Christoph Lameter
2013-04-03 13:53 ` Christoph Lameter
2013-03-14 17:50 ` [PATCHv2, RFC 21/30] x86-64, mm: proper alignment mappings with hugepages Kirill A. Shutemov
2013-03-22 18:37 ` Dave
2013-03-14 17:50 ` [PATCHv2, RFC 22/30] mm: add huge_fault() callback to vm_operations_struct Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 23/30] thp: prepare zap_huge_pmd() to uncharge file pages Kirill A. Shutemov
2013-03-15 7:09 ` Hillf Danton
2013-03-15 13:30 ` Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 24/30] thp: move maybe_pmd_mkwrite() out of mk_huge_pmd() Kirill A. Shutemov
2013-03-15 7:31 ` Hillf Danton
2013-03-14 17:50 ` [PATCHv2, RFC 25/30] thp, mm: basic huge_fault implementation for generic_file_vm_ops Kirill A. Shutemov
2013-03-15 7:44 ` Hillf Danton
2013-03-15 13:30 ` Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 26/30] thp: extract fallback path from do_huge_pmd_anonymous_page() to a function Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 27/30] thp: initial implementation of do_huge_linear_fault() Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 28/30] thp: handle write-protect exception to file-backed huge pages Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 29/30] thp: call __vma_adjust_trans_huge() for file-backed VMA Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 30/30] thp: map file-backed huge pages on fault Kirill A. Shutemov
2013-03-15 0:33 ` [PATCHv2, RFC 00/30] Transparent huge page cache Hillf Danton
2013-03-15 13:33 ` Kirill A. Shutemov
2013-03-18 4:03 ` Simon Jeons
2013-03-18 5:23 ` Simon Jeons
2013-03-18 11:19 ` Kirill A. Shutemov
2013-03-18 11:29 ` Simon Jeons
2013-03-18 11:42 ` Kirill A. Shutemov
2013-03-18 11:42 ` Ric Mason
2013-03-20 1:09 ` Simon Jeons
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=514B3F24.3070006@sr71.net \
--to=dave@sr71.net \
--cc=aarcange@redhat.com \
--cc=ak@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=dhillf@gmail.com \
--cc=fengguang.wu@intel.com \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=kirill.shutemov@linux.intel.com \
--cc=kirill@shutemov.name \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=matthew.r.wilcox@intel.com \
--cc=mgorman@suse.de \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox