From: Dave Hansen <dave@sr71.net>
To: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Al Viro <viro@zeniv.linux.org.uk>,
Hugh Dickins <hughd@google.com>,
Wu Fengguang <fengguang.wu@intel.com>, Jan Kara <jack@suse.cz>,
Mel Gorman <mgorman@suse.de>,
linux-mm@kvack.org, Andi Kleen <ak@linux.intel.com>,
Matthew Wilcox <matthew.r.wilcox@intel.com>,
"Kirill A. Shutemov" <kirill@shutemov.name>,
Hillf Danton <dhillf@gmail.com>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCHv2, RFC 14/30] thp, mm: naive support of thp in generic read/write routines
Date: Fri, 22 Mar 2013 08:22:34 -0700 [thread overview]
Message-ID: <514C773A.6070000@sr71.net> (raw)
In-Reply-To: <1363283435-7666-15-git-send-email-kirill.shutemov@linux.intel.com>
On 03/14/2013 10:50 AM, Kirill A. Shutemov wrote:
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>
> For now we still write/read at most PAGE_CACHE_SIZE bytes a time.
>
> This implementation doesn't cover address spaces with backing store.
...
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1165,12 +1165,23 @@ find_page:
> if (unlikely(page == NULL))
> goto no_cached_page;
> }
> + if (PageTransTail(page)) {
> + page_cache_release(page);
> + page = find_get_page(mapping,
> + index & ~HPAGE_CACHE_INDEX_MASK);
> + if (!PageTransHuge(page)) {
> + page_cache_release(page);
> + goto find_page;
> + }
> + }
So, we're going to do a read of a file, and we pulled a tail page out of
the page cache. Why can't we just deal with the tail page directly?
What prevents this?
Is there something special about THP pages that keeps the head page in
the page cache after the tail has been released? I'd normally be
worried that the find_get_page() here might fail.
It's probably also worth a quick comment like:
/* can't deal with tail pages directly, move to head page */
otherwise the reassignment of "page" starts to seem a bit odd.
> if (PageReadahead(page)) {
> + BUG_ON(PageTransHuge(page));
> page_cache_async_readahead(mapping,
> ra, filp, page,
> index, last_index - index);
> }
Is this because we only do readahead for fs's with backing stores?
Could we have a comment to this effect?
> if (!PageUptodate(page)) {
> + BUG_ON(PageTransHuge(page));
> if (inode->i_blkbits == PAGE_CACHE_SHIFT ||
> !mapping->a_ops->is_partially_uptodate)
> goto page_not_up_to_date;
Same question. :)
Since your two-line description covers two topics, it's not immediately
obvious which one this BUG_ON() applies to.
> @@ -1212,18 +1223,25 @@ page_ok:
> }
> nr = nr - offset;
>
> + /* Recalculate offset in page if we've got a huge page */
> + if (PageTransHuge(page)) {
> + offset = (((loff_t)index << PAGE_CACHE_SHIFT) + offset);
> + offset &= ~HPAGE_PMD_MASK;
> + }
Does this need to be done in cases other than the path that goes through
"if(PageTransTail(page))" above? If not, I'd probably stick this code
up with the other part.
> /* If users can be writing to this page using arbitrary
> * virtual addresses, take care about potential aliasing
> * before reading the page on the kernel side.
> */
> if (mapping_writably_mapped(mapping))
> - flush_dcache_page(page);
> + flush_dcache_page(page + (offset >> PAGE_CACHE_SHIFT));
This is another case where I think adding another local variable would
essentially help the code self-document. The way it stands, it's fairly
subtle how (offset>>PAGE_CACHE_SHIFT) works and that it's conditional on
THP being enabled.
int tail_page_index = (offset >> PAGE_CACHE_SHIFT)
...
> + flush_dcache_page(page + tail_page_index);
This makes it obvious that we're indexing off something, *and* that it's
only going to be relevant when dealing with tail pages.
> /*
> * When a sequential read accesses a page several times,
> * only mark it as accessed the first time.
> */
> - if (prev_index != index || offset != prev_offset)
> + if (prev_index != index ||
> + (offset & ~PAGE_CACHE_MASK) != prev_offset)
> mark_page_accessed(page);
> prev_index = index;
>
> @@ -1238,8 +1256,9 @@ page_ok:
> * "pos" here (the actor routine has to update the user buffer
> * pointers and the remaining count).
> */
> - ret = file_read_actor(desc, page, offset, nr);
> - offset += ret;
> + ret = file_read_actor(desc, page + (offset >> PAGE_CACHE_SHIFT),
> + offset & ~PAGE_CACHE_MASK, nr);
> + offset = (offset & ~PAGE_CACHE_MASK) + ret;
^^ There's an extra space in that last line.
> index += offset >> PAGE_CACHE_SHIFT;
> offset &= ~PAGE_CACHE_MASK;
> prev_offset = offset;
> @@ -2440,8 +2459,13 @@ again:
> if (mapping_writably_mapped(mapping))
> flush_dcache_page(page);
>
> + if (PageTransHuge(page))
> + offset = pos & ~HPAGE_PMD_MASK;
> +
> pagefault_disable();
> - copied = iov_iter_copy_from_user_atomic(page, i, offset, bytes);
> + copied = iov_iter_copy_from_user_atomic(
> + page + (offset >> PAGE_CACHE_SHIFT),
> + i, offset & ~PAGE_CACHE_MASK, bytes);
> pagefault_enable();
> flush_dcache_page(page);
>
> @@ -2464,6 +2488,7 @@ again:
> * because not all segments in the iov can be copied at
> * once without a pagefault.
> */
> + offset = pos & ~PAGE_CACHE_MASK;
> bytes = min_t(unsigned long, PAGE_CACHE_SIZE - offset,
> iov_iter_single_seg_count(i));
> goto again;
>
I think the difficulty in this function is that you're now dealing with
two 'struct page's, two offsets, and two indexes. It isn't blindingly
obvious which one should be used in a given situation.
The way you've done it here might just be the best way. I'd *really*
encourage you to make sure that this is tested exhaustively, and make
sure you hit all the different paths in that function. I'd suspect
there is still a bug or two in there outside the diff context.
Would it be sane to have a set of variables like:
struct page *thp_tail_page = page + (offset >> PAGE_CACHE_SHIFT);
instead of just open-coding the masks and shifts every time?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-03-22 15:21 UTC|newest]
Thread overview: 124+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-14 17:50 [PATCHv2, RFC 00/30] Transparent huge page cache Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 01/30] block: implement add_bdi_stat() Kirill A. Shutemov
2013-03-21 14:46 ` Dave Hansen
2013-03-21 17:19 ` Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 02/30] mm: implement zero_huge_user_segment and friends Kirill A. Shutemov
2013-03-21 15:23 ` Dave Hansen
2013-03-22 9:21 ` Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 03/30] mm: drop actor argument of do_generic_file_read() Kirill A. Shutemov
2013-03-15 0:21 ` Hillf Danton
2013-03-15 0:27 ` Hillf Danton
2013-03-15 13:22 ` Kirill A. Shutemov
2013-03-21 15:26 ` Dave Hansen
2013-03-14 17:50 ` [PATCHv2, RFC 04/30] radix-tree: implement preload for multiple contiguous elements Kirill A. Shutemov
2013-03-21 15:56 ` Dave Hansen
2013-03-22 9:47 ` Kirill A. Shutemov
2013-03-22 14:38 ` Dave Hansen
2013-03-25 13:03 ` Kirill A. Shutemov
2013-04-05 3:37 ` Ric Mason
2013-03-14 17:50 ` [PATCHv2, RFC 05/30] thp, mm: avoid PageUnevictable on active/inactive lru lists Kirill A. Shutemov
2013-03-21 16:15 ` Dave Hansen
2013-03-22 10:11 ` Kirill A. Shutemov
2013-04-05 3:42 ` Ric Mason
2013-03-14 17:50 ` [PATCHv2, RFC 06/30] thp, mm: basic defines for transparent huge page cache Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 07/30] thp, mm: introduce mapping_can_have_hugepages() predicate Kirill A. Shutemov
2013-03-21 16:21 ` Dave Hansen
2013-03-22 10:12 ` Kirill A. Shutemov
2013-03-22 14:44 ` Dave Hansen
2013-04-02 14:46 ` Kirill A. Shutemov
2013-04-05 3:45 ` Ric Mason
2013-04-05 3:48 ` Ric Mason
2013-03-14 17:50 ` [PATCHv2, RFC 08/30] thp, mm: rewrite add_to_page_cache_locked() to support huge pages Kirill A. Shutemov
2013-03-15 1:30 ` Hillf Danton
2013-03-15 13:23 ` Kirill A. Shutemov
2013-03-15 13:25 ` Hillf Danton
2013-03-15 13:50 ` Kirill A. Shutemov
2013-03-15 13:55 ` Hillf Danton
2013-03-15 15:05 ` Kirill A. Shutemov
2013-03-21 17:11 ` Dave Hansen
2013-03-22 10:34 ` Kirill A. Shutemov
2013-03-22 14:51 ` Dave Hansen
2013-03-14 17:50 ` [PATCHv2, RFC 09/30] thp, mm: rewrite delete_from_page_cache() " Kirill A. Shutemov
2013-03-15 2:25 ` Hillf Danton
2013-03-15 13:23 ` Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 10/30] thp, mm: locking tail page is a bug Kirill A. Shutemov
2013-03-21 17:20 ` Dave Hansen
2013-03-14 17:50 ` [PATCHv2, RFC 11/30] thp, mm: handle tail pages in page_cache_get_speculative() Kirill A. Shutemov
2013-04-05 4:03 ` Ric Mason
2013-03-14 17:50 ` [PATCHv2, RFC 12/30] thp, mm: add event counters for huge page alloc on write to a file Kirill A. Shutemov
2013-03-21 17:59 ` Dave Hansen
2013-03-26 8:40 ` Kirill A. Shutemov
2013-04-05 4:05 ` Ric Mason
2013-03-14 17:50 ` [PATCHv2, RFC 13/30] thp, mm: implement grab_cache_huge_page_write_begin() Kirill A. Shutemov
2013-03-15 2:34 ` Hillf Danton
2013-03-15 13:24 ` Kirill A. Shutemov
2013-03-15 13:30 ` Hillf Danton
2013-03-15 13:35 ` Kirill A. Shutemov
2013-03-15 13:37 ` Hillf Danton
2013-03-21 18:15 ` Dave Hansen
2013-03-26 10:48 ` Kirill A. Shutemov
2013-03-26 15:40 ` Dave
2013-03-21 18:16 ` Dave Hansen
2013-03-14 17:50 ` [PATCHv2, RFC 14/30] thp, mm: naive support of thp in generic read/write routines Kirill A. Shutemov
2013-03-15 3:11 ` Hillf Danton
2013-03-15 13:27 ` Kirill A. Shutemov
2013-03-22 15:22 ` Dave Hansen [this message]
2013-03-28 12:25 ` Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 15/30] thp, libfs: initial support of thp in simple_read/write_begin/write_end Kirill A. Shutemov
2013-03-22 18:01 ` Dave
2013-03-28 14:29 ` Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 16/30] thp: handle file pages in split_huge_page() Kirill A. Shutemov
2013-03-15 6:15 ` Hillf Danton
2013-03-15 13:26 ` Kirill A. Shutemov
2013-03-15 13:33 ` Hillf Danton
2013-03-22 18:18 ` Dave
2013-03-28 14:32 ` Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 17/30] thp: wait_split_huge_page(): serialize over i_mmap_mutex too Kirill A. Shutemov
2013-03-22 18:22 ` Dave
2013-03-28 15:08 ` Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 18/30] thp, mm: truncate support for transparent huge page cache Kirill A. Shutemov
2013-03-22 18:29 ` Dave
2013-03-28 15:31 ` Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 19/30] thp, mm: split huge page on mmap file page Kirill A. Shutemov
2013-03-15 6:58 ` Hillf Danton
2013-03-15 13:29 ` Kirill A. Shutemov
2013-03-15 13:35 ` Hillf Danton
2013-03-15 13:45 ` Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 20/30] ramfs: enable transparent huge page cache Kirill A. Shutemov
2013-04-02 16:28 ` Kirill A. Shutemov
2013-04-02 22:15 ` Hugh Dickins
2013-04-03 1:11 ` Minchan Kim
2013-04-05 6:47 ` Simon Jeons
2013-04-05 8:01 ` Minchan Kim
2013-04-05 8:22 ` Wanpeng Li
2013-04-05 8:22 ` Wanpeng Li
[not found] ` <515e89d2.e725320a.3a74.7fe7SMTPIN_ADDED_BROKEN@mx.google.com>
2013-04-05 8:31 ` Minchan Kim
2013-04-05 8:35 ` Wanpeng Li
2013-04-05 8:35 ` Wanpeng Li
2013-04-05 13:46 ` Christoph Lameter
2013-04-03 13:53 ` Christoph Lameter
2013-03-14 17:50 ` [PATCHv2, RFC 21/30] x86-64, mm: proper alignment mappings with hugepages Kirill A. Shutemov
2013-03-22 18:37 ` Dave
2013-03-14 17:50 ` [PATCHv2, RFC 22/30] mm: add huge_fault() callback to vm_operations_struct Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 23/30] thp: prepare zap_huge_pmd() to uncharge file pages Kirill A. Shutemov
2013-03-15 7:09 ` Hillf Danton
2013-03-15 13:30 ` Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 24/30] thp: move maybe_pmd_mkwrite() out of mk_huge_pmd() Kirill A. Shutemov
2013-03-15 7:31 ` Hillf Danton
2013-03-14 17:50 ` [PATCHv2, RFC 25/30] thp, mm: basic huge_fault implementation for generic_file_vm_ops Kirill A. Shutemov
2013-03-15 7:44 ` Hillf Danton
2013-03-15 13:30 ` Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 26/30] thp: extract fallback path from do_huge_pmd_anonymous_page() to a function Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 27/30] thp: initial implementation of do_huge_linear_fault() Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 28/30] thp: handle write-protect exception to file-backed huge pages Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 29/30] thp: call __vma_adjust_trans_huge() for file-backed VMA Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 30/30] thp: map file-backed huge pages on fault Kirill A. Shutemov
2013-03-15 0:33 ` [PATCHv2, RFC 00/30] Transparent huge page cache Hillf Danton
2013-03-15 13:33 ` Kirill A. Shutemov
2013-03-18 4:03 ` Simon Jeons
2013-03-18 5:23 ` Simon Jeons
2013-03-18 11:19 ` Kirill A. Shutemov
2013-03-18 11:29 ` Simon Jeons
2013-03-18 11:42 ` Kirill A. Shutemov
2013-03-18 11:42 ` Ric Mason
2013-03-20 1:09 ` Simon Jeons
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=514C773A.6070000@sr71.net \
--to=dave@sr71.net \
--cc=aarcange@redhat.com \
--cc=ak@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=dhillf@gmail.com \
--cc=fengguang.wu@intel.com \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=kirill.shutemov@linux.intel.com \
--cc=kirill@shutemov.name \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=matthew.r.wilcox@intel.com \
--cc=mgorman@suse.de \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox