From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
To: Theodore Ts'o <tytso@mit.edu>,
Andreas Dilger <adilger.kernel@dilger.ca>,
Jan Kara <jack@suse.com>,
Andrew Morton <akpm@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
Hugh Dickins <hughd@google.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Dave Hansen <dave.hansen@intel.com>,
Vlastimil Babka <vbabka@suse.cz>,
Matthew Wilcox <willy@infradead.org>,
Ross Zwisler <ross.zwisler@linux.intel.com>,
linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-block@vger.kernel.org,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: [PATCHv5 20/36] truncate: make truncate_inode_pages_range() aware about huge pages
Date: Tue, 29 Nov 2016 14:22:48 +0300 [thread overview]
Message-ID: <20161129112304.90056-21-kirill.shutemov@linux.intel.com> (raw)
In-Reply-To: <20161129112304.90056-1-kirill.shutemov@linux.intel.com>
As with shmem_undo_range(), truncate_inode_pages_range() removes huge
pages, if it fully within range.
Partial truncate of huge pages zero out this part of THP.
Unlike with shmem, it doesn't prevent us having holes in the middle of
huge page we still can skip writeback not touched buffers.
With memory-mapped IO we would loose holes in some cases when we have
THP in page cache, since we cannot track access on 4k level in this
case.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
fs/buffer.c | 2 +-
include/linux/mm.h | 9 +++++-
mm/truncate.c | 86 ++++++++++++++++++++++++++++++++++++++++++++----------
3 files changed, 80 insertions(+), 17 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index 8e000021513c..24daf7b9bdb0 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1534,7 +1534,7 @@ void block_invalidatepage(struct page *page, unsigned int offset,
/*
* Check for overflow
*/
- BUG_ON(stop > PAGE_SIZE || stop < length);
+ BUG_ON(stop > hpage_size(page) || stop < length);
head = page_buffers(page);
bh = head;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 582844ca0b23..59e74dc57359 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1328,8 +1328,15 @@ int get_kernel_page(unsigned long start, int write, struct page **pages);
struct page *get_dump_page(unsigned long addr);
extern int try_to_release_page(struct page * page, gfp_t gfp_mask);
-extern void do_invalidatepage(struct page *page, unsigned int offset,
+extern void __do_invalidatepage(struct page *page, unsigned int offset,
unsigned int length);
+static inline void do_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
+{
+ if (page_has_private(page))
+ __do_invalidatepage(page, offset, length);
+}
+
int __set_page_dirty_nobuffers(struct page *page);
int __set_page_dirty_no_writeback(struct page *page);
diff --git a/mm/truncate.c b/mm/truncate.c
index eb3a3a45feb6..d2d95f283ec3 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -70,12 +70,12 @@ static void clear_exceptional_entry(struct address_space *mapping,
* point. Because the caller is about to free (and possibly reuse) those
* blocks on-disk.
*/
-void do_invalidatepage(struct page *page, unsigned int offset,
+void __do_invalidatepage(struct page *page, unsigned int offset,
unsigned int length)
{
void (*invalidatepage)(struct page *, unsigned int, unsigned int);
- invalidatepage = page->mapping->a_ops->invalidatepage;
+ invalidatepage = page_mapping(page)->a_ops->invalidatepage;
#ifdef CONFIG_BLOCK
if (!invalidatepage)
invalidatepage = block_invalidatepage;
@@ -100,8 +100,7 @@ truncate_complete_page(struct address_space *mapping, struct page *page)
if (page->mapping != mapping)
return -EIO;
- if (page_has_private(page))
- do_invalidatepage(page, 0, PAGE_SIZE);
+ do_invalidatepage(page, 0, hpage_size(page));
/*
* Some filesystems seem to re-dirty the page even after
@@ -273,13 +272,35 @@ void truncate_inode_pages_range(struct address_space *mapping,
unlock_page(page);
continue;
}
+
+ if (PageTransHuge(page)) {
+ int j, first = 0, last = HPAGE_PMD_NR - 1;
+
+ if (start > page->index)
+ first = start & (HPAGE_PMD_NR - 1);
+ if (index == round_down(end, HPAGE_PMD_NR))
+ last = (end - 1) & (HPAGE_PMD_NR - 1);
+
+ /* Range starts or ends in the middle of THP */
+ if (first != 0 || last != HPAGE_PMD_NR - 1) {
+ int off, len;
+ for (j = first; j <= last; j++)
+ clear_highpage(page + j);
+ off = first * PAGE_SIZE;
+ len = (last + 1) * PAGE_SIZE - off;
+ do_invalidatepage(page, off, len);
+ unlock_page(page);
+ continue;
+ }
+ }
+
truncate_inode_page(mapping, page);
unlock_page(page);
}
pagevec_remove_exceptionals(&pvec);
+ index += pvec.nr ? hpage_nr_pages(pvec.pages[pvec.nr - 1]) : 1;
pagevec_release(&pvec);
cond_resched();
- index++;
}
if (partial_start) {
@@ -294,9 +315,12 @@ void truncate_inode_pages_range(struct address_space *mapping,
wait_on_page_writeback(page);
zero_user_segment(page, partial_start, top);
cleancache_invalidate_page(mapping, page);
- if (page_has_private(page))
- do_invalidatepage(page, partial_start,
- top - partial_start);
+ if (page_has_private(page)) {
+ int off = page - compound_head(page);
+ do_invalidatepage(compound_head(page),
+ off * PAGE_SIZE + partial_start,
+ top - partial_start);
+ }
unlock_page(page);
put_page(page);
}
@@ -307,9 +331,12 @@ void truncate_inode_pages_range(struct address_space *mapping,
wait_on_page_writeback(page);
zero_user_segment(page, 0, partial_end);
cleancache_invalidate_page(mapping, page);
- if (page_has_private(page))
- do_invalidatepage(page, 0,
- partial_end);
+ if (page_has_private(page)) {
+ int off = page - compound_head(page);
+ do_invalidatepage(compound_head(page),
+ off * PAGE_SIZE,
+ partial_end);
+ }
unlock_page(page);
put_page(page);
}
@@ -323,7 +350,7 @@ void truncate_inode_pages_range(struct address_space *mapping,
index = start;
for ( ; ; ) {
- cond_resched();
+restart: cond_resched();
if (!pagevec_lookup_entries(&pvec, mapping, index,
min(end - index, (pgoff_t)PAGEVEC_SIZE), indices)) {
/* If all gone from start onwards, we're done */
@@ -346,8 +373,8 @@ void truncate_inode_pages_range(struct address_space *mapping,
index = indices[i];
if (index >= end) {
/* Restart punch to make sure all gone */
- index = start - 1;
- break;
+ index = start;
+ goto restart;
}
if (radix_tree_exceptional_entry(page)) {
@@ -358,12 +385,41 @@ void truncate_inode_pages_range(struct address_space *mapping,
lock_page(page);
WARN_ON(page_to_index(page) != index);
wait_on_page_writeback(page);
+
+ if (PageTransHuge(page)) {
+ int j, first = 0, last = HPAGE_PMD_NR - 1;
+
+ if (start > page->index)
+ first = start & (HPAGE_PMD_NR - 1);
+ if (index == round_down(end, HPAGE_PMD_NR))
+ last = (end - 1) & (HPAGE_PMD_NR - 1);
+
+ /*
+ * On Partial thp truncate due 'start' in
+ * middle of THP: don't need to look on these
+ * pages again on !pvec.nr restart.
+ */
+ start = page->index + HPAGE_PMD_NR;
+
+ /* Range starts or ends in the middle of THP */
+ if (first != 0 || last != HPAGE_PMD_NR - 1) {
+ int off, len;
+ for (j = first; j <= last; j++)
+ clear_highpage(page + j);
+ off = first * PAGE_SIZE;
+ len = (last + 1) * PAGE_SIZE - off;
+ do_invalidatepage(page, off, len);
+ unlock_page(page);
+ continue;
+ }
+ }
+
truncate_inode_page(mapping, page);
unlock_page(page);
}
pagevec_remove_exceptionals(&pvec);
+ index += pvec.nr ? hpage_nr_pages(pvec.pages[pvec.nr - 1]) : 1;
pagevec_release(&pvec);
- index++;
}
cleancache_invalidate_inode(mapping);
}
--
2.10.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-11-29 11:24 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-29 11:22 [PATCHv5 00/36] ext4: support of " Kirill A. Shutemov
2016-11-29 11:22 ` [PATCHv5 01/36] mm, shmem: swich huge tmpfs to multi-order radix-tree entries Kirill A. Shutemov
2016-11-29 11:22 ` [PATCHv5 02/36] Revert "radix-tree: implement radix_tree_maybe_preload_order()" Kirill A. Shutemov
2016-11-29 11:22 ` [PATCHv5 03/36] page-flags: relax page flag policy for few flags Kirill A. Shutemov
2016-11-29 11:22 ` [PATCHv5 04/36] mm, rmap: account file thp pages Kirill A. Shutemov
2016-11-29 11:22 ` [PATCHv5 05/36] thp: try to free page's buffers before attempt split Kirill A. Shutemov
2016-11-29 11:22 ` [PATCHv5 06/36] thp: handle write-protection faults for file THP Kirill A. Shutemov
2016-11-29 11:22 ` [PATCHv5 07/36] filemap: allocate huge page in page_cache_read(), if allowed Kirill A. Shutemov
2016-11-29 11:22 ` [PATCHv5 08/36] filemap: handle huge pages in do_generic_file_read() Kirill A. Shutemov
2016-11-29 11:22 ` [PATCHv5 09/36] filemap: allocate huge page in pagecache_get_page(), if allowed Kirill A. Shutemov
2016-11-29 11:22 ` [PATCHv5 10/36] filemap: handle huge pages in filemap_fdatawait_range() Kirill A. Shutemov
2016-11-29 11:22 ` [PATCHv5 11/36] HACK: readahead: alloc huge pages, if allowed Kirill A. Shutemov
2016-11-29 11:22 ` [PATCHv5 12/36] brd: make it handle huge pages Kirill A. Shutemov
2016-11-29 11:22 ` [PATCHv5 13/36] mm: make write_cache_pages() work on " Kirill A. Shutemov
2016-11-29 11:22 ` [PATCHv5 14/36] thp: introduce hpage_size() and hpage_mask() Kirill A. Shutemov
2016-11-29 11:22 ` [PATCHv5 15/36] thp: do not threat slab pages as huge in hpage_{nr_pages,size,mask} Kirill A. Shutemov
2016-11-29 11:22 ` [PATCHv5 16/36] thp: make thp_get_unmapped_area() respect S_HUGE_MODE Kirill A. Shutemov
2016-11-29 11:22 ` [PATCHv5 17/36] fs: make block_read_full_page() be able to read huge page Kirill A. Shutemov
2016-11-29 11:22 ` [PATCHv5 18/36] fs: make block_write_{begin,end}() be able to handle huge pages Kirill A. Shutemov
2016-11-29 11:22 ` [PATCHv5 19/36] fs: make block_page_mkwrite() aware about " Kirill A. Shutemov
2016-11-29 11:22 ` Kirill A. Shutemov [this message]
2016-11-29 11:22 ` [PATCHv5 21/36] truncate: make invalidate_inode_pages2_range() " Kirill A. Shutemov
2016-11-29 11:22 ` [PATCHv5 22/36] mm, hugetlb: switch hugetlbfs to multi-order radix-tree entries Kirill A. Shutemov
2016-11-30 9:48 ` Hillf Danton
2016-11-30 13:15 ` Kirill A. Shutemov
2016-12-01 3:10 ` Hillf Danton
2016-11-29 11:22 ` [PATCHv5 23/36] mm: account huge pages to dirty, writaback, reclaimable, etc Kirill A. Shutemov
2016-11-29 11:22 ` [PATCHv5 24/36] ext4: make ext4_mpage_readpages() hugepage-aware Kirill A. Shutemov
2016-11-29 11:22 ` [PATCHv5 25/36] ext4: make ext4_writepage() work on huge pages Kirill A. Shutemov
2016-11-29 11:22 ` [PATCHv5 26/36] ext4: handle huge pages in ext4_page_mkwrite() Kirill A. Shutemov
2016-11-29 11:22 ` [PATCHv5 27/36] ext4: handle huge pages in __ext4_block_zero_page_range() Kirill A. Shutemov
2016-11-29 11:22 ` [PATCHv5 28/36] ext4: make ext4_block_write_begin() aware about huge pages Kirill A. Shutemov
2016-11-29 11:22 ` [PATCHv5 29/36] ext4: handle huge pages in ext4_da_write_end() Kirill A. Shutemov
2016-11-29 11:22 ` [PATCHv5 30/36] ext4: make ext4_da_page_release_reservation() aware about huge pages Kirill A. Shutemov
2016-11-29 11:22 ` [PATCHv5 31/36] ext4: handle writeback with " Kirill A. Shutemov
2016-11-29 11:23 ` [PATCHv5 32/36] ext4: make EXT4_IOC_MOVE_EXT work " Kirill A. Shutemov
2016-11-29 11:23 ` [PATCHv5 33/36] ext4: fix SEEK_DATA/SEEK_HOLE for " Kirill A. Shutemov
2016-11-29 11:23 ` [PATCHv5 34/36] ext4: make fallocate() operations work with " Kirill A. Shutemov
2016-11-29 11:23 ` [PATCHv5 35/36] mm, fs, ext4: expand use of page_mapping() and page_to_pgoff() Kirill A. Shutemov
2016-11-29 11:23 ` [PATCHv5 36/36] ext4, vfs: add huge= mount option Kirill A. Shutemov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161129112304.90056-21-kirill.shutemov@linux.intel.com \
--to=kirill.shutemov@linux.intel.com \
--cc=aarcange@redhat.com \
--cc=adilger.kernel@dilger.ca \
--cc=akpm@linux-foundation.org \
--cc=dave.hansen@intel.com \
--cc=hughd@google.com \
--cc=jack@suse.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ross.zwisler@linux.intel.com \
--cc=tytso@mit.edu \
--cc=vbabka@suse.cz \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox