linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andreas Dilger <adilger@dilger.ca>
To: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Jan Kara <jack@suse.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Hugh Dickins <hughd@google.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Matthew Wilcox <willy@infradead.org>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-block@vger.kernel.org
Subject: Re: [PATCHv2, 00/41] ext4: support of huge pages
Date: Sun, 14 Aug 2016 01:20:12 -0600	[thread overview]
Message-ID: <638E01BE-FD45-465C-8464-2E2D96ED6787@dilger.ca> (raw)
In-Reply-To: <1471027104-115213-1-git-send-email-kirill.shutemov@linux.intel.com>

[-- Attachment #1: Type: text/plain, Size: 2963 bytes --]

On Aug 12, 2016, at 12:37 PM, Kirill A. Shutemov <kirill.shutemov@linux.intel.com> wrote:
> 
> Here's stabilized version of my patchset which intended to bring huge pages
> to ext4.
> 
> The basics are the same as with tmpfs[1] which is in Linus' tree now and
> ext4 built on top of it. The main difference is that we need to handle
> read out from and write-back to backing storage.
> 
> Head page links buffers for whole huge page. Dirty/writeback tracking
> happens on per-hugepage level.
> 
> We read out whole huge page at once. It required bumping BIO_MAX_PAGES to
> not less than HPAGE_PMD_NR. I defined BIO_MAX_PAGES to HPAGE_PMD_NR if
> huge pagecache enabled.
> 
> On split_huge_page() we need to free buffers before splitting the page.
> Page buffers takes additional pin on the page and can be a vector to mess
> with the page during split. We want to avoid this.
> If try_to_free_buffers() fails, split_huge_page() would return -EBUSY.
> 
> Readahead doesn't play with huge pages well: 128k max readahead window,
> assumption on page size, PageReadahead() to track hit/miss.  I've got it
> to allocate huge pages, but it doesn't provide any readahead as such.
> I don't know how to do this right. It's not clear at this point if we
> really need readahead with huge pages. I guess it's good enough for now.

Typically read-ahead is a loss if you are able to get large allocations on
disk, since you can get at least seek_rate * chunk_size throughput from the
disks even with random IO at that size.  With 1MB allocations and 7200 RPM drives this works out to be about 150MB/s, which is close to the throughput
of these drive already.

Cheers, Andreas

> Shadow entries ignored on allocation -- recently evicted page is not
> promoted to active list. Not sure if current workingset logic is adequate
> for huge pages. On eviction, we split the huge page and setup 4k shadow
> entries as usual.
> 
> Unlike tmpfs, ext4 makes use of tags in radix-tree. The approach I used
> for tmpfs -- 512 entries in radix-tree per-hugepages -- doesn't work well
> if we want to have coherent view on tags. So the first 8 patches of the
> patchset converts tmpfs to use multi-order entries in radix-tree.
> The same infrastructure used for ext4.
> 
> Encryption doesn't handle huge pages yet. To avoid regressions we just
> disable huge pages for the inode if it has EXT4_INODE_ENCRYPT.
> 
> With this version I don't see any xfstests regressions with huge pages enabled.
> Patch with new configurations for xfstests-bld is below.
> 
> Tested with 4k, 1k, encryption and bigalloc. All with and without
> huge=always. I think it's reasonable coverage.
> 
> The patchset is also in git:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git hugeext4/v2
> 
> Please review and consider applying.
> 
> [1] http://lkml.kernel.org/r/1465222029-45942-1-git-send-email-kirill.shutemov@linux.intel.com


[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  parent reply	other threads:[~2016-08-14  7:20 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-12 18:37 Kirill A. Shutemov
2016-08-12 18:37 ` [PATCHv2 01/41] tools: Add WARN_ON_ONCE Kirill A. Shutemov
2016-08-12 18:37 ` [PATCHv2 02/41] radix tree test suite: Allow GFP_ATOMIC allocations to fail Kirill A. Shutemov
2016-08-12 18:37 ` [PATCHv2 03/41] radix-tree: Add radix_tree_join Kirill A. Shutemov
2016-08-12 18:37 ` [PATCHv2 04/41] radix-tree: Add radix_tree_split Kirill A. Shutemov
2016-08-12 18:37 ` [PATCHv2 05/41] radix-tree: Add radix_tree_split_preload() Kirill A. Shutemov
2016-08-12 18:37 ` [PATCHv2 06/41] radix-tree: Handle multiorder entries being deleted by replace_clear_tags Kirill A. Shutemov
2016-08-12 18:37 ` [PATCHv2 07/41] mm, shmem: swich huge tmpfs to multi-order radix-tree entries Kirill A. Shutemov
2016-08-12 18:37 ` [PATCHv2 08/41] Revert "radix-tree: implement radix_tree_maybe_preload_order()" Kirill A. Shutemov
2016-08-12 18:37 ` [PATCHv2 09/41] page-flags: relax page flag policy for few flags Kirill A. Shutemov
2016-08-12 18:37 ` [PATCHv2 10/41] mm, rmap: account file thp pages Kirill A. Shutemov
2016-08-12 18:37 ` [PATCHv2 11/41] thp: try to free page's buffers before attempt split Kirill A. Shutemov
2016-08-12 18:37 ` [PATCHv2 12/41] thp: handle write-protection faults for file THP Kirill A. Shutemov
2016-08-12 18:37 ` [PATCHv2 13/41] truncate: make sure invalidate_mapping_pages() can discard huge pages Kirill A. Shutemov
2016-08-12 18:37 ` [PATCHv2 14/41] filemap: allocate huge page in page_cache_read(), if allowed Kirill A. Shutemov
2016-08-12 18:37 ` [PATCHv2 15/41] filemap: handle huge pages in do_generic_file_read() Kirill A. Shutemov
2016-08-12 18:37 ` [PATCHv2 16/41] filemap: allocate huge page in pagecache_get_page(), if allowed Kirill A. Shutemov
2016-08-12 18:38 ` [PATCHv2 17/41] filemap: handle huge pages in filemap_fdatawait_range() Kirill A. Shutemov
2016-08-12 18:38 ` [PATCHv2 18/41] HACK: readahead: alloc huge pages, if allowed Kirill A. Shutemov
2016-08-12 18:38 ` [PATCHv2 19/41] block: define BIO_MAX_PAGES to HPAGE_PMD_NR if huge page cache enabled Kirill A. Shutemov
2016-08-12 18:38 ` [PATCHv2 20/41] mm: make write_cache_pages() work on huge pages Kirill A. Shutemov
2016-08-12 18:38 ` [PATCHv2 21/41] thp: introduce hpage_size() and hpage_mask() Kirill A. Shutemov
2016-08-12 18:38 ` [PATCHv2 22/41] thp: do not threat slab pages as huge in hpage_{nr_pages,size,mask} Kirill A. Shutemov
2016-08-12 18:38 ` [PATCHv2 23/41] fs: make block_read_full_page() be able to read huge page Kirill A. Shutemov
2016-08-12 18:38 ` [PATCHv2 24/41] fs: make block_write_{begin,end}() be able to handle huge pages Kirill A. Shutemov
2016-08-12 18:38 ` [PATCHv2 25/41] fs: make block_page_mkwrite() aware about " Kirill A. Shutemov
2016-08-12 18:38 ` [PATCHv2 26/41] truncate: make truncate_inode_pages_range() " Kirill A. Shutemov
2016-08-12 18:38 ` [PATCHv2 27/41] truncate: make invalidate_inode_pages2_range() " Kirill A. Shutemov
2016-08-12 18:38 ` [PATCHv2 28/41] mm, hugetlb: switch hugetlbfs to multi-order radix-tree entries Kirill A. Shutemov
2016-08-12 18:38 ` [PATCHv2 29/41] ext4: make ext4_mpage_readpages() hugepage-aware Kirill A. Shutemov
2016-08-12 18:38 ` [PATCHv2 30/41] ext4: make ext4_writepage() work on huge pages Kirill A. Shutemov
2016-08-12 18:38 ` [PATCHv2 31/41] ext4: handle huge pages in ext4_page_mkwrite() Kirill A. Shutemov
2016-08-12 18:38 ` [PATCHv2 32/41] ext4: handle huge pages in __ext4_block_zero_page_range() Kirill A. Shutemov
2016-08-12 18:38 ` [PATCHv2 33/41] ext4: make ext4_block_write_begin() aware about huge pages Kirill A. Shutemov
2016-08-12 18:38 ` [PATCHv2 34/41] ext4: handle huge pages in ext4_da_write_end() Kirill A. Shutemov
2016-08-12 18:38 ` [PATCHv2 35/41] ext4: make ext4_da_page_release_reservation() aware about huge pages Kirill A. Shutemov
2016-08-12 18:38 ` [PATCHv2 36/41] ext4: handle writeback with " Kirill A. Shutemov
2016-08-12 18:38 ` [PATCHv2 37/41] ext4: make EXT4_IOC_MOVE_EXT work " Kirill A. Shutemov
2016-08-12 18:38 ` [PATCHv2 38/41] ext4: fix SEEK_DATA/SEEK_HOLE for " Kirill A. Shutemov
2016-08-12 18:38 ` [PATCHv2 39/41] ext4: make fallocate() operations work with " Kirill A. Shutemov
2016-08-12 18:38 ` [PATCHv2 40/41] mm, fs, ext4: expand use of page_mapping() and page_to_pgoff() Kirill A. Shutemov
2016-08-12 18:38 ` [PATCHv2 41/41] ext4, vfs: add huge= mount option Kirill A. Shutemov
2016-08-12 20:34 ` [PATCHv2, 00/41] ext4: support of huge pages Theodore Ts'o
2016-08-12 23:19   ` Kirill A. Shutemov
2016-08-14  7:20 ` Andreas Dilger [this message]
2016-08-14 12:40   ` Kirill A. Shutemov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=638E01BE-FD45-465C-8464-2E2D96ED6787@dilger.ca \
    --to=adilger@dilger.ca \
    --cc=aarcange@redhat.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@intel.com \
    --cc=hughd@google.com \
    --cc=jack@suse.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ross.zwisler@linux.intel.com \
    --cc=tytso@mit.edu \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox