linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Hugh Dickins <hughd@google.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Ning Qu <quning@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 11/24] huge tmpfs: shrinker to migrate and free underused holes
Date: Sun, 22 Mar 2015 21:40:02 -0700 (PDT)	[thread overview]
Message-ID: <alpine.LSU.2.11.1503222046510.5278@eggly.anvils> (raw)
In-Reply-To: <550AFFD5.40607@yandex-team.ru>

On Thu, 19 Mar 2015, Konstantin Khlebnikov wrote:
> On 21.02.2015 07:09, Hugh Dickins wrote:
> > 
> > The "team_usage" field added to struct page (in union with "private")
> > is somewhat vaguely named: because while the huge page is sparsely
> > occupied, it counts the occupancy; but once the huge page is fully
> > occupied, it will come to be used differently in a later patch, as
> > the huge mapcount (offset by the HPAGE_PMD_NR occupancy) - it is
> > never possible to map a sparsely occupied huge page, because that
> > would expose stale data to the user.
> 
> That might be a problem if this approach is supposed to be used for
> normal filesystems.

Yes, most filesystems have their own use for page->private.
My concern at this stage has just been to have a good implementation
for tmpfs, but Kirill and others are certainly interested in looking
beyond that.

> Instead of adding dedicated counter shmem could
> detect partially occupied page by scanning though all tail pages and
> checking PageUptodate() and bump mapcount for all tail pages prevent
> races between mmap and truncate. Overhead shouldn't be that big, also
> we can add fastpath - mark completely uptodate page with one of unused
> page flag (PG_private or something).

I do already use PageChecked (PG_owner_priv_1) for just that purpose:
noting all subpages Uptodate (and marked Dirty) when first mapping by
pmd (in 12/24).

But don't bump mapcount on the subpages, just the head: I don't mind
doing a pass down the subpages when it's first hugely mapped, but prefer
to avoid such a pass on every huge map and unmap - seems unnecessary.

The team_usage (== private) field ends up with three or four separate
counts (and an mlocked flag) packed into it: I expect we could trade
some of those counts for scans down the 512 subpages when necessary,
but I doubt it's a good tradeoff; and keeping atomicity would be
difficult (I've never wanted to have to take page_lock or somesuch
on every page in zap_pte_range).  Without atomicity the stats go wrong
(I think Kirill has a problem of that kind in his page_remove_rmap scan).

It will be interesting to see what Kirill does to maintain the stats
for huge pagecache: but he will have no difficulty in finding fields
to store counts, because he's got lots of spare fields in those 511
tail pages - that's a useful benefit of the compound page, but does
prevent the tails from being used in ordinary ways.  (I did try using
team_head[1].team_usage for more, but atomicity needs prevented it.)

> 
> Another (strange) idea is adding separate array of struct huge_page
> into each zone. They will work as headers for huge pages and hold
> that kind of fields. Pageblock flags also could be stored here.

It's not such a strange idea, it is a definite possibility.  Though
I've tended to think of them more as a separate array of struct pages,
one for each of the hugepages.

It's a complication I'll keep away from as long as I can, but something
like that will probably have to come.  Consider the ambiguity of the
head page, whose flags and counts may represent the 4k page mapped
by pte and the 2M page mapped by pmd: there's an absurdity to that,
one that I can live with for now, but expect some nasty case to demand
a change (the way I have it at present, just mlocking the 4k head is
enough to hold the 2M hugepage in memory: that's not good, but should
be quite easily fixed without needing the hugepage array itself).

And I think ideas may emerge from the persistent memory struct page
discussions, which feed in here.  One reason to hold back for now.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2015-03-23  4:40 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-21  3:49 [PATCH 00/24] huge tmpfs: an alternative approach to THPageCache Hugh Dickins
2015-02-21  3:51 ` [PATCH 01/24] mm: update_lru_size warn and reset bad lru_size Hugh Dickins
2015-02-23  9:30   ` Kirill A. Shutemov
2015-03-23  2:44     ` Hugh Dickins
2015-02-21  3:54 ` [PATCH 02/24] mm: update_lru_size do the __mod_zone_page_state Hugh Dickins
2015-02-21  3:56 ` [PATCH 03/24] mm: use __SetPageSwapBacked and don't ClearPageSwapBacked Hugh Dickins
2015-02-25 10:53   ` Mel Gorman
2015-03-23  3:01     ` Hugh Dickins
2015-02-21  3:58 ` [PATCH 04/24] mm: make page migration's newpage handling more robust Hugh Dickins
2015-02-21  4:00 ` [PATCH 05/24] tmpfs: preliminary minor tidyups Hugh Dickins
2015-02-21  4:01 ` [PATCH 06/24] huge tmpfs: prepare counts in meminfo, vmstat and SysRq-m Hugh Dickins
2015-02-21  4:03 ` [PATCH 07/24] huge tmpfs: include shmem freeholes in available memory counts Hugh Dickins
2015-02-21  4:05 ` [PATCH 08/24] huge tmpfs: prepare huge=N mount option and /proc/sys/vm/shmem_huge Hugh Dickins
2015-02-21  4:06 ` [PATCH 09/24] huge tmpfs: try to allocate huge pages, split into a team Hugh Dickins
2015-02-21  4:07 ` [PATCH 10/24] huge tmpfs: avoid team pages in a few places Hugh Dickins
2015-02-21  4:09 ` [PATCH 11/24] huge tmpfs: shrinker to migrate and free underused holes Hugh Dickins
2015-03-19 16:56   ` Konstantin Khlebnikov
2015-03-23  4:40     ` Hugh Dickins [this message]
2015-03-23 12:50       ` Kirill A. Shutemov
2015-03-23 13:50         ` Kirill A. Shutemov
2015-03-24 12:57       ` Kirill A. Shutemov
2015-03-25  0:41         ` Hugh Dickins
2015-02-21  4:11 ` [PATCH 12/24] huge tmpfs: get_unmapped_area align and fault supply huge page Hugh Dickins
2015-02-21  4:12 ` [PATCH 13/24] huge tmpfs: extend get_user_pages_fast to shmem pmd Hugh Dickins
2015-02-21  4:13 ` [PATCH 14/24] huge tmpfs: extend vma_adjust_trans_huge " Hugh Dickins
2015-02-21  4:15 ` [PATCH 15/24] huge tmpfs: rework page_referenced_one and try_to_unmap_one Hugh Dickins
2015-02-21  4:16 ` [PATCH 16/24] huge tmpfs: fix problems from premature exposure of pagetable Hugh Dickins
2015-07-01 10:53   ` Kirill A. Shutemov
2015-02-21  4:18 ` [PATCH 17/24] huge tmpfs: map shmem by huge page pmd or by page team ptes Hugh Dickins
2015-02-21  4:20 ` [PATCH 18/24] huge tmpfs: mmap_sem is unlocked when truncation splits huge pmd Hugh Dickins
2015-02-21  4:22 ` [PATCH 19/24] huge tmpfs: disband split huge pmds on race or memory failure Hugh Dickins
2015-02-21  4:23 ` [PATCH 20/24] huge tmpfs: use Unevictable lru with variable hpage_nr_pages() Hugh Dickins
2015-02-21  4:25 ` [PATCH 21/24] huge tmpfs: fix Mlocked meminfo, tracking huge and unhuge mlocks Hugh Dickins
2015-02-21  4:27 ` [PATCH 22/24] huge tmpfs: fix Mapped meminfo, tracking huge and unhuge mappings Hugh Dickins
2015-02-21  4:29 ` [PATCH 23/24] kvm: plumb return of hva when resolving page fault Hugh Dickins
2015-02-21  4:31 ` [PATCH 24/24] kvm: teach kvm to map page teams as huge pages Hugh Dickins
2015-02-23 13:48 ` [PATCH 00/24] huge tmpfs: an alternative approach to THPageCache Kirill A. Shutemov
2015-03-23  2:25   ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LSU.2.11.1503222046510.5278@eggly.anvils \
    --to=hughd@google.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=khlebnikov@yandex-team.ru \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=quning@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox