[PATCH 15/31] huge tmpfs: fix Mapped meminfo, track huge & unhuge mappings

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Hugh Dickins <hughd@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Andres Lagar-Cavilla <andreslc@google.com>,
	Yang Shi <yang.shi@linaro.org>, Ning Qu <quning@gmail.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: [PATCH 15/31] huge tmpfs: fix Mapped meminfo, track huge & unhuge mappings
Date: Tue, 5 Apr 2016 14:37:03 -0700 (PDT)	[thread overview]
Message-ID: <alpine.LSU.2.11.1604051435460.5965@eggly.anvils> (raw)
In-Reply-To: <alpine.LSU.2.11.1604051403210.5965@eggly.anvils>

Maintaining Mlocked was the difficult one, but now that it is correctly
tracked, without duplication between the 4kB and 2MB amounts, I think
we have to make a similar effort with Mapped.

But whereas mlock and munlock were already rare and slow operations,
to which we could fairly add a little more overhead in the huge tmpfs
case, ordinary mmap is not something we want to slow down further,
relative to hugetlbfs.

In the Mapped case, I think we can take small or misaligned mmaps of
huge tmpfs files as the exceptional operation, and add a little more
overhead to those, by maintaining another count for them in the head;
and by keeping both hugely and unhugely mapped counts in the one long,
can rely on cmpxchg to manage their racing transitions atomically.

That's good on 64-bit, but there are not enough free bits in a 32-bit
atomic_long_t team_usage to support this: I think we should continue
to permit huge tmpfs on 32-bit, but accept that Mapped may be doubly
counted there.  (A more serious problem on 32-bit is that it would,
I think, be possible to overflow the huge mapping counter: protection
against that will need to be added.)

Now that we are maintaining NR_FILE_MAPPED correctly for huge
tmpfs, adjust vmscan's zone_unmapped_file_pages() to exclude
NR_SHMEM_PMDMAPPED, which it clearly would not want included.
Whereas minimum_image_size() in kernel/power/snapshot.c?  I have
not grasped the basis for that calculation, so leaving untouched.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 include/linux/memcontrol.h |    5 +
 include/linux/pageteam.h   |  144 ++++++++++++++++++++++++++++++++---
 mm/huge_memory.c           |   34 +++++++-
 mm/rmap.c                  |   10 +-
 mm/vmscan.c                |    6 +
 5 files changed, 180 insertions(+), 19 deletions(-)

--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -700,6 +700,11 @@ static inline bool mem_cgroup_oom_synchr
 	return false;
 }
 
+static inline void mem_cgroup_update_page_stat(struct page *page,
+				enum mem_cgroup_stat_index idx, int val)
+{
+}
+
 static inline void mem_cgroup_inc_page_stat(struct page *page,
 					    enum mem_cgroup_stat_index idx)
 {
--- a/include/linux/pageteam.h
+++ b/include/linux/pageteam.h
@@ -30,6 +30,30 @@ static inline struct page *team_head(str
 }
 
 /*
+ * Layout of team head's page->team_usage field, as on x86_64 and arm64_4K:
+ *
+ *  63        32 31          22 21      12     11         10    9          0
+ * +------------+--------------+----------+----------+---------+------------+
+ * | pmd_mapped & instantiated |pte_mapped| reserved | mlocked | lru_weight |
+ * |   42 bits       10 bits   |  10 bits |  1 bit   |  1 bit  |   10 bits  |
+ * +------------+--------------+----------+----------+---------+------------+
+ *
+ * TEAM_LRU_WEIGHT_ONE               1  (1<<0)
+ * TEAM_LRU_WEIGHT_MASK            3ff  (1<<10)-1
+ * TEAM_PMD_MLOCKED                400  (1<<10)
+ * TEAM_RESERVED_FLAG              800  (1<<11)
+ * TEAM_PTE_COUNTER               1000  (1<<12)
+ * TEAM_PTE_MASK                3ff000  (1<<22)-(1<<12)
+ * TEAM_PAGE_COUNTER            400000  (1<<22)
+ * TEAM_COMPLETE              80000000  (1<<31)
+ * TEAM_MAPPING_COUNTER         400000  (1<<22)
+ * TEAM_PMD_MAPPED            80400000  (1<<31)
+ *
+ * The upper bits count up to TEAM_COMPLETE as pages are instantiated,
+ * and then, above TEAM_COMPLETE, they count huge mappings of the team.
+ * Team tails have team_usage either 1 (lru_weight 1) or 0 (lru_weight 0).
+ */
+/*
  * Mask for lower bits of team_usage, giving the weight 0..HPAGE_PMD_NR of the
  * page on its LRU: normal pages have weight 1, tails held unevictable until
  * head is evicted have weight 0, and the head gathers weight 1..HPAGE_PMD_NR.
@@ -42,8 +66,22 @@ static inline struct page *team_head(str
  */
 #define TEAM_PMD_MLOCKED	(1L << (HPAGE_PMD_ORDER + 1))
 #define TEAM_RESERVED_FLAG	(1L << (HPAGE_PMD_ORDER + 2))
-
+#ifdef CONFIG_64BIT
+/*
+ * Count how many pages of team are individually mapped into userspace.
+ */
+#define TEAM_PTE_COUNTER	(1L << (HPAGE_PMD_ORDER + 3))
+#define TEAM_HIGH_COUNTER	(1L << (2*HPAGE_PMD_ORDER + 4))
+#define TEAM_PTE_MASK		(TEAM_HIGH_COUNTER - TEAM_PTE_COUNTER)
+#define team_pte_count(usage)	(((usage) & TEAM_PTE_MASK) / TEAM_PTE_COUNTER)
+#else /* 32-bit */
+/*
+ * Not enough bits in atomic_long_t: we prefer not to bloat struct page just to
+ * avoid duplication in Mapped, when a page is mapped both hugely and unhugely.
+ */
 #define TEAM_HIGH_COUNTER	(1L << (HPAGE_PMD_ORDER + 3))
+#define team_pte_count(usage)	1 /* allows for the extra page_add_file_rmap */
+#endif /* CONFIG_64BIT */
 /*
  * Count how many pages of team are instantiated, as it is built up.
  */
@@ -66,22 +104,110 @@ static inline bool team_pmd_mapped(struc
 
 /*
  * Returns true if this was the first mapping by pmd, whereupon mapped stats
- * need to be updated.
+ * need to be updated.  Together with the number of pages which then need
+ * to be accounted (can be ignored when false returned): because some team
+ * members may have been mapped unhugely by pte, so already counted as Mapped.
  */
-static inline bool inc_team_pmd_mapped(struct page *head)
+static inline bool inc_team_pmd_mapped(struct page *head, int *nr_pages)
 {
-	return atomic_long_add_return(TEAM_MAPPING_COUNTER, &head->team_usage)
-		< TEAM_PMD_MAPPED + TEAM_MAPPING_COUNTER;
+	long team_usage;
+
+	team_usage = atomic_long_add_return(TEAM_MAPPING_COUNTER,
+					    &head->team_usage);
+	*nr_pages = HPAGE_PMD_NR - team_pte_count(team_usage);
+	return team_usage < TEAM_PMD_MAPPED + TEAM_MAPPING_COUNTER;
 }
 
 /*
  * Returns true if this was the last mapping by pmd, whereupon mapped stats
- * need to be updated.
+ * need to be updated.  Together with the number of pages which then need
+ * to be accounted (can be ignored when false returned): because some team
+ * members may still be mapped unhugely by pte, so remain counted as Mapped.
+ */
+static inline bool dec_team_pmd_mapped(struct page *head, int *nr_pages)
+{
+	long team_usage;
+
+	team_usage = atomic_long_sub_return(TEAM_MAPPING_COUNTER,
+					    &head->team_usage);
+	*nr_pages = HPAGE_PMD_NR - team_pte_count(team_usage);
+	return team_usage < TEAM_PMD_MAPPED;
+}
+
+/*
+ * Returns true if this pte mapping is of a non-team page, or of a team page not
+ * covered by an existing huge pmd mapping: whereupon stats need to be updated.
+ * Only called when mapcount goes up from 0 to 1 i.e. _mapcount from -1 to 0.
+ */
+static inline bool inc_team_pte_mapped(struct page *page)
+{
+#ifdef CONFIG_64BIT
+	struct page *head;
+	long team_usage;
+	long old;
+
+	if (likely(!PageTeam(page)))
+		return true;
+	head = team_head(page);
+	team_usage = atomic_long_read(&head->team_usage);
+	for (;;) {
+		/* Is team now being disbanded? Stop once team_usage is reset */
+		if (unlikely(!PageTeam(head) ||
+			     team_usage / TEAM_PAGE_COUNTER == 0))
+			return true;
+		/*
+		 * XXX: but despite the impressive-looking cmpxchg, gthelen
+		 * points out that head might be freed and reused and assigned
+		 * a matching value in ->private now: tiny chance, must revisit.
+		 */
+		old = atomic_long_cmpxchg(&head->team_usage,
+			team_usage, team_usage + TEAM_PTE_COUNTER);
+		if (likely(old == team_usage))
+			break;
+		team_usage = old;
+	}
+	return team_usage < TEAM_PMD_MAPPED;
+#else /* 32-bit */
+	return true;
+#endif
+}
+
+/*
+ * Returns true if this pte mapping is of a non-team page, or of a team page not
+ * covered by a remaining huge pmd mapping: whereupon stats need to be updated.
+ * Only called when mapcount goes down from 1 to 0 i.e. _mapcount from 0 to -1.
  */
-static inline bool dec_team_pmd_mapped(struct page *head)
+static inline bool dec_team_pte_mapped(struct page *page)
 {
-	return atomic_long_sub_return(TEAM_MAPPING_COUNTER, &head->team_usage)
-		< TEAM_PMD_MAPPED;
+#ifdef CONFIG_64BIT
+	struct page *head;
+	long team_usage;
+	long old;
+
+	if (likely(!PageTeam(page)))
+		return true;
+	head = team_head(page);
+	team_usage = atomic_long_read(&head->team_usage);
+	for (;;) {
+		/* Is team now being disbanded? Stop once team_usage is reset */
+		if (unlikely(!PageTeam(head) ||
+			     team_usage / TEAM_PAGE_COUNTER == 0))
+			return true;
+		/*
+		 * XXX: but despite the impressive-looking cmpxchg, gthelen
+		 * points out that head might be freed and reused and assigned
+		 * a matching value in ->private now: tiny chance, must revisit.
+		 */
+		old = atomic_long_cmpxchg(&head->team_usage,
+			team_usage, team_usage - TEAM_PTE_COUNTER);
+		if (likely(old == team_usage))
+			break;
+		team_usage = old;
+	}
+	return team_usage < TEAM_PMD_MAPPED;
+#else /* 32-bit */
+	return true;
+#endif
 }
 
 static inline void inc_lru_weight(struct page *head)
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1130,9 +1130,11 @@ int copy_huge_pmd(struct mm_struct *dst_
 			pmdp_set_wrprotect(src_mm, addr, src_pmd);
 			pmd = pmd_wrprotect(pmd);
 		} else {
+			int nr_pages;	/* not interesting here */
+
 			VM_BUG_ON_PAGE(!PageTeam(src_page), src_page);
 			page_dup_rmap(src_page, false);
-			inc_team_pmd_mapped(src_page);
+			inc_team_pmd_mapped(src_page, &nr_pages);
 		}
 		add_mm_counter(dst_mm, mm_counter(src_page), HPAGE_PMD_NR);
 		atomic_long_inc(&dst_mm->nr_ptes);
@@ -3499,18 +3501,40 @@ late_initcall(split_huge_pages_debugfs);
 
 static void page_add_team_rmap(struct page *page)
 {
+	int nr_pages;
+
 	VM_BUG_ON_PAGE(PageAnon(page), page);
 	VM_BUG_ON_PAGE(!PageTeam(page), page);
-	if (inc_team_pmd_mapped(page))
-		__inc_zone_page_state(page, NR_SHMEM_PMDMAPPED);
+
+	lock_page_memcg(page);
+	if (inc_team_pmd_mapped(page, &nr_pages)) {
+		struct zone *zone = page_zone(page);
+
+		__inc_zone_state(zone, NR_SHMEM_PMDMAPPED);
+		__mod_zone_page_state(zone, NR_FILE_MAPPED, nr_pages);
+		mem_cgroup_update_page_stat(page,
+				MEM_CGROUP_STAT_FILE_MAPPED, nr_pages);
+	}
+	unlock_page_memcg(page);
 }
 
 static void page_remove_team_rmap(struct page *page)
 {
+	int nr_pages;
+
 	VM_BUG_ON_PAGE(PageAnon(page), page);
 	VM_BUG_ON_PAGE(!PageTeam(page), page);
-	if (dec_team_pmd_mapped(page))
-		__dec_zone_page_state(page, NR_SHMEM_PMDMAPPED);
+
+	lock_page_memcg(page);
+	if (dec_team_pmd_mapped(page, &nr_pages)) {
+		struct zone *zone = page_zone(page);
+
+		__dec_zone_state(zone, NR_SHMEM_PMDMAPPED);
+		__mod_zone_page_state(zone, NR_FILE_MAPPED, -nr_pages);
+		mem_cgroup_update_page_stat(page,
+				MEM_CGROUP_STAT_FILE_MAPPED, -nr_pages);
+	}
+	unlock_page_memcg(page);
 }
 
 int map_team_by_pmd(struct vm_area_struct *vma, unsigned long addr,
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1272,7 +1272,8 @@ void page_add_new_anon_rmap(struct page
 void page_add_file_rmap(struct page *page)
 {
 	lock_page_memcg(page);
-	if (atomic_inc_and_test(&page->_mapcount)) {
+	if (atomic_inc_and_test(&page->_mapcount) &&
+	    inc_team_pte_mapped(page)) {
 		__inc_zone_page_state(page, NR_FILE_MAPPED);
 		mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_MAPPED);
 	}
@@ -1299,9 +1300,10 @@ static void page_remove_file_rmap(struct
 	 * these counters are not modified in interrupt context, and
 	 * pte lock(a spinlock) is held, which implies preemption disabled.
 	 */
-	__dec_zone_page_state(page, NR_FILE_MAPPED);
-	mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_MAPPED);
-
+	if (dec_team_pte_mapped(page)) {
+		__dec_zone_page_state(page, NR_FILE_MAPPED);
+		mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_MAPPED);
+	}
 	if (unlikely(PageMlocked(page)))
 		clear_page_mlock(page);
 out:
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3685,8 +3685,12 @@ static inline unsigned long zone_unmappe
 	/*
 	 * It's possible for there to be more file mapped pages than
 	 * accounted for by the pages on the file LRU lists because
-	 * tmpfs pages accounted for as ANON can also be FILE_MAPPED
+	 * tmpfs pages accounted for as ANON can also be FILE_MAPPED.
+	 * We don't know how many, beyond the PMDMAPPED excluded below.
 	 */
+	if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
+		file_mapped -= zone_page_state(zone, NR_SHMEM_PMDMAPPED) <<
+							HPAGE_PMD_ORDER;
 	return (file_lru > file_mapped) ? (file_lru - file_mapped) : 0;
 }
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2016-04-05 21:37 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-05 21:10 [PATCH 00/31] huge tmpfs: THPagecache implemented by teams Hugh Dickins
2016-04-05 21:12 ` [PATCH 01/31] huge tmpfs: prepare counts in meminfo, vmstat and SysRq-m Hugh Dickins
2016-04-11 11:05   ` Kirill A. Shutemov
2016-04-17  2:28     ` Hugh Dickins
2016-04-05 21:13 ` [PATCH 02/31] huge tmpfs: include shmem freeholes in available memory Hugh Dickins
2016-04-05 21:15 ` [PATCH 03/31] huge tmpfs: huge=N mount option and /proc/sys/vm/shmem_huge Hugh Dickins
2016-04-11 11:17   ` Kirill A. Shutemov
2016-04-17  2:00     ` Hugh Dickins
2016-04-05 21:16 ` [PATCH 04/31] huge tmpfs: try to allocate huge pages, split into a team Hugh Dickins
2016-04-05 21:17 ` [PATCH 05/31] huge tmpfs: avoid team pages in a few places Hugh Dickins
2016-04-05 21:20 ` [PATCH 06/31] huge tmpfs: shrinker to migrate and free underused holes Hugh Dickins
2016-04-05 21:21 ` [PATCH 07/31] huge tmpfs: get_unmapped_area align & fault supply huge page Hugh Dickins
2016-04-05 21:23 ` [PATCH 08/31] huge tmpfs: try_to_unmap_one use page_check_address_transhuge Hugh Dickins
2016-04-05 21:24 ` [PATCH 09/31] huge tmpfs: avoid premature exposure of new pagetable Hugh Dickins
2016-04-11 11:54   ` Kirill A. Shutemov
2016-04-17  1:49     ` Hugh Dickins
2016-04-05 21:25 ` [PATCH 10/31] huge tmpfs: map shmem by huge page pmd or by page team ptes Hugh Dickins
2016-04-05 21:29 ` [PATCH 11/31] huge tmpfs: disband split huge pmds on race or memory failure Hugh Dickins
2016-04-05 21:33 ` [PATCH 12/31] huge tmpfs: extend get_user_pages_fast to shmem pmd Hugh Dickins
2016-04-06  7:00   ` Ingo Molnar
2016-04-07  2:53     ` Hugh Dickins
2016-04-13  8:58       ` Ingo Molnar
2016-04-05 21:34 ` [PATCH 13/31] huge tmpfs: use Unevictable lru with variable hpage_nr_pages Hugh Dickins
2016-04-05 21:35 ` [PATCH 14/31] huge tmpfs: fix Mlocked meminfo, track huge & unhuge mlocks Hugh Dickins
2016-04-05 21:37 ` Hugh Dickins [this message]
2016-04-05 21:39 ` [PATCH 16/31] kvm: plumb return of hva when resolving page fault Hugh Dickins
2016-04-05 21:41 ` [PATCH 17/31] kvm: teach kvm to map page teams as huge pages Hugh Dickins
2016-04-05 23:37   ` Paolo Bonzini
2016-04-06  1:12     ` Hugh Dickins
2016-04-06  6:47       ` Paolo Bonzini
2016-04-06  6:56         ` Andres Lagar-Cavilla
2016-04-05 21:44 ` [PATCH 18/31] huge tmpfs: mem_cgroup move charge on shmem " Hugh Dickins
2016-04-05 21:46 ` [PATCH 19/31] huge tmpfs: mem_cgroup shmem_pmdmapped accounting Hugh Dickins
2016-04-05 21:47 ` [PATCH 20/31] huge tmpfs: mem_cgroup shmem_hugepages accounting Hugh Dickins
2016-04-05 21:49 ` [PATCH 21/31] huge tmpfs: show page team flag in pageflags Hugh Dickins
2016-04-05 21:51 ` [PATCH 22/31] huge tmpfs: /proc/<pid>/smaps show ShmemHugePages Hugh Dickins
2016-04-05 21:53 ` [PATCH 23/31] huge tmpfs recovery: framework for reconstituting huge pages Hugh Dickins
2016-04-06 10:28   ` Mika Penttilä
2016-04-07  2:05     ` Hugh Dickins
2016-04-05 21:54 ` [PATCH 24/31] huge tmpfs recovery: shmem_recovery_populate to fill huge page Hugh Dickins
2016-04-05 21:56 ` [PATCH 25/31] huge tmpfs recovery: shmem_recovery_remap & remap_team_by_pmd Hugh Dickins
2016-04-05 21:58 ` [PATCH 26/31] huge tmpfs recovery: shmem_recovery_swapin to read from swap Hugh Dickins
2016-04-05 22:00 ` [PATCH 27/31] huge tmpfs recovery: tweak shmem_getpage_gfp to fill team Hugh Dickins
2016-04-05 22:02 ` [PATCH 28/31] huge tmpfs recovery: debugfs stats to complete this phase Hugh Dickins
2016-04-05 22:03 ` [PATCH 29/31] huge tmpfs recovery: page migration call back into shmem Hugh Dickins
2016-04-05 22:05 ` [PATCH 30/31] huge tmpfs: shmem_huge_gfpmask and shmem_recovery_gfpmask Hugh Dickins
2016-04-05 22:07 ` [PATCH 31/31] huge tmpfs: no kswapd by default on sync allocations Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LSU.2.11.1604051435460.5965@eggly.anvils \
    --to=hughd@google.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andreslc@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=quning@gmail.com \
    --cc=yang.shi@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox