From: Hugh Dickins <hughd@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Andres Lagar-Cavilla <andreslc@google.com>,
Yang Shi <yang.shi@linaro.org>, Ning Qu <quning@gmail.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Michal Hocko <mhocko@kernel.org>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: [PATCH 20/31] huge tmpfs: mem_cgroup shmem_hugepages accounting
Date: Tue, 5 Apr 2016 14:47:58 -0700 (PDT) [thread overview]
Message-ID: <alpine.LSU.2.11.1604051446131.5965@eggly.anvils> (raw)
In-Reply-To: <alpine.LSU.2.11.1604051403210.5965@eggly.anvils>
From: Andres Lagar-Cavilla <andreslc@google.com>
Keep track of all hugepages, not just those mapped.
This has gone through several anguished iterations, memcg stats being
harder to protect against mem_cgroup_move_account() than you might
expect. Abandon the pretence that miscellaneous stats can all be
protected by the same lock_page_memcg(),unlock_page_memcg() scheme:
add mem_cgroup_update_page_stat_treelocked(), using mapping->tree_lock
for safe updates of MEM_CGROUP_STAT_SHMEM_HUGEPAGES (where tree_lock
is already held, but nests inside not outside of memcg->move_lock).
Nowadays, when mem_cgroup_move_account() takes page lock, and is only
called when immigrating pages found in page tables, it almost seems as
if this reliance on tree_lock is unnecessary. But consider the case
when the team head is pte-mapped, and being migrated to a new memcg,
racing with the last page of the team being instantiated: the page
lock is held on the page being instantiated, not on the team head,
so we do still need the tree_lock to serialize them.
Signed-off-by: Andres Lagar-Cavilla <andreslc@google.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
---
Documentation/cgroup-v1/memory.txt | 2 +
Documentation/filesystems/tmpfs.txt | 8 ++++
include/linux/memcontrol.h | 10 +++++
include/linux/pageteam.h | 3 +
mm/memcontrol.c | 47 ++++++++++++++++++++++----
mm/shmem.c | 4 ++
6 files changed, 66 insertions(+), 8 deletions(-)
--- a/Documentation/cgroup-v1/memory.txt
+++ b/Documentation/cgroup-v1/memory.txt
@@ -487,6 +487,8 @@ rss - # of bytes of anonymous and swap
transparent hugepages).
rss_huge - # of bytes of anonymous transparent hugepages.
mapped_file - # of bytes of mapped file (includes tmpfs/shmem)
+shmem_hugepages - # of bytes of tmpfs huge pages completed (subset of cache)
+shmem_pmdmapped - # of bytes of tmpfs huge mapped huge (subset of mapped_file)
pgpgin - # of charging events to the memory cgroup. The charging
event happens each time a page is accounted as either mapped
anon page(RSS) or cache page(Page Cache) to the cgroup.
--- a/Documentation/filesystems/tmpfs.txt
+++ b/Documentation/filesystems/tmpfs.txt
@@ -200,6 +200,14 @@ nr_shmem_hugepages 13 tmpfs huge
nr_shmem_pmdmapped 6 tmpfs hugepages with huge mappings in userspace
nr_shmem_freeholes 167861 pages reserved for team but available to shrinker
+/sys/fs/cgroup/memory/<cgroup>/memory.stat shows:
+
+shmem_hugepages 27262976 bytes tmpfs hugepage completed (subset of cache)
+shmem_pmdmapped 12582912 bytes tmpfs huge mapped huge (subset of mapped_file)
+
+Note: the individual pages of a huge team might be charged to different
+memcgs, but these counts assume that they are all charged to the same as head.
+
Author:
Christoph Rohland <cr@sap.com>, 1.12.01
Updated:
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -50,6 +50,8 @@ enum mem_cgroup_stat_index {
MEM_CGROUP_STAT_DIRTY, /* # of dirty pages in page cache */
MEM_CGROUP_STAT_WRITEBACK, /* # of pages under writeback */
MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */
+ /* # of pages charged as non-disbanded huge teams */
+ MEM_CGROUP_STAT_SHMEM_HUGEPAGES,
/* # of pages charged as hugely mapped teams */
MEM_CGROUP_STAT_SHMEM_PMDMAPPED,
MEM_CGROUP_STAT_NSTATS,
@@ -491,6 +493,9 @@ static inline void mem_cgroup_update_pag
this_cpu_add(page->mem_cgroup->stat->count[idx], val);
}
+void mem_cgroup_update_page_stat_treelocked(struct page *page,
+ enum mem_cgroup_stat_index idx, int val);
+
static inline void mem_cgroup_inc_page_stat(struct page *page,
enum mem_cgroup_stat_index idx)
{
@@ -706,6 +711,11 @@ static inline void mem_cgroup_update_pag
enum mem_cgroup_stat_index idx, int val)
{
}
+
+static inline void mem_cgroup_update_page_stat_treelocked(struct page *page,
+ enum mem_cgroup_stat_index idx, int val)
+{
+}
static inline void mem_cgroup_inc_page_stat(struct page *page,
enum mem_cgroup_stat_index idx)
--- a/include/linux/pageteam.h
+++ b/include/linux/pageteam.h
@@ -139,12 +139,13 @@ static inline bool dec_team_pmd_mapped(s
* needs to maintain memcg's huge tmpfs stats correctly.
*/
static inline void count_team_pmd_mapped(struct page *head, int *file_mapped,
- bool *pmd_mapped)
+ bool *pmd_mapped, bool *team_complete)
{
long team_usage;
*file_mapped = 1;
team_usage = atomic_long_read(&head->team_usage);
+ *team_complete = team_usage >= TEAM_COMPLETE;
*pmd_mapped = team_usage >= TEAM_PMD_MAPPED;
if (*pmd_mapped)
*file_mapped = HPAGE_PMD_NR - team_pte_count(team_usage);
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -107,6 +107,7 @@ static const char * const mem_cgroup_sta
"dirty",
"writeback",
"swap",
+ "shmem_hugepages",
"shmem_pmdmapped",
};
@@ -4431,6 +4432,17 @@ static struct page *mc_handle_file_pte(s
return page;
}
+void mem_cgroup_update_page_stat_treelocked(struct page *page,
+ enum mem_cgroup_stat_index idx, int val)
+{
+ /* Update this VM_BUG_ON if other cases are added */
+ VM_BUG_ON(idx != MEM_CGROUP_STAT_SHMEM_HUGEPAGES);
+ lockdep_assert_held(&page->mapping->tree_lock);
+
+ if (page->mem_cgroup)
+ __this_cpu_add(page->mem_cgroup->stat->count[idx], val);
+}
+
/**
* mem_cgroup_move_account - move account of the page
* @page: the page
@@ -4448,6 +4460,7 @@ static int mem_cgroup_move_account(struc
struct mem_cgroup *from,
struct mem_cgroup *to)
{
+ spinlock_t *tree_lock = NULL;
unsigned long flags;
int nr_pages = compound ? hpage_nr_pages(page) : 1;
int file_mapped = 1;
@@ -4487,9 +4500,9 @@ static int mem_cgroup_move_account(struc
* So mapping should be stable for dirty pages.
*/
if (!anon && PageDirty(page)) {
- struct address_space *mapping = page_mapping(page);
+ struct address_space *mapping = page->mapping;
- if (mapping_cap_account_dirty(mapping)) {
+ if (mapping && mapping_cap_account_dirty(mapping)) {
__this_cpu_sub(from->stat->count[MEM_CGROUP_STAT_DIRTY],
nr_pages);
__this_cpu_add(to->stat->count[MEM_CGROUP_STAT_DIRTY],
@@ -4498,10 +4511,28 @@ static int mem_cgroup_move_account(struc
}
if (!anon && PageTeam(page)) {
- if (page == team_head(page)) {
- bool pmd_mapped;
+ struct address_space *mapping = page->mapping;
- count_team_pmd_mapped(page, &file_mapped, &pmd_mapped);
+ if (mapping && page == team_head(page)) {
+ bool pmd_mapped, team_complete;
+ /*
+ * We avoided taking mapping->tree_lock unnecessarily.
+ * Is it safe to take mapping->tree_lock below? Was it
+ * safe to peek at PageTeam above, without tree_lock?
+ * Yes, this is a team head, just now taken from its
+ * lru: PageTeam must already be set. And we took
+ * page lock above, so page->mapping is stable.
+ */
+ tree_lock = &mapping->tree_lock;
+ spin_lock(tree_lock);
+ count_team_pmd_mapped(page, &file_mapped, &pmd_mapped,
+ &team_complete);
+ if (team_complete) {
+ __this_cpu_sub(from->stat->count[
+ MEM_CGROUP_STAT_SHMEM_HUGEPAGES], HPAGE_PMD_NR);
+ __this_cpu_add(to->stat->count[
+ MEM_CGROUP_STAT_SHMEM_HUGEPAGES], HPAGE_PMD_NR);
+ }
if (pmd_mapped) {
__this_cpu_sub(from->stat->count[
MEM_CGROUP_STAT_SHMEM_PMDMAPPED], HPAGE_PMD_NR);
@@ -4522,10 +4553,12 @@ static int mem_cgroup_move_account(struc
* It is safe to change page->mem_cgroup here because the page
* is referenced, charged, and isolated - we can't race with
* uncharging, charging, migration, or LRU putback.
+ * Caller should have done css_get.
*/
-
- /* caller should have done css_get */
page->mem_cgroup = to;
+
+ if (tree_lock)
+ spin_unlock(tree_lock);
spin_unlock_irqrestore(&from->move_lock, flags);
ret = 0;
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -413,6 +413,8 @@ static void shmem_added_to_hugeteam(stru
&head->team_usage) >= TEAM_COMPLETE) {
shmem_clear_tag_hugehole(mapping, head->index);
__inc_zone_state(zone, NR_SHMEM_HUGEPAGES);
+ mem_cgroup_update_page_stat_treelocked(head,
+ MEM_CGROUP_STAT_SHMEM_HUGEPAGES, HPAGE_PMD_NR);
}
__dec_zone_state(zone, NR_SHMEM_FREEHOLES);
}
@@ -523,6 +525,8 @@ again2:
if (nr >= HPAGE_PMD_NR) {
ClearPageChecked(head);
__dec_zone_state(zone, NR_SHMEM_HUGEPAGES);
+ mem_cgroup_update_page_stat_treelocked(head,
+ MEM_CGROUP_STAT_SHMEM_HUGEPAGES, -HPAGE_PMD_NR);
VM_BUG_ON(nr != HPAGE_PMD_NR);
} else if (nr) {
shmem_clear_tag_hugehole(mapping, head->index);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-04-05 21:48 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-05 21:10 [PATCH 00/31] huge tmpfs: THPagecache implemented by teams Hugh Dickins
2016-04-05 21:12 ` [PATCH 01/31] huge tmpfs: prepare counts in meminfo, vmstat and SysRq-m Hugh Dickins
2016-04-11 11:05 ` Kirill A. Shutemov
2016-04-17 2:28 ` Hugh Dickins
2016-04-05 21:13 ` [PATCH 02/31] huge tmpfs: include shmem freeholes in available memory Hugh Dickins
2016-04-05 21:15 ` [PATCH 03/31] huge tmpfs: huge=N mount option and /proc/sys/vm/shmem_huge Hugh Dickins
2016-04-11 11:17 ` Kirill A. Shutemov
2016-04-17 2:00 ` Hugh Dickins
2016-04-05 21:16 ` [PATCH 04/31] huge tmpfs: try to allocate huge pages, split into a team Hugh Dickins
2016-04-05 21:17 ` [PATCH 05/31] huge tmpfs: avoid team pages in a few places Hugh Dickins
2016-04-05 21:20 ` [PATCH 06/31] huge tmpfs: shrinker to migrate and free underused holes Hugh Dickins
2016-04-05 21:21 ` [PATCH 07/31] huge tmpfs: get_unmapped_area align & fault supply huge page Hugh Dickins
2016-04-05 21:23 ` [PATCH 08/31] huge tmpfs: try_to_unmap_one use page_check_address_transhuge Hugh Dickins
2016-04-05 21:24 ` [PATCH 09/31] huge tmpfs: avoid premature exposure of new pagetable Hugh Dickins
2016-04-11 11:54 ` Kirill A. Shutemov
2016-04-17 1:49 ` Hugh Dickins
2016-04-05 21:25 ` [PATCH 10/31] huge tmpfs: map shmem by huge page pmd or by page team ptes Hugh Dickins
2016-04-05 21:29 ` [PATCH 11/31] huge tmpfs: disband split huge pmds on race or memory failure Hugh Dickins
2016-04-05 21:33 ` [PATCH 12/31] huge tmpfs: extend get_user_pages_fast to shmem pmd Hugh Dickins
2016-04-06 7:00 ` Ingo Molnar
2016-04-07 2:53 ` Hugh Dickins
2016-04-13 8:58 ` Ingo Molnar
2016-04-05 21:34 ` [PATCH 13/31] huge tmpfs: use Unevictable lru with variable hpage_nr_pages Hugh Dickins
2016-04-05 21:35 ` [PATCH 14/31] huge tmpfs: fix Mlocked meminfo, track huge & unhuge mlocks Hugh Dickins
2016-04-05 21:37 ` [PATCH 15/31] huge tmpfs: fix Mapped meminfo, track huge & unhuge mappings Hugh Dickins
2016-04-05 21:39 ` [PATCH 16/31] kvm: plumb return of hva when resolving page fault Hugh Dickins
2016-04-05 21:41 ` [PATCH 17/31] kvm: teach kvm to map page teams as huge pages Hugh Dickins
2016-04-05 23:37 ` Paolo Bonzini
2016-04-06 1:12 ` Hugh Dickins
2016-04-06 6:47 ` Paolo Bonzini
2016-04-06 6:56 ` Andres Lagar-Cavilla
2016-04-05 21:44 ` [PATCH 18/31] huge tmpfs: mem_cgroup move charge on shmem " Hugh Dickins
2016-04-05 21:46 ` [PATCH 19/31] huge tmpfs: mem_cgroup shmem_pmdmapped accounting Hugh Dickins
2016-04-05 21:47 ` Hugh Dickins [this message]
2016-04-05 21:49 ` [PATCH 21/31] huge tmpfs: show page team flag in pageflags Hugh Dickins
2016-04-05 21:51 ` [PATCH 22/31] huge tmpfs: /proc/<pid>/smaps show ShmemHugePages Hugh Dickins
2016-04-05 21:53 ` [PATCH 23/31] huge tmpfs recovery: framework for reconstituting huge pages Hugh Dickins
2016-04-06 10:28 ` Mika Penttilä
2016-04-07 2:05 ` Hugh Dickins
2016-04-05 21:54 ` [PATCH 24/31] huge tmpfs recovery: shmem_recovery_populate to fill huge page Hugh Dickins
2016-04-05 21:56 ` [PATCH 25/31] huge tmpfs recovery: shmem_recovery_remap & remap_team_by_pmd Hugh Dickins
2016-04-05 21:58 ` [PATCH 26/31] huge tmpfs recovery: shmem_recovery_swapin to read from swap Hugh Dickins
2016-04-05 22:00 ` [PATCH 27/31] huge tmpfs recovery: tweak shmem_getpage_gfp to fill team Hugh Dickins
2016-04-05 22:02 ` [PATCH 28/31] huge tmpfs recovery: debugfs stats to complete this phase Hugh Dickins
2016-04-05 22:03 ` [PATCH 29/31] huge tmpfs recovery: page migration call back into shmem Hugh Dickins
2016-04-05 22:05 ` [PATCH 30/31] huge tmpfs: shmem_huge_gfpmask and shmem_recovery_gfpmask Hugh Dickins
2016-04-05 22:07 ` [PATCH 31/31] huge tmpfs: no kswapd by default on sync allocations Hugh Dickins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LSU.2.11.1604051446131.5965@eggly.anvils \
--to=hughd@google.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=andreslc@google.com \
--cc=hannes@cmpxchg.org \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=quning@gmail.com \
--cc=yang.shi@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox