From: Andrew Morton <akpm@linux-foundation.org>
To: Yafang Shao <laoar.shao@gmail.com>
Cc: ktkhai@virtuozzo.com, mhocko@suse.com, hannes@cmpxchg.org,
vdavydov.dev@gmail.com, mgorman@techsingularity.net,
linux-mm@kvack.org
Subject: Re: [PATCH 2/2] mm/vmscan: calculate reclaimed slab caches in all reclaim paths
Date: Fri, 21 Jun 2019 20:30:14 -0700 [thread overview]
Message-ID: <20190621203014.fff2b968b6f9c2e23ebf4eef@linux-foundation.org> (raw)
In-Reply-To: <1561112086-6169-3-git-send-email-laoar.shao@gmail.com>
On Fri, 21 Jun 2019 18:14:46 +0800 Yafang Shao <laoar.shao@gmail.com> wrote:
> There're six different reclaim paths by now,
> - kswapd reclaim path
> - node reclaim path
> - hibernate preallocate memory reclaim path
> - direct reclaim path
> - memcg reclaim path
> - memcg softlimit reclaim path
>
> The slab caches reclaimed in these paths are only calculated in the above
> three paths.
>
> There're some drawbacks if we don't calculate the reclaimed slab caches.
> - The sc->nr_reclaimed isn't correct if there're some slab caches
> relcaimed in this path.
> - The slab caches may be reclaimed thoroughly if there're lots of
> reclaimable slab caches and few page caches.
> Let's take an easy example for this case.
> If one memcg is full of slab caches and the limit of it is 512M, in
> other words there're approximately 512M slab caches in this memcg.
> Then the limit of the memcg is reached and the memcg reclaim begins,
> and then in this memcg reclaim path it will continuesly reclaim the
> slab caches until the sc->priority drops to 0.
> After this reclaim stops, you will find there're few slab caches left,
> which is less than 20M in my test case.
> While after this patch applied the number is greater than 300M and
> the sc->priority only drops to 3.
I got a bit exhausted checking that none of these six callsites can
scribble on some caller's value of current->reclaim_state.
How about we do it at runtime?
From: Andrew Morton <akpm@linux-foundation.org>
Subject: mm/vmscan.c: add checks for incorrect handling of current->reclaim_state
Six sites are presently altering current->reclaim_state. There is a risk
that one function stomps on a caller's value. Use a helper function to
catch such errors.
Cc: Yafang Shao <laoar.shao@gmail.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/vmscan.c | 37 ++++++++++++++++++++++++-------------
1 file changed, 24 insertions(+), 13 deletions(-)
--- a/mm/vmscan.c~mm-vmscanc-add-checks-for-incorrect-handling-of-current-reclaim_state
+++ a/mm/vmscan.c
@@ -177,6 +177,18 @@ unsigned long vm_total_pages;
static LIST_HEAD(shrinker_list);
static DECLARE_RWSEM(shrinker_rwsem);
+static void set_task_reclaim_state(struct task_struct *task,
+ struct reclaim_state *rs)
+{
+ /* Check for an overwrite */
+ WARN_ON_ONCE(rs && task->reclaim_state);
+
+ /* Check for the nulling of an already-nulled member */
+ WARN_ON_ONCE(!rs && !task->reclaim_state);
+
+ task->reclaim_state = rs;
+}
+
#ifdef CONFIG_MEMCG_KMEM
/*
@@ -3194,13 +3206,13 @@ unsigned long try_to_free_pages(struct z
if (throttle_direct_reclaim(sc.gfp_mask, zonelist, nodemask))
return 1;
- current->reclaim_state = &sc.reclaim_state;
+ set_task_reclaim_state(current, &sc.reclaim_state);
trace_mm_vmscan_direct_reclaim_begin(order, sc.gfp_mask);
nr_reclaimed = do_try_to_free_pages(zonelist, &sc);
trace_mm_vmscan_direct_reclaim_end(nr_reclaimed);
- current->reclaim_state = NULL;
+ set_task_reclaim_state(current, NULL);
return nr_reclaimed;
}
@@ -3223,7 +3235,7 @@ unsigned long mem_cgroup_shrink_node(str
};
unsigned long lru_pages;
- current->reclaim_state = &sc.reclaim_state;
+ set_task_reclaim_state(current, &sc.reclaim_state);
sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
(GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
@@ -3245,7 +3257,7 @@ unsigned long mem_cgroup_shrink_node(str
cgroup_ino(memcg->css.cgroup),
sc.nr_reclaimed);
- current->reclaim_state = NULL;
+ set_task_reclaim_state(current, NULL);
*nr_scanned = sc.nr_scanned;
return sc.nr_reclaimed;
@@ -3274,7 +3286,7 @@ unsigned long try_to_free_mem_cgroup_pag
.may_shrinkslab = 1,
};
- current->reclaim_state = &sc.reclaim_state;
+ set_task_reclaim_state(current, &sc.reclaim_state);
/*
* Unlike direct reclaim via alloc_pages(), memcg's reclaim doesn't
* take care zof from where we get pages. So the node where we start the
@@ -3299,7 +3311,7 @@ unsigned long try_to_free_mem_cgroup_pag
trace_mm_vmscan_memcg_reclaim_end(
cgroup_ino(memcg->css.cgroup),
nr_reclaimed);
- current->reclaim_state = NULL;
+ set_task_reclaim_state(current, NULL);
return nr_reclaimed;
}
@@ -3501,7 +3513,7 @@ static int balance_pgdat(pg_data_t *pgda
.may_unmap = 1,
};
- current->reclaim_state = &sc.reclaim_state;
+ set_task_reclaim_state(current, &sc.reclaim_state);
psi_memstall_enter(&pflags);
__fs_reclaim_acquire();
@@ -3683,7 +3695,7 @@ out:
snapshot_refaults(NULL, pgdat);
__fs_reclaim_release();
psi_memstall_leave(&pflags);
- current->reclaim_state = NULL;
+ set_task_reclaim_state(current, NULL);
/*
* Return the order kswapd stopped reclaiming at as
@@ -3945,17 +3957,16 @@ unsigned long shrink_all_memory(unsigned
.hibernation_mode = 1,
};
struct zonelist *zonelist = node_zonelist(numa_node_id(), sc.gfp_mask);
- struct task_struct *p = current;
unsigned long nr_reclaimed;
unsigned int noreclaim_flag;
fs_reclaim_acquire(sc.gfp_mask);
noreclaim_flag = memalloc_noreclaim_save();
- p->reclaim_state = &sc.reclaim_state;
+ set_task_reclaim_state(current, &sc.reclaim_state);
nr_reclaimed = do_try_to_free_pages(zonelist, &sc);
- p->reclaim_state = NULL;
+ set_task_reclaim_state(current, NULL);
memalloc_noreclaim_restore(noreclaim_flag);
fs_reclaim_release(sc.gfp_mask);
@@ -4144,7 +4155,7 @@ static int __node_reclaim(struct pglist_
*/
noreclaim_flag = memalloc_noreclaim_save();
p->flags |= PF_SWAPWRITE;
- p->reclaim_state = &sc.reclaim_state;
+ set_task_reclaim_state(p, &sc.reclaim_state);
if (node_pagecache_reclaimable(pgdat) > pgdat->min_unmapped_pages) {
/*
@@ -4156,7 +4167,7 @@ static int __node_reclaim(struct pglist_
} while (sc.nr_reclaimed < nr_pages && --sc.priority >= 0);
}
- p->reclaim_state = NULL;
+ set_task_reclaim_state(p, NULL);
current->flags &= ~PF_SWAPWRITE;
memalloc_noreclaim_restore(noreclaim_flag);
fs_reclaim_release(sc.gfp_mask);
_
next prev parent reply other threads:[~2019-06-22 3:30 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-21 10:14 [PATCH 0/2] mm/vmscan: calculate reclaimed slab " Yafang Shao
2019-06-21 10:14 ` [PATCH 1/2] mm/vmscan: add a new member reclaim_state in struct shrink_control Yafang Shao
2019-06-21 10:14 ` [PATCH 2/2] mm/vmscan: calculate reclaimed slab caches in all reclaim paths Yafang Shao
2019-06-22 3:30 ` Andrew Morton [this message]
2019-06-22 6:31 ` Yafang Shao
2019-06-24 8:53 ` Kirill Tkhai
2019-06-24 12:30 ` Yafang Shao
2019-06-24 12:33 ` Kirill Tkhai
2019-06-24 12:40 ` Yafang Shao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190621203014.fff2b968b6f9c2e23ebf4eef@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=ktkhai@virtuozzo.com \
--cc=laoar.shao@gmail.com \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mhocko@suse.com \
--cc=vdavydov.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox