From: Michal Hocko <mhocko@suse.cz>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: linux-mm@kvack.org, Vladimir Davydov <vdavydov@parallels.com>,
Greg Thelen <gthelen@google.com>, Dave Hansen <dave@sr71.net>,
cgroups@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [patch 1/3] mm: memcontrol: lockless page counters
Date: Tue, 7 Oct 2014 17:15:43 +0200 [thread overview]
Message-ID: <20141007151543.GE14243@dhcp22.suse.cz> (raw)
In-Reply-To: <1411573390-9601-2-git-send-email-hannes@cmpxchg.org>
On Wed 24-09-14 11:43:08, Johannes Weiner wrote:
> Memory is internally accounted in bytes, using spinlock-protected
> 64-bit counters, even though the smallest accounting delta is a page.
> The counter interface is also convoluted and does too many things.
>
> Introduce a new lockless word-sized page counter API, then change all
> memory accounting over to it and remove the old one. The translation
> from and to bytes then only happens when interfacing with userspace.
>
> Aside from the locking costs, this gets rid of the icky unsigned long
> long types in the very heart of memcg, which is great for 32 bit and
> also makes the code a lot more readable.
>
I only now got to the res_counter -> page_counter change. It looks
correct to me. Some really minor comments below.
[...]
> static inline void memcg_memory_allocated_sub(struct cg_proto *prot,
> unsigned long amt)
> {
> - res_counter_uncharge(&prot->memory_allocated, amt << PAGE_SHIFT);
> -}
> -
> -static inline u64 memcg_memory_allocated_read(struct cg_proto *prot)
> -{
> - u64 ret;
> - ret = res_counter_read_u64(&prot->memory_allocated, RES_USAGE);
> - return ret >> PAGE_SHIFT;
> + page_counter_uncharge(&prot->memory_allocated, amt);
> }
There is only one caller of memcg_memory_allocated_sub and the caller
can use the counter directly same as memcg_memory_allocated_read
original users.
[...]
> diff --git a/init/Kconfig b/init/Kconfig
> index ed4f42d79bd1..88b56940cb9e 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -983,9 +983,12 @@ config RESOURCE_COUNTERS
> This option enables controller independent resource accounting
> infrastructure that works with cgroups.
>
> +config PAGE_COUNTER
> + bool
> +
> config MEMCG
> bool "Memory Resource Controller for Control Groups"
> - depends on RESOURCE_COUNTERS
> + select PAGE_COUNTER
> select EVENTFD
> help
> Provides a memory resource controller that manages both anonymous
Thanks for this!
[...]
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index c2c75262a209..52c24119be69 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
[...]
> @@ -1490,12 +1495,23 @@ int mem_cgroup_inactive_anon_is_low(struct lruvec *lruvec)
> */
> static unsigned long mem_cgroup_margin(struct mem_cgroup *memcg)
> {
> - unsigned long long margin;
> + unsigned long margin = 0;
> + unsigned long count;
> + unsigned long limit;
>
> - margin = res_counter_margin(&memcg->res);
> - if (do_swap_account)
> - margin = min(margin, res_counter_margin(&memcg->memsw));
> - return margin >> PAGE_SHIFT;
> + count = page_counter_read(&memcg->memory);
> + limit = ACCESS_ONCE(memcg->memory.limit);
> + if (count < limit)
> + margin = limit - count;
> +
> + if (do_swap_account) {
> + count = page_counter_read(&memcg->memsw);
> + limit = ACCESS_ONCE(memcg->memsw.limit);
> + if (count < limit)
I guess you wanted (count <= limit) here?
> + margin = min(margin, limit - count);
> + }
> +
> + return margin;
> }
>
> int mem_cgroup_swappiness(struct mem_cgroup *memcg)
[...]
> @@ -2293,33 +2295,31 @@ static DEFINE_MUTEX(percpu_charge_mutex);
> static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
> {
> struct memcg_stock_pcp *stock;
> - bool ret = true;
> + bool ret = false;
>
> if (nr_pages > CHARGE_BATCH)
> - return false;
> + return ret;
>
> stock = &get_cpu_var(memcg_stock);
> - if (memcg == stock->cached && stock->nr_pages >= nr_pages)
> + if (memcg == stock->cached && stock->nr_pages >= nr_pages) {
> stock->nr_pages -= nr_pages;
> - else /* need to call res_counter_charge */
> - ret = false;
> + ret = true;
> + }
> put_cpu_var(memcg_stock);
> return ret;
This change is not really needed but at least it woke me up after some
monotonic and mechanical changes...
[...]
> @@ -4389,14 +4346,16 @@ static int memcg_stat_show(struct seq_file *m, void *v)
> mem_cgroup_nr_lru_pages(memcg, BIT(i)) * PAGE_SIZE);
>
> /* Hierarchical information */
> - {
> - unsigned long long limit, memsw_limit;
> - memcg_get_hierarchical_limit(memcg, &limit, &memsw_limit);
> - seq_printf(m, "hierarchical_memory_limit %llu\n", limit);
> - if (do_swap_account)
> - seq_printf(m, "hierarchical_memsw_limit %llu\n",
> - memsw_limit);
> + memory = memsw = PAGE_COUNTER_MAX;
> + for (mi = memcg; mi; mi = parent_mem_cgroup(mi)) {
> + memory = min(memory, mi->memory.limit);
> + memsw = min(memsw, mi->memsw.limit);
> }
This looks much better!
> + seq_printf(m, "hierarchical_memory_limit %llu\n",
> + (u64)memory * PAGE_SIZE);
> + if (do_swap_account)
> + seq_printf(m, "hierarchical_memsw_limit %llu\n",
> + (u64)memsw * PAGE_SIZE);
>
> for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
> long long val = 0;
[...]
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2014-10-07 15:15 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-24 15:43 [patch 0/3] mm: memcontrol: lockless page counters v2 Johannes Weiner
2014-09-24 15:43 ` [patch 1/3] mm: memcontrol: lockless page counters Johannes Weiner
2014-09-26 10:31 ` Vladimir Davydov
2014-10-02 12:07 ` Johannes Weiner
2014-10-03 15:36 ` Vladimir Davydov
2014-10-03 15:41 ` Michal Hocko
2014-10-06 6:38 ` Vladimir Davydov
2014-09-30 11:06 ` Michal Hocko
2014-10-02 15:01 ` Johannes Weiner
2014-10-02 19:52 ` Johannes Weiner
2014-10-03 15:44 ` Michal Hocko
2014-10-03 14:50 ` Michal Hocko
2014-10-07 15:15 ` Michal Hocko [this message]
2014-10-08 12:31 ` Johannes Weiner
2014-09-24 15:43 ` [patch 2/3] mm: hugetlb_controller: convert to " Johannes Weiner
2014-09-26 11:25 ` Vladimir Davydov
2014-10-07 15:21 ` Michal Hocko
2014-10-08 12:39 ` Johannes Weiner
2014-09-24 15:43 ` [patch 3/3] kernel: res_counter: remove the unused API Johannes Weiner
2014-09-26 11:27 ` Vladimir Davydov
2014-10-07 15:26 ` Michal Hocko
2014-10-14 1:46 [patch 0/3] mm: memcontrol: lockless page counters v3 Johannes Weiner
2014-10-14 1:46 ` [patch 1/3] mm: memcontrol: lockless page counters Johannes Weiner
2014-10-14 15:56 ` Michal Hocko
2014-10-14 16:33 ` Johannes Weiner
2014-10-15 9:40 ` Michal Hocko
2014-10-17 7:47 ` Vladimir Davydov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141007151543.GE14243@dhcp22.suse.cz \
--to=mhocko@suse.cz \
--cc=cgroups@vger.kernel.org \
--cc=dave@sr71.net \
--cc=gthelen@google.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=vdavydov@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox