From: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Johannes Weiner <hannes@cmpxchg.org>, linux-mm@kvack.org
Cc: Michal Hocko <mhocko@suse.cz>, Greg Thelen <gthelen@google.com>,
Dave Hansen <dave@sr71.net>,
cgroups@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [patch] mm: memcontrol: lockless page counters
Date: Tue, 23 Sep 2014 16:46:50 +0900 [thread overview]
Message-ID: <5421256A.8080708@jp.fujitsu.com> (raw)
In-Reply-To: <1411132928-16143-1-git-send-email-hannes@cmpxchg.org>
(2014/09/19 22:22), Johannes Weiner wrote:
> Memory is internally accounted in bytes, using spinlock-protected
> 64-bit counters, even though the smallest accounting delta is a page.
> The counter interface is also convoluted and does too many things.
>
> Introduce a new lockless word-sized page counter API, then change all
> memory accounting over to it and remove the old one. The translation
> from and to bytes then only happens when interfacing with userspace.
>
> Aside from the locking costs, this gets rid of the icky unsigned long
> long types in the very heart of memcg, which is great for 32 bit and
> also makes the code a lot more readable.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
I like this patch because I hate res_counter very much.
a few nitpick comments..
<snip>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 19df5d857411..bf8fb1a05597 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -54,6 +54,38 @@ struct mem_cgroup_reclaim_cookie {
> };
>
> #ifdef CONFIG_MEMCG
> +
> +struct page_counter {
> + atomic_long_t count;
> + unsigned long limit;
> + struct page_counter *parent;
> +
> + /* legacy */
> + unsigned long watermark;
> + unsigned long limited;
> +};
I guees all attributes should be on the same cache line. How about align this to cache ?
And legacy values are not very important to be atomic by design, right ?
> +
> +#if BITS_PER_LONG == 32
> +#define PAGE_COUNTER_MAX ULONG_MAX
> +#else
> +#define PAGE_COUNTER_MAX (ULONG_MAX / PAGE_SIZE)
> +#endif
> +
<snip>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index e2def11f1ec1..dfd3b15a57e8 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -25,7 +25,6 @@
> * GNU General Public License for more details.
> */
>
> -#include <linux/res_counter.h>
> #include <linux/memcontrol.h>
> #include <linux/cgroup.h>
> #include <linux/mm.h>
> @@ -66,6 +65,117 @@
>
> #include <trace/events/vmscan.h>
>
> +int page_counter_cancel(struct page_counter *counter, unsigned long nr_pages)
> +{
> + long new;
> +
> + new = atomic_long_sub_return(nr_pages, &counter->count);
> +
> + if (WARN_ON(unlikely(new < 0)))
> + atomic_long_set(&counter->count, 0);
WARN_ON_ONCE() ?
Or I prefer atomic_add(&counter->count, nr_pages) rather than set to 0
because if a buggy call's "nr_pages" is enough big, following calls to
page_counter_cacnel() will show more logs.
> +
> + return new > 1;
> +}
> +
> +int page_counter_charge(struct page_counter *counter, unsigned long nr_pages,
> + struct page_counter **fail)
> +{
> + struct page_counter *c;
> +
> + for (c = counter; c; c = c->parent) {
> + for (;;) {
> + unsigned long count;
> + unsigned long new;
> +
> + count = atomic_long_read(&c->count);
> +
> + new = count + nr_pages;
> + if (new > c->limit) {
> + c->limited++;
> + if (fail) {
> + *fail = c;
> + goto failed;
> + }
seeing res_counter(), c ret code for this case should be -ENOMEM.
> + }
> +
> + if (atomic_long_cmpxchg(&c->count, count, new) != count)
> + continue;
> +
> + if (new > c->watermark)
> + c->watermark = new;
> +
> + break;
> + }
> + }
> + return 0;
> +
> +failed:
> + for (c = counter; c != *fail; c = c->parent)
> + page_counter_cancel(c, nr_pages);
> +
> + return -ENOMEM;
> +}
> +
> +int page_counter_uncharge(struct page_counter *counter, unsigned long nr_pages)
> +{
> + struct page_counter *c;
> + int ret = 1;
> +
> + for (c = counter; c; c = c->parent) {
> + int remainder;
> +
> + remainder = page_counter_cancel(c, nr_pages);
> + if (c == counter && !remainder)
> + ret = 0;
> + }
> +
> + return ret;
> +}
> +
> +int page_counter_limit(struct page_counter *counter, unsigned long limit)
> +{
> + for (;;) {
> + unsigned long count;
> + unsigned long old;
> +
> + count = atomic_long_read(&counter->count);
> +
> + old = xchg(&counter->limit, limit);
> +
> + if (atomic_long_read(&counter->count) != count) {
> + counter->limit = old;
> + continue;
> + }
> +
> + if (count > limit) {
> + counter->limit = old;
> + return -EBUSY;
> + }
> +
> + return 0;
> + }
> +}
I think the whole "updating limit" ops should be mutual exclusive. It seems
there will be trouble if multiple updater comes at once.
So, "xchg" isn't required. calllers should have their own locks.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
prev parent reply other threads:[~2014-09-23 7:48 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-19 13:22 Johannes Weiner
2014-09-19 13:29 ` Johannes Weiner
2014-09-22 14:41 ` Vladimir Davydov
2014-09-22 18:57 ` Johannes Weiner
2014-09-23 11:06 ` Vladimir Davydov
2014-09-23 13:28 ` Johannes Weiner
2014-09-23 15:21 ` Vladimir Davydov
2014-09-23 17:05 ` Johannes Weiner
2014-09-24 8:02 ` Vladimir Davydov
2014-09-24 13:33 ` Michal Hocko
2014-09-24 16:51 ` Johannes Weiner
2014-09-24 14:16 ` Michal Hocko
2014-09-24 17:00 ` Johannes Weiner
2014-09-25 13:07 ` Michal Hocko
2014-09-22 14:44 ` Michal Hocko
2014-09-22 15:50 ` Johannes Weiner
2014-09-22 17:28 ` Michal Hocko
2014-09-22 19:58 ` Johannes Weiner
2014-09-23 13:25 ` Michal Hocko
2014-09-23 14:05 ` Johannes Weiner
2014-09-23 14:28 ` Michal Hocko
2014-09-23 22:33 ` David Rientjes
2014-09-23 7:46 ` Kamezawa Hiroyuki [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5421256A.8080708@jp.fujitsu.com \
--to=kamezawa.hiroyu@jp.fujitsu.com \
--cc=cgroups@vger.kernel.org \
--cc=dave@sr71.net \
--cc=gthelen@google.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox