Re: [PATCH v2 3/5] memcg: make it suck faster

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Glauber Costa <glommer@parallels.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: linux-mm@kvack.org, cgroups@vger.kernel.org,
	Tejun Heo <tj@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	kamezawa.hiroyu@jp.fujitsu.com, handai.szj@gmail.com,
	anton.vorontsov@linaro.org, Johannes Weiner <hannes@cmpxchg.org>,
	Mel Gorman <mgorman@suse.de>
Subject: Re: [PATCH v2 3/5] memcg: make it suck faster
Date: Wed, 20 Mar 2013 11:00:03 +0400	[thread overview]
Message-ID: <51495E73.8090409@parallels.com> (raw)
In-Reply-To: <20130319135821.GG7869@dhcp22.suse.cz>

Sorry all for taking a lot of time to reply to this. I've been really busy.

On 03/19/2013 05:58 PM, Michal Hocko wrote:
> On Tue 05-03-13 17:10:56, Glauber Costa wrote:
>> It is an accepted fact that memcg sucks. But can it suck faster?  Or in
>> a more fair statement, can it at least stop draining everyone's
>> performance when it is not in use?
>>
>> This experimental and slightly crude patch demonstrates that we can do
>> that by using static branches to patch it out until the first memcg
>> comes to life. There are edges to be trimmed, and I appreciate comments
>> for direction. In particular, the events in the root are not fired, but
>> I believe this can be done without further problems by calling a
>> specialized event check from mem_cgroup_newpage_charge().
>>
>> My goal was to have enough numbers to demonstrate the performance gain
>> that can come from it. I tested it in a 24-way 2-socket Intel box, 24 Gb
>> mem. I used Mel Gorman's pft test, that he used to demonstrate this
>> problem back in the Kernel Summit. There are three kernels:
>>
>> nomemcg  : memcg compile disabled.
>> base     : memcg enabled, patch not applied.
>> bypassed : memcg enabled, with patch applied.
>>
>>                 base    bypassed
>> User          109.12      105.64
>> System       1646.84     1597.98
>> Elapsed       229.56      215.76
>>
>>              nomemcg    bypassed
>> User          104.35      105.64
>> System       1578.19     1597.98
>> Elapsed       212.33      215.76
> 
> Do you have profiles for where we spend the time?
> 

I don't *have* in the sense that I never saved them, but it is easy to
grab. I've just run Mel's pft test with perf top -a in parallel, and
that was mostly the charge and uncharge functions being run.

>>  #ifdef CONFIG_MEMCG
>> +extern struct static_key memcg_in_use_key;
>> +
>> +static inline bool mem_cgroup_subsys_disabled(void)
>> +{
>> +	return !!mem_cgroup_subsys.disabled;
>> +}
>> +
>> +static inline bool mem_cgroup_disabled(void)
>> +{
>> +	/*
>> +	 * Will always be false if subsys is disabled, because we have no one
>> +	 * to bump it up. So the test suffices and we don't have to test the
>> +	 * subsystem as well
>> +	 */
> 
> but static_key_false adds an atomic read here which is more costly so I
> am not sure you are optimizing much.
> 

No it doesn't. You're missing the point of static branches: The code is
*patched out* until it is not used. So it adds a predictable
deterministic jump instruction to the false statement, and that's it
(hence their previous name 'jump label').

>> +
>> +extern int __mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
>> +				     gfp_t gfp_mask);
>> +static inline int
>> +mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask)
>> +{
>> +	if (mem_cgroup_disabled())
>> +		return 0;
>> +
>> +	return __mem_cgroup_cache_charge(page, mm, gfp_mask);
>> +}
> 
> Are there any reasons to not get down to __mem_cgroup_try_charge? We
> will not be perfect, all right, because some wrappers already do some
> work but we should at least cover most of them.
> 
> I am also thinking whether this stab at charging path is not just an
> overkill. Wouldn't it suffice to do something like:

I don't know. I could test. I just see no reason for that. Being able to
patch out code in the caller level means we'll not incur even a function
call. That's a generally accepted good thing to do in hot paths.
Specially given the fact that the memcg overhead seems not to be
concentrated in one single place, but as Christoph Lameter defined,
"death by a thousand cuts", I'd much rather not even pay the function
calls if I can avoid. If I introducing great complexity for that, fine,
I could trade off. But honestly, the patch gets bigger but that's it.

>>  struct lruvec *mem_cgroup_zone_lruvec(struct zone *, struct mem_cgroup *);
>>  struct lruvec *mem_cgroup_page_lruvec(struct page *, struct zone *);
> [...]
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index bfbf1c2..45c1886 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
> [...]
>> @@ -1335,6 +1345,20 @@ struct lruvec *mem_cgroup_page_lruvec(struct page *page, struct zone *zone)
>>  	memcg = pc->mem_cgroup;
> 
> I would expect that you want to prevent lookup as well if there are no
> other groups.
>
well, that function in specific seems to be mostly called during
reclaim, where I wasn't terribly concerned about optimizations, unlike
the steady state functions.

>>  	/*
>> +	 * Because we lazily enable memcg only after first child group is
>> +	 * created, we can have memcg == 0. Because page cgroup is created with
>> +	 * GFP_ZERO, and after charging, all page cgroups will have a non-zero
>> +	 * cgroup attached (even if root), we can be sure that this is a
>> +	 * used-but-not-accounted page. (due to lazyness). We could get around
>> +	 * that by scanning all pages on cgroup init is too expensive. We can
>> +	 * ultimately pay, but prefer to just to defer the update until we get
>> +	 * here. We could take the opportunity to set PageCgroupUsed, but it
>> +	 * won't be that important for the root cgroup.
>> +	 */
>> +	if (!memcg && PageLRU(page))
>> +		pc->mem_cgroup = memcg = root_mem_cgroup;
> 
> Why not return page_cgroup_zoneinfo(root_mem_cgroup, page);
> This would require messing up with __mem_cgroup_uncharge_common but that
> doesn't sound incredibly crazy (to the local standard of course ;)).
> 

Could you clarify?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2013-03-20  6:59 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-05 13:10 [PATCH v2 0/5] bypass root memcg charges if no memcgs are possible Glauber Costa
2013-03-05 13:10 ` [PATCH v2 1/5] memcg: make nocpu_base available for non hotplug Glauber Costa
2013-03-06  0:04   ` Kamezawa Hiroyuki
2013-03-19 11:07   ` Michal Hocko
2013-03-05 13:10 ` [PATCH v2 2/5] memcg: provide root figures from system totals Glauber Costa
2013-03-06  0:27   ` Kamezawa Hiroyuki
2013-03-06  8:30     ` Glauber Costa
2013-03-06 10:45       ` Kamezawa Hiroyuki
2013-03-06 10:52         ` Glauber Costa
2013-03-06 10:59           ` Kamezawa Hiroyuki
2013-03-13  6:58             ` Sha Zhengju
2013-03-13  9:15               ` Kamezawa Hiroyuki
2013-03-13  9:59                 ` Sha Zhengju
2013-03-14  0:03                   ` Kamezawa Hiroyuki
2013-03-06 10:50       ` Kamezawa Hiroyuki
2013-03-19 12:46   ` Michal Hocko
2013-03-19 12:55     ` Michal Hocko
2013-03-20  7:03       ` Glauber Costa
2013-03-20  8:03         ` Michal Hocko
2013-03-20  8:08           ` Glauber Costa
2013-03-20  8:18             ` Michal Hocko
2013-03-20  8:34               ` Glauber Costa
2013-03-20  8:58                 ` Michal Hocko
2013-03-20  9:30                   ` Glauber Costa
2013-03-21  6:08                     ` Kamezawa Hiroyuki
2013-03-20 16:40             ` Anton Vorontsov
2013-03-20  7:04     ` Glauber Costa
2013-03-05 13:10 ` [PATCH v2 3/5] memcg: make it suck faster Glauber Costa
2013-03-06  0:46   ` Kamezawa Hiroyuki
2013-03-06  8:38     ` Glauber Costa
2013-03-06 10:54       ` Kamezawa Hiroyuki
2013-03-13  8:08   ` Sha Zhengju
2013-03-20  7:13     ` Glauber Costa
2013-03-19 13:58   ` Michal Hocko
2013-03-20  7:00     ` Glauber Costa [this message]
2013-03-20  8:13       ` Michal Hocko
2013-03-05 13:10 ` [PATCH v2 4/5] memcg: do not call page_cgroup_init at system_boot Glauber Costa
2013-03-06  1:07   ` Kamezawa Hiroyuki
2013-03-06  8:22     ` Glauber Costa
2013-03-19 14:06   ` Michal Hocko
2013-03-05 13:10 ` [PATCH v2 5/5] memcg: do not walk all the way to the root for memcg Glauber Costa
2013-03-06  1:08   ` Kamezawa Hiroyuki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51495E73.8090409@parallels.com \
    --to=glommer@parallels.com \
    --cc=akpm@linux-foundation.org \
    --cc=anton.vorontsov@linaro.org \
    --cc=cgroups@vger.kernel.org \
    --cc=handai.szj@gmail.com \
    --cc=hannes@cmpxchg.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox