Re: [PATCH v2] mm/vmscan: get number of pages on the LRU list in memcgroup base on lru_zone_size

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Honglei Wang <honglei.wang@oracle.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: linux-mm@kvack.org, vdavydov.dev@gmail.com, hannes@cmpxchg.org
Subject: Re: [PATCH v2] mm/vmscan: get number of pages on the LRU list in memcgroup base on lru_zone_size
Date: Tue, 8 Oct 2019 17:34:03 +0800	[thread overview]
Message-ID: <991b4719-a2a0-9efe-de02-56a928752fe3@oracle.com> (raw)
In-Reply-To: <20191007142805.GM2381@dhcp22.suse.cz>



On 10/7/19 10:28 PM, Michal Hocko wrote:
> On Thu 05-09-19 15:10:34, Honglei Wang wrote:
>> lruvec_lru_size() is involving lruvec_page_state_local() to get the
>> lru_size in the current code. It's base on lruvec_stat_local.count[]
>> of mem_cgroup_per_node. This counter is updated in batch. It won't
>> do charge if the number of coming pages doesn't meet the needs of
>> MEMCG_CHARGE_BATCH who's defined as 32 now.
>>
>> The testcase in LTP madvise09[1] fails due to small block memory is
>> not charged. It creates a new memcgroup and sets up 32 MADV_FREE
>> pages. Then it forks child who will introduce memory pressure in the
>> memcgroup. The MADV_FREE pages are expected to be released under the
>> pressure, but 32 is not more than MEMCG_CHARGE_BATCH and these pages
>> won't be charged in lruvec_stat_local.count[] until some more pages
>> come in to satisfy the needs of batch charging. So these MADV_FREE
>> pages can't be freed in memory pressure which is a bit conflicted
>> with the definition of MADV_FREE.
> 
> The test case is simly wrong. The caching and the batch size is an
> internal implementation detail. Moreover MADV_FREE is a _hint_ so all
> you can say is that those pages will get freed at some point in time but
> you cannot make any assumptions about when that moment happens.
> 

This is a corner case, it makes extremely memory pressure which give the 
group no chance to satisfy the batch operation. There might be small 
chance to hit such problem in real workload -- 128K memory is really 
small in current amount of memory usage. I know exactly what you mean. 
The batch size is internal implementation detail, this *test case* just 
happen hit it in black box.

>> Getting lru_size base on lru_zone_size of mem_cgroup_per_node which
>> is not updated in batch can make it a bit more accurate in similar
>> scenario.
> 
> What does that mean? It would be more helpful to describe the code path
> which will use this more precise value and what is the effect of that.
> 

How about we describe it like this:

Get the lru_size base on lru_zone_size of mem_cgroup_per_node which is 
not updated via batching can help any related code path get more precise 
lru size in mem_cgroup case. This makes memory reclaim code won't ignore 
small blocks of memory(say, less than MEMCG_CHARGE_BATCH pages) in the 
lru list.

For this specific MADV_FREE page case, more precise lru size helps 
release the pages less than 32 as expected.

Thanks,
Honglei

> As I've said in the previous version, I do not object to the patch
> because a more precise lruvec_lru_size sounds like a nice thing as long
> as we are not paying a high price for that. Just look at the global case
> for mem_cgroup_disabled(). It uses node_page_state and that one is using
> per-cpu accounting with regular global value refreshing IIRC.
> 
>> [1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise09.c
>>
>> Signed-off-by: Honglei Wang <honglei.wang@oracle.com>
>> ---
>>   mm/vmscan.c | 9 +++++----
>>   1 file changed, 5 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index c77d1e3761a7..c28672460868 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -354,12 +354,13 @@ unsigned long zone_reclaimable_pages(struct zone *zone)
>>    */
>>   unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru, int zone_idx)
>>   {
>> -	unsigned long lru_size;
>> +	unsigned long lru_size = 0;
>>   	int zid;
>>   
>> -	if (!mem_cgroup_disabled())
>> -		lru_size = lruvec_page_state_local(lruvec, NR_LRU_BASE + lru);
>> -	else
>> +	if (!mem_cgroup_disabled()) {
>> +		for (zid = 0; zid < MAX_NR_ZONES; zid++)
>> +			lru_size += mem_cgroup_get_zone_lru_size(lruvec, lru, zid);
>> +	} else
>>   		lru_size = node_page_state(lruvec_pgdat(lruvec), NR_LRU_BASE + lru);
>>   
>>   	for (zid = zone_idx + 1; zid < MAX_NR_ZONES; zid++) {
>> -- 
>> 2.17.0
>

next prev parent reply	other threads:[~2019-10-08  9:34 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-05  7:10 Honglei Wang
2019-10-06  0:10 ` Andrew Morton
2019-10-07 14:28 ` Michal Hocko
2019-10-08  9:34   ` Honglei Wang [this message]
2019-10-09 14:16     ` Michal Hocko
2019-10-10  8:40       ` Honglei Wang
2019-10-10 14:33         ` Michal Hocko
2019-10-11  1:40           ` Honglei Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=991b4719-a2a0-9efe-de02-56a928752fe3@oracle.com \
    --to=honglei.wang@oracle.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox