From: Evgeniy Ivanov <lolkaantimat@gmail.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: linux-mm@kvack.org
Subject: Re: Question about cgroup hierarchy and reducing memory limit
Date: Thu, 25 Nov 2010 13:51:06 +0300 [thread overview]
Message-ID: <AANLkTinQ_sqpEc=-vcCQvpp98ny5HSDVvqD_R6_YE3-C@mail.gmail.com> (raw)
In-Reply-To: <20101125100428.24920cd3.kamezawa.hiroyu@jp.fujitsu.com>
Thank you very much for very detailed explanation.
On Thu, Nov 25, 2010 at 4:04 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> Thanks.
>
> On Wed, 24 Nov 2010 15:17:38 +0300
> Evgeniy Ivanov <lolkaantimat@gmail.com> wrote:
>> > What kinds of error ? Do you have swap ? What is the kerenel version ?
>>
>> Kernel is 2.6.31-5 from SLES-SP1 (my build, but without extra patches).
>> I have 2 Gb swap and just 40 Mb used. Machine has 3 Gb RAM and no load
>> (neither mem or CPU).
>>
> Hmm, maybe I should see 2.6.32.
Oh, yes. You're right, I took a wrong version, but still my 2.6.31-5
is from Novell. I can use 2.6.32 as well (I see it in packages).
>> Error is "-bash: echo: write error: Device or resource busy", when I
>> write to memory.limit_in_bytes.
>>
> Ok.
>
>> > It's designed to allow "shrink at once" but that means release memory
>> > and do forced-writeback. To release memory, it may have to write back
>> > to swap. If tasks in "A" and "B" are too busy and tocuhes tons of memory
>> > while shrinking, it may fail.
>>
>> Well, in test I have a process which uses 30M of memory and in loop
>> dirties all pages (just single byte) then sleeps 5 seconds before next
>> iteration.
>>
>> > It may be a regression. Kernel version is important.
>> >
>> > Could you show memory.stat file when you shrink "A" and "B" ?
>> > And what happnes
>> > # sync
>> > # sync
>> > # sync
>> > # reduce memory A
>> > # reduce memory B
>>
>> Sync doesn't help. Here is log just for memory.stat for group I tried to shrink:
>>
>> ivanoe:/cgroups/root# cat C/memory.stat
>> cache 0
>> rss 90222592
>
> Hmm, memcg is filled with 86MB of anon pages....So, all "pageout" in this
> will go swap.
>
>> mapped_file 0
>> pgpgin 1212770
>> pgpgout 1190743
>> inactive_anon 45338624
>> active_anon 44883968
>
> (Off topic) IIUC, this active/inactive ratio has been modified in recent kernel.
> So, new swapout may do different behavior.
>
>> inactive_file 0
>> active_file 0
>> unevictable 0
>> hierarchical_memory_limit 94371840
>> hierarchical_memsw_limit 9223372036854775807
>> total_cache 0
>> total_rss 90222592
>> total_mapped_file 0
>> total_pgpgin 1212770
>> total_pgpgout 1190743
>> total_inactive_anon 45338624
>> total_active_anon 44883968
>> total_inactive_file 0
>> total_active_file 0
>> total_unevictable 0
>> ivanoe:/cgroups/root# echo 30M > C/memory.limit_in_bytes
>> -bash: echo: write error: Device or resource busy
>> ivanoe:/cgroups/root# echo 30M > C/memory.limit_in_bytes
>> -bash: echo: write error: Device or resource busy
>> ivanoe:/cgroups/root# echo 30M > C/memory.limit_in_bytes
>> -bash: echo: write error: Device or resource busy
>> ivanoe:/cgroups/root# echo 30M > C/memory.limit_in_bytes
>
>
> So, this means reducing limit from 90M->30M and
> failure of 50MB swapout.
>
>> ivanoe:/cgroups/root# cat memory.limit_in_bytes
>> 125829120
>> ivanoe:/cgroups/root# cat B/memory.limit_in_bytes
>> 62914560
>> ivanoe:/cgroups/root# cat A/memory.limit_in_bytes
>> 20971520
>>
>
> Ah....I have to explain this.
>
> (root) limited to 120MB
> (A) limited to 60MB and this is children of (root)
> (B) limited to 20MB and this is children of (root)
> (C) limited to 90MB(now) and this is children of (root)
>
> And now, you want to set limit of (C) to 30MB.
>
> At first, memory cgroup has 2 mode. Do you know memory.use_hierarchy file ?
>
> If memory.use_hierarchy == 0, all cgroups under the cgroup are flat.
> In above, if root/memory.use_hierarhy == 0, A and B and C and (root) are
> all independent from each other.
>
> If memory.use_hierarchy == 1, all cgroups under the cgroup are in tree.
> In above, if root/memory.use_hierarchy == 1, A and B and C works as children
> of (root) and usage of A+B+C is limited by (root).
>
> If you use root/memory.use_hierarchy==0, changing limit of C doesn't affect to
> (root) and (root/A) and (root/B). All works will be done in C and you can set
> arbitrary limit.
>
> Even if you use root/memory.use_hierarchy==1, changing limit of C will not
> affect to (root) and (root/A) and (root/B). All pageout will be done in C
> but you can't set limit larger than (root).
Thank you for explanation. I use memory.use_hierarchy=1, I don't want
all pageout done in C, that's why originally I was trying to change
limits of A and B before adding C (same problem as with changing
limits for C).
>
> (Off topic)If you use root/memory.use_hierarchy==1, changing limit of (root)
> will affect (A) and (B) and (C). Then memory are reclaimed from (A) and (B)
> and (C) because (root) is parent of (A) and (B) and (C).
>
>
>
> So, in this case, only "C" is the problem.
Kind of, A and B are not good too. I guess it's related to decreasing
memory limit of any group.
> And, at swapout, it may be problem how swap is slow.
>
> The logic of pageout(swapout) at shrinking is:
>
> 0. retry_count=5
> 1. usage = current_usage
> 2. limit = new limit.
> 3. if (usage < limit) => goto end(success)
> 4. try to reclaim memory.
> 5. new_usage = current_usage
> 6. if (new_usage >= usage) retry_count--
> 7. if (retry_count < 0) goto end(-EBUSY)
>
> So, It depends on workload(swapin) and speed of swapout whether it will success.
> It seems pagein in "C" is faster than swapout of shrinking itelation.
>
> So, why you succeed to reduce limit by 1MB is maybe because pagein is blocked
> by hitting memory limit. So, shrink usage can success.
I see, that makes sense.
> To make success rate higher, it seems
> 1) memory cgroup should do harder retry
> Difficulty with this is that we have no guarantee.
> or
> 2) memory cgroup should block pagein.
> Difficulty with this is tasks may stop too long. (if new limit is bad.)
> I may not be able to give you good advise about SLES.
> I'll think about some and write a patch. Thank you for reporting.
> I hope my patch may be able to be backported.
That would be great, thanks!
For now we decided either to use decreasing limits in script with
timeout or controlling the limit just by root group.
--
Evgeniy Ivanov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-11-25 10:51 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-22 16:59 Evgeniy Ivanov
2010-11-24 0:47 ` KAMEZAWA Hiroyuki
2010-11-24 12:17 ` Evgeniy Ivanov
2010-11-25 1:04 ` KAMEZAWA Hiroyuki
2010-11-25 10:51 ` Evgeniy Ivanov [this message]
2010-11-29 6:58 ` KAMEZAWA Hiroyuki
2010-11-29 14:02 ` Balbir Singh
2010-11-30 0:03 ` KAMEZAWA Hiroyuki
2010-11-30 1:23 ` KAMEZAWA Hiroyuki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='AANLkTinQ_sqpEc=-vcCQvpp98ny5HSDVvqD_R6_YE3-C@mail.gmail.com' \
--to=lolkaantimat@gmail.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox