From: Glauber Costa <glommer@parallels.com>
To: Andrew Morton <akpm@linux-foundation.org>,
Dave Chinner <dchinner@redhat.com>
Cc: Glauber Costa <glommer@openvz.org>,
linux-fsdevel@vger.kernel.org, Mel Gorman <mgorman@suse.de>,
Dave Chinner <david@fromorbit.com>,
linux-mm@kvack.org, cgroups@vger.kernel.org,
kamezawa.hiroyu@jp.fujitsu.com, Michal Hocko <mhocko@suse.cz>,
Johannes Weiner <hannes@cmpxchg.org>,
hughd@google.com, Greg Thelen <gthelen@google.com>,
Rik van Riel <riel@redhat.com>
Subject: Re: [PATCH v10 29/35] memcg: per-memcg kmem shrinking
Date: Fri, 7 Jun 2013 10:10:16 +0400 [thread overview]
Message-ID: <51B17948.1000204@parallels.com> (raw)
In-Reply-To: <20130606152315.69603127cca33e54b1ed428e@linux-foundation.org>
On 06/07/2013 02:23 AM, Andrew Morton wrote:
> On Thu, 6 Jun 2013 16:09:16 +0400 Glauber Costa <glommer@parallels.com> wrote:
>
>>>>> then waiting for it to complete is equivalent to calling it directly.
>>>>>
>>>> Not in this case. We are in wait-capable context (we check for this
>>>> right before we reach this), but we are not in fs capable context.
>>>>
>>>> So the reason we do this - which I tried to cover in the changelog, is
>>>> to escape from the GFP_FS limitation that our call chain has, not the
>>>> wait limitation.
>>>
>>> But that's equivalent to calling the code directly. Look:
>>>
>>> some_fs_function()
>>> {
>>> lock(some-fs-lock);
>>> ...
>>> }
>>>
>>> some_other_fs_function()
>>> {
>>> lock(some-fs-lock);
>>> alloc_pages(GFP_NOFS);
>>> ->...
>>> ->schedule_work(some_fs_function);
>>> flush_scheduled_work();
>>>
>>> that flush_scheduled_work() won't complete until some_fs_function() has
>>> completed. But some_fs_function() won't complete, because we're
>>> holding some-fs-lock.
>>>
>>
>> In my experience during this series, most of the kmem allocation here
>
> "most"?
>
Yes, dentrys, inodes, buffer_heads. They constitute the bulk of kmem
allocations. (Please note that I am talking about kmem allocations only)
>> will be filesystem related. This means that we will allocate that with
>> GFP_FS on.
>
> eh? filesystems do a tremendous amount of GFP_NOFS allocation.
>
> akpm3:/usr/src/25> grep GFP_NOFS fs/*/*.c|wc -l
> 898
>
My bad, I thought one thing, wrote another. I meant GFP_FS off.
>> If we don't do anything like that, reclaim is almost
>> pointless since it will never free anything (only once here and there
>> when the allocation is not from fs).
>
> It depends what you mean by "reclaim". There are a lot of things which
> vmscan can do for a GFP_NOFS allocation. Scraping clean pagecache,
> clean swapcache, well-behaved (ahem) shrinkable caches.
I mean exclusively shrinkable caches. This code is executed only when we
reach the kernel memory limit. Therefore, we know that depleting user
pages won't help. And now that we have targeted shrinking, we shrink
just the caches.
>
>> It tend to work just fine like this. It may very well be because fs
>> people just mark everything as NOFS out of safety and we aren't *really*
>> holding any locks in common situations, but it will blow in our faces in
>> a subtle way (which none of us want).
>>
>> That said, suggestions are more than welcome.
>
> At a minimum we should remove all the schedule_work() stuff, call the
> callback function synchronously and add
>
> /* This code is full of deadlocks */
>
>
> Sorry, this part of the patchset is busted and needs a fundamental
> rethink.
>
Okay, I will go back to it soon.
I am suspecting we may have no choice but to just let the shrinkers run
asynchronously, which will fail this allocation but at least save us up
to the next.
Dave Shrinkers, would you be so kind to look at this problem from the
top of your mighty filesystem knowledge and see if you have a better
suggestion ?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-06-07 6:09 UTC|newest]
Thread overview: 103+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-03 19:29 [PATCH v10 00/35] kmemcg shrinkers Glauber Costa
2013-06-03 19:29 ` [PATCH v10 01/35] fs: bump inode and dentry counters to long Glauber Costa
2013-06-03 19:29 ` [PATCH v10 02/35] super: fix calculation of shrinkable objects for small numbers Glauber Costa
2013-06-03 19:29 ` [PATCH v10 03/35] dcache: convert dentry_stat.nr_unused to per-cpu counters Glauber Costa
2013-06-05 23:07 ` Andrew Morton
2013-06-06 1:45 ` Dave Chinner
2013-06-06 2:48 ` Andrew Morton
2013-06-06 4:02 ` Dave Chinner
2013-06-06 12:40 ` Glauber Costa
2013-06-06 22:25 ` Andrew Morton
2013-06-06 23:42 ` Dave Chinner
2013-06-07 6:03 ` Glauber Costa
2013-06-03 19:29 ` [PATCH v10 04/35] dentry: move to per-sb LRU locks Glauber Costa
2013-06-05 23:07 ` Andrew Morton
2013-06-06 1:56 ` Dave Chinner
2013-06-06 8:03 ` Glauber Costa
2013-06-06 12:51 ` Glauber Costa
2013-06-03 19:29 ` [PATCH v10 05/35] dcache: remove dentries from LRU before putting on dispose list Glauber Costa
2013-06-05 23:07 ` Andrew Morton
2013-06-06 8:04 ` Glauber Costa
2013-06-03 19:29 ` [PATCH v10 06/35] mm: new shrinker API Glauber Costa
2013-06-05 23:07 ` Andrew Morton
2013-06-06 7:58 ` Glauber Costa
2013-06-03 19:29 ` [PATCH v10 07/35] shrinker: convert superblock shrinkers to new API Glauber Costa
2013-06-03 19:29 ` [PATCH v10 08/35] list: add a new LRU list type Glauber Costa
2013-06-05 23:07 ` Andrew Morton
2013-06-06 2:49 ` Dave Chinner
2013-06-06 3:05 ` Andrew Morton
2013-06-06 4:44 ` Dave Chinner
2013-06-06 7:04 ` Andrew Morton
2013-06-06 9:03 ` Glauber Costa
2013-06-06 9:55 ` Andrew Morton
2013-06-06 11:47 ` Glauber Costa
2013-06-06 14:28 ` Glauber Costa
2013-06-06 8:10 ` Glauber Costa
2013-06-03 19:29 ` [PATCH v10 09/35] inode: convert inode lru list to generic lru list code Glauber Costa
2013-06-03 19:29 ` [PATCH v10 10/35] dcache: convert to use new lru list infrastructure Glauber Costa
2013-06-03 19:29 ` [PATCH v10 11/35] list_lru: per-node " Glauber Costa
2013-06-05 23:08 ` Andrew Morton
2013-06-06 3:21 ` Dave Chinner
2013-06-06 3:51 ` Andrew Morton
2013-06-06 8:21 ` Glauber Costa
2013-06-06 16:15 ` Glauber Costa
2013-06-06 16:48 ` Andrew Morton
2013-06-03 19:29 ` [PATCH v10 12/35] shrinker: add node awareness Glauber Costa
2013-06-05 23:08 ` Andrew Morton
2013-06-06 3:26 ` Dave Chinner
2013-06-06 3:54 ` Andrew Morton
2013-06-06 8:23 ` Glauber Costa
2013-06-03 19:29 ` [PATCH v10 13/35] vmscan: per-node deferred work Glauber Costa
2013-06-05 23:08 ` Andrew Morton
2013-06-06 3:37 ` Dave Chinner
2013-06-06 4:59 ` Dave Chinner
2013-06-06 7:12 ` Andrew Morton
2013-06-06 9:00 ` Glauber Costa
2013-06-03 19:29 ` [PATCH v10 14/35] list_lru: per-node API Glauber Costa
2013-06-03 19:29 ` [PATCH v10 15/35] fs: convert inode and dentry shrinking to be node aware Glauber Costa
2013-06-03 19:29 ` [PATCH v10 16/35] xfs: convert buftarg LRU to generic code Glauber Costa
2013-06-03 19:29 ` [PATCH v10 17/35] xfs: rework buffer dispose list tracking Glauber Costa
2013-06-03 19:29 ` [PATCH v10 18/35] xfs: convert dquot cache lru to list_lru Glauber Costa
2013-06-03 19:29 ` [PATCH v10 19/35] fs: convert fs shrinkers to new scan/count API Glauber Costa
2013-06-03 19:29 ` [PATCH v10 20/35] drivers: convert shrinkers to new count/scan API Glauber Costa
2013-06-03 19:29 ` [PATCH v10 21/35] i915: bail out earlier when shrinker cannot acquire mutex Glauber Costa
2013-06-03 19:29 ` [PATCH v10 22/35] shrinker: convert remaining shrinkers to count/scan API Glauber Costa
2013-06-05 23:08 ` Andrew Morton
2013-06-06 3:41 ` Dave Chinner
2013-06-06 8:27 ` Glauber Costa
2013-06-03 19:29 ` [PATCH v10 23/35] hugepage: convert huge zero page shrinker to new shrinker API Glauber Costa
2013-06-03 19:29 ` [PATCH v10 24/35] shrinker: Kill old ->shrink API Glauber Costa
2013-06-03 19:29 ` [PATCH v10 25/35] vmscan: also shrink slab in memcg pressure Glauber Costa
2013-06-03 19:29 ` [PATCH v10 26/35] memcg,list_lru: duplicate LRUs upon kmemcg creation Glauber Costa
2013-06-05 23:08 ` Andrew Morton
2013-06-06 8:52 ` Glauber Costa
2013-06-03 19:29 ` [PATCH v10 27/35] lru: add an element to a memcg list Glauber Costa
2013-06-05 23:08 ` Andrew Morton
2013-06-06 8:44 ` Glauber Costa
2013-06-03 19:29 ` [PATCH v10 28/35] list_lru: per-memcg walks Glauber Costa
2013-06-05 23:08 ` Andrew Morton
2013-06-06 8:37 ` Glauber Costa
2013-06-03 19:29 ` [PATCH v10 29/35] memcg: per-memcg kmem shrinking Glauber Costa
2013-06-05 23:08 ` Andrew Morton
2013-06-06 8:35 ` Glauber Costa
2013-06-06 9:49 ` Andrew Morton
2013-06-06 12:09 ` Glauber Costa
2013-06-06 22:23 ` Andrew Morton
2013-06-07 6:10 ` Glauber Costa [this message]
2013-06-03 19:29 ` [PATCH v10 30/35] memcg: scan cache objects hierarchically Glauber Costa
2013-06-05 23:08 ` Andrew Morton
2013-06-03 19:30 ` [PATCH v10 31/35] vmscan: take at least one pass with shrinkers Glauber Costa
2013-06-03 19:30 ` [PATCH v10 32/35] super: targeted memcg reclaim Glauber Costa
2013-06-03 19:30 ` [PATCH v10 33/35] memcg: move initialization to memcg creation Glauber Costa
2013-06-03 19:30 ` [PATCH v10 34/35] vmpressure: in-kernel notifications Glauber Costa
2013-06-03 19:30 ` [PATCH v10 35/35] memcg: reap dead memcgs upon global memory pressure Glauber Costa
2013-06-05 23:09 ` Andrew Morton
2013-06-06 8:33 ` Glauber Costa
2013-06-05 23:07 ` [PATCH v10 00/35] kmemcg shrinkers Andrew Morton
2013-06-06 3:44 ` Dave Chinner
2013-06-06 5:51 ` Glauber Costa
2013-06-06 7:18 ` Andrew Morton
2013-06-06 7:37 ` Glauber Costa
2013-06-06 7:47 ` Andrew Morton
2013-06-06 7:59 ` Glauber Costa
2013-06-07 14:15 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51B17948.1000204@parallels.com \
--to=glommer@parallels.com \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=david@fromorbit.com \
--cc=dchinner@redhat.com \
--cc=glommer@openvz.org \
--cc=gthelen@google.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@suse.cz \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox