From: Greg Thelen <gthelen@google.com>
To: Glauber Costa <glommer@parallels.com>
Cc: Glauber Costa <glommer@openvz.org>,
linux-mm@kvack.org, cgroups@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Dave Chinner <david@fromorbit.com>,
Anton Vorontsov <anton.vorontsov@linaro.org>,
John Stultz <john.stultz@linaro.org>,
Joonsoo Kim <js1304@gmail.com>, Michal Hocko <mhocko@suse.cz>,
Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCH 1/2] vmpressure: in-kernel notifications
Date: Wed, 24 Apr 2013 12:35:21 -0700 [thread overview]
Message-ID: <xr93k3nrhgcm.fsf@gthelen.mtv.corp.google.com> (raw)
In-Reply-To: <51779970.4010101@parallels.com> (Glauber Costa's message of "Wed, 24 Apr 2013 12:36:00 +0400")
On Wed, Apr 24 2013, Glauber Costa wrote:
> On 04/24/2013 11:21 AM, Greg Thelen wrote:
>> On Tue, Apr 23 2013, Glauber Costa wrote:
>>
>>> From: Glauber Costa <glommer@parallels.com>
>>>
>>> During the past weeks, it became clear to us that the shrinker interface
>>> we have right now works very well for some particular types of users,
>>> but not that well for others. The later are usually people interested in
>>> one-shot notifications, that were forced to adapt themselves to the
>>> count+scan behavior of shrinkers. To do so, they had no choice than to
>>> greatly abuse the shrinker interface producing little monsters all over.
>>>
>>> During LSF/MM, one of the proposals that popped out during our session
>>> was to reuse Anton Voronstsov's vmpressure for this. They are designed
>>> for userspace consumption, but also provide a well-stablished,
>>> cgroup-aware entry point for notifications.
>>>
>>> This patch extends that to also support in-kernel users. Events that
>>> should be generated for in-kernel consumption will be marked as such,
>>> and for those, we will call a registered function instead of triggering
>>> an eventfd notification.
>>>
>>> Please note that due to my lack of understanding of each shrinker user,
>>> I will stay away from converting the actual users, you are all welcome
>>> to do so.
>>>
>>> Signed-off-by: Glauber Costa <glommer@openvz.org>
>>> Cc: Dave Chinner <david@fromorbit.com>
>>> Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
>>> Cc: John Stultz <john.stultz@linaro.org>
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> Cc: Joonsoo Kim <js1304@gmail.com>
>>> Cc: Michal Hocko <mhocko@suse.cz>
>>> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>>> ---
>>> include/linux/vmpressure.h | 6 ++++++
>>> mm/vmpressure.c | 48 ++++++++++++++++++++++++++++++++++++++++++----
>>> 2 files changed, 50 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h
>>> index 76be077..1862012 100644
>>> --- a/include/linux/vmpressure.h
>>> +++ b/include/linux/vmpressure.h
>>> @@ -19,6 +19,9 @@ struct vmpressure {
>>> /* Have to grab the lock on events traversal or modifications. */
>>> struct mutex events_lock;
>>>
>>> + /* false if only kernel users want to be notified, true otherwise */
>>> + bool notify_userspace;
>>> +
>>> struct work_struct work;
>>> };
>>>
>>> @@ -36,6 +39,9 @@ extern struct vmpressure *css_to_vmpressure(struct cgroup_subsys_state *css);
>>> extern int vmpressure_register_event(struct cgroup *cg, struct cftype *cft,
>>> struct eventfd_ctx *eventfd,
>>> const char *args);
>>> +
>>> +extern int vmpressure_register_kernel_event(struct cgroup *cg,
>>> + void (*fn)(void));
>>> extern void vmpressure_unregister_event(struct cgroup *cg, struct cftype *cft,
>>> struct eventfd_ctx *eventfd);
>>> #else
>>> diff --git a/mm/vmpressure.c b/mm/vmpressure.c
>>> index 736a601..8d77ad0 100644
>>> --- a/mm/vmpressure.c
>>> +++ b/mm/vmpressure.c
>>> @@ -135,8 +135,12 @@ static enum vmpressure_levels vmpressure_calc_level(unsigned long scanned,
>>> }
>>>
>>> struct vmpressure_event {
>>> - struct eventfd_ctx *efd;
>>> + union {
>>> + struct eventfd_ctx *efd;
>>> + void (*fn)(void);
>>> + };
>>> enum vmpressure_levels level;
>>> + bool kernel_event;
>>> struct list_head node;
>>> };
>>>
>>> @@ -152,7 +156,9 @@ static bool vmpressure_event(struct vmpressure *vmpr,
>>> mutex_lock(&vmpr->events_lock);
>>>
>>> list_for_each_entry(ev, &vmpr->events, node) {
>>> - if (level >= ev->level) {
>>> + if (ev->kernel_event)
>>> + ev->fn();
>>> + else if (vmpr->notify_userspace && (level >= ev->level)) {
>>> eventfd_signal(ev->efd, 1);
>>> signalled = true;
>>> }
>>> @@ -227,7 +233,7 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
>>> * we account it too.
>>> */
>>> if (!(gfp & (__GFP_HIGHMEM | __GFP_MOVABLE | __GFP_IO | __GFP_FS)))
>>> - return;
>>> + goto schedule;
>>>
>>> /*
>>> * If we got here with no pages scanned, then that is an indicator
>>> @@ -238,14 +244,16 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
>>> * through vmpressure_prio(). But so far, keep calm.
>>> */
>>> if (!scanned)
>>> - return;
>>> + goto schedule;
>>
>> If goto schedule is taken here then scanned==0. Then
>> scanned<vmpressure_win (below), so this function would always simply
>> return. So this change seems like a no-op. Should the schedule: below
>> be just before schedule_work(&vmpr->work)? But this wouldn't do much
>> either because vmpressure_work_fn() would immediately return if
>> vmpr->scanned==0. Presumable the idea is to avoid notifying user space
>> or kernel callbacks if lru pages are not scanned - at least until
>> vmpressure_prio() is called with a priority more desperate than
>> vmpressure_level_critical_prio at which time this function's scanned!=0.
>>
>
> Yes, the idea is to avoid calling the callbacks. I can just return at
> this point if you prefer. I figured that jumping to the common entry
> point would be more consistent, only that. I don't care either way.
Leave it as is. I don't really care either way.
>>> mutex_lock(&vmpr->sr_lock);
>>> vmpr->scanned += scanned;
>>> vmpr->reclaimed += reclaimed;
>>> + vmpr->notify_userspace = true;
>>> scanned = vmpr->scanned;
>>> mutex_unlock(&vmpr->sr_lock);
>>>
>>> +schedule:
>>> if (scanned < vmpressure_win || work_pending(&vmpr->work))
>>> return;
>>> schedule_work(&vmpr->work);
>>> @@ -328,6 +336,38 @@ int vmpressure_register_event(struct cgroup *cg, struct cftype *cft,
>>> }
>>>
>>> /**
>>> + * vmpressure_register_kernel_event() - Register kernel-side notification
>>> + * @cg: cgroup that is interested in vmpressure notifications
>>> + * @fn: function to be called when pressure happens
>>> + *
>>> + * This function register in-kernel users interested in receiving notifications
>>> + * about pressure conditions. Pressure notifications will be triggered at the
>>> + * same time as userspace notifications (with no particular ordering relative
>>> + * to it).
>>> + *
>>> + * Pressure notifications are a alternative method to shrinkers and will serve
>>> + * well users that are interested in a one-shot notification, with a
>>> + * well-defined cgroup aware interface.
>>> + */
>>> +int vmpressure_register_kernel_event(struct cgroup *cg, void (*fn)(void))
>>
>> It seems useful to include the "struct cgroup *" as a parameter to fn.
>> This would allow for fn to shrink objects it's caching in the cgroup.
>>
>> Also, why not allow level specification for kernel events?
>>
> Because I don't want to overdesign. This is a in-kernel API, so we can
> change it if we want to. There is only one user, and that is called from
> the root cgroup, without level distinction.
>
> The cgroup argument makes sense, but I would rather leave it as is for
> now. As for levels, it might make sense as well, but I would much rather
> leave the implementation to someone actually using them - specially
> since this is not a simple parameter passing.
If there's going to be a single global listener for now, then I agree.
Leave your change as-is.
>> It might be neat if vmpressure_register_event() used
>> vmpressure_register_kernel_event() with a callback function calls
>> eventfd_signal(). This would allow for a uniform event notification
>> type which is agnostic of user vs kernel. However, as proposed there
>> are different signaling conditions. So I'm not sure it's worth the time
>> to combine the even types. So feel free to ignore this paragraph.
>>
>
> I don't think it is worth it.
Fine with me.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-04-24 19:35 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-23 8:22 [PATCH 0/2] reuse vmpressure for in-kernel events Glauber Costa
2013-04-23 8:22 ` [PATCH 1/2] vmpressure: in-kernel notifications Glauber Costa
2013-04-23 17:11 ` Anton Vorontsov
2013-04-23 18:17 ` Glauber Costa
2013-04-23 19:13 ` Pekka Enberg
2013-04-23 20:24 ` Anton Vorontsov
2013-04-23 21:01 ` Anton Vorontsov
2013-04-24 6:26 ` Glauber Costa
2013-04-24 11:20 ` Glauber Costa
2013-04-24 7:21 ` Greg Thelen
2013-04-24 8:36 ` Glauber Costa
2013-04-24 19:35 ` Greg Thelen [this message]
2013-04-24 19:42 ` Greg Thelen
2013-04-24 20:04 ` Glauber Costa
2013-04-25 10:50 ` Glauber Costa
2013-04-25 18:34 ` Greg Thelen
2013-04-23 8:22 ` [PATCH 2/2] memcg: reap dead memcgs under pressure Glauber Costa
2013-04-25 12:50 ` Li Zefan
2013-04-26 7:38 ` Glauber Costa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xr93k3nrhgcm.fsf@gthelen.mtv.corp.google.com \
--to=gthelen@google.com \
--cc=akpm@linux-foundation.org \
--cc=anton.vorontsov@linaro.org \
--cc=cgroups@vger.kernel.org \
--cc=david@fromorbit.com \
--cc=glommer@openvz.org \
--cc=glommer@parallels.com \
--cc=hannes@cmpxchg.org \
--cc=john.stultz@linaro.org \
--cc=js1304@gmail.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox