From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Menage <menage@google.com>, Li Zefan <lizf@cn.fujitsu.com>,
containers@lists.linux-foundation.org,
Andrew Morton <akpm@linux-foundation.org>,
Balbir Singh <balbir@linux.vnet.ibm.com>,
Pavel Emelyanov <xemul@openvz.org>,
Dan Malek <dan@embeddedalley.com>,
Vladislav Buzov <vbuzov@embeddedalley.com>,
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH RFC v2 1/4] cgroup: implement eventfd-based generic API for notifications
Date: Tue, 15 Dec 2009 12:30:27 +0200 [thread overview]
Message-ID: <cc557aab0912150230g54863bb8rabc8b8c1c58d5a55@mail.gmail.com> (raw)
In-Reply-To: <20091215183533.1a1e87d9.kamezawa.hiroyu@jp.fujitsu.com>
On Tue, Dec 15, 2009 at 11:35 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Tue, 15 Dec 2009 11:11:16 +0200
> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
>
>> Could anybody review the patch?
>>
>> Thank you.
>
> some nitpicks.
>
>>
>> On Sat, Dec 12, 2009 at 12:59 AM, Kirill A. Shutemov
>> <kirill@shutemov.name> wrote:
>
>> > + /*
>> > + * Unregister events and notify userspace.
>> > + * FIXME: How to avoid race with cgroup_event_remove_work()
>> > + * which runs from workqueue?
>> > + */
>> > + mutex_lock(&cgrp->event_list_mutex);
>> > + list_for_each_entry_safe(event, tmp, &cgrp->event_list, list) {
>> > + cgroup_event_remove(event);
>> > + eventfd_signal(event->eventfd, 1);
>> > + }
>> > + mutex_unlock(&cgrp->event_list_mutex);
>> > +
>> > +out:
>> > return ret;
>> > }
>
> How ciritical is this FIXME ?
There is potential race. I have never seen it. When userspace closes
eventfd associated
with cgroup event, cgroup_event_remove() will not be called
immediately. It will be called
later from workqueue. If somebody removes cgroup before the workqueue calls
cgroup_event_remove() we will get problem.
It's unlikely, but theoretically possible.
> But Hmm..can't we use RCU ?
I'll play with it.
>> >
>> > @@ -1136,6 +1187,8 @@ static void init_cgroup_housekeeping(struct cgroup *cgrp)
>> > INIT_LIST_HEAD(&cgrp->release_list);
>> > INIT_LIST_HEAD(&cgrp->pidlists);
>> > mutex_init(&cgrp->pidlist_mutex);
>> > + INIT_LIST_HEAD(&cgrp->event_list);
>> > + mutex_init(&cgrp->event_list_mutex);
>> > }
>> >
>> > static void init_cgroup_root(struct cgroupfs_root *root)
>> > @@ -1935,6 +1988,16 @@ static const struct inode_operations cgroup_dir_inode_operations = {
>> > .rename = cgroup_rename,
>> > };
>> >
>> > +/*
>> > + * Check if a file is a control file
>> > + */
>> > +static inline struct cftype *__file_cft(struct file *file)
>> > +{
>> > + if (file->f_dentry->d_inode->i_fop != &cgroup_file_operations)
>> > + return ERR_PTR(-EINVAL);
>> > + return __d_cft(file->f_dentry);
>> > +}
>> > +
>> > static int cgroup_create_file(struct dentry *dentry, mode_t mode,
>> > struct super_block *sb)
>> > {
>> > @@ -2789,6 +2852,151 @@ static int cgroup_write_notify_on_release(struct cgroup *cgrp,
>> > return 0;
>> > }
>> >
>> > +static inline void cgroup_event_remove(struct cgroup_event *event)
>> > +{
>> > + struct cgroup *cgrp = event->cgrp;
>> > +
>> > + BUG_ON(event->cft->unregister_event(cgrp, event->cft, event->eventfd));
>
> Hmm ? BUG ? If bug, please add document or comment.
I'll remove it, since we check it in cgroup_write_event_control().
>> > + eventfd_ctx_put(event->eventfd);
>> > + remove_wait_queue(event->wqh, &event->wait);
>> > + list_del(&event->list);
>
> please add comment as /* event_list_mutex must be held */
Ok.
>> > + kfree(event);
>> > +}
>> > +
>> > +static void cgroup_event_remove_work(struct work_struct *work)
>> > +{
>> > + struct cgroup_event *event = container_of(work, struct cgroup_event,
>> > + remove);
>> > + struct cgroup *cgrp = event->cgrp;
>> > +
>> > + mutex_lock(&cgrp->event_list_mutex);
>> > + cgroup_event_remove(event);
>> > + mutex_unlock(&cgrp->event_list_mutex);
>> > +}
>> > +
>> > +static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
>> > + int sync, void *key)
>> > +{
>> > + struct cgroup_event *event = container_of(wait,
>> > + struct cgroup_event, wait);
>> > + unsigned long flags = (unsigned long)key;
>> > +
>> > + if (flags & POLLHUP)
>> > + /*
>> > + * This function called with spinlock taken, but
>> > + * cgroup_event_remove() may sleep, so we have
>> > + * to run it in a workqueue.
>> > + */
>> > + schedule_work(&event->remove);
>> > +
>> > + return 0;
>> > +}
>
>> > +
>> > +static void cgroup_event_ptable_queue_proc(struct file *file,
>> > + wait_queue_head_t *wqh, poll_table *pt)
>> > +{
>> > + struct cgroup_event *event = container_of(pt,
>> > + struct cgroup_event, pt);
>> > +
>> > + event->wqh = wqh;
>> > + add_wait_queue(wqh, &event->wait);
>> > +}
>> > +
>> > +static int cgroup_write_event_control(struct cgroup *cont, struct cftype *cft,
>> > + const char *buffer)
>> > +{
>> > + struct cgroup_event *event = NULL;
>> > + unsigned int efd, cfd;
>> > + struct file *efile = NULL;
>> > + struct file *cfile = NULL;
>> > + char *endp;
>> > + int ret;
>> > +
>> > + efd = simple_strtoul(buffer, &endp, 10);
>> > + if (*endp != ' ')
>> > + return -EINVAL;
>> > + buffer = endp + 1;
>> > +
>> > + cfd = simple_strtoul(buffer, &endp, 10);
>> > + if ((*endp != ' ') && (*endp != '\0'))
>> > + return -EINVAL;
>> > + buffer = endp + 1;
>> > +
>> > + event = kzalloc(sizeof(*event), GFP_KERNEL);
>> > + if (!event)
>> > + return -ENOMEM;
>> > + event->cgrp = cont;
>> > + INIT_LIST_HEAD(&event->list);
>> > + init_poll_funcptr(&event->pt, cgroup_event_ptable_queue_proc);
>> > + init_waitqueue_func_entry(&event->wait, cgroup_event_wake);
>> > + INIT_WORK(&event->remove, cgroup_event_remove_work);
>> > +
>> > + efile = eventfd_fget(efd);
>> > + if (IS_ERR(efile)) {
>> > + ret = PTR_ERR(efile);
>> > + goto fail;
>> > + }
>> > +
>> > + event->eventfd = eventfd_ctx_fileget(efile);
>> > + if (IS_ERR(event->eventfd)) {
>> > + ret = PTR_ERR(event->eventfd);
>> > + goto fail;
>> > + }
>> > +
>> > + cfile = fget(cfd);
>> > + if (!cfile) {
>> > + ret = -EBADF;
>> > + goto fail;
>> > + }
>> > +
>> > + /* the process need read permission on control file */
>> > + ret = file_permission(cfile, MAY_READ);
>> > + if (ret < 0)
>> > + goto fail;
>> > +
>> > + event->cft = __file_cft(cfile);
>> > + if (IS_ERR(event->cft)) {
>> > + ret = PTR_ERR(event->cft);
>> > + goto fail;
>> > + }
>> > +
>> > + if (!event->cft->register_event || !event->cft->unregister_event) {
>> > + ret = -EINVAL;
>> > + goto fail;
>> > + }
>> > +
>> > + ret = event->cft->register_event(cont, event->cft,
>> > + event->eventfd, buffer);
>> > + if (ret)
>> > + goto fail;
>> > +
>> > + efile->f_op->poll(efile, &event->pt);
>
> Not necessary to check return value ?
You are right. We need to check return value for POLLHUP.
Thanks!
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-12-15 10:30 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-12-11 22:59 [PATCH RFC v2 0/4] cgroup notifications API and memory thresholds Kirill A. Shutemov
2009-12-11 22:59 ` [PATCH RFC v2 1/4] cgroup: implement eventfd-based generic API for notifications Kirill A. Shutemov
2009-12-11 22:59 ` [PATCH RFC v2 2/4] memcg: extract mem_group_usage() from mem_cgroup_read() Kirill A. Shutemov
2009-12-11 22:59 ` [PATCH RFC v2 3/4] memcg: rework usage of stats by soft limit Kirill A. Shutemov
2009-12-11 22:59 ` [PATCH RFC v2 4/4] memcg: implement memory thresholds Kirill A. Shutemov
2009-12-12 3:19 ` Daisuke Nishimura
2009-12-12 13:11 ` Kirill A. Shutemov
2009-12-12 13:13 ` Kirill A. Shutemov
2009-12-15 1:58 ` KAMEZAWA Hiroyuki
2009-12-15 10:46 ` Kirill A. Shutemov
2009-12-15 11:09 ` KAMEZAWA Hiroyuki
2009-12-12 3:50 ` [PATCH RFC v2 3/4] memcg: rework usage of stats by soft limit Daisuke Nishimura
2009-12-12 13:06 ` Kirill A. Shutemov
2009-12-12 14:34 ` Daisuke Nishimura
2009-12-12 19:46 ` Kirill A. Shutemov
2009-12-13 1:30 ` Daisuke Nishimura
2009-12-15 1:35 ` KAMEZAWA Hiroyuki
2009-12-15 7:48 ` Kirill A. Shutemov
2009-12-15 8:07 ` KAMEZAWA Hiroyuki
2009-12-16 8:40 ` [PATCH RFC v2 2/4] memcg: extract mem_group_usage() from mem_cgroup_read() Balbir Singh
2009-12-15 9:11 ` [PATCH RFC v2 1/4] cgroup: implement eventfd-based generic API for notifications Kirill A. Shutemov
2009-12-15 9:35 ` KAMEZAWA Hiroyuki
2009-12-15 10:30 ` Kirill A. Shutemov [this message]
2009-12-15 15:03 ` Kirill A. Shutemov
2009-12-15 23:55 ` KAMEZAWA Hiroyuki
2009-12-16 1:44 ` Li Zefan
2009-12-16 2:00 ` Li Zefan
2009-12-16 5:46 ` Kirill A. Shutemov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cc557aab0912150230g54863bb8rabc8b8c1c58d5a55@mail.gmail.com \
--to=kirill@shutemov.name \
--cc=akpm@linux-foundation.org \
--cc=balbir@linux.vnet.ibm.com \
--cc=containers@lists.linux-foundation.org \
--cc=dan@embeddedalley.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lizf@cn.fujitsu.com \
--cc=menage@google.com \
--cc=nishimura@mxp.nes.nec.co.jp \
--cc=vbuzov@embeddedalley.com \
--cc=xemul@openvz.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox