From: Marcelo Tosatti <marcelo@kvack.org>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: "Marcelo Tosatti" <marcelo@kvack.org>,
linux-mm@kvack.org, "Daniel Sp蚣g" <daniel.spang@gmail.com>,
"Rik van Riel" <riel@redhat.com>,
"Andrew Morton" <akpm@linux-foundation.org>
Subject: Re: [RFC][patch 1/2] mem notifications v3 improvement for large system
Date: Thu, 27 Dec 2007 16:04:56 -0500 [thread overview]
Message-ID: <20071227210456.GB14823@dmt> (raw)
In-Reply-To: <20071225182144.D26D.KOSAKI.MOTOHIRO@jp.fujitsu.com>
On Tue, Dec 25, 2007 at 07:31:14PM +0900, KOSAKI Motohiro wrote:
> Hi
>
> I tried resolve too few notification problem.
>
> mem_notify_status global variable mean wakeup 1 process.
> it is too few.
>
> improvement step1:
> - add read method and wake up all process.
>
> 1. run >10000 process test
> console1# LANG=C; while [ 1 ] ;do sleep 1; date; vmstat 1 1 -S M -a; done
> console2# sh m.sh 12500
>
> result:
> - wakeup all unoccur neither thundering herd nor soft lock-up.
> - no swap out occured.
> - but too much free ;-)
> in my test-case, over 5GB freed.
>
>
> Wed Dec 26 03:19:20 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
> r b swpd free inact active si so bi bo in cs us sy id wa st
> 7 0 0 605 209 12778 0 0 143 11 1458 183 14 10 76 1 0
> Wed Dec 26 03:19:21 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
> r b swpd free inact active si so bi bo in cs us sy id wa st
> 6 0 0 2687 209 10769 0 0 142 11 1459 188 14 10 75 1 0
> Wed Dec 26 03:19:22 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
> r b swpd free inact active si so bi bo in cs us sy id wa st
> 2 0 0 4560 209 8968 0 0 142 11 1459 191 14 10 75 1 0
> Wed Dec 26 03:19:23 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
> r b swpd free inact active si so bi bo in cs us sy id wa st
> 1 0 0 5857 209 7724 0 0 142 11 1457 192 14 10 75 1 0
> Wed Dec 26 03:19:24 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
> r b swpd free inact active si so bi bo in cs us sy id wa st
> 1 0 0 5872 209 7724 0 0 141 11 1454 192 14 10 75 1 0
> Wed Dec 26 03:19:25 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
> r b swpd free inact active si so bi bo in cs us sy id wa st
> 1 0 0 5884 209 7724 0 0 141 11 1451 192 14 10 75 1 0
> Wed Dec 26 03:19:26 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
> r b swpd free inact active si so bi bo in cs us sy id wa st
> 1 0 0 5895 209 7724 0 0 140 11 1448 191 14 10 75 1 0
> Wed Dec 26 03:19:27 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
> r b swpd free inact active si so bi bo in cs us sy id wa st
> 1 0 0 5904 209 7724 0 0 140 11 1445 191 14 10 75 1 0
> Wed Dec 26 03:19:28 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
> r b swpd free inact active si so bi bo in cs us sy id wa st
> 1 0 0 5912 209 7724 0 0 140 11 1442 190 13 10 75 1 0
> Wed Dec 26 03:19:29 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
> r b swpd free inact active si so bi bo in cs us sy id wa st
> 1 0 0 5920 209 7724 0 0 139 11 1439 190 13 10 75 1 0
> Wed Dec 26 03:19:30 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
> r b swpd free inact active si so bi bo in cs us sy id wa st
> 1 1 0 5929 209 7724 0 0 139 11 1436 189 13 10 75 1 0
> Wed Dec 26 03:19:32 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
> r b swpd free inact active si so bi bo in cs us sy id wa st
> 1 0 0 5935 209 7724 0 0 139 11 1433 189 13 10 75 1 0
> Wed Dec 26 03:19:33 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
> r b swpd free inact active si so bi bo in cs us sy id wa st
> 1 0 0 5940 209 7724 0 0 138 11 1430 188 13 10 75 1 0
> Wed Dec 26 03:19:34 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
> r b swpd free inact active si so bi bo in cs us sy id wa st
> 2 1 0 5948 209 7725 0 0 138 11 1427 188 13 10 75 1 0
> Wed Dec 26 03:19:35 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
> r b swpd free inact active si so bi bo in cs us sy id wa st
> 0 0 0 5676 209 8005 0 0 138 11 1425 188 13 10 75 1 0
> Wed Dec 26 03:19:36 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
> r b swpd free inact active si so bi bo in cs us sy id wa st
> 0 1 0 5676 209 8006 0 0 137 11 1422 188 13 10 75 1 0
>
>
> Index: linux-2.6.23-mem_notify_v3/mm/mem_notify.c
> ===================================================================
> --- linux-2.6.23-mem_notify_v3.orig/mm/mem_notify.c
> +++ linux-2.6.23-mem_notify_v3/mm/mem_notify.c
> @@ -13,7 +13,11 @@
> #include <linux/percpu.h>
> #include <linux/timer.h>
>
> -static unsigned long mem_notify_status = 0;
> +struct mem_notify_file_info {
> + long last_event;
> +};
> +
> +atomic_t mem_notify_event = ATOMIC_INIT(0);
>
> static DECLARE_WAIT_QUEUE_HEAD(mem_wait);
> static DEFINE_PER_CPU(unsigned long, last_mem_notify) = INITIAL_JIFFIES;
> @@ -28,53 +32,81 @@ void mem_notify_userspace(void)
>
> if (time_after(now, target)) {
> __get_cpu_var(last_mem_notify) = now;
> - mem_notify_status = 1;
> + atomic_inc(&mem_notify_event);
> wake_up(&mem_wait);
> }
> }
>
> static int mem_notify_open(struct inode *inode, struct file *file)
> {
> - return 0;
> + struct mem_notify_file_info *ptr;
> + int err = 0;
> +
> + ptr = kmalloc(sizeof(*ptr), GFP_KERNEL);
> + if (!ptr) {
> + err = -ENOMEM;
> + goto out;
> + }
> +
> + ptr->last_event = atomic_read(&mem_notify_event);
> + file->private_data = ptr;
> +
> +out:
> + return err;
> }
>
> static int mem_notify_release(struct inode *inode, struct file *file)
> {
> + kfree(file->private_data);
> +
> return 0;
> }
>
> static unsigned int mem_notify_poll(struct file *file, poll_table *wait)
> {
> unsigned int val = 0;
> + struct zone *zone;
> + int pages_high, pages_free, pages_reserve;
> + struct mem_notify_file_info *file_info = file->private_data;
>
> poll_wait(file, &mem_wait, wait);
>
> - if (mem_notify_status) {
> - struct zone *zone;
> - int pages_high, pages_free, pages_reserve;
> -
> - mem_notify_status = 0;
By clearing mem_notify_status you avoid other processes going
through mem_notify_poll() from having a spurious notification if
memory is not exhausted. So the real point of notification is the
call from shrink_zone() (or shrink_active_list()), not the pages <
(pages_high+pages_reserve)*2 formula.
So something like the following sounds better:
- have your poll_wait_exclusive() patch in place
- pass a "status" parameter to mem_notify_userspace() and have it clear
mem_notify_status in case status is zero, so to stop sending POLLIN to processes.
- call mem_notify_userspace(0) from mm/vmscan.c when ZONE_NORMAL reclaim_mapped
is false (that seems a good indication that VM is out of trouble).
- test for mem_notify_status in mem_notify_poll(), but do not clear it.
- at mem_notify_userspace(), use wake_up_nr(number of mem_notify users/10) (10
meaning a small percentage of registered users).
> -
> - /* check if its not a spurious/stale notification */
> - pages_high = pages_free = pages_reserve = 0;
> - for_each_zone(zone) {
> - if (!populated_zone(zone) || is_highmem(zone))
> - continue;
> - pages_high += zone->pages_high;
> - pages_free += zone_page_state(zone, NR_FREE_PAGES);
> - pages_reserve += zone->lowmem_reserve[MAX_NR_ZONES-1];
> - }
> + if (file_info->last_event == atomic_read(&mem_notify_event))
> + goto out;
What exactly are you trying to deal with by using last_event?
>
> - if (pages_free < (pages_high+pages_reserve)*2)
> - val = POLLIN;
> + /* check if its not a spurious/stale notification */
> + pages_high = pages_free = pages_reserve = 0;
> + for_each_zone(zone) {
> + if (!populated_zone(zone) || is_highmem(zone))
> + continue;
> + pages_high += zone->pages_high;
> + pages_free += zone_page_state(zone, NR_FREE_PAGES);
> + pages_reserve += zone->lowmem_reserve[MAX_NR_ZONES-1];
> }
> -
> +
> + if (pages_free < (pages_high+pages_reserve)*2)
> + val = POLLIN;
> +
> +out:
> return val;
> }
>
> +static ssize_t mem_notify_read(struct file *file, char __user *buf,
> + size_t count, loff_t *ppos)
> +{
> + struct mem_notify_file_info *file_info = file->private_data;
> + if (!file_info)
> + return -EINVAL;
> +
> + file_info->last_event = atomic_read(&mem_notify_event);
> +
> + return 0;
> +}
Your then using read() to affect poll() behavior. I don't like it.
> +
> struct file_operations mem_notify_fops = {
> .open = mem_notify_open,
> .release = mem_notify_release,
> .poll = mem_notify_poll,
> + .read = mem_notify_read,
> };
> EXPORT_SYMBOL(mem_notify_fops);
>
>
>
>
>
>
> /kosaki
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-12-27 21:04 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-12-24 20:32 [PATCH] mem notifications v3 Marcelo Tosatti
2007-12-25 3:47 ` KOSAKI Motohiro
2007-12-25 4:56 ` [RFC] add poll_wait_exclusive() API KOSAKI Motohiro
2007-12-27 21:05 ` Marcelo Tosatti
2007-12-25 8:31 ` [PATCH] mem notifications v3 KOSAKI Motohiro
2007-12-25 10:31 ` [RFC][patch 1/2] mem notifications v3 improvement for large system KOSAKI Motohiro
2007-12-27 21:04 ` Marcelo Tosatti [this message]
2007-12-28 0:38 ` KOSAKI Motohiro
2007-12-25 10:31 ` [RFC][patch 2/2] " KOSAKI Motohiro
2007-12-25 10:41 ` KOSAKI Motohiro
2007-12-27 4:49 ` [RFC][patch] mem_notify more faster reduce load average KOSAKI Motohiro
2007-12-27 20:13 ` [PATCH] mem notifications v3 Marcelo Tosatti
2007-12-28 1:44 ` KOSAKI Motohiro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20071227210456.GB14823@dmt \
--to=marcelo@kvack.org \
--cc=akpm@linux-foundation.org \
--cc=daniel.spang@gmail.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox