linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Marcelo Tosatti <marcelo@kvack.org>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: "Marcelo Tosatti" <marcelo@kvack.org>,
	linux-mm@kvack.org, "Daniel Sp蚣g" <daniel.spang@gmail.com>,
	"Rik van Riel" <riel@redhat.com>,
	"Andrew Morton" <akpm@linux-foundation.org>
Subject: Re: [RFC][patch 1/2] mem notifications v3 improvement for large system
Date: Thu, 27 Dec 2007 16:04:56 -0500	[thread overview]
Message-ID: <20071227210456.GB14823@dmt> (raw)
In-Reply-To: <20071225182144.D26D.KOSAKI.MOTOHIRO@jp.fujitsu.com>

On Tue, Dec 25, 2007 at 07:31:14PM +0900, KOSAKI Motohiro wrote:
> Hi
> 
> I tried resolve too few notification problem.
> 
> mem_notify_status global variable mean wakeup 1 process.
> it is too few.
> 
> improvement step1:
> - add read method and wake up all process.
> 
> 1. run >10000 process test
>    console1# LANG=C; while [ 1 ] ;do sleep 1; date; vmstat 1 1 -S M -a; done
>    console2# sh m.sh 12500
> 
> result:
>  - wakeup all unoccur neither thundering herd nor soft lock-up.
>  - no swap out occured.
>  - but too much free ;-)
>    in my test-case, over 5GB freed.
> 
> 
> Wed Dec 26 03:19:20 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
>  r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy id wa st
>  7  0      0    605    209  12778    0    0   143    11 1458  183 14 10 76  1  0
> Wed Dec 26 03:19:21 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
>  r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy id wa st
>  6  0      0   2687    209  10769    0    0   142    11 1459  188 14 10 75  1  0
> Wed Dec 26 03:19:22 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
>  r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy id wa st
>  2  0      0   4560    209   8968    0    0   142    11 1459  191 14 10 75  1  0
> Wed Dec 26 03:19:23 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
>  r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy id wa st
>  1  0      0   5857    209   7724    0    0   142    11 1457  192 14 10 75  1  0
> Wed Dec 26 03:19:24 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
>  r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy id wa st
>  1  0      0   5872    209   7724    0    0   141    11 1454  192 14 10 75  1  0
> Wed Dec 26 03:19:25 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
>  r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy id wa st
>  1  0      0   5884    209   7724    0    0   141    11 1451  192 14 10 75  1  0
> Wed Dec 26 03:19:26 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
>  r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy id wa st
>  1  0      0   5895    209   7724    0    0   140    11 1448  191 14 10 75  1  0
> Wed Dec 26 03:19:27 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
>  r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy id wa st
>  1  0      0   5904    209   7724    0    0   140    11 1445  191 14 10 75  1  0
> Wed Dec 26 03:19:28 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
>  r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy id wa st
>  1  0      0   5912    209   7724    0    0   140    11 1442  190 13 10 75  1  0
> Wed Dec 26 03:19:29 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
>  r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy id wa st
>  1  0      0   5920    209   7724    0    0   139    11 1439  190 13 10 75  1  0
> Wed Dec 26 03:19:30 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
>  r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy id wa st
>  1  1      0   5929    209   7724    0    0   139    11 1436  189 13 10 75  1  0
> Wed Dec 26 03:19:32 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
>  r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy id wa st
>  1  0      0   5935    209   7724    0    0   139    11 1433  189 13 10 75  1  0
> Wed Dec 26 03:19:33 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
>  r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy id wa st
>  1  0      0   5940    209   7724    0    0   138    11 1430  188 13 10 75  1  0
> Wed Dec 26 03:19:34 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
>  r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy id wa st
>  2  1      0   5948    209   7725    0    0   138    11 1427  188 13 10 75  1  0
> Wed Dec 26 03:19:35 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
>  r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy id wa st
>  0  0      0   5676    209   8005    0    0   138    11 1425  188 13 10 75  1  0
> Wed Dec 26 03:19:36 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
>  r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy id wa st
>  0  1      0   5676    209   8006    0    0   137    11 1422  188 13 10 75  1  0
> 
> 
> Index: linux-2.6.23-mem_notify_v3/mm/mem_notify.c
> ===================================================================
> --- linux-2.6.23-mem_notify_v3.orig/mm/mem_notify.c
> +++ linux-2.6.23-mem_notify_v3/mm/mem_notify.c
> @@ -13,7 +13,11 @@
>  #include <linux/percpu.h>
>  #include <linux/timer.h>
> 
> -static unsigned long mem_notify_status = 0;
> +struct mem_notify_file_info {
> +        long          last_event;
> +};
> +
> +atomic_t mem_notify_event = ATOMIC_INIT(0);
> 
>  static DECLARE_WAIT_QUEUE_HEAD(mem_wait);
>  static DEFINE_PER_CPU(unsigned long, last_mem_notify) = INITIAL_JIFFIES;
> @@ -28,53 +32,81 @@ void mem_notify_userspace(void)
> 
>         if (time_after(now, target)) {
>                 __get_cpu_var(last_mem_notify) = now;
> -               mem_notify_status = 1;
> +               atomic_inc(&mem_notify_event);
>                 wake_up(&mem_wait);
>         }
>  }
> 
>  static int mem_notify_open(struct inode *inode, struct file *file)
>  {
> -       return 0;
> +        struct mem_notify_file_info *ptr;
> +        int    err = 0;
> +
> +        ptr = kmalloc(sizeof(*ptr), GFP_KERNEL);
> +        if (!ptr) {
> +                err = -ENOMEM;
> +                goto out;
> +        }
> +
> +        ptr->last_event = atomic_read(&mem_notify_event);
> +        file->private_data = ptr;
> +
> +out:
> +        return err;
>  }
> 
>  static int mem_notify_release(struct inode *inode, struct file *file)
>  {
> +        kfree(file->private_data);
> +
>         return 0;
>  }
> 
>  static unsigned int mem_notify_poll(struct file *file, poll_table *wait)
>  {
>         unsigned int val = 0;
> +       struct zone *zone;
> +       int pages_high, pages_free, pages_reserve;
> +        struct mem_notify_file_info *file_info = file->private_data;
> 
>         poll_wait(file, &mem_wait, wait);
> 
> -       if (mem_notify_status) {
> -               struct zone *zone;
> -               int pages_high, pages_free, pages_reserve;
> -
> -               mem_notify_status = 0;

By clearing mem_notify_status you avoid other processes going
through mem_notify_poll() from having a spurious notification if
memory is not exhausted. So the real point of notification is the
call from shrink_zone() (or shrink_active_list()), not the pages <
(pages_high+pages_reserve)*2 formula.

So something like the following sounds better:

- have your poll_wait_exclusive() patch in place
- pass a "status" parameter to mem_notify_userspace() and have it clear
mem_notify_status in case status is zero, so to stop sending POLLIN to processes.
- call mem_notify_userspace(0) from mm/vmscan.c when ZONE_NORMAL reclaim_mapped 
is false (that seems a good indication that VM is out of trouble).
- test for mem_notify_status in mem_notify_poll(), but do not clear it.
- at mem_notify_userspace(), use wake_up_nr(number of mem_notify users/10) (10
meaning a small percentage of registered users).


> -
> -               /* check if its not a spurious/stale notification */
> -               pages_high = pages_free = pages_reserve = 0;
> -               for_each_zone(zone) {
> -                       if (!populated_zone(zone) || is_highmem(zone))
> -                               continue;
> -                       pages_high += zone->pages_high;
> -                       pages_free += zone_page_state(zone, NR_FREE_PAGES);
> -                       pages_reserve += zone->lowmem_reserve[MAX_NR_ZONES-1];
> -               }
> +        if (file_info->last_event == atomic_read(&mem_notify_event))
> +                goto out;

What exactly are you trying to deal with by using last_event?

> 
> -               if (pages_free < (pages_high+pages_reserve)*2)
> -                       val = POLLIN;
> +       /* check if its not a spurious/stale notification */
> +       pages_high = pages_free = pages_reserve = 0;
> +       for_each_zone(zone) {
> +               if (!populated_zone(zone) || is_highmem(zone))
> +                       continue;
> +               pages_high += zone->pages_high;
> +               pages_free += zone_page_state(zone, NR_FREE_PAGES);
> +               pages_reserve += zone->lowmem_reserve[MAX_NR_ZONES-1];
>         }
> -
> +
> +       if (pages_free < (pages_high+pages_reserve)*2)
> +               val = POLLIN;
> +
> +out:
>         return val;
>  }
> 
> +static ssize_t mem_notify_read(struct file *file, char __user *buf,
> +                               size_t count, loff_t *ppos)
> +{
> +        struct mem_notify_file_info *file_info = file->private_data;
> +        if (!file_info)
> +                return -EINVAL;
> +
> +        file_info->last_event = atomic_read(&mem_notify_event);
> +
> +        return 0;
> +}

Your then using read() to affect poll() behavior. I don't like it.

> +
>  struct file_operations mem_notify_fops = {
>         .open = mem_notify_open,
>         .release = mem_notify_release,
>         .poll = mem_notify_poll,
> +        .read = mem_notify_read,
>  };
>  EXPORT_SYMBOL(mem_notify_fops);
> 
> 
> 
> 
> 
> 
> /kosaki

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2007-12-27 21:04 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-12-24 20:32 [PATCH] mem notifications v3 Marcelo Tosatti
2007-12-25  3:47 ` KOSAKI Motohiro
2007-12-25  4:56   ` [RFC] add poll_wait_exclusive() API KOSAKI Motohiro
2007-12-27 21:05     ` Marcelo Tosatti
2007-12-25  8:31   ` [PATCH] mem notifications v3 KOSAKI Motohiro
2007-12-25 10:31     ` [RFC][patch 1/2] mem notifications v3 improvement for large system KOSAKI Motohiro
2007-12-27 21:04       ` Marcelo Tosatti [this message]
2007-12-28  0:38         ` KOSAKI Motohiro
2007-12-25 10:31     ` [RFC][patch 2/2] " KOSAKI Motohiro
2007-12-25 10:41       ` KOSAKI Motohiro
2007-12-27  4:49       ` [RFC][patch] mem_notify more faster reduce load average KOSAKI Motohiro
2007-12-27 20:13   ` [PATCH] mem notifications v3 Marcelo Tosatti
2007-12-28  1:44     ` KOSAKI Motohiro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20071227210456.GB14823@dmt \
    --to=marcelo@kvack.org \
    --cc=akpm@linux-foundation.org \
    --cc=daniel.spang@gmail.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox