linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Bresticker <abrestic@google.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"nishimura@mxp.nes.nec.co.jp" <nishimura@mxp.nes.nec.co.jp>,
	"bsingharora@gmail.com" <bsingharora@gmail.com>,
	Michal Hocko <mhocko@suse.cz>, Ying Han <yinghan@google.com>
Subject: Re: [PATCH v2] memcg: add vmscan_stat
Date: Fri, 15 Jul 2011 13:28:08 -0700	[thread overview]
Message-ID: <CAL1qeaHfP4p_ahuLRF5jfRa0Ttsj=PdS0bxiZVJiXeSi+Z3L3w@mail.gmail.com> (raw)
In-Reply-To: <CAL1qeaEoH_o4YfwnT5jsz3tyd=ZiGtUqMHQy7BMfLznjKvSZ_g@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 24929 bytes --]

And this one tracks the number of pages unmapped:
--

From: Andrew Bresticker <abrestic@google.com>
Date: Fri, 15 Jul 2011 11:46:40 -0700
Subject: [PATCH] vmscan: Track pages unmapped during page reclaim.

Record the number of pages unmapped during page reclaim in
memory.vmscan_stat.  Counters are broken down by type and
context like the other stats in memory.vmscan_stat.

Sample output:
$ mkdir /dev/cgroup/memory/1
$ echo 512m > /dev/cgroup/memory/1
$ echo $$ > /dev/cgroup/memory/1
$ pft -m 512m
$ cat /dev/cgroup/memory/1/memory.vmscan_stat
...
unmapped_pages_by_limit 67
unmapped_anon_pages_by_limit 0
unmapped_file_pages_by_limit 67
...
unmapped_pages_by_limit_under_hierarchy 67
unmapped_anon_pages_by_limit_under_hierarchy 0
unmapped_file_pages_by_limit_under_hierarchy 67

Signed-off-by: Andrew Bresticker <abrestic@google.com>
---
 include/linux/memcontrol.h |    1 +
 mm/memcontrol.c            |   12 ++++++++++++
 mm/vmscan.c                |    8 ++++++--
 3 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 4be907e..8d65b55 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -47,6 +47,7 @@ struct memcg_scanrecord {
  unsigned long nr_rotated[2]; /* the number of rotated pages */
  unsigned long nr_freed[2]; /* the number of freed pages */
  unsigned long nr_written[2]; /* the number of pages written back */
+ unsigned long nr_unmapped[2]; /* the number of pages unmapped */
  unsigned long elapsed; /* nsec of time elapsed while scanning */
 };

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 5ec2aa3..6b4fbbd 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -224,6 +224,9 @@ enum {
  WRITTEN,
  WRITTEN_ANON,
  WRITTEN_FILE,
+ UNMAPPED,
+ UNMAPPED_ANON,
+ UNMAPPED_FILE,
  ELAPSED,
  NR_SCANSTATS,
 };
@@ -247,6 +250,9 @@ const char *scanstat_string[NR_SCANSTATS] = {
  "written_pages",
  "written_anon_pages",
  "written_file_pages",
+ "unmapped_pages",
+ "unmapped_anon_pages",
+ "unmapped_file_pages",
  "elapsed_ns",
 };
 #define SCANSTAT_WORD_LIMIT    "_by_limit"
@@ -1692,6 +1698,10 @@ static void __mem_cgroup_record_scanstat(unsigned
long *stats,
  stats[WRITTEN_ANON] += rec->nr_written[0];
  stats[WRITTEN_FILE] += rec->nr_written[1];

+ stats[UNMAPPED] += rec->nr_unmapped[0] + rec->nr_unmapped[1];
+ stats[UNMAPPED_ANON] += rec->nr_unmapped[0];
+ stats[UNMAPPED_FILE] += rec->nr_unmapped[1];
+
         stats[ELAPSED] += rec->elapsed;
 }

@@ -1806,6 +1816,8 @@ static int mem_cgroup_hierarchical_reclaim(struct
mem_cgroup *root_mem,
                 rec.nr_freed[1] = 0;
  rec.nr_written[0] = 0;
  rec.nr_written[1] = 0;
+ rec.nr_unmapped[0] = 0;
+ rec.nr_unmapped[1] = 0;
                 rec.elapsed = 0;
  /* we use swappiness of local cgroup */
  if (check_soft) {
diff --git a/mm/vmscan.c b/mm/vmscan.c
index f73b96e..2d2bc99 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -728,6 +728,7 @@ static unsigned long shrink_page_list(struct list_head
*page_list,
  unsigned long nr_congested = 0;
  unsigned long nr_reclaimed = 0;
  unsigned long nr_written = 0;
+ unsigned long nr_unmapped = 0;

  cond_resched();

@@ -819,7 +820,8 @@ static unsigned long shrink_page_list(struct list_head
*page_list,
  case SWAP_MLOCK:
  goto cull_mlocked;
  case SWAP_SUCCESS:
- ; /* try to free the page below */
+ /* try to free the page below */
+ nr_unmapped++;
  }
  }

@@ -960,8 +962,10 @@ keep_lumpy:
  free_page_list(&free_pages);

  list_splice(&ret_pages, page_list);
- if (!scanning_global_lru(sc))
+ if (!scanning_global_lru(sc)) {
  sc->memcg_record->nr_written[file] += nr_written;
+ sc->memcg_record->nr_unmapped[file] += nr_unmapped;
+ }
  count_vm_events(PGACTIVATE, pgactivate);
  return nr_reclaimed;
 }
-- 
1.7.3.1

Thanks,
Andrew

On Fri, Jul 15, 2011 at 11:34 AM, Andrew Bresticker <abrestic@google.com>wrote:

> I've extended your patch to track write-back during page reclaim:
> ---
>
> From: Andrew Bresticker <abrestic@google.com>
> Date: Thu, 14 Jul 2011 17:56:48 -0700
> Subject: [PATCH] vmscan: Track number of pages written back during page
> reclaim.
>
> This tracks pages written out during page reclaim in memory.vmscan_stat
> and breaks it down by file vs. anon and context (like "scanned_pages",
> "rotated_pages", etc.).
>
> Example output:
> $ mkdir /dev/cgroup/memory/1
> $ echo 8m > /dev/cgroup/memory/1/memory.limit_in_bytes
> $ echo $$ > /dev/cgroup/memory/1/tasks
> $ dd if=/dev/urandom of=file_20g bs=4096 count=524288
> $ cat /dev/cgroup/memory/1/memory.vmscan_stat
> ...
> written_pages_by_limit 36
> written_anon_pages_by_limit 0
> written_file_pages_by_limit 36
> ...
> written_pages_by_limit_under_hierarchy 28
> written_anon_pages_by_limit_under_hierarchy 0
> written_file_pages_by_limit_under_hierarchy 28
>
> Signed-off-by: Andrew Bresticker <abrestic@google.com>
> ---
>  include/linux/memcontrol.h |    1 +
>  mm/memcontrol.c            |   12 ++++++++++++
>  mm/vmscan.c                |   10 +++++++---
>  3 files changed, 20 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 4b49edf..4be907e 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -46,6 +46,7 @@ struct memcg_scanrecord {
>   unsigned long nr_scanned[2]; /* the number of scanned pages */
>   unsigned long nr_rotated[2]; /* the number of rotated pages */
>   unsigned long nr_freed[2]; /* the number of freed pages */
> + unsigned long nr_written[2]; /* the number of pages written back */
>   unsigned long elapsed; /* nsec of time elapsed while scanning */
>  };
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 9bb6e93..5ec2aa3 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -221,6 +221,9 @@ enum {
>   FREED,
>   FREED_ANON,
>   FREED_FILE,
> + WRITTEN,
> + WRITTEN_ANON,
> + WRITTEN_FILE,
>   ELAPSED,
>   NR_SCANSTATS,
>  };
> @@ -241,6 +244,9 @@ const char *scanstat_string[NR_SCANSTATS] = {
>   "freed_pages",
>   "freed_anon_pages",
>   "freed_file_pages",
> + "written_pages",
> + "written_anon_pages",
> + "written_file_pages",
>   "elapsed_ns",
>  };
>  #define SCANSTAT_WORD_LIMIT    "_by_limit"
> @@ -1682,6 +1688,10 @@ static void __mem_cgroup_record_scanstat(unsigned
> long *stats,
>          stats[FREED_ANON] += rec->nr_freed[0];
>          stats[FREED_FILE] += rec->nr_freed[1];
>
> + stats[WRITTEN] += rec->nr_written[0] + rec->nr_written[1];
> + stats[WRITTEN_ANON] += rec->nr_written[0];
> + stats[WRITTEN_FILE] += rec->nr_written[1];
> +
>          stats[ELAPSED] += rec->elapsed;
>  }
>
> @@ -1794,6 +1804,8 @@ static int mem_cgroup_hierarchical_reclaim(struct
> mem_cgroup *root_mem,
>                  rec.nr_rotated[1] = 0;
>                  rec.nr_freed[0] = 0;
>                  rec.nr_freed[1] = 0;
> + rec.nr_written[0] = 0;
> + rec.nr_written[1] = 0;
>                  rec.elapsed = 0;
>   /* we use swappiness of local cgroup */
>   if (check_soft) {
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 8fb1abd..f73b96e 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -719,7 +719,7 @@ static noinline_for_stack void free_page_list(struct
> list_head *free_pages)
>   */
>  static unsigned long shrink_page_list(struct list_head *page_list,
>        struct zone *zone,
> -      struct scan_control *sc)
> +      struct scan_control *sc, int file)
>  {
>   LIST_HEAD(ret_pages);
>   LIST_HEAD(free_pages);
> @@ -727,6 +727,7 @@ static unsigned long shrink_page_list(struct list_head
> *page_list,
>   unsigned long nr_dirty = 0;
>   unsigned long nr_congested = 0;
>   unsigned long nr_reclaimed = 0;
> + unsigned long nr_written = 0;
>
>   cond_resched();
>
> @@ -840,6 +841,7 @@ static unsigned long shrink_page_list(struct list_head
> *page_list,
>   case PAGE_ACTIVATE:
>   goto activate_locked;
>   case PAGE_SUCCESS:
> + nr_written++;
>   if (PageWriteback(page))
>   goto keep_lumpy;
>   if (PageDirty(page))
> @@ -958,6 +960,8 @@ keep_lumpy:
>   free_page_list(&free_pages);
>
>   list_splice(&ret_pages, page_list);
> + if (!scanning_global_lru(sc))
> + sc->memcg_record->nr_written[file] += nr_written;
>   count_vm_events(PGACTIVATE, pgactivate);
>   return nr_reclaimed;
>  }
> @@ -1463,7 +1467,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct
> zone *zone,
>
>   spin_unlock_irq(&zone->lru_lock);
>
> - nr_reclaimed = shrink_page_list(&page_list, zone, sc);
> + nr_reclaimed = shrink_page_list(&page_list, zone, sc, file);
>
>   if (!scanning_global_lru(sc))
>   sc->memcg_record->nr_freed[file] += nr_reclaimed;
> @@ -1471,7 +1475,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct
> zone *zone,
>   /* Check if we should syncronously wait for writeback */
>   if (should_reclaim_stall(nr_taken, nr_reclaimed, priority, sc)) {
>   set_reclaim_mode(priority, sc, true);
> - nr_reclaimed += shrink_page_list(&page_list, zone, sc);
> + nr_reclaimed += shrink_page_list(&page_list, zone, sc, file);
>   }
>
>   local_irq_disable();
> --
> 1.7.3.1
>
> Thanks,
> Andrew
>
> On Wed, Jul 13, 2011 at 5:02 PM, KAMEZAWA Hiroyuki <
> kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
>> On Tue, 12 Jul 2011 16:02:02 -0700
>> Andrew Bresticker <abrestic@google.com> wrote:
>>
>> > On Mon, Jul 11, 2011 at 3:30 AM, KAMEZAWA Hiroyuki <
>> > kamezawa.hiroyu@jp.fujitsu.com> wrote:
>> >
>> > >
>> > > This patch is onto mmotm-0710... got bigger than expected ;(
>> > > ==
>> > > [PATCH] add memory.vmscan_stat
>> > >
>> > > commit log of commit 0ae5e89 " memcg: count the soft_limit reclaim
>> in..."
>> > > says it adds scanning stats to memory.stat file. But it doesn't
>> because
>> > > we considered we needed to make a concensus for such new APIs.
>> > >
>> > > This patch is a trial to add memory.scan_stat. This shows
>> > >  - the number of scanned pages(total, anon, file)
>> > >  - the number of rotated pages(total, anon, file)
>> > >  - the number of freed pages(total, anon, file)
>> > >  - the number of elaplsed time (including sleep/pause time)
>> > >
>> > >  for both of direct/soft reclaim.
>> > >
>> > > The biggest difference with oringinal Ying's one is that this file
>> > > can be reset by some write, as
>> > >
>> > >  # echo 0 ...../memory.scan_stat
>> > >
>> > > Example of output is here. This is a result after make -j 6 kernel
>> > > under 300M limit.
>> > >
>> > > [kamezawa@bluextal ~]$ cat /cgroup/memory/A/memory.scan_stat
>> > > [kamezawa@bluextal ~]$ cat /cgroup/memory/A/memory.vmscan_stat
>> > > scanned_pages_by_limit 9471864
>> > > scanned_anon_pages_by_limit 6640629
>> > > scanned_file_pages_by_limit 2831235
>> > > rotated_pages_by_limit 4243974
>> > > rotated_anon_pages_by_limit 3971968
>> > > rotated_file_pages_by_limit 272006
>> > > freed_pages_by_limit 2318492
>> > > freed_anon_pages_by_limit 962052
>> > > freed_file_pages_by_limit 1356440
>> > > elapsed_ns_by_limit 351386416101
>> > > scanned_pages_by_system 0
>> > > scanned_anon_pages_by_system 0
>> > > scanned_file_pages_by_system 0
>> > > rotated_pages_by_system 0
>> > > rotated_anon_pages_by_system 0
>> > > rotated_file_pages_by_system 0
>> > > freed_pages_by_system 0
>> > > freed_anon_pages_by_system 0
>> > > freed_file_pages_by_system 0
>> > > elapsed_ns_by_system 0
>> > > scanned_pages_by_limit_under_hierarchy 9471864
>> > > scanned_anon_pages_by_limit_under_hierarchy 6640629
>> > > scanned_file_pages_by_limit_under_hierarchy 2831235
>> > > rotated_pages_by_limit_under_hierarchy 4243974
>> > > rotated_anon_pages_by_limit_under_hierarchy 3971968
>> > > rotated_file_pages_by_limit_under_hierarchy 272006
>> > > freed_pages_by_limit_under_hierarchy 2318492
>> > > freed_anon_pages_by_limit_under_hierarchy 962052
>> > > freed_file_pages_by_limit_under_hierarchy 1356440
>> > > elapsed_ns_by_limit_under_hierarchy 351386416101
>> > > scanned_pages_by_system_under_hierarchy 0
>> > > scanned_anon_pages_by_system_under_hierarchy 0
>> > > scanned_file_pages_by_system_under_hierarchy 0
>> > > rotated_pages_by_system_under_hierarchy 0
>> > > rotated_anon_pages_by_system_under_hierarchy 0
>> > > rotated_file_pages_by_system_under_hierarchy 0
>> > > freed_pages_by_system_under_hierarchy 0
>> > > freed_anon_pages_by_system_under_hierarchy 0
>> > > freed_file_pages_by_system_under_hierarchy 0
>> > > elapsed_ns_by_system_under_hierarchy 0
>> > >
>> > >
>> > > total_xxxx is for hierarchy management.
>> > >
>> > > This will be useful for further memcg developments and need to be
>> > > developped before we do some complicated rework on LRU/softlimit
>> > > management.
>> > >
>> > > This patch adds a new struct memcg_scanrecord into scan_control
>> struct.
>> > > sc->nr_scanned at el is not designed for exporting information. For
>> > > example,
>> > > nr_scanned is reset frequentrly and incremented +2 at scanning mapped
>> > > pages.
>> > >
>> > > For avoiding complexity, I added a new param in scan_control which is
>> for
>> > > exporting scanning score.
>> > >
>> > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> > >
>> > > Changelog:
>> > >  - renamed as vmscan_stat
>> > >  - handle file/anon
>> > >  - added "rotated"
>> > >  - changed names of param in vmscan_stat.
>> > > ---
>> > >  Documentation/cgroups/memory.txt |   85 +++++++++++++++++++
>> > >  include/linux/memcontrol.h       |   19 ++++
>> > >  include/linux/swap.h             |    6 -
>> > >  mm/memcontrol.c                  |  172
>> > > +++++++++++++++++++++++++++++++++++++--
>> > >  mm/vmscan.c                      |   39 +++++++-
>> > >  5 files changed, 303 insertions(+), 18 deletions(-)
>> > >
>> > > Index: mmotm-0710/Documentation/cgroups/memory.txt
>> > > ===================================================================
>> > > --- mmotm-0710.orig/Documentation/cgroups/memory.txt
>> > > +++ mmotm-0710/Documentation/cgroups/memory.txt
>> > > @@ -380,7 +380,7 @@ will be charged as a new owner of it.
>> > >
>> > >  5.2 stat file
>> > >
>> > > -memory.stat file includes following statistics
>> > > +5.2.1 memory.stat file includes following statistics
>> > >
>> > >  # per-memory cgroup local status
>> > >  cache          - # of bytes of page cache memory.
>> > > @@ -438,6 +438,89 @@ Note:
>> > >         file_mapped is accounted only when the memory cgroup is owner
>> of
>> > > page
>> > >         cache.)
>> > >
>> > > +5.2.2 memory.vmscan_stat
>> > > +
>> > > +memory.vmscan_stat includes statistics information for memory
>> scanning and
>> > > +freeing, reclaiming. The statistics shows memory scanning information
>> > > since
>> > > +memory cgroup creation and can be reset to 0 by writing 0 as
>> > > +
>> > > + #echo 0 > ../memory.vmscan_stat
>> > > +
>> > > +This file contains following statistics.
>> > > +
>> > > +[param]_[file_or_anon]_pages_by_[reason]_[under_heararchy]
>> > > +[param]_elapsed_ns_by_[reason]_[under_hierarchy]
>> > > +
>> > > +For example,
>> > > +
>> > > +  scanned_file_pages_by_limit indicates the number of scanned
>> > > +  file pages at vmscan.
>> > > +
>> > > +Now, 3 parameters are supported
>> > > +
>> > > +  scanned - the number of pages scanned by vmscan
>> > > +  rotated - the number of pages activated at vmscan
>> > > +  freed   - the number of pages freed by vmscan
>> > > +
>> > > +If "rotated" is high against scanned/freed, the memcg seems busy.
>> > > +
>> > > +Now, 2 reason are supported
>> > > +
>> > > +  limit - the memory cgroup's limit
>> > > +  system - global memory pressure + softlimit
>> > > +           (global memory pressure not under softlimit is not handled
>> now)
>> > > +
>> > > +When under_hierarchy is added in the tail, the number indicates the
>> > > +total memcg scan of its children and itself.
>> > > +
>> > > +elapsed_ns is a elapsed time in nanosecond. This may include sleep
>> time
>> > > +and not indicates CPU usage. So, please take this as just showing
>> > > +latency.
>> > > +
>> > > +Here is an example.
>> > > +
>> > > +# cat /cgroup/memory/A/memory.vmscan_stat
>> > > +scanned_pages_by_limit 9471864
>> > > +scanned_anon_pages_by_limit 6640629
>> > > +scanned_file_pages_by_limit 2831235
>> > > +rotated_pages_by_limit 4243974
>> > > +rotated_anon_pages_by_limit 3971968
>> > > +rotated_file_pages_by_limit 272006
>> > > +freed_pages_by_limit 2318492
>> > > +freed_anon_pages_by_limit 962052
>> > > +freed_file_pages_by_limit 1356440
>> > > +elapsed_ns_by_limit 351386416101
>> > > +scanned_pages_by_system 0
>> > > +scanned_anon_pages_by_system 0
>> > > +scanned_file_pages_by_system 0
>> > > +rotated_pages_by_system 0
>> > > +rotated_anon_pages_by_system 0
>> > > +rotated_file_pages_by_system 0
>> > > +freed_pages_by_system 0
>> > > +freed_anon_pages_by_system 0
>> > > +freed_file_pages_by_system 0
>> > > +elapsed_ns_by_system 0
>> > > +scanned_pages_by_limit_under_hierarchy 9471864
>> > > +scanned_anon_pages_by_limit_under_hierarchy 6640629
>> > > +scanned_file_pages_by_limit_under_hierarchy 2831235
>> > > +rotated_pages_by_limit_under_hierarchy 4243974
>> > > +rotated_anon_pages_by_limit_under_hierarchy 3971968
>> > > +rotated_file_pages_by_limit_under_hierarchy 272006
>> > > +freed_pages_by_limit_under_hierarchy 2318492
>> > > +freed_anon_pages_by_limit_under_hierarchy 962052
>> > > +freed_file_pages_by_limit_under_hierarchy 1356440
>> > > +elapsed_ns_by_limit_under_hierarchy 351386416101
>> > > +scanned_pages_by_system_under_hierarchy 0
>> > > +scanned_anon_pages_by_system_under_hierarchy 0
>> > > +scanned_file_pages_by_system_under_hierarchy 0
>> > > +rotated_pages_by_system_under_hierarchy 0
>> > > +rotated_anon_pages_by_system_under_hierarchy 0
>> > > +rotated_file_pages_by_system_under_hierarchy 0
>> > > +freed_pages_by_system_under_hierarchy 0
>> > > +freed_anon_pages_by_system_under_hierarchy 0
>> > > +freed_file_pages_by_system_under_hierarchy 0
>> > > +elapsed_ns_by_system_under_hierarchy 0
>> > > +
>> > >  5.3 swappiness
>> > >
>> > >  Similar to /proc/sys/vm/swappiness, but affecting a hierarchy of
>> groups
>> > > only.
>> > > Index: mmotm-0710/include/linux/memcontrol.h
>> > > ===================================================================
>> > > --- mmotm-0710.orig/include/linux/memcontrol.h
>> > > +++ mmotm-0710/include/linux/memcontrol.h
>> > > @@ -39,6 +39,16 @@ extern unsigned long mem_cgroup_isolate_
>> > >                                        struct mem_cgroup *mem_cont,
>> > >                                        int active, int file);
>> > >
>> > > +struct memcg_scanrecord {
>> > > +       struct mem_cgroup *mem; /* scanend memory cgroup */
>> > > +       struct mem_cgroup *root; /* scan target hierarchy root */
>> > > +       int context;            /* scanning context (see memcontrol.c)
>> */
>> > > +       unsigned long nr_scanned[2]; /* the number of scanned pages */
>> > > +       unsigned long nr_rotated[2]; /* the number of rotated pages */
>> > > +       unsigned long nr_freed[2]; /* the number of freed pages */
>> > > +       unsigned long elapsed; /* nsec of time elapsed while scanning
>> */
>> > > +};
>> > > +
>> > >  #ifdef CONFIG_CGROUP_MEM_RES_CTLR
>> > >  /*
>> > >  * All "charge" functions with gfp_mask should use GFP_KERNEL or
>> > > @@ -117,6 +127,15 @@ mem_cgroup_get_reclaim_stat_from_page(st
>> > >  extern void mem_cgroup_print_oom_info(struct mem_cgroup *memcg,
>> > >                                        struct task_struct *p);
>> > >
>> > > +extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup
>> *mem,
>> > > +                                                 gfp_t gfp_mask, bool
>> > > noswap,
>> > > +                                                 struct
>> memcg_scanrecord
>> > > *rec);
>> > > +extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup
>> *mem,
>> > > +                                               gfp_t gfp_mask, bool
>> > > noswap,
>> > > +                                               struct zone *zone,
>> > > +                                               struct
>> memcg_scanrecord
>> > > *rec,
>> > > +                                               unsigned long
>> *nr_scanned);
>> > > +
>> > >  #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
>> > >  extern int do_swap_account;
>> > >  #endif
>> > > Index: mmotm-0710/include/linux/swap.h
>> > > ===================================================================
>> > > --- mmotm-0710.orig/include/linux/swap.h
>> > > +++ mmotm-0710/include/linux/swap.h
>> > > @@ -253,12 +253,6 @@ static inline void lru_cache_add_file(st
>> > >  /* linux/mm/vmscan.c */
>> > >  extern unsigned long try_to_free_pages(struct zonelist *zonelist, int
>> > > order,
>> > >                                        gfp_t gfp_mask, nodemask_t
>> *mask);
>> > > -extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup
>> *mem,
>> > > -                                                 gfp_t gfp_mask, bool
>> > > noswap);
>> > > -extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup
>> *mem,
>> > > -                                               gfp_t gfp_mask, bool
>> > > noswap,
>> > > -                                               struct zone *zone,
>> > > -                                               unsigned long
>> *nr_scanned);
>> > >  extern int __isolate_lru_page(struct page *page, int mode, int file);
>> > >  extern unsigned long shrink_all_memory(unsigned long nr_pages);
>> > >  extern int vm_swappiness;
>> > > Index: mmotm-0710/mm/memcontrol.c
>> > > ===================================================================
>> > > --- mmotm-0710.orig/mm/memcontrol.c
>> > > +++ mmotm-0710/mm/memcontrol.c
>> > > @@ -204,6 +204,50 @@ struct mem_cgroup_eventfd_list {
>> > >  static void mem_cgroup_threshold(struct mem_cgroup *mem);
>> > >  static void mem_cgroup_oom_notify(struct mem_cgroup *mem);
>> > >
>> > > +enum {
>> > > +       SCAN_BY_LIMIT,
>> > > +       SCAN_BY_SYSTEM,
>> > > +       NR_SCAN_CONTEXT,
>> > > +       SCAN_BY_SHRINK, /* not recorded now */
>> > > +};
>> > > +
>> > > +enum {
>> > > +       SCAN,
>> > > +       SCAN_ANON,
>> > > +       SCAN_FILE,
>> > > +       ROTATE,
>> > > +       ROTATE_ANON,
>> > > +       ROTATE_FILE,
>> > > +       FREED,
>> > > +       FREED_ANON,
>> > > +       FREED_FILE,
>> > > +       ELAPSED,
>> > > +       NR_SCANSTATS,
>> > > +};
>> > > +
>> > > +struct scanstat {
>> > > +       spinlock_t      lock;
>> > > +       unsigned long   stats[NR_SCAN_CONTEXT][NR_SCANSTATS];
>> > > +       unsigned long   rootstats[NR_SCAN_CONTEXT][NR_SCANSTATS];
>> > > +};
>> > >
>> >
>> > I'm working on a similar effort with Ying here at Google and so far
>> we've
>> > been using per-cpu counters for these statistics instead of spin-lock
>> > protected counters.  Clearly the spin-lock protected counters have less
>> > memory overhead and make reading the stat file faster, but our concern
>> is
>> > that this method is inconsistent with the other memory stat files such
>> > /proc/vmstat and /dev/cgroup/memory/.../memory.stat.  Is there any
>> > particular reason you chose to use spin-lock protected counters instead
>> of
>> > per-cpu counters?
>> >
>>
>> In my experience, if we do "batch" enouch, it works always better than
>> percpu-counter. percpu counter is effective when batching is difficult.
>> This patch's implementation does enough batching and it's much coarse
>> grained than percpu counter. Then, this patch is better than percpu.
>>
>>
>> > I've also modified your patch to use per-cpu counters instead of
>> spin-lock
>> > protected counters.  I tested it by doing streaming I/O from a ramdisk:
>> >
>> > $ mke2fs /dev/ram1
>> > $ mkdir /tmp/swapram
>> > $ mkdir /tmp/swapram/ram1
>> > $ mount -t ext2 /dev/ram1 /tmp/swapram/ram1
>> > $ dd if=/dev/urandom of=/tmp/swapram/ram1/file_16m bs=4096 count=4096
>> > $ mkdir /dev/cgroup/memory/1
>> > $ echo 8m > /dev/cgroup/memory/1
>> > $ ./ramdisk_load.sh 7
>> > $ echo $$ > /dev/cgroup/memory/1/tasks
>> > $ time for ((z=0; z<=2000; z++)); do cat /tmp/swapram/ram1/file_16m >
>> > /dev/zero; done
>> >
>> > Where ramdisk_load.sh is:
>> > for ((i=0; i<=$1; i++))
>> > do
>> >   echo $$ >/dev/cgroup/memory/1/tasks
>> >   for ((z=0; z<=2000; z++)); do cat /tmp/swapram/ram1/file_16m >
>> /dev/zero;
>> > done &
>> > done
>> >
>> > Surprisingly, the per-cpu counters perform worse than the spin-lock
>> > protected counters.  Over 10 runs of the test above, the per-cpu
>> counters
>> > were 1.60% slower in both real time and sys time.  I'm wondering if you
>> have
>> > any insight as to why this is.  I can provide my diff against your patch
>> if
>> > necessary.
>> >
>>
>> The percpu counte works effectively only when we use +1/-1 at each change
>> of
>> counters. It uses "batch" to merge the per-cpu value to the counter.
>> I think you use default "batch" value but the scan/rotate/free/elapsed
>> value
>> is always larger than "batch" and you just added memory overhead and "if"
>> to pure spinlock counters.
>>
>> Determining this "batch" threshold for percpu counter is difficult.
>>
>> Thanks,
>> -Kame
>>
>>
>>
>

[-- Attachment #2: Type: text/html, Size: 37915 bytes --]

  reply	other threads:[~2011-07-15 20:28 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-11 10:30 KAMEZAWA Hiroyuki
2011-07-12 23:02 ` Andrew Bresticker
2011-07-14  0:02   ` KAMEZAWA Hiroyuki
2011-07-15 18:34     ` Andrew Bresticker
2011-07-15 20:28       ` Andrew Bresticker [this message]
2011-07-20  6:00         ` KAMEZAWA Hiroyuki
2011-07-20  5:58       ` KAMEZAWA Hiroyuki
2011-07-18 21:00 ` Andrew Bresticker
2011-07-20  6:03   ` KAMEZAWA Hiroyuki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAL1qeaHfP4p_ahuLRF5jfRa0Ttsj=PdS0bxiZVJiXeSi+Z3L3w@mail.gmail.com' \
    --to=abrestic@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=bsingharora@gmail.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=nishimura@mxp.nes.nec.co.jp \
    --cc=yinghan@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox