From: Andy Whitcroft <apw@shadowen.org>
To: KUROSAWA Takahiro <kurosawa@valinux.co.jp>
Cc: ckrm-tech@lists.sourceforge.net, linux-mm@kvack.org
Subject: Re: [PATCH 1/2] Add the pzone
Date: Thu, 19 Jan 2006 18:04:43 +0000 [thread overview]
Message-ID: <43CFD4BB.4070704@shadowen.org> (raw)
In-Reply-To: <20060119080413.24736.27946.sendpatchset@debian>
KUROSAWA Takahiro wrote:
> This patch implements the pzone (pseudo zone). A pzone can be used
> for reserving pages in a zone. Pzones are implemented by extending
> the zone structure and act almost the same as the conventional zones;
> we can specify pzones in a zonelist for __alloc_pages() and the vmscan
> code works on pzones with few modifications.
>
> Signed-off-by: KUROSAWA Takahiro <kurosawa@valinux.co.jp>
[...]
> -/* Page flags: | [SECTION] | [NODE] | ZONE | ... | FLAGS | */
> -#define SECTIONS_PGOFF ((sizeof(unsigned long)*8) - SECTIONS_WIDTH)
> +/* Page flags: | [PZONE] | [SECTION] | [NODE] | ZONE | ... | FLAGS | */
> +#define PZONE_BIT_PGOFF ((sizeof(unsigned long)*8) - PZONE_BIT_WIDTH)
> +#define SECTIONS_PGOFF (PZONE_BIT_PGOFF - SECTIONS_WIDTH)
> #define NODES_PGOFF (SECTIONS_PGOFF - NODES_WIDTH)
> #define ZONES_PGOFF (NODES_PGOFF - ZONES_WIDTH)
In general this PZONE bit is really a part of the zone number. Much of
the order of these bits is chosen to obtain the cheapest extraction of
the most used bits, particularly the node/zone conbination or section
number on the left. I would say put the PZONE_BIT next to ZONE
probabally to the right of it? [See below for more reasons to put it
there.]
> @@ -431,6 +438,7 @@ void put_page(struct page *page);
> * sections we define the shift as 0; that plus a 0 mask ensures
> * the compiler will optimise away reference to them.
> */
> +#define PZONE_BIT_PGSHIFT (PZONE_BIT_PGOFF * (PZONE_BIT_WIDTH != 0))
> #define SECTIONS_PGSHIFT (SECTIONS_PGOFF * (SECTIONS_WIDTH != 0))
> #define NODES_PGSHIFT (NODES_PGOFF * (NODES_WIDTH != 0))
> #define ZONES_PGSHIFT (ZONES_PGOFF * (ZONES_WIDTH != 0))
> @@ -443,10 +451,11 @@ void put_page(struct page *page);
> #endif
> #define ZONETABLE_PGSHIFT ZONES_PGSHIFT
>
> -#if SECTIONS_WIDTH+NODES_WIDTH+ZONES_WIDTH > FLAGS_RESERVED
> -#error SECTIONS_WIDTH+NODES_WIDTH+ZONES_WIDTH > FLAGS_RESERVED
> +#if PZONE_BIT_WIDTH+SECTIONS_WIDTH+NODES_WIDTH+ZONES_WIDTH > FLAGS_RESERVED
> +#error PZONE_BIT_WIDTH+SECTIONS_WIDTH+NODES_WIDTH+ZONES_WIDTH > FLAGS_RESERVED
> #endif
Do we have any bits left in the reserve on 32 bit machines? The reserve
at last look was only 8 bits and there was little if any headroom in the
rest of the flags word to extend it; if memory serves at least 22 of the
24 remaining bits was accounted for. Has this been tested on any such
machines?
> +#define PZONE_BIT_MASK ((1UL << PZONE_BIT_WIDTH) - 1)
> #define ZONES_MASK ((1UL << ZONES_WIDTH) - 1)
> #define NODES_MASK ((1UL << NODES_WIDTH) - 1)
> #define SECTIONS_MASK ((1UL << SECTIONS_WIDTH) - 1)
[...]
> +#ifdef CONFIG_PSEUDO_ZONE
> +static inline int page_in_pzone(struct page *page)
> +{
> + return (page->flags >> PZONE_BIT_PGSHIFT) & PZONE_BIT_MASK;
> +}
> +
> +static inline struct zone *page_zone(struct page *page)
> +{
> + int idx;
> +
> + idx = (page->flags >> ZONETABLE_PGSHIFT) & ZONETABLE_MASK;
> + if (page_in_pzone(page))
> + return pzone_table[idx].zone;
> + return zone_table[idx];
> +}
Could we not do this all without changing the structure of the zone
table at all by placing the PZONE_BIT either to the left or right of the
ZONE in the flags. Then ZONETABLE_MASK could be extended to cover it
when pzone's are enabled and the pzone's could be added to zone_table
instead of their own pzone_table? This would mean the code above could
be unmodified and much simpler.
> +
> +static inline unsigned long page_to_nid(struct page *page)
> +{
> + return page_zone(page)->zone_pgdat->node_id;
> +}
[...]
> +#ifdef CONFIG_PSEUDO_ZONE
> +#define MAX_NR_PZONES 1024
You seem to be allowing for 1024 pzone's here? But in
pzone_setup_page_flags() you place the pzone_idx (an offset into the
pzone_table) into the ZONE field of the page flags. This field is
typically only two bits wide? I don't see this being increased in this
patch, nor is there space for it generally to get much bigger not on 32
bit kernels anyhow (see comments about bits earlier)?
#define ZONES_SHIFT 2 /* ceil(log2(MAX_NR_ZONES)) */
Cheers.
-apw
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2006-01-19 18:04 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-01-19 8:04 [PATCH 0/2] Pzone based CKRM memory resource controller KUROSAWA Takahiro
2006-01-19 8:04 ` [PATCH 1/2] Add the pzone KUROSAWA Takahiro
2006-01-19 18:04 ` Andy Whitcroft [this message]
2006-01-19 23:42 ` KUROSAWA Takahiro
2006-01-20 9:17 ` Andy Whitcroft
2006-01-20 7:08 ` KAMEZAWA Hiroyuki
2006-01-20 8:22 ` KUROSAWA Takahiro
2006-01-20 8:30 ` KAMEZAWA Hiroyuki
2006-01-19 8:04 ` [PATCH 2/2] Add CKRM memory resource controller using pzones KUROSAWA Takahiro
2006-01-31 2:30 ` [PATCH 0/8] Pzone based CKRM memory resource controller KUROSAWA Takahiro
2006-01-31 2:30 ` [PATCH 1/8] Add the __GFP_NOLRU flag KUROSAWA Takahiro
2006-01-31 18:18 ` [ckrm-tech] " Dave Hansen
2006-02-01 5:06 ` KUROSAWA Takahiro
2006-01-31 2:30 ` [PATCH 2/8] Keep the number of zones while zone iterator loop KUROSAWA Takahiro
2006-01-31 2:30 ` [PATCH 3/8] Add for_each_zone_in_node macro KUROSAWA Takahiro
2006-01-31 2:30 ` [PATCH 4/8] Extract zone specific routines as functions KUROSAWA Takahiro
2006-01-31 2:30 ` [PATCH 5/8] Add the pzone_create() function KUROSAWA Takahiro
2006-01-31 2:30 ` [PATCH 6/8] Add the pzone_destroy() function KUROSAWA Takahiro
2006-01-31 2:30 ` [PATCH 7/8] Make the number of pages in pzones resizable KUROSAWA Takahiro
2006-01-31 2:30 ` [PATCH 8/8] Add a CKRM memory resource controller using pzones KUROSAWA Takahiro
2006-02-01 2:58 ` [ckrm-tech] [PATCH 0/8] Pzone based CKRM memory resource controller chandra seetharaman
2006-02-01 5:39 ` KUROSAWA Takahiro
2006-02-01 6:16 ` Hirokazu Takahashi
2006-02-02 1:26 ` chandra seetharaman
2006-02-02 3:54 ` KUROSAWA Takahiro
2006-02-03 0:37 ` chandra seetharaman
2006-02-03 0:51 ` KUROSAWA Takahiro
2006-02-03 1:01 ` chandra seetharaman
2006-02-01 3:07 ` chandra seetharaman
2006-02-01 5:54 ` KUROSAWA Takahiro
2006-02-03 1:33 ` KUROSAWA Takahiro
2006-02-03 9:37 ` KUROSAWA Takahiro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=43CFD4BB.4070704@shadowen.org \
--to=apw@shadowen.org \
--cc=ckrm-tech@lists.sourceforge.net \
--cc=kurosawa@valinux.co.jp \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox