From: Hugh Dickins <hugh.dickins@tiscali.co.uk>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Nitin Gupta <ngupta@vflare.org>,
hongshin@gmail.com, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: [PATCH 7/9] swap_info: swap count continuations
Date: Fri, 16 Oct 2009 00:53:36 +0100 (BST) [thread overview]
Message-ID: <Pine.LNX.4.64.0910160016160.11643@sister.anvils> (raw)
In-Reply-To: <20091015123024.21ca3ef7.kamezawa.hiroyu@jp.fujitsu.com>
On Thu, 15 Oct 2009, KAMEZAWA Hiroyuki wrote:
> On Thu, 15 Oct 2009 01:56:01 +0100 (BST)
> Hugh Dickins <hugh.dickins@tiscali.co.uk> wrote:
>
> > This patch implements swap count continuations: when the count overflows,
> > a continuation page is allocated and linked to the original vmalloc'ed
> > map page, and this used to hold the continuation counts for that entry
> > and its neighbours. These continuation pages are seldom referenced:
> > the common paths all work on the original swap_map, only referring to
> > a continuation page when the low "digit" of a count is incremented or
> > decremented through SWAP_MAP_MAX.
>
> Hmm...maybe I don't understand the benefit of this style of data structure.
I can see that what I have there is not entirely transparent!
>
> Do we need fine grain chain ?
> Is array of "unsigned long" counter is bad ? (too big?)
I'll admit that that design just happens to be what first sprang
to my mind. It was only later, while implementing it, that I
wondered, hey, wouldn't it be a lot simpler just to have an
extension array of full counts?
It seemed to me (I'm not certain) that the char arrays I was
implementing were better suited to (use less memory in) a "normal"
workload in which the basic swap_map counts might overflow (but
I wonder how normal is any workload in which they overflow).
Whereas the array of full counts would be better suited to an
"aberrant" workload in which a mischievous user is actually
trying to maximize those counts. I decided to carry on with
the better solution for the (more) normal workload, the solution
less likely to gobble up more memory there than we've used before.
While I agree that the full count implementation would be simpler
and more obviously correct, I thought it was still going to involve
a linked list of pages (but "parallel" rather than "serial": each
of the pages assigned to one range of the base page).
Looking at what you propose below, maybe I'm not getting the details
right, but it looks as if you're having to do an order 2 or order 3
page allocation? Attempted with GFP_ATOMIC? I'd much rather stick
with order 0 pages, even if we do have to chain them to the base.
(Order 3 on 64-bit? A side issue which deterred me from the full
count approach, was the argumentation we'd get into over how big a
full count needs to be. I think, for so long as we have atomic_t
page count and page mapcount, an int is big enough for swap count.
But switching them to atomic_long_t may already be overdue.
Anyway, I liked how the char continuations avoided that issue.)
I'm reluctant to depart from what I have, now that it's tested;
but yes, we could perfectly well replace it by a different design,
it is very self-contained. The demands on this code are unusually
simple: it only has to manage counting up and counting down;
so it is very easily tested.
(The part I found difficult was getting rid of the __GFP_ZERO
I was allocating with originally.)
Hugh
>
> ==
> #define EXTENTION_OFFSET_INDEX(offset) (((offset) & PAGE_MASK)
> #define EXTENTION_OFFSET_MASK (~(PAGE_SIZE/sizeof(long) - 1))
> struct swapcount_extention_array {
> unsigned long *map[EXTEND_MAP_SIZE];
> };
>
> At adding continuation.
>
> int add_swap_count_continuation(swp_entry_t entry, gfp_t gfp_mask)
> {
> struct page *page;
> unsigned long *newmap, *map;
> struct swapcount_extention_array *array;
>
> newmap = __get_free_page(mask);
> si = swap_info_get(entry);
> array = kmalloc(sizeof(swapcount_extention_array);
>
> ....
> (If overflow)
> page = vmalloc_to_page(si->swap_map + offset);
> if (!PagePrivate(page)) {
> page->praivate = array;
> } else
> kfree(array);
>
> index = EXTENTION_OFFSET_INDEX(offset);
> pos = EXTENTION_OFFSET_MASK(offset);
>
> array = page->private;
> if (!array->map[index]) {
> array->map[index] = newmap;
> } else
> free_page(newmap);
> map = array->map[index];
> map[pos] += 1;
> mappage = vaddr_to_page(map);
> get_page(mappage); # increment page->count of array.
> ==
>
> Hmm? maybe I just don't like chain...
>
> Regards,
> -Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-10-15 23:53 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-10-15 0:44 [PATCH 0/9] swap_info and swap_map patches Hugh Dickins
2009-10-15 0:46 ` [PATCH 1/9] swap_info: private to swapfile.c Hugh Dickins
2009-10-15 14:57 ` Rik van Riel
2009-10-15 23:10 ` Nigel Cunningham
2009-10-16 0:28 ` Hugh Dickins
2009-10-15 0:48 ` [PATCH 2/9] swap_info: change to array of pointers Hugh Dickins
2009-10-15 2:11 ` KAMEZAWA Hiroyuki
2009-10-15 22:41 ` Hugh Dickins
2009-10-15 23:04 ` Hugh Dickins
2009-10-15 23:47 ` KAMEZAWA Hiroyuki
2009-10-15 23:46 ` KAMEZAWA Hiroyuki
2009-10-15 15:02 ` Rik van Riel
2009-10-15 0:49 ` [PATCH 3/9] swap_info: include first_swap_extent Hugh Dickins
2009-10-15 0:50 ` [PATCH 4/9] swap_info: miscellaneous minor cleanups Hugh Dickins
2009-10-15 2:19 ` KAMEZAWA Hiroyuki
2009-10-15 22:01 ` Hugh Dickins
2009-10-16 0:41 ` [PATCH 4/9 v2] " Hugh Dickins
2009-10-15 0:52 ` [PATCH 5/9] swap_info: SWAP_HAS_CACHE cleanups Hugh Dickins
2009-10-15 2:37 ` KAMEZAWA Hiroyuki
2009-10-15 22:08 ` Hugh Dickins
2009-10-15 0:53 ` [PATCH 6/9] swap_info: swap_map of chars not shorts Hugh Dickins
2009-10-15 2:44 ` KAMEZAWA Hiroyuki
2009-10-15 22:17 ` Hugh Dickins
2009-10-15 23:52 ` KAMEZAWA Hiroyuki
2009-10-15 0:56 ` [PATCH 7/9] swap_info: swap count continuations Hugh Dickins
2009-10-15 3:30 ` KAMEZAWA Hiroyuki
2009-10-15 19:45 ` Andrew Morton
2009-10-15 21:17 ` David Rientjes
2009-10-16 0:21 ` Hugh Dickins
2009-10-15 23:53 ` Hugh Dickins [this message]
2009-10-16 1:29 ` KAMEZAWA Hiroyuki
2009-10-16 2:24 ` Hugh Dickins
2009-10-16 4:06 ` KAMEZAWA Hiroyuki
2009-10-16 4:49 ` Nitin Gupta
2009-10-16 6:30 ` [PATCH] mm: call pte_unmap() against a proper pte (Re: [PATCH 7/9] swap_info: swap count continuations) Daisuke Nishimura
2009-10-16 8:01 ` KAMEZAWA Hiroyuki
2009-10-15 0:57 ` [PATCH 8/9] swap_info: note SWAP_MAP_SHMEM Hugh Dickins
2009-10-15 3:32 ` KAMEZAWA Hiroyuki
2009-10-15 22:23 ` Hugh Dickins
2009-10-16 0:04 ` KAMEZAWA Hiroyuki
2009-10-15 0:58 ` [PATCH 9/9] swap_info: reorder its fields Hugh Dickins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0910160016160.11643@sister.anvils \
--to=hugh.dickins@tiscali.co.uk \
--cc=akpm@linux-foundation.org \
--cc=hongshin@gmail.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ngupta@vflare.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox