From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Nitin Gupta <ngupta@vflare.org>,
hongshin@gmail.com, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: [PATCH 7/9] swap_info: swap count continuations
Date: Fri, 16 Oct 2009 10:29:51 +0900 [thread overview]
Message-ID: <20091016102951.a4f66a19.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0910160016160.11643@sister.anvils>
On Fri, 16 Oct 2009 00:53:36 +0100 (BST)
Hugh Dickins <hugh.dickins@tiscali.co.uk> wrote:
> On Thu, 15 Oct 2009, KAMEZAWA Hiroyuki wrote:
> > On Thu, 15 Oct 2009 01:56:01 +0100 (BST)
> > Hugh Dickins <hugh.dickins@tiscali.co.uk> wrote:
> >
> > > This patch implements swap count continuations: when the count overflows,
> > > a continuation page is allocated and linked to the original vmalloc'ed
> > > map page, and this used to hold the continuation counts for that entry
> > > and its neighbours. These continuation pages are seldom referenced:
> > > the common paths all work on the original swap_map, only referring to
> > > a continuation page when the low "digit" of a count is incremented or
> > > decremented through SWAP_MAP_MAX.
> >
> > Hmm...maybe I don't understand the benefit of this style of data structure.
>
> I can see that what I have there is not entirely transparent!
>
> >
> > Do we need fine grain chain ?
> > Is array of "unsigned long" counter is bad ? (too big?)
>
> I'll admit that that design just happens to be what first sprang
> to my mind. It was only later, while implementing it, that I
> wondered, hey, wouldn't it be a lot simpler just to have an
> extension array of full counts?
>
> It seemed to me (I'm not certain) that the char arrays I was
> implementing were better suited to (use less memory in) a "normal"
> workload in which the basic swap_map counts might overflow (but
> I wonder how normal is any workload in which they overflow).
> Whereas the array of full counts would be better suited to an
> "aberrant" workload in which a mischievous user is actually
> trying to maximize those counts. I decided to carry on with
> the better solution for the (more) normal workload, the solution
> less likely to gobble up more memory there than we've used before.
>
> While I agree that the full count implementation would be simpler
> and more obviously correct, I thought it was still going to involve
> a linked list of pages (but "parallel" rather than "serial": each
> of the pages assigned to one range of the base page).
>
> Looking at what you propose below, maybe I'm not getting the details
> right, but it looks as if you're having to do an order 2 or order 3
> page allocation? Attempted with GFP_ATOMIC? I'd much rather stick
> with order 0 pages, even if we do have to chain them to the base.
>
order-0 allocation per array entry.
1st leve map 2nd level map
map -> array[0] -> map => PAGE_SIZE map.
[1] -> map => PAGE_SIZE map.
...
[7] -> map == NULL if not used.
> (Order 3 on 64-bit? A side issue which deterred me from the full
> count approach, was the argumentation we'd get into over how big a
> full count needs to be. I think, for so long as we have atomic_t
> page count and page mapcount, an int is big enough for swap count.
I see.
> But switching them to atomic_long_t may already be overdue.
> Anyway, I liked how the char continuations avoided that issue.)
>
My concern is that small numbers of swap_map[] which has too much refcnt
can consume too much pages.
If an entry is shared by 65535, 65535/128 = 512 page will be used.
(I'm sorry if I don't undestand implementation correctly.)
> I'm reluctant to depart from what I have, now that it's tested;
> but yes, we could perfectly well replace it by a different design,
> it is very self-contained. The demands on this code are unusually
> simple: it only has to manage counting up and counting down;
> so it is very easily tested.
>
Okay, let's start with this.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-10-16 1:37 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-10-15 0:44 [PATCH 0/9] swap_info and swap_map patches Hugh Dickins
2009-10-15 0:46 ` [PATCH 1/9] swap_info: private to swapfile.c Hugh Dickins
2009-10-15 14:57 ` Rik van Riel
2009-10-15 23:10 ` Nigel Cunningham
2009-10-16 0:28 ` Hugh Dickins
2009-10-15 0:48 ` [PATCH 2/9] swap_info: change to array of pointers Hugh Dickins
2009-10-15 2:11 ` KAMEZAWA Hiroyuki
2009-10-15 22:41 ` Hugh Dickins
2009-10-15 23:04 ` Hugh Dickins
2009-10-15 23:47 ` KAMEZAWA Hiroyuki
2009-10-15 23:46 ` KAMEZAWA Hiroyuki
2009-10-15 15:02 ` Rik van Riel
2009-10-15 0:49 ` [PATCH 3/9] swap_info: include first_swap_extent Hugh Dickins
2009-10-15 0:50 ` [PATCH 4/9] swap_info: miscellaneous minor cleanups Hugh Dickins
2009-10-15 2:19 ` KAMEZAWA Hiroyuki
2009-10-15 22:01 ` Hugh Dickins
2009-10-16 0:41 ` [PATCH 4/9 v2] " Hugh Dickins
2009-10-15 0:52 ` [PATCH 5/9] swap_info: SWAP_HAS_CACHE cleanups Hugh Dickins
2009-10-15 2:37 ` KAMEZAWA Hiroyuki
2009-10-15 22:08 ` Hugh Dickins
2009-10-15 0:53 ` [PATCH 6/9] swap_info: swap_map of chars not shorts Hugh Dickins
2009-10-15 2:44 ` KAMEZAWA Hiroyuki
2009-10-15 22:17 ` Hugh Dickins
2009-10-15 23:52 ` KAMEZAWA Hiroyuki
2009-10-15 0:56 ` [PATCH 7/9] swap_info: swap count continuations Hugh Dickins
2009-10-15 3:30 ` KAMEZAWA Hiroyuki
2009-10-15 19:45 ` Andrew Morton
2009-10-15 21:17 ` David Rientjes
2009-10-16 0:21 ` Hugh Dickins
2009-10-15 23:53 ` Hugh Dickins
2009-10-16 1:29 ` KAMEZAWA Hiroyuki [this message]
2009-10-16 2:24 ` Hugh Dickins
2009-10-16 4:06 ` KAMEZAWA Hiroyuki
2009-10-16 4:49 ` Nitin Gupta
2009-10-16 6:30 ` [PATCH] mm: call pte_unmap() against a proper pte (Re: [PATCH 7/9] swap_info: swap count continuations) Daisuke Nishimura
2009-10-16 8:01 ` KAMEZAWA Hiroyuki
2009-10-15 0:57 ` [PATCH 8/9] swap_info: note SWAP_MAP_SHMEM Hugh Dickins
2009-10-15 3:32 ` KAMEZAWA Hiroyuki
2009-10-15 22:23 ` Hugh Dickins
2009-10-16 0:04 ` KAMEZAWA Hiroyuki
2009-10-15 0:58 ` [PATCH 9/9] swap_info: reorder its fields Hugh Dickins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20091016102951.a4f66a19.kamezawa.hiroyu@jp.fujitsu.com \
--to=kamezawa.hiroyu@jp.fujitsu.com \
--cc=akpm@linux-foundation.org \
--cc=hongshin@gmail.com \
--cc=hugh.dickins@tiscali.co.uk \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ngupta@vflare.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox