From: Nitin Gupta <ngupta@vflare.org>
To: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>,
akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, linux-mm-cc@laptop.org
Subject: Re: [PATCH 1/4] compcache: xvmalloc memory allocator
Date: Tue, 25 Aug 2009 20:22:21 +0530 [thread overview]
Message-ID: <4A93FAA5.5000001@vflare.org> (raw)
In-Reply-To: <Pine.LNX.4.64.0908242224530.10534@sister.anvils>
On 08/25/2009 03:16 AM, Hugh Dickins wrote:
> On Tue, 25 Aug 2009, Nitin Gupta wrote:
>> On 08/25/2009 02:09 AM, Hugh Dickins wrote:
>>> On Tue, 25 Aug 2009, Nitin Gupta wrote:
>>>> On 08/24/2009 11:03 PM, Pekka Enberg wrote:
>>>>>
>>>>> What's the purpose of passing PFNs around? There's quite a lot of PFN
>>>>> to struct page conversion going on because of it. Wouldn't it make
>>>>> more sense to return (and pass) a pointer to struct page instead?
>>>>
>>>> PFNs are 32-bit on all archs
>>>
>>> Are you sure? If it happens to be so for all machines built today,
>>> I think it can easily change tomorrow. We consistently use unsigned long
>>> for pfn (there, now I've said that, I bet you'll find somewhere we don't!)
>>>
>>> x86_64 says MAX_PHYSMEM_BITS 46 and ia64 says MAX_PHYSMEM_BITS 50 and
>>> mm/sparse.c says
>>> unsigned long max_sparsemem_pfn = 1UL<< (MAX_PHYSMEM_BITS-PAGE_SHIFT);
>>>
>>
>> For PFN to exceed 32-bit we need to have physical memory> 16TB (2^32 * 4KB).
>> So, maybe I can simply add a check in ramzswap module load to make sure that
>> RAM is indeed< 16TB and then safely use 32-bit for PFN?
>
> Others know much more about it, but I believe that with sparsemem you
> may be handling vast holes in physical memory: so a relatively small
> amount of physical memory might in part be mapped with gigantic pfns.
>
> So if you go that route, I think you'd rather have to refuse pages
> with oversized pfns (or refuse configurations with any oversized pfns),
> than base it upon the quantity of physical memory in the machine.
>
> Seems ugly to me, as it did to Pekka; but I can understand that you're
> very much in the business of saving memory, so doubling the size of some
> of your tables (I may be oversimplifying) would be repugnant to you.
>
> You could add a CONFIG option, rather like CONFIG_LBDAF, to switch on
> u64-sized pfns; but you'd still have to handle what happens when the
> pfn is too big to fit in u32 without that option; and if distros always
> switch the option on, to accomodate the larger machines, then there may
> have been no point to adding it.
>
Thanks for these details.
Now I understand that use of 32-bit PFN on 64-bit archs is unsafe. So,
there is no option but to include extra bits for PFNs or use struct page.
* Solution of ramzswap block device:
Use 48 bit PFNs (32 + 8) and have a compile time error to make sure that
that MAX_PHYSMEM_BITS is < 48 + PAGE_SHIFT. The ramzswap table can accommodate
48-bits without any increase in table size.
--- ramzswap_new.h 2009-08-25 20:10:38.054033804 +0530
+++ ramzswap.h 2009-08-25 20:09:28.386069100 +0530
@@ -110,9 +110,9 @@
/* Indexed by page no. */
struct table {
- u32 pagenum_1;
+ u32 pagenum;
u16 offset;
- u8 pagenum_2;
+ u8 count; /* object ref count (not yet used) */
u8 flags;
};
(removal for 'count' field will hurt later when we implement
memory defragmentation support).
* Solution for allocator:
Use struct page instead of PFN. This is better than always using 64-bit PFNs
since we get rid of all casts. Use of 48-bit PFNs as above will create too
much mess. However, use of struct page increases per-pool overhead by 4K on
64-bit systems. This should be okay.
Please let me know if you have any comments. I will make these changes in next
revision.
There is still some problem with memory allocator naming. Its no longer a
separate module, the symbols are not exported and its now compiled with ramzswap
block driver itself. So, I am hoping xv_malloc() does not causes any confusion
with any existing name now. It really should not cause any confustion. I would
love to retain this name for allocator.
Thanks,
Nitin
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-08-26 0:38 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-24 4:37 Nitin Gupta
2009-08-24 17:33 ` Pekka Enberg
2009-08-24 17:52 ` Nitin Gupta
2009-08-24 18:08 ` Pekka Enberg
2009-08-24 18:11 ` Nitin Gupta
2009-08-24 18:27 ` Pekka Enberg
2009-08-24 18:40 ` Nitin Gupta
2009-08-24 19:36 ` Nitin Gupta
2009-08-24 19:43 ` Pekka Enberg
2009-08-24 21:16 ` Nitin Gupta
2009-08-25 4:26 ` Pekka Enberg
2009-08-24 20:39 ` Hugh Dickins
2009-08-24 21:16 ` Nitin Gupta
2009-08-24 21:46 ` Hugh Dickins
2009-08-25 14:52 ` Nitin Gupta [this message]
2009-08-25 19:03 ` Nitin Gupta
2009-08-26 16:10 ` Christoph Lameter
2009-08-26 16:17 ` Nitin Gupta
2009-08-26 16:19 ` Pekka Enberg
2009-08-26 16:07 ` Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A93FAA5.5000001@vflare.org \
--to=ngupta@vflare.org \
--cc=akpm@linux-foundation.org \
--cc=hugh.dickins@tiscali.co.uk \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm-cc@laptop.org \
--cc=linux-mm@kvack.org \
--cc=penberg@cs.helsinki.fi \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox