From: Gavin Shan <shangw@linux.vnet.ibm.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: linux-mm@kvack.org
Subject: Re: [PATCH v2] MM: check limit while deallocating bootmem node
Date: Thu, 3 May 2012 16:35:06 +0800 [thread overview]
Message-ID: <20120503083506.GA19924@shangw> (raw)
In-Reply-To: <20120503074708.GA31780@cmpxchg.org>
>> For the particular bootmem node, the minimal and maximal PFN (
>> Page Frame Number) have been traced in the instance of "struct
>> bootmem_data_t". On current implementation, the maximal PFN isn't
>> checked while deallocating a bunch (BITS_PER_LONG) of page frames.
>> So the current implementation won't work if the maximal PFN isn't
>> aligned with BITS_PER_LONG.
>
>That's not true, given how the bitmap works, see my previous mail.
>
>> The patch will check the maximal PFN of the given bootmem node.
>> Also, we needn't check all the bits map when the starting PFN isn't
>> BITS_PER_LONG aligned.
>
>Actually, it's musn't. I just realized that this code is totally
>buggy :(
>
>vec is an aligned chunk of memory that start is pointing into. If
>start is not aligned, we check the bitmap at the wrong offset
>throughout the loop.
>
>Your skipping the unaligned bits is not an optimization, it's a
>bugfix.
>
>> Actually, we should start from the offset
>> of the bits map, which indicated by the starting PFN. By the way,
>> V2 patch removed the duplicate check according to comments from
>> Johannes Weiner.
>>
>> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
>> ---
>> mm/bootmem.c | 7 +++++--
>> 1 files changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/bootmem.c b/mm/bootmem.c
>> index 5a04536..b4f3ba5 100644
>> --- a/mm/bootmem.c
>> +++ b/mm/bootmem.c
>> @@ -201,9 +201,11 @@ static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata)
>> count += BITS_PER_LONG;
>> start += BITS_PER_LONG;
>> } else {
>> - unsigned long off = 0;
>> + unsigned long cursor = start;
>> + unsigned long off = cursor & (BITS_PER_LONG - 1);
>>
>> - while (vec && off < BITS_PER_LONG) {
>> + vec >>= off;
>> + while (vec) {
>> if (vec & 1) {
>> page = pfn_to_page(start + off);
>
>I don't understand this.
>
>start + (start & (BITS_PER_LONG - 1)) ?
>
Yeah, It has calculated the PFN wrongly here. Actually, "cursor"
should be used here.
>> __free_pages_bootmem(page, 0);
>> @@ -211,6 +213,7 @@ static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata)
>> }
>> vec >>= 1;
>> off++;
>> + cursor++;
>
>cursor is not really used anywhere.
>
>> start = ALIGN(start + 1, BITS_PER_LONG);
>> }
>> --
>
>I think all we need to add is
>
> vec >>= start & (BITS_PER_LONG - 1);
>
>before the loop, and call the patch a bugfix rather than an
>optimization.
>
>And removing the off < BITS_PER_LONG check should probably be a
>separate change with its own explanation then, to not have unrelated
>cleanup and error potential in a bugfix patch.
>
>How about this:
>
>---
>From: Gavin Shan <shangw@linux.vnet.ibm.com>
>Subject: [patch 1/2] mm: bootmem: fix checking the bitmap when finally
> freeing bootmem
>
>When bootmem releases an unaligned chunk of memory at the beginning of
>a node to the page allocator, it iterates from that unaligned PFN but
>checks an aligned word of the page bitmap. The checked bits do not
>correspond to the PFNs and, as a result, reserved pages can be freed.
>
>Properly shift the bitmap word so that the lowest bit corresponds to
>the starting PFN before entering the freeing loop.
>
Thanks for changing it correctly, Johannes ;-)
>Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
>Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
>---
> mm/bootmem.c | 1 +
> 1 file changed, 1 insertion(+)
>
>diff --git a/mm/bootmem.c b/mm/bootmem.c
>index 0131170..67872fc 100644
>--- a/mm/bootmem.c
>+++ b/mm/bootmem.c
>@@ -203,6 +203,7 @@ static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata)
> } else {
> unsigned long off = 0;
>
>+ vec >>= start & (BITS_PER_LONG - 1);
> while (vec && off < BITS_PER_LONG) {
I think it can be changed to "while (vec) {" since it's duplicate
check. "vec" has no chance to have more bits than BITS_PER_LONG here.
Others look good. Need I change it accordingly and send it out
again?
> if (vec & 1) {
> page = pfn_to_page(start + off);
Thanks,
Gavin
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2012-05-03 8:35 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-03 1:31 Gavin Shan
2012-05-03 7:47 ` Johannes Weiner
2012-05-03 8:35 ` Gavin Shan [this message]
2012-05-03 8:50 ` Johannes Weiner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120503083506.GA19924@shangw \
--to=shangw@linux.vnet.ibm.com \
--cc=hannes@cmpxchg.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox