linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Inconsistency (bug) of vm_insert_page with high order allocations
@ 2009-05-28  5:07 Alexey Korolev
       [not found] ` <20090528143524.e8a2cde7.kamezawa.hiroyu@jp.fujitsu.com>
  2009-05-28  9:59 ` Mel Gorman
  0 siblings, 2 replies; 8+ messages in thread
From: Alexey Korolev @ 2009-05-28  5:07 UTC (permalink / raw)
  To: linux-mm; +Cc: greg, vijaykumar

Hi,
I have the following issue. I need to allocate a big chunk of
contiguous memory and then transfer it to user mode applications to
let them operate with given buffers.

To allocate memory I use standard function alloc_apges(gfp_mask,
order) which asks buddy allocator to give a chunk of memory of given
"order".
Allocator returns page and also sets page count to 1 but for page of
high order. I.e. pages 2,3 etc inside high order allocation will have
page->_count==0.
If I try to mmap allocated area to user space vm_insert_page will
return error as pages 2,3, etc are not refcounted.

The issue could be workaround if to set-up refcount to 1 manually for
each page. But this workaround is not very good, because page refcount
is used inside mm subsystem only.

While searching a driver with the similar solutions in kernel tree it
was found a driver which suffers from exactly the same
problem("poch"). So it is not single problem.

What you could suggest to workaround the problem except hacks with page count?
May be it makes sence to introduce wm_insert_pages function?

In this case users would have the following picture:
zero order page: alloc_page <-> vm_instert_page
non zero order  : alloc_pages(..., order) <-> vm_instert_pages(...., order)

Thanks,
Alexey

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Inconsistency (bug) of vm_insert_page with high order allocations
       [not found] ` <20090528143524.e8a2cde7.kamezawa.hiroyu@jp.fujitsu.com>
@ 2009-05-28  7:02   ` Alexey Korolev
       [not found]     ` <20090528162108.a6adcc36.kamezawa.hiroyu@jp.fujitsu.com>
  0 siblings, 1 reply; 8+ messages in thread
From: Alexey Korolev @ 2009-05-28  7:02 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, greg, vijaykumar

Hi

>> Hi,
>> ...
>> What you could suggest to workaround the problem except hacks with page count?
>> May be it makes sence to introduce wm_insert_pages function?
>>
>
>
> Maybe followings are for drivers ?
>
> void *alloc_pages_exact(size_t size, gfp_t gfp_mask)
> void free_pages_exact(void *virt, size_t size)
>
Hmm. This functions were developed for needs of video drivers to
prevent extra memory allocations, page splitting is the side effect of
using this function.
It should be Ok for UMA case.
The only problem that the driver I'm writing now should support NUMA
nodes selection also. In this case alloc_pages_exact won't help :(.
What could be the best solution to solve existing inconsistency? Any ideas?

Thanks,
Alexey

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Inconsistency (bug) of vm_insert_page with high order allocations
  2009-05-28  5:07 Inconsistency (bug) of vm_insert_page with high order allocations Alexey Korolev
       [not found] ` <20090528143524.e8a2cde7.kamezawa.hiroyu@jp.fujitsu.com>
@ 2009-05-28  9:59 ` Mel Gorman
  2009-05-30  5:27   ` Alexey Korolev
  1 sibling, 1 reply; 8+ messages in thread
From: Mel Gorman @ 2009-05-28  9:59 UTC (permalink / raw)
  To: Alexey Korolev; +Cc: linux-mm, greg, vijaykumar

On Thu, May 28, 2009 at 05:07:01PM +1200, Alexey Korolev wrote:
> Hi,
> I have the following issue. I need to allocate a big chunk of
> contiguous memory and then transfer it to user mode applications to
> let them operate with given buffers.
> 
> To allocate memory I use standard function alloc_apges(gfp_mask,
> order) which asks buddy allocator to give a chunk of memory of given
> "order".
> Allocator returns page and also sets page count to 1 but for page of
> high order. I.e. pages 2,3 etc inside high order allocation will have
> page->_count==0.
> If I try to mmap allocated area to user space vm_insert_page will
> return error as pages 2,3, etc are not refcounted.
> 

page = alloc_pages(high_order);
split_page(page, high_order);

That will fix up the ref-counting of each of the individual pages. You are
then responsible for freeing them individually. As you are inserting these
into userspace, I suspect that's ok.

> The issue could be workaround if to set-up refcount to 1 manually for
> each page. But this workaround is not very good, because page refcount
> is used inside mm subsystem only.
> 

And you would have reimplemented split_page().

> While searching a driver with the similar solutions in kernel tree it
> was found a driver which suffers from exactly the same
> problem("poch"). So it is not single problem.
> 
> What you could suggest to workaround the problem except hacks with page count?
> May be it makes sence to introduce wm_insert_pages function?
> 
> In this case users would have the following picture:
> zero order page: alloc_page <-> vm_instert_page
> non zero order  : alloc_pages(..., order) <-> vm_instert_pages(...., order)
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Inconsistency (bug) of vm_insert_page with high order allocations
  2009-05-28  9:59 ` Mel Gorman
@ 2009-05-30  5:27   ` Alexey Korolev
  2009-06-02  8:38     ` Mel Gorman
  0 siblings, 1 reply; 8+ messages in thread
From: Alexey Korolev @ 2009-05-30  5:27 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm, greg, vijaykumar

Hi,
>> To allocate memory I use standard function alloc_apges(gfp_mask,
>> order) which asks buddy allocator to give a chunk of memory of given
>> "order".
>> Allocator returns page and also sets page count to 1 but for page of
>> high order. I.e. pages 2,3 etc inside high order allocation will have
>> page->_count==0.
>> If I try to mmap allocated area to user space vm_insert_page will
>> return error as pages 2,3, etc are not refcounted.
>>
>
> page = alloc_pages(high_order);
> split_page(page, high_order);
>
> That will fix up the ref-counting of each of the individual pages. You are
> then responsible for freeing them individually. As you are inserting these
> into userspace, I suspect that's ok.

It seems it is the only way I have now. It is not so elegant - but should work.
Thanks for good advise.

BTW: Just out of curiosity what limits mapping high ordered pages into
user space. I tried to find any except the check in vm_insert but
failed. Is this checks caused by possible swapping?

Thanks,
Alexey

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Inconsistency (bug) of vm_insert_page with high order allocations
       [not found]     ` <20090528162108.a6adcc36.kamezawa.hiroyu@jp.fujitsu.com>
@ 2009-05-30  5:42       ` Alexey Korolev
  2009-06-01 23:53         ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 8+ messages in thread
From: Alexey Korolev @ 2009-05-30  5:42 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, greg, vijaykumar

Kame San,

Thank you for your answers. I've decided to use split_pages function.
>
>  - write a patch for adding alloc_page_exact_nodemask()  // this is not difficult.
>  - explain why you need this.
>  - discuss.
>
Writing the patch is not dificult - but it will be hard to explain why
it is necessary in kernel...
> IMHO, considering other mmap/munmap/zap_pte, etc... page_count() and page_mapocunt()
> should be controlled per pte. Then, you'll have to map pages one by one.
>
This is quite interesting. I tried to understand this code but it is
much complicated. I clearly understand why pages have to be mapped one
by one. By I don't understand how counters relate to this. (it is just
a curiosity question - I won't be upset if no one answer it)

Thanks,
Alexey

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Inconsistency (bug) of vm_insert_page with high order allocations
  2009-05-30  5:42       ` Alexey Korolev
@ 2009-06-01 23:53         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 8+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-06-01 23:53 UTC (permalink / raw)
  To: Alexey Korolev; +Cc: linux-mm, greg, vijaykumar

On Sat, 30 May 2009 17:42:35 +1200
Alexey Korolev <akorolex@gmail.com> wrote:

> Kame San,
> 
> Thank you for your answers. I've decided to use split_pages function.
> >
> >  - write a patch for adding alloc_page_exact_nodemask()  // this is not difficult.
> >  - explain why you need this.
> >  - discuss.
> >
> Writing the patch is not dificult - but it will be hard to explain why
> it is necessary in kernel...
> > IMHO, considering other mmap/munmap/zap_pte, etc... page_count() and page_mapocunt()
> > should be controlled per pte. Then, you'll have to map pages one by one.
> >
> This is quite interesting. I tried to understand this code but it is
> much complicated. I clearly understand why pages have to be mapped one
> by one. By I don't understand how counters relate to this. (it is just
> a curiosity question - I won't be upset if no one answer it)
> 
The kernel/cpu cannot handle changes to multiple ptes/TLBs at once. Then,
if mapcount/count is not per pte, there will be "partially mapped/unmapped" racy
state. That's a bad and we'll need a complicated synchronization technique to
map/unmap multiple ptes/TLBs at once. It seems impossible.(and not worth to try)

Thanks
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Inconsistency (bug) of vm_insert_page with high order allocations
  2009-05-30  5:27   ` Alexey Korolev
@ 2009-06-02  8:38     ` Mel Gorman
  2009-06-03  5:58       ` Alexey Korolev
  0 siblings, 1 reply; 8+ messages in thread
From: Mel Gorman @ 2009-06-02  8:38 UTC (permalink / raw)
  To: Alexey Korolev; +Cc: linux-mm, greg, vijaykumar

On Sat, May 30, 2009 at 05:27:15PM +1200, Alexey Korolev wrote:
> Hi,
> >> To allocate memory I use standard function alloc_apges(gfp_mask,
> >> order) which asks buddy allocator to give a chunk of memory of given
> >> "order".
> >> Allocator returns page and also sets page count to 1 but for page of
> >> high order. I.e. pages 2,3 etc inside high order allocation will have
> >> page->_count==0.
> >> If I try to mmap allocated area to user space vm_insert_page will
> >> return error as pages 2,3, etc are not refcounted.
> >>
> >
> > page = alloc_pages(high_order);
> > split_page(page, high_order);
> >
> > That will fix up the ref-counting of each of the individual pages. You are
> > then responsible for freeing them individually. As you are inserting these
> > into userspace, I suspect that's ok.
> 
> It seems it is the only way I have now. It is not so elegant - but should work.
> Thanks for good advise.
> 
> BTW: Just out of curiosity what limits mapping high ordered pages into
> user space. I tried to find any except the check in vm_insert but
> failed. Is this checks caused by possible swapping?
> 

Nothing limits it as such other than it's usually not required. There is
nothing really that special about high-order pages other than they are
physically contiguous. The expectation is normally that userspace does
not care about physical contiguity.

There is expected to be a 1 to 1 mapping of PTE to ref-counted pages so that
they get freed at the right times so it's not just about swapping.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Inconsistency (bug) of vm_insert_page with high order allocations
  2009-06-02  8:38     ` Mel Gorman
@ 2009-06-03  5:58       ` Alexey Korolev
  0 siblings, 0 replies; 8+ messages in thread
From: Alexey Korolev @ 2009-06-03  5:58 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm, greg, vijaykumar

Mel, Kame San,

Thanks a lot for your answers and good advises it is more or less clear why
counting needs to be per page based.
Code which splits pages works fine - no issues.

On Tue, Jun 2, 2009 at 8:38 PM, Mel Gorman<mel@csn.ul.ie> wrote:
> On Sat, May 30, 2009 at 05:27:15PM +1200, Alexey Korolev wrote:
>> Hi,
>> >> To allocate memory I use standard function alloc_apges(gfp_mask,
>> >> order) which asks buddy allocator to give a chunk of memory of given
>> >> "order".
>> >> Allocator returns page and also sets page count to 1 but for page of
>> >> high order. I.e. pages 2,3 etc inside high order allocation will have
>> >> page->_count==0.
>> >> If I try to mmap allocated area to user space vm_insert_page will
>> >> return error as pages 2,3, etc are not refcounted.
>> >>
>> >
>> > page = alloc_pages(high_order);
>> > split_page(page, high_order);
>> >
>> > That will fix up the ref-counting of each of the individual pages. You are
>> > then responsible for freeing them individually. As you are inserting these
>> > into userspace, I suspect that's ok.
>>
>> It seems it is the only way I have now. It is not so elegant - but should work.
>> Thanks for good advise.
>>
>> BTW: Just out of curiosity what limits mapping high ordered pages into
>> user space. I tried to find any except the check in vm_insert but
>> failed. Is this checks caused by possible swapping?
>>
>
> Nothing limits it as such other than it's usually not required. There is
> nothing really that special about high-order pages other than they are
> physically contiguous. The expectation is normally that userspace does
> not care about physical contiguity.
>
> There is expected to be a 1 to 1 mapping of PTE to ref-counted pages so that
> they get freed at the right times so it's not just about swapping.
>
> --
> Mel Gorman
> Part-time Phd Student                          Linux Technology Center
> University of Limerick                         IBM Dublin Software Lab
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-06-03 16:45 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-28  5:07 Inconsistency (bug) of vm_insert_page with high order allocations Alexey Korolev
     [not found] ` <20090528143524.e8a2cde7.kamezawa.hiroyu@jp.fujitsu.com>
2009-05-28  7:02   ` Alexey Korolev
     [not found]     ` <20090528162108.a6adcc36.kamezawa.hiroyu@jp.fujitsu.com>
2009-05-30  5:42       ` Alexey Korolev
2009-06-01 23:53         ` KAMEZAWA Hiroyuki
2009-05-28  9:59 ` Mel Gorman
2009-05-30  5:27   ` Alexey Korolev
2009-06-02  8:38     ` Mel Gorman
2009-06-03  5:58       ` Alexey Korolev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox