* Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined
2012-11-29 23:39 ` Andrew Morton
@ 2012-11-30 0:04 ` Zach Brown
2012-11-30 3:39 ` Lin Feng
2012-11-30 3:42 ` Lin Feng
2012-11-30 10:57 ` Mel Gorman
2 siblings, 1 reply; 19+ messages in thread
From: Zach Brown @ 2012-11-30 0:04 UTC (permalink / raw)
To: Andrew Morton
Cc: Lin Feng, viro, bcrl, kamezawa.hiroyu, mhocko, hughd, cl,
mgorman, minchan, isimatu.yasuaki, laijs, wency, tangchen,
linux-fsdevel, linux-aio, linux-mm, linux-kernel
> The best I can think of is to make changes in or around
> get_user_pages(), to steal the pages from userspace and replace them
> with non-movable ones before pinning them. The performance cost of
> something like this would surely be unacceptable for direct-io, but
> maybe OK for the aio ring and futexes.
In the aio case it seems like it could be taught to populate the mapping
with non-movable pages to begin with. It's calling get_user_pages() a
few lines after instantiating the mapping itself with do_mmap_pgoff().
- z
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined
2012-11-30 0:04 ` Zach Brown
@ 2012-11-30 3:39 ` Lin Feng
0 siblings, 0 replies; 19+ messages in thread
From: Lin Feng @ 2012-11-30 3:39 UTC (permalink / raw)
To: Zach Brown
Cc: Andrew Morton, viro, bcrl, kamezawa.hiroyu, mhocko, hughd, cl,
mgorman, minchan, isimatu.yasuaki, laijs, wency, tangchen,
linux-fsdevel, linux-aio, linux-mm, linux-kernel
Hi Zach,
Thanks for your advice. So agree, I will look into it to lead aio
to use non-movable pages.
Thanks,
linfeng
On 11/30/2012 08:04 AM, Zach Brown wrote:
>> The best I can think of is to make changes in or around
>> get_user_pages(), to steal the pages from userspace and replace them
>> with non-movable ones before pinning them. The performance cost of
>> something like this would surely be unacceptable for direct-io, but
>> maybe OK for the aio ring and futexes.
>
> In the aio case it seems like it could be taught to populate the mapping
> with non-movable pages to begin with. It's calling get_user_pages() a
> few lines after instantiating the mapping itself with do_mmap_pgoff().
>
> - z
>
--
--------------------------------------------------
Lin Feng
Development Dept.I
Nanjing Fujitsu Nanda Software Tech. Co., Ltd.(FNST) No. 6 Wenzhu Road,
Nanjing, 210012, China
PHONE:+86-25-86630566-8557
COINS:7998-8557
FAX:+86-25-83317685
MAIL:linfeng@cn.fujitsu.com
--------------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined
2012-11-29 23:39 ` Andrew Morton
2012-11-30 0:04 ` Zach Brown
@ 2012-11-30 3:42 ` Lin Feng
2012-11-30 5:57 ` Andrew Morton
2012-11-30 10:57 ` Mel Gorman
2 siblings, 1 reply; 19+ messages in thread
From: Lin Feng @ 2012-11-30 3:42 UTC (permalink / raw)
To: Andrew Morton
Cc: viro, bcrl, kamezawa.hiroyu, mhocko, hughd, cl, mgorman, minchan,
isimatu.yasuaki, laijs, wency, tangchen, linux-fsdevel,
linux-aio, linux-mm, linux-kernel
hi Andrew,
On 11/30/2012 07:39 AM, Andrew Morton wrote:
> Tricky.
>
> I expect the same problem would occur with pages which are under
> O_DIRECT I/O. Obviously O_DIRECT pages won't be pinned for such long
> periods, but the durations could still be lengthy (seconds).
the offline retry timeout duration is 2 minutes, so to O_DIRECT pages
seem maybe not a problem for the moment.
>
> Worse is a futex page, which could easily remain pinned indefinitely.
>
> The best I can think of is to make changes in or around
> get_user_pages(), to steal the pages from userspace and replace them
> with non-movable ones before pinning them. The performance cost of
> something like this would surely be unacceptable for direct-io, but
> maybe OK for the aio ring and futexes.
thanks for your advice.
I want to limit the impact as little as possible, as mentioned above,
direct-io seems not a problem, we needn't touch them. Maybe we can
just change the use of get_user_pages()(in or around) such as aio
ring pages. I will try to find a way to do this.
Thanks,
linfeng
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined
2012-11-30 3:42 ` Lin Feng
@ 2012-11-30 5:57 ` Andrew Morton
2012-11-30 7:01 ` Lin Feng
2012-11-30 7:13 ` Kamezawa Hiroyuki
0 siblings, 2 replies; 19+ messages in thread
From: Andrew Morton @ 2012-11-30 5:57 UTC (permalink / raw)
To: Lin Feng
Cc: viro, bcrl, kamezawa.hiroyu, mhocko, hughd, cl, mgorman, minchan,
isimatu.yasuaki, laijs, wency, tangchen, linux-fsdevel,
linux-aio, linux-mm, linux-kernel
On Fri, 30 Nov 2012 11:42:05 +0800 Lin Feng <linfeng@cn.fujitsu.com> wrote:
> hi Andrew,
>
> On 11/30/2012 07:39 AM, Andrew Morton wrote:
> > Tricky.
> >
> > I expect the same problem would occur with pages which are under
> > O_DIRECT I/O. Obviously O_DIRECT pages won't be pinned for such long
> > periods, but the durations could still be lengthy (seconds).
> the offline retry timeout duration is 2 minutes, so to O_DIRECT pages
> seem maybe not a problem for the moment.
> >
> > Worse is a futex page, which could easily remain pinned indefinitely.
> >
> > The best I can think of is to make changes in or around
> > get_user_pages(), to steal the pages from userspace and replace them
> > with non-movable ones before pinning them. The performance cost of
> > something like this would surely be unacceptable for direct-io, but
> > maybe OK for the aio ring and futexes.
> thanks for your advice.
> I want to limit the impact as little as possible, as mentioned above,
> direct-io seems not a problem, we needn't touch them. Maybe we can
> just change the use of get_user_pages()(in or around) such as aio
> ring pages. I will try to find a way to do this.
What about futexes?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined
2012-11-30 5:57 ` Andrew Morton
@ 2012-11-30 7:01 ` Lin Feng
2012-11-30 7:55 ` Andrew Morton
2012-11-30 7:13 ` Kamezawa Hiroyuki
1 sibling, 1 reply; 19+ messages in thread
From: Lin Feng @ 2012-11-30 7:01 UTC (permalink / raw)
To: Andrew Morton
Cc: viro, bcrl, kamezawa.hiroyu, mhocko, hughd, cl, mgorman, minchan,
isimatu.yasuaki, laijs, wency, tangchen, linux-fsdevel,
linux-aio, linux-mm, linux-kernel
On 11/30/2012 01:57 PM, Andrew Morton wrote:
> On Fri, 30 Nov 2012 11:42:05 +0800 Lin Feng <linfeng@cn.fujitsu.com> wrote:
>
>> hi Andrew,
>>
>> On 11/30/2012 07:39 AM, Andrew Morton wrote:
>>> Tricky.
>>>
>>> I expect the same problem would occur with pages which are under
>>> O_DIRECT I/O. Obviously O_DIRECT pages won't be pinned for such long
>>> periods, but the durations could still be lengthy (seconds).
>> the offline retry timeout duration is 2 minutes, so to O_DIRECT pages
>> seem maybe not a problem for the moment.
>>>
>>> Worse is a futex page, which could easily remain pinned indefinitely.
>>>
>>> The best I can think of is to make changes in or around
>>> get_user_pages(), to steal the pages from userspace and replace them
>>> with non-movable ones before pinning them. The performance cost of
>>> something like this would surely be unacceptable for direct-io, but
>>> maybe OK for the aio ring and futexes.
>> thanks for your advice.
>> I want to limit the impact as little as possible, as mentioned above,
>> direct-io seems not a problem, we needn't touch them. Maybe we can
>> just change the use of get_user_pages()(in or around) such as aio
>> ring pages. I will try to find a way to do this.
>
> What about futexes?
hi Andrew,
Yes, better to find an approach to solve them all.
But I'm worried about that if we just confine get_user_pages() to use
none-movable pages, it will drain the none-movable pages soon. Because
there are many places using get_user_pages() such as some drivers.
IMHO in most cases get_user_pages() callers should release the pages soon,
so pages allocated from movable zone should be OK. But I'm not sure if
we get such rule upon get_user_pages().
And in other cases we specify get_user_pages() to allocate pages from
none-movable zone.
So could we add a zone-alloc flags when we call get_user_pages()?
Thanks,
linfeng
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined
2012-11-30 7:01 ` Lin Feng
@ 2012-11-30 7:55 ` Andrew Morton
2012-11-30 10:29 ` Lin Feng
2012-11-30 11:00 ` Mel Gorman
0 siblings, 2 replies; 19+ messages in thread
From: Andrew Morton @ 2012-11-30 7:55 UTC (permalink / raw)
To: Lin Feng
Cc: viro, bcrl, kamezawa.hiroyu, mhocko, hughd, cl, mgorman, minchan,
isimatu.yasuaki, laijs, wency, tangchen, linux-fsdevel,
linux-aio, linux-mm, linux-kernel
On Fri, 30 Nov 2012 15:01:26 +0800 Lin Feng <linfeng@cn.fujitsu.com> wrote:
>
>
> On 11/30/2012 01:57 PM, Andrew Morton wrote:
> > On Fri, 30 Nov 2012 11:42:05 +0800 Lin Feng <linfeng@cn.fujitsu.com> wrote:
> >
> >> hi Andrew,
> >>
> >> On 11/30/2012 07:39 AM, Andrew Morton wrote:
> >>> Tricky.
> >>>
> >>> I expect the same problem would occur with pages which are under
> >>> O_DIRECT I/O. Obviously O_DIRECT pages won't be pinned for such long
> >>> periods, but the durations could still be lengthy (seconds).
> >> the offline retry timeout duration is 2 minutes, so to O_DIRECT pages
> >> seem maybe not a problem for the moment.
> >>>
> >>> Worse is a futex page, which could easily remain pinned indefinitely.
> >>>
> >>> The best I can think of is to make changes in or around
> >>> get_user_pages(), to steal the pages from userspace and replace them
> >>> with non-movable ones before pinning them. The performance cost of
> >>> something like this would surely be unacceptable for direct-io, but
> >>> maybe OK for the aio ring and futexes.
> >> thanks for your advice.
> >> I want to limit the impact as little as possible, as mentioned above,
> >> direct-io seems not a problem, we needn't touch them. Maybe we can
> >> just change the use of get_user_pages()(in or around) such as aio
> >> ring pages. I will try to find a way to do this.
> >
> > What about futexes?
> hi Andrew,
>
> Yes, better to find an approach to solve them all.
>
> But I'm worried about that if we just confine get_user_pages() to use
> none-movable pages, it will drain the none-movable pages soon. Because
> there are many places using get_user_pages() such as some drivers.
Obviously we shouldn't change get_user_pages() for all callers.
> IMHO in most cases get_user_pages() callers should release the pages soon,
> so pages allocated from movable zone should be OK. But I'm not sure if
> we get such rule upon get_user_pages().
> And in other cases we specify get_user_pages() to allocate pages from
> none-movable zone.
>
> So could we add a zone-alloc flags when we call get_user_pages()?
Well, that's a fairly low-level implementation detail. A more typical
approach would be to add a new get_user_pages_non_movable() or such.
That would probably have the same signature as get_user_pages(), with
one additional argument. Then get_user_pages() becomes a one-line
wrapper which passes in a particular value of that argument.
But that means we'd also have to add get_user_pages_fast_non_movable()
and things might become a bit stupid. A better approach might be to
add a new library function which callers can use before (or after?)
calling get_user_pages[_fast]().
Unsure. It's the sort of thing where one has to dive in and try a few
things.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined
2012-11-30 7:55 ` Andrew Morton
@ 2012-11-30 10:29 ` Lin Feng
2012-11-30 10:47 ` Andrew Morton
2012-11-30 11:00 ` Mel Gorman
1 sibling, 1 reply; 19+ messages in thread
From: Lin Feng @ 2012-11-30 10:29 UTC (permalink / raw)
To: Andrew Morton
Cc: viro, bcrl, kamezawa.hiroyu, mhocko, hughd, cl, mgorman, minchan,
isimatu.yasuaki, laijs, wency, tangchen, linux-fsdevel,
linux-aio, linux-mm, linux-kernel, Lin Feng
On 11/30/2012 03:55 PM, Andrew Morton wrote:
> On Fri, 30 Nov 2012 15:01:26 +0800 Lin Feng <linfeng@cn.fujitsu.com> wrote:
>
>>
>>
>> On 11/30/2012 01:57 PM, Andrew Morton wrote:
>>> On Fri, 30 Nov 2012 11:42:05 +0800 Lin Feng <linfeng@cn.fujitsu.com> wrote:
>>>
>>>> hi Andrew,
>>>>
>>>> On 11/30/2012 07:39 AM, Andrew Morton wrote:
>>>>> Tricky.
>>>>>
>>>>> I expect the same problem would occur with pages which are under
>>>>> O_DIRECT I/O. Obviously O_DIRECT pages won't be pinned for such long
>>>>> periods, but the durations could still be lengthy (seconds).
>>>> the offline retry timeout duration is 2 minutes, so to O_DIRECT pages
>>>> seem maybe not a problem for the moment.
>>>>>
>>>>> Worse is a futex page, which could easily remain pinned indefinitely.
>>>>>
>>>>> The best I can think of is to make changes in or around
>>>>> get_user_pages(), to steal the pages from userspace and replace them
>>>>> with non-movable ones before pinning them. The performance cost of
>>>>> something like this would surely be unacceptable for direct-io, but
>>>>> maybe OK for the aio ring and futexes.
>>>> thanks for your advice.
>>>> I want to limit the impact as little as possible, as mentioned above,
>>>> direct-io seems not a problem, we needn't touch them. Maybe we can
>>>> just change the use of get_user_pages()(in or around) such as aio
>>>> ring pages. I will try to find a way to do this.
>>>
>>> What about futexes?
>> hi Andrew,
>>
>> Yes, better to find an approach to solve them all.
>>
>> But I'm worried about that if we just confine get_user_pages() to use
>> none-movable pages, it will drain the none-movable pages soon. Because
>> there are many places using get_user_pages() such as some drivers.
>
> Obviously we shouldn't change get_user_pages() for all callers.
>
>> IMHO in most cases get_user_pages() callers should release the pages soon,
>> so pages allocated from movable zone should be OK. But I'm not sure if
>> we get such rule upon get_user_pages().
>> And in other cases we specify get_user_pages() to allocate pages from
>> none-movable zone.
>>
>> So could we add a zone-alloc flags when we call get_user_pages()?
>
> Well, that's a fairly low-level implementation detail. A more typical
> approach would be to add a new get_user_pages_non_movable() or such.
> That would probably have the same signature as get_user_pages(), with
> one additional argument. Then get_user_pages() becomes a one-line
> wrapper which passes in a particular value of that argument.
>
> But that means we'd also have to add get_user_pages_fast_non_movable()
> and things might become a bit stupid. A better approach might be to
hi Andrew,
Thanks for your patient reply.
What I can think out is like following:
inline int generic_get_user_pages(..., int movable_flag)
{
if (0 == movable_flag)
return get_user_pages();
else if (1 == movable_flag)
return get_user_pages_non_movable();
}
Yes, that seems to add a lot of duplicated codes.
> add a new library function which callers can use before (or after?)
> calling get_user_pages[_fast]().
Sorry, I'm not quite understand what "library function" function means..
Does it means a function aids get_user_pages() or totally wraps/replaces
get_user_pages(), or none of above?
Thanks,
linfeng
>
> Unsure. It's the sort of thing where one has to dive in and try a few
> things.
ah, maybe more complicated than as I can expect..
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined
2012-11-30 10:29 ` Lin Feng
@ 2012-11-30 10:47 ` Andrew Morton
2012-12-03 3:00 ` Lin Feng
0 siblings, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2012-11-30 10:47 UTC (permalink / raw)
To: Lin Feng
Cc: viro, bcrl, kamezawa.hiroyu, mhocko, hughd, cl, mgorman, minchan,
isimatu.yasuaki, laijs, wency, tangchen, linux-fsdevel,
linux-aio, linux-mm, linux-kernel
On Fri, 30 Nov 2012 18:29:30 +0800 Lin Feng <linfeng@cn.fujitsu.com> wrote:
> > add a new library function which callers can use before (or after?)
> > calling get_user_pages[_fast]().
> Sorry, I'm not quite understand what "library function" function means..
> Does it means a function aids get_user_pages() or totally wraps/replaces
> get_user_pages(), or none of above?
"library function" is terminology for a general facility which
the core kernel makes available to other parts of the kernel.
get_user_pages() is a library function, as are the functions in lib/,
etc. "grep EXPORT_SYMBOL ./*/*.c"
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined
2012-11-30 10:47 ` Andrew Morton
@ 2012-12-03 3:00 ` Lin Feng
0 siblings, 0 replies; 19+ messages in thread
From: Lin Feng @ 2012-12-03 3:00 UTC (permalink / raw)
To: Andrew Morton
Cc: viro, bcrl, kamezawa.hiroyu, mhocko, hughd, cl, mgorman, minchan,
isimatu.yasuaki, laijs, wency, tangchen, linux-fsdevel,
linux-aio, linux-mm, linux-kernel, Lin Feng
On 11/30/2012 06:47 PM, Andrew Morton wrote:
> On Fri, 30 Nov 2012 18:29:30 +0800 Lin Feng <linfeng@cn.fujitsu.com> wrote:
>
>>> add a new library function which callers can use before (or after?)
>>> calling get_user_pages[_fast]().
>> Sorry, I'm not quite understand what "library function" function means..
>> Does it means a function aids get_user_pages() or totally wraps/replaces
>> get_user_pages(), or none of above?
>
> "library function" is terminology for a general facility which
> the core kernel makes available to other parts of the kernel.
> get_user_pages() is a library function, as are the functions in lib/,
> etc. "grep EXPORT_SYMBOL ./*/*.c"
hi Andrew,
Thanks for your explanation and sorry for my ignorant question :)
As Mel said Still I can't find a way to make every guy happy..
Thanks,
linfeng
>
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined
2012-11-30 7:55 ` Andrew Morton
2012-11-30 10:29 ` Lin Feng
@ 2012-11-30 11:00 ` Mel Gorman
2012-12-03 2:52 ` Lin Feng
1 sibling, 1 reply; 19+ messages in thread
From: Mel Gorman @ 2012-11-30 11:00 UTC (permalink / raw)
To: Andrew Morton
Cc: Lin Feng, viro, bcrl, kamezawa.hiroyu, mhocko, hughd, cl,
minchan, isimatu.yasuaki, laijs, wency, tangchen, linux-fsdevel,
linux-aio, linux-mm, linux-kernel
On Thu, Nov 29, 2012 at 11:55:02PM -0800, Andrew Morton wrote:
> On Fri, 30 Nov 2012 15:01:26 +0800 Lin Feng <linfeng@cn.fujitsu.com> wrote:
>
> >
> >
> > On 11/30/2012 01:57 PM, Andrew Morton wrote:
> > > On Fri, 30 Nov 2012 11:42:05 +0800 Lin Feng <linfeng@cn.fujitsu.com> wrote:
> > >
> > >> hi Andrew,
> > >>
> > >> On 11/30/2012 07:39 AM, Andrew Morton wrote:
> > >>> Tricky.
> > >>>
> > >>> I expect the same problem would occur with pages which are under
> > >>> O_DIRECT I/O. Obviously O_DIRECT pages won't be pinned for such long
> > >>> periods, but the durations could still be lengthy (seconds).
> > >> the offline retry timeout duration is 2 minutes, so to O_DIRECT pages
> > >> seem maybe not a problem for the moment.
> > >>>
> > >>> Worse is a futex page, which could easily remain pinned indefinitely.
> > >>>
> > >>> The best I can think of is to make changes in or around
> > >>> get_user_pages(), to steal the pages from userspace and replace them
> > >>> with non-movable ones before pinning them. The performance cost of
> > >>> something like this would surely be unacceptable for direct-io, but
> > >>> maybe OK for the aio ring and futexes.
> > >> thanks for your advice.
> > >> I want to limit the impact as little as possible, as mentioned above,
> > >> direct-io seems not a problem, we needn't touch them. Maybe we can
> > >> just change the use of get_user_pages()(in or around) such as aio
> > >> ring pages. I will try to find a way to do this.
> > >
> > > What about futexes?
> > hi Andrew,
> >
> > Yes, better to find an approach to solve them all.
> >
> > But I'm worried about that if we just confine get_user_pages() to use
> > none-movable pages, it will drain the none-movable pages soon. Because
> > there are many places using get_user_pages() such as some drivers.
>
> Obviously we shouldn't change get_user_pages() for all callers.
>
> > IMHO in most cases get_user_pages() callers should release the pages soon,
> > so pages allocated from movable zone should be OK. But I'm not sure if
> > we get such rule upon get_user_pages().
> > And in other cases we specify get_user_pages() to allocate pages from
> > none-movable zone.
> >
> > So could we add a zone-alloc flags when we call get_user_pages()?
>
> Well, that's a fairly low-level implementation detail. A more typical
> approach would be to add a new get_user_pages_non_movable() or such.
> That would probably have the same signature as get_user_pages(), with
> one additional argument. Then get_user_pages() becomes a one-line
> wrapper which passes in a particular value of that argument.
>
That is going in the direction that all pinned pages become MIGRATE_UNMOVABLE
allocations. That will impact THP availability by increasing the number
of MIGRATE_UNMOVABLE blocks that exist and it would hit every user --
not just those that care about ZONE_MOVABLE.
I'm likely to NAK such a patch if it's only about node hot-remove because
it's much more of a corner case than wanting to use THP.
I would prefer if get_user_pages() checked if the page it was about to
pin was in ZONE_MOVABLE and if so, migrate it at that point before it's
pinned. It'll be expensive but will guarantee ZONE_MOVABLE availability
if that's what they want. The CMA people might also want to take
advantage of this if the page happened to be in the MIGRATE_CMA
pageblock.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined
2012-11-30 11:00 ` Mel Gorman
@ 2012-12-03 2:52 ` Lin Feng
2012-12-03 11:37 ` Mel Gorman
0 siblings, 1 reply; 19+ messages in thread
From: Lin Feng @ 2012-12-03 2:52 UTC (permalink / raw)
To: Mel Gorman
Cc: Andrew Morton, viro, bcrl, kamezawa.hiroyu, mhocko, hughd, cl,
minchan, isimatu.yasuaki, laijs, wency, tangchen, linux-fsdevel,
linux-aio, linux-mm, linux-kernel
On 11/30/2012 07:00 PM, Mel Gorman wrote:
>>
>> Well, that's a fairly low-level implementation detail. A more typical
>> approach would be to add a new get_user_pages_non_movable() or such.
>> That would probably have the same signature as get_user_pages(), with
>> one additional argument. Then get_user_pages() becomes a one-line
>> wrapper which passes in a particular value of that argument.
>>
>
> That is going in the direction that all pinned pages become MIGRATE_UNMOVABLE
> allocations. That will impact THP availability by increasing the number
> of MIGRATE_UNMOVABLE blocks that exist and it would hit every user --
> not just those that care about ZONE_MOVABLE.
>
> I'm likely to NAK such a patch if it's only about node hot-remove because
> it's much more of a corner case than wanting to use THP.
>
> I would prefer if get_user_pages() checked if the page it was about to
> pin was in ZONE_MOVABLE and if so, migrate it at that point before it's
> pinned. It'll be expensive but will guarantee ZONE_MOVABLE availability
> if that's what they want. The CMA people might also want to take
> advantage of this if the page happened to be in the MIGRATE_CMA
> pageblock.
>
hi Mel,
Thanks for your suggestion.
My initial idea is also to restrict the impact as little as possible so
migrate such pages as we need.
But even to such "going to pin pages", most of them are going to be released
soon, so deal with them all in the same way is really *expensive*.
May be we do have to find another way that makes everybody happy :)
Thanks,
linfeng
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined
2012-12-03 2:52 ` Lin Feng
@ 2012-12-03 11:37 ` Mel Gorman
0 siblings, 0 replies; 19+ messages in thread
From: Mel Gorman @ 2012-12-03 11:37 UTC (permalink / raw)
To: Lin Feng
Cc: Andrew Morton, viro, bcrl, kamezawa.hiroyu, mhocko, hughd, cl,
minchan, isimatu.yasuaki, laijs, wency, tangchen, linux-fsdevel,
linux-aio, linux-mm, linux-kernel
On Mon, Dec 03, 2012 at 10:52:27AM +0800, Lin Feng wrote:
>
>
> On 11/30/2012 07:00 PM, Mel Gorman wrote:
> >>
> >> Well, that's a fairly low-level implementation detail. A more typical
> >> approach would be to add a new get_user_pages_non_movable() or such.
> >> That would probably have the same signature as get_user_pages(), with
> >> one additional argument. Then get_user_pages() becomes a one-line
> >> wrapper which passes in a particular value of that argument.
> >>
> >
> > That is going in the direction that all pinned pages become MIGRATE_UNMOVABLE
> > allocations. That will impact THP availability by increasing the number
> > of MIGRATE_UNMOVABLE blocks that exist and it would hit every user --
> > not just those that care about ZONE_MOVABLE.
> >
> > I'm likely to NAK such a patch if it's only about node hot-remove because
> > it's much more of a corner case than wanting to use THP.
> >
> > I would prefer if get_user_pages() checked if the page it was about to
> > pin was in ZONE_MOVABLE and if so, migrate it at that point before it's
> > pinned. It'll be expensive but will guarantee ZONE_MOVABLE availability
> > if that's what they want. The CMA people might also want to take
> > advantage of this if the page happened to be in the MIGRATE_CMA
> > pageblock.
> >
> hi Mel,
>
> Thanks for your suggestion.
> My initial idea is also to restrict the impact as little as possible so
> migrate such pages as we need.
> But even to such "going to pin pages", most of them are going to be released
> soon, so deal with them all in the same way is really *expensive*.
>
Then you need to somehow distinguish between short-lived pins and
long-lived pins and only migrate the long-lived pins. I didn't research
how this could be implemented
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined
2012-11-30 5:57 ` Andrew Morton
2012-11-30 7:01 ` Lin Feng
@ 2012-11-30 7:13 ` Kamezawa Hiroyuki
2012-11-30 8:00 ` Andrew Morton
1 sibling, 1 reply; 19+ messages in thread
From: Kamezawa Hiroyuki @ 2012-11-30 7:13 UTC (permalink / raw)
To: Andrew Morton
Cc: Lin Feng, viro, bcrl, mhocko, hughd, cl, mgorman, minchan,
isimatu.yasuaki, laijs, wency, tangchen, linux-fsdevel,
linux-aio, linux-mm, linux-kernel
(2012/11/30 14:57), Andrew Morton wrote:
> On Fri, 30 Nov 2012 11:42:05 +0800 Lin Feng <linfeng@cn.fujitsu.com> wrote:
>
>> hi Andrew,
>>
>> On 11/30/2012 07:39 AM, Andrew Morton wrote:
>>> Tricky.
>>>
>>> I expect the same problem would occur with pages which are under
>>> O_DIRECT I/O. Obviously O_DIRECT pages won't be pinned for such long
>>> periods, but the durations could still be lengthy (seconds).
>> the offline retry timeout duration is 2 minutes, so to O_DIRECT pages
>> seem maybe not a problem for the moment.
>>>
>>> Worse is a futex page, which could easily remain pinned indefinitely.
>>>
>>> The best I can think of is to make changes in or around
>>> get_user_pages(), to steal the pages from userspace and replace them
>>> with non-movable ones before pinning them. The performance cost of
>>> something like this would surely be unacceptable for direct-io, but
>>> maybe OK for the aio ring and futexes.
>> thanks for your advice.
>> I want to limit the impact as little as possible, as mentioned above,
>> direct-io seems not a problem, we needn't touch them. Maybe we can
>> just change the use of get_user_pages()(in or around) such as aio
>> ring pages. I will try to find a way to do this.
>
> What about futexes?
>
IIUC, futex's key is now a pair of (mm,address) or (inode, pgoff).
Then, get_user_page() in futex.c will release the page by put_page().
'struct page' is just touched by get_futex_key() to obtain page->mapping info.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined
2012-11-30 7:13 ` Kamezawa Hiroyuki
@ 2012-11-30 8:00 ` Andrew Morton
0 siblings, 0 replies; 19+ messages in thread
From: Andrew Morton @ 2012-11-30 8:00 UTC (permalink / raw)
To: Kamezawa Hiroyuki
Cc: Lin Feng, viro, bcrl, mhocko, hughd, cl, mgorman, minchan,
isimatu.yasuaki, laijs, wency, tangchen, linux-fsdevel,
linux-aio, linux-mm, linux-kernel
On Fri, 30 Nov 2012 16:13:16 +0900 Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > What about futexes?
> >
>
> IIUC, futex's key is now a pair of (mm,address) or (inode, pgoff).
> Then, get_user_page() in futex.c will release the page by put_page().
> 'struct page' is just touched by get_futex_key() to obtain page->mapping info.
Ah yes, that page is unpinned before syscall return.
grep -rl get_user_pages .
Gad.
These should be audited. The great majority will be simple and OK,
but drivers/media, drivers/infiniband and net/rds could be problematic.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined
2012-11-29 23:39 ` Andrew Morton
2012-11-30 0:04 ` Zach Brown
2012-11-30 3:42 ` Lin Feng
@ 2012-11-30 10:57 ` Mel Gorman
2 siblings, 0 replies; 19+ messages in thread
From: Mel Gorman @ 2012-11-30 10:57 UTC (permalink / raw)
To: Andrew Morton
Cc: Lin Feng, viro, bcrl, kamezawa.hiroyu, mhocko, hughd, cl,
minchan, isimatu.yasuaki, laijs, wency, tangchen, linux-fsdevel,
linux-aio, linux-mm, linux-kernel
On Thu, Nov 29, 2012 at 03:39:30PM -0800, Andrew Morton wrote:
> On Thu, 29 Nov 2012 14:54:58 +0800
> Lin Feng <linfeng@cn.fujitsu.com> wrote:
>
> > Hi all,
> >
> > We encounter a "Resource temporarily unavailable" fail while trying
> > to offline a memory section in a movable zone. We found that there are
> > some pages can't be migrated. The offline operation fails in function
> > migrate_page_move_mapping() returning -EAGAIN till timeout because
> > the if assertion 'page_count(page) != 1' fails.
> > I wonder in the case 'page_count(page) != 1', should we always wait
> > (return -EAGAING)? Or in other words, can we do something here for
> > migration if we know where the pages from?
> >
> > And finally found that such pages are used by /sbin/multipathd in the form
> > of aio ring_pages. Besides once increment introduced by the offline calling
> > chain, another increment is added by aio_setup_ring() via callling
> > get_userpages(), it won't decrease until we call aio_free_ring().
> >
> > The dump_page info in the offline context is showed as following:
> > page:ffffea0011e69140 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1d
> > page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable)
> > page:ffffea0011fb0480 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1c
> > page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable)
> > page:ffffea0011fbaa80 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1a
> > page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable)
> > page:ffffea0011ff21c0 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1b
> > page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable)
> >
> > The multipathd seems never going to release the ring_pages until we reboot the box.
> > Furthermore, if some guy makes app which only calls io_setup() but never calls
> > io_destroy() for the reason that he has to keep the io_setup() for a long time
> > or just forgets to or even on purpose that we can't expect.
> > So I think the mm-hotplug framwork should get the capability to deal with such
> > situation. And should we consider adding migration support for such pages?
> >
> > However I don't know if there are any other kinds of such particular pages in
> > current kernel/Linux system. If unluckily there are many apparently it's hard to
> > handle them all, just adding migrate support for aio ring_pages is insufficient.
> >
> > But if luckily can we use the private field of page struct to track the
> > ring_pages[] pointer so that we can retrieve the user when migrate?
> > Doing so another problem occurs, how to distinguish such special pages?
> > Use pageflag may cause an impact on current pageflag layout, add new pageflag
> > item also seems to be impossible.
> >
> > I'm not sure what way is the right approach, seeking for help.
> > Any comments are extremely needed, thanks :)
>
> Tricky.
>
> I expect the same problem would occur with pages which are under
> O_DIRECT I/O. Obviously O_DIRECT pages won't be pinned for such long
> periods, but the durations could still be lengthy (seconds).
>
> Worse is a futex page, which could easily remain pinned indefinitely.
>
> The best I can think of is to make changes in or around
> get_user_pages(), to steal the pages from userspace and replace them
> with non-movable ones before pinning them. The performance cost of
> something like this would surely be unacceptable for direct-io, but
> maybe OK for the aio ring and futexes.
>
If this happens then it would be preferred if this only happened for
ZONE_MOVABLE. If it generally happens it means we're going to have a lot
more MIGRATE_UNMOVABLE pageblocks and a lot more fragmentation leading
to lower THP availability. For THP, we're ok if some pageblocks are
temporarily unavailable or even unavailable for long periods of time,
we can cope with that but we (or I at least) do not want to lower THP
availability on systems that do not care about ZONE_MOVABLE or node hot-plug.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread