* [RFC] per thread page reservation patch [not found] ` <20050107144644.GA9606@infradead.org> @ 2005-01-07 17:16 ` Vladimir Saveliev 2005-01-07 18:48 ` Andrew Morton ` (2 more replies) 0 siblings, 3 replies; 27+ messages in thread From: Vladimir Saveliev @ 2005-01-07 17:16 UTC (permalink / raw) To: linux-mm; +Cc: Andrew Morton, linux-kernel, Christoph Hellwig [-- Attachment #1: Type: text/plain, Size: 900 bytes --] Hello On Fri, 2005-01-07 at 17:46, Christoph Hellwig wrote: > > > reiser4-perthread-pages.patch > > this one I don't particularly object, but I'm not sure it's really > the right thing to do. Can you post it with a detailed description > to linux-mm so we can kick off discussion? > This patch adds an API to reserve some number of pages for exclusive use of calling thread. This is necessary to support operations that cannot tolerate memory allocation failures in the middle (reiser4 uses this API to not have to undo complex uncompleted (due to ENOMEM) filesystem tree modifications). Implementation is simple: reserved pages are stored in a list hanging off calling task_struct. Caller is to manually release unused pages. This API is supposed to be used for small reservation (no more than few dozens of pages). Looking forward for advices on how to do that better/properly Thanks [-- Attachment #2: reiser4-perthread-pages.patch --] [-- Type: text/plain, Size: 6484 bytes --] This patch adds an API to reserve some number of pages for exclusive use of calling thread. This is necessary to support operations that cannot tolerate memory allocation failures in the middle. Implementation is very simplistic and would only work efficiently for small reservations (no more than few dozens of pages). Reserved pages are stored in a list hanging off calling task_struct. This list is only accessed by the current thread, hence, no locking is required. Note that this makes use of reservation from interrupt handlers impossible without some external synchronization. Caller is responsible for manually releasing unused pages by calling perthread_pages_release(int nrpages) with proper argument. Signed-off-by: Andrew Morton <akpm@osdl.org> --- diff -puN include/linux/gfp.h~reiser4-perthread-pages include/linux/gfp.h include/linux/gfp.h | 4 + include/linux/init_task.h | 4 - include/linux/sched.h | 3 + kernel/fork.c | 4 + mm/page_alloc.c | 97 ++++++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 110 insertions(+), 2 deletions(-) diff -puN include/linux/gfp.h~reiser4-perthread-pages include/linux/gfp.h --- linux-2.6.10-rc3/include/linux/gfp.h~reiser4-perthread-pages 2004-12-22 20:09:44.153164276 +0300 +++ linux-2.6.10-rc3-vs/include/linux/gfp.h 2004-12-22 20:09:44.199167794 +0300 @@ -134,4 +134,8 @@ extern void FASTCALL(free_cold_page(stru void page_alloc_init(void); +int perthread_pages_reserve(int nrpages, int gfp); +void perthread_pages_release(int nrpages); +int perthread_pages_count(void); + #endif /* __LINUX_GFP_H */ diff -puN include/linux/init_task.h~reiser4-perthread-pages include/linux/init_task.h --- linux-2.6.10-rc3/include/linux/init_task.h~reiser4-perthread-pages 2004-12-22 20:09:44.160164812 +0300 +++ linux-2.6.10-rc3-vs/include/linux/init_task.h 2004-12-22 20:09:44.201167947 +0300 @@ -112,8 +112,8 @@ extern struct group_info init_groups; .proc_lock = SPIN_LOCK_UNLOCKED, \ .switch_lock = SPIN_LOCK_UNLOCKED, \ .journal_info = NULL, \ + .private_pages = LIST_HEAD_INIT(tsk.private_pages), \ + .private_pages_count = 0, \ } - - #endif diff -puN include/linux/sched.h~reiser4-perthread-pages include/linux/sched.h --- linux-2.6.10-rc3/include/linux/sched.h~reiser4-perthread-pages 2004-12-22 20:09:44.170165576 +0300 +++ linux-2.6.10-rc3-vs/include/linux/sched.h 2004-12-22 20:09:44.204168176 +0300 @@ -691,6 +691,9 @@ struct task_struct { nodemask_t mems_allowed; int cpuset_mems_generation; #endif + + struct list_head private_pages; /* per-process private pages */ + int private_pages_count; }; static inline pid_t process_group(struct task_struct *tsk) diff -puN kernel/fork.c~reiser4-perthread-pages kernel/fork.c --- linux-2.6.10-rc3/kernel/fork.c~reiser4-perthread-pages 2004-12-22 20:09:44.179166264 +0300 +++ linux-2.6.10-rc3-vs/kernel/fork.c 2004-12-22 20:09:44.215169017 +0300 @@ -154,6 +154,10 @@ static struct task_struct *dup_task_stru tsk->thread_info = ti; ti->task = tsk; + /* initialize list of pages privately reserved by process */ + INIT_LIST_HEAD(&tsk->private_pages); + tsk->private_pages_count = 0; + /* One for us, one for whoever does the "release_task()" (usually parent) */ atomic_set(&tsk->usage,2); return tsk; diff -puN mm/page_alloc.c~reiser4-perthread-pages mm/page_alloc.c --- linux-2.6.10-rc3/mm/page_alloc.c~reiser4-perthread-pages 2004-12-22 20:09:44.187166876 +0300 +++ linux-2.6.10-rc3-vs/mm/page_alloc.c 2004-12-22 20:09:44.219169323 +0300 @@ -551,6 +551,97 @@ void fastcall free_cold_page(struct page free_hot_cold_page(page, 1); } +static inline struct list_head *get_per_thread_pages(void) +{ + return ¤t->private_pages; +} + +int perthread_pages_reserve(int nrpages, int gfp) +{ + int i; + struct list_head accumulator; + struct list_head *per_thread; + + per_thread = get_per_thread_pages(); + INIT_LIST_HEAD(&accumulator); + list_splice_init(per_thread, &accumulator); + for (i = 0; i < nrpages; ++i) { + struct page *page; + + page = alloc_page(gfp); + if (page != NULL) + list_add(&page->lru, &accumulator); + else { + for (; i > 0; --i) { + page = list_entry(accumulator.next, struct page, lru); + list_del(&page->lru); + page_cache_release(page); + } + return -ENOMEM; + } + } + /* + * Q: why @accumulator is used, instead of directly adding pages to + * the get_per_thread_pages()? + * + * A: because after first page is added to the get_per_thread_pages(), + * next call to the alloc_page() (on the next loop iteration), will + * re-use it. + */ + list_splice(&accumulator, per_thread); + current->private_pages_count += nrpages; + return 0; +} +EXPORT_SYMBOL(perthread_pages_reserve); + +void perthread_pages_release(int nrpages) +{ + struct list_head *per_thread; + + current->private_pages_count -= nrpages; + per_thread = get_per_thread_pages(); + for (; nrpages != 0; --nrpages) { + struct page *page; + + BUG_ON(list_empty(per_thread)); + page = list_entry(per_thread->next, struct page, lru); + list_del(&page->lru); + page_cache_release(page); + } +} +EXPORT_SYMBOL(perthread_pages_release); + +int perthread_pages_count(void) +{ + return current->private_pages_count; +} +EXPORT_SYMBOL(perthread_pages_count); + +static inline struct page * +perthread_pages_alloc(void) +{ + struct list_head *perthread_pages; + + /* + * try to allocate pages from the per-thread private_pages pool. No + * locking is needed: this list can only be modified by the thread + * itself, and not by interrupts or other threads. + */ + perthread_pages = get_per_thread_pages(); + if (!in_interrupt() && !list_empty(perthread_pages)) { + struct page *page; + + page = list_entry(perthread_pages->next, struct page, lru); + list_del(&page->lru); + current->private_pages_count--; + /* + * per-thread page is already initialized, just return it. + */ + return page; + } else + return NULL; +} + /* * Really, prep_compound_page() should be called from __rmqueue_bulk(). But * we cheat by calling it from here, in the order > 0 path. Saves a branch @@ -667,6 +758,12 @@ __alloc_pages(unsigned int gfp_mask, uns */ can_try_harder = (unlikely(rt_task(p)) && !in_interrupt()) || !wait; + if (order == 0) { + page = perthread_pages_alloc(); + if (page != NULL) + return page; + } + zones = zonelist->zones; /* the list of zones suitable for gfp_mask */ if (unlikely(zones[0] == NULL)) { _ ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] per thread page reservation patch 2005-01-07 17:16 ` [RFC] per thread page reservation patch Vladimir Saveliev @ 2005-01-07 18:48 ` Andrew Morton 2005-01-07 20:21 ` Nikita Danilov 2005-01-07 19:05 ` Christoph Hellwig 2005-01-07 19:14 ` Paulo Marques 2 siblings, 1 reply; 27+ messages in thread From: Andrew Morton @ 2005-01-07 18:48 UTC (permalink / raw) To: Vladimir Saveliev; +Cc: linux-mm, linux-kernel, hch Vladimir Saveliev <vs@namesys.com> wrote: > > +int perthread_pages_reserve(int nrpages, int gfp) > +{ > + int i; > + struct list_head accumulator; > + struct list_head *per_thread; > + > + per_thread = get_per_thread_pages(); > + INIT_LIST_HEAD(&accumulator); > + list_splice_init(per_thread, &accumulator); > + for (i = 0; i < nrpages; ++i) { This will end up reserving more pages than were asked for, if current->private_pages_count is non-zero. Deliberate? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] per thread page reservation patch 2005-01-07 18:48 ` Andrew Morton @ 2005-01-07 20:21 ` Nikita Danilov 0 siblings, 0 replies; 27+ messages in thread From: Nikita Danilov @ 2005-01-07 20:21 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-mm, linux-kernel, hch Andrew Morton <akpm@osdl.org> writes: > Vladimir Saveliev <vs@namesys.com> wrote: >> >> +int perthread_pages_reserve(int nrpages, int gfp) >> +{ >> + int i; >> + struct list_head accumulator; >> + struct list_head *per_thread; >> + >> + per_thread = get_per_thread_pages(); >> + INIT_LIST_HEAD(&accumulator); >> + list_splice_init(per_thread, &accumulator); >> + for (i = 0; i < nrpages; ++i) { > > This will end up reserving more pages than were asked for, if > current->private_pages_count is non-zero. Deliberate? Yes. This is to make modular usage possible, so that perthread_pages_reserve(nrpages, gfp_mask); /* call some other code... */ perthread_pages_release(unused_pages); works correctly if "some other code" does per-thread reservations too. Nikita. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] per thread page reservation patch 2005-01-07 17:16 ` [RFC] per thread page reservation patch Vladimir Saveliev 2005-01-07 18:48 ` Andrew Morton @ 2005-01-07 19:05 ` Christoph Hellwig 2005-01-07 19:12 ` Christoph Hellwig ` (2 more replies) 2005-01-07 19:14 ` Paulo Marques 2 siblings, 3 replies; 27+ messages in thread From: Christoph Hellwig @ 2005-01-07 19:05 UTC (permalink / raw) To: Vladimir Saveliev; +Cc: linux-mm, Andrew Morton, linux-kernel > diff -puN include/linux/gfp.h~reiser4-perthread-pages include/linux/gfp.h > --- linux-2.6.10-rc3/include/linux/gfp.h~reiser4-perthread-pages 2004-12-22 20:09:44.153164276 +0300 > +++ linux-2.6.10-rc3-vs/include/linux/gfp.h 2004-12-22 20:09:44.199167794 +0300 > @@ -134,4 +134,8 @@ extern void FASTCALL(free_cold_page(stru > > void page_alloc_init(void); > > +int perthread_pages_reserve(int nrpages, int gfp); > +void perthread_pages_release(int nrpages); > +int perthread_pages_count(void); > + > #endif /* __LINUX_GFP_H */ > diff -puN include/linux/init_task.h~reiser4-perthread-pages include/linux/init_task.h > --- linux-2.6.10-rc3/include/linux/init_task.h~reiser4-perthread-pages 2004-12-22 20:09:44.160164812 +0300 > +++ linux-2.6.10-rc3-vs/include/linux/init_task.h 2004-12-22 20:09:44.201167947 +0300 > @@ -112,8 +112,8 @@ extern struct group_info init_groups; > .proc_lock = SPIN_LOCK_UNLOCKED, \ > .switch_lock = SPIN_LOCK_UNLOCKED, \ > .journal_info = NULL, \ > + .private_pages = LIST_HEAD_INIT(tsk.private_pages), \ > + .private_pages_count = 0, \ > } > > - > - > #endif > diff -puN include/linux/sched.h~reiser4-perthread-pages include/linux/sched.h > --- linux-2.6.10-rc3/include/linux/sched.h~reiser4-perthread-pages 2004-12-22 20:09:44.170165576 +0300 > +++ linux-2.6.10-rc3-vs/include/linux/sched.h 2004-12-22 20:09:44.204168176 +0300 > @@ -691,6 +691,9 @@ struct task_struct { > nodemask_t mems_allowed; > int cpuset_mems_generation; > #endif > + > + struct list_head private_pages; /* per-process private pages */ > + int private_pages_count; > }; > > static inline pid_t process_group(struct task_struct *tsk) > diff -puN kernel/fork.c~reiser4-perthread-pages kernel/fork.c > --- linux-2.6.10-rc3/kernel/fork.c~reiser4-perthread-pages 2004-12-22 20:09:44.179166264 +0300 > +++ linux-2.6.10-rc3-vs/kernel/fork.c 2004-12-22 20:09:44.215169017 +0300 > @@ -154,6 +154,10 @@ static struct task_struct *dup_task_stru > tsk->thread_info = ti; > ti->task = tsk; > > + /* initialize list of pages privately reserved by process */ > + INIT_LIST_HEAD(&tsk->private_pages); > + tsk->private_pages_count = 0; > + > /* One for us, one for whoever does the "release_task()" (usually parent) */ > atomic_set(&tsk->usage,2); > return tsk; > diff -puN mm/page_alloc.c~reiser4-perthread-pages mm/page_alloc.c > --- linux-2.6.10-rc3/mm/page_alloc.c~reiser4-perthread-pages 2004-12-22 20:09:44.187166876 +0300 > +++ linux-2.6.10-rc3-vs/mm/page_alloc.c 2004-12-22 20:09:44.219169323 +0300 > @@ -551,6 +551,97 @@ void fastcall free_cold_page(struct page > free_hot_cold_page(page, 1); > } > > +static inline struct list_head *get_per_thread_pages(void) > +{ > + return ¤t->private_pages; > +} this is a completely useless wrapper that doesn't buy anything. > +int perthread_pages_reserve(int nrpages, int gfp) > +{ > + int i; > + struct list_head accumulator; > + struct list_head *per_thread; > + > + per_thread = get_per_thread_pages(); > + INIT_LIST_HEAD(&accumulator); > + list_splice_init(per_thread, &accumulator); Can probably be written much simpler as: int perthread_pages_reserve(int nrpages, int gfp) { LIST_HEAD(accumulator); int i; list_splice_init(¤t->private_pages, &accumulator); Now the big question is, what's synchronizing access to current->private_pages? > + for (i = 0; i < nrpages; ++i) { > + struct page *page; > + > + page = alloc_page(gfp); > + if (page != NULL) > + list_add(&page->lru, &accumulator); > + else { > + for (; i > 0; --i) { > + page = list_entry(accumulator.next, struct page, lru); > + list_del(&page->lru); > + page_cache_release(page); > + } > + return -ENOMEM; > + } > + } Would be much more readable with the error condition handling at the end of the function and a goto. > +int perthread_pages_count(void) > +{ > + return current->private_pages_count; > +} > +EXPORT_SYMBOL(perthread_pages_count); Again a completely useless wrapper. > +static inline struct page * > +perthread_pages_alloc(void) > +{ > + struct list_head *perthread_pages; > + > + /* > + * try to allocate pages from the per-thread private_pages pool. No > + * locking is needed: this list can only be modified by the thread > + * itself, and not by interrupts or other threads. > + */ > + perthread_pages = get_per_thread_pages(); > + if (!in_interrupt() && !list_empty(perthread_pages)) { > + struct page *page; > + > + page = list_entry(perthread_pages->next, struct page, lru); > + list_del(&page->lru); > + current->private_pages_count--; > + /* > + * per-thread page is already initialized, just return it. > + */ > + return page; > + } else > + return NULL; > +} Would be written much cleaner with a single return statement and the comment above the function, ala: /* * try to allocate pages from the per-thread private_pages pool. No * locking is needed: this list can only be modified by the thread * itself, and not by interrupts or other threads. */ static inline struct page *perthread_pages_alloc(void) { struct list_head *perthread_pages = ¤t->private_pages; struct page *page = NULL; if (!in_interrupt() && !list_empty(perthread_pages)) { page = list_entry(perthread_pages->next, struct page, lru); list_del(&page->lru); current->private_pages_count--; } return page; } > + if (order == 0) { > + page = perthread_pages_alloc(); > + if (page != NULL) > + return page; > + } > + You should at least move the !in_interrupt() condition out here. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] per thread page reservation patch 2005-01-07 19:05 ` Christoph Hellwig @ 2005-01-07 19:12 ` Christoph Hellwig 2005-01-07 19:21 ` Robert Love 2005-01-07 20:48 ` Nikita Danilov 2 siblings, 0 replies; 27+ messages in thread From: Christoph Hellwig @ 2005-01-07 19:12 UTC (permalink / raw) To: Christoph Hellwig, Vladimir Saveliev, linux-mm, Andrew Morton, linux-kernel > Now the big question is, what's synchronizing access to > current->private_pages? Umm, it's obviously correct as we're thread-local. Objection taken back :) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] per thread page reservation patch 2005-01-07 19:05 ` Christoph Hellwig 2005-01-07 19:12 ` Christoph Hellwig @ 2005-01-07 19:21 ` Robert Love 2005-01-07 20:48 ` Nikita Danilov 2 siblings, 0 replies; 27+ messages in thread From: Robert Love @ 2005-01-07 19:21 UTC (permalink / raw) To: Christoph Hellwig Cc: Vladimir Saveliev, linux-mm, Andrew Morton, linux-kernel On Fri, 2005-01-07 at 19:05 +0000, Christoph Hellwig wrote: > int perthread_pages_reserve(int nrpages, int gfp) > { > LIST_HEAD(accumulator); > int i; > > list_splice_init(¤t->private_pages, &accumulator); > > Now the big question is, what's synchronizing access to > current->private_pages? Safe without locks so long as there is no other way to get at another process's private_pages. ;) Best, Robert Love -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] per thread page reservation patch 2005-01-07 19:05 ` Christoph Hellwig 2005-01-07 19:12 ` Christoph Hellwig 2005-01-07 19:21 ` Robert Love @ 2005-01-07 20:48 ` Nikita Danilov 2005-01-07 20:54 ` Christoph Hellwig 2 siblings, 1 reply; 27+ messages in thread From: Nikita Danilov @ 2005-01-07 20:48 UTC (permalink / raw) To: Christoph Hellwig Cc: Vladimir Saveliev, linux-mm, Andrew Morton, linux-kernel Christoph Hellwig <hch@infradead.org> writes: >> diff -puN include/linux/gfp.h~reiser4-perthread-pages include/linux/gfp.h >> --- linux-2.6.10-rc3/include/linux/gfp.h~reiser4-perthread-pages 2004-12-22 20:09:44.153164276 +0300 [...] > >> +int perthread_pages_count(void) >> +{ >> + return current->private_pages_count; >> +} >> +EXPORT_SYMBOL(perthread_pages_count); > > Again a completely useless wrapper. I disagree. Patch introduces explicit API int perthread_pages_reserve(int nrpages, int gfp); void perthread_pages_release(int nrpages); int perthread_pages_count(void); sufficient to create and use per-thread reservations. Using current->private_pages_count directly - makes API less uniform, not contained within single namespace (perthread_pages_*), and worse, - exhibits internal implementation detail to the user. > [...] Nikita. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] per thread page reservation patch 2005-01-07 20:48 ` Nikita Danilov @ 2005-01-07 20:54 ` Christoph Hellwig 2005-01-07 21:00 ` Nikita Danilov 0 siblings, 1 reply; 27+ messages in thread From: Christoph Hellwig @ 2005-01-07 20:54 UTC (permalink / raw) To: Nikita Danilov; +Cc: Vladimir Saveliev, Andrew Morton, linux-kernel, linux-mm On Fri, Jan 07, 2005 at 11:48:58PM +0300, Nikita Danilov wrote: > sufficient to create and use per-thread reservations. Using > current->private_pages_count directly > > - makes API less uniform, not contained within single namespace > (perthread_pages_*), and worse, > > - exhibits internal implementation detail to the user. Completely disagreed, hiding all the details doesn't help for such trivial access, it's just obsfucating things. But looking at it the API doesn't make sense at all. Number of pages in the thread-pool is an internal implementation detail and no caller must look at it - think about two callers, e.g. filesystem and iscsi initiator using it in the same thread. Here's an updated patch with my suggestions implemented and other goodies such a kerneldoc comments (but completely untested so far): --- 1.20/include/linux/gfp.h 2005-01-05 18:30:39 +01:00 +++ edited/include/linux/gfp.h 2005-01-07 20:30:20 +01:00 @@ -130,6 +130,9 @@ #define __free_page(page) __free_pages((page), 0) #define free_page(addr) free_pages((addr),0) +extern int perthread_pages_reserve(unsigned int nrpages, unsigned int gfp_mask); +extern void perthread_pages_release(unsigned int nrpages); + void page_alloc_init(void); #endif /* __LINUX_GFP_H */ --- 1.33/include/linux/init_task.h 2005-01-05 03:48:20 +01:00 +++ edited/include/linux/init_task.h 2005-01-07 20:33:25 +01:00 @@ -112,6 +112,7 @@ .proc_lock = SPIN_LOCK_UNLOCKED, \ .switch_lock = SPIN_LOCK_UNLOCKED, \ .journal_info = NULL, \ + .private_pages = LIST_HEAD_INIT(tsk.private_pages), \ } --- 1.280/include/linux/sched.h 2005-01-05 03:48:20 +01:00 +++ edited/include/linux/sched.h 2005-01-07 20:32:44 +01:00 @@ -658,6 +658,7 @@ /* VM state */ struct reclaim_state *reclaim_state; + struct list_head private_pages; /* per-process private pages */ struct dentry *proc_dentry; struct backing_dev_info *backing_dev_info; --- 1.229/kernel/fork.c 2005-01-05 03:48:20 +01:00 +++ edited/kernel/fork.c 2005-01-07 20:34:04 +01:00 @@ -153,6 +153,9 @@ tsk->thread_info = ti; ti->task = tsk; + /* initialize list of pages privately reserved by process */ + INIT_LIST_HEAD(&tsk->private_pages); + /* One for us, one for whoever does the "release_task()" (usually parent) */ atomic_set(&tsk->usage,2); return tsk; --- 1.249/mm/page_alloc.c 2005-01-05 18:32:52 +01:00 +++ edited/mm/page_alloc.c 2005-01-07 20:36:34 +01:00 @@ -641,6 +641,86 @@ return 1; } +/** + * perthread_pages_reserve -- reserve some pages for this thread + * @nrpages: number of pages to reserve + * @gfp_mask: constaints passed to alloc_page() + * + * This functions allocates @nrpages into a thread-local pool that + * __alloc_pages() will use. Used to support operations that cannot + * tolerate memory allocation failures. + */ +int perthread_pages_reserve(unsigned int nrpages, unsigned int gfp_mask) +{ + LIST_HEAD(accumulator); + int i; + + list_splice_init(¤t->private_pages, &accumulator); + + for (i = 0; i < nrpages; ++i) { + struct page *page = alloc_page(gfp_mask); + + if (!page) + goto fail_free_pages; + list_add(&page->lru, &accumulator); + } + + /* + * Q: why @accumulator is used, instead of directly adding pages to + * current->private_pages? + * + * A: because after first page is added to the current->private_pages + * next call to the alloc_page() (on the next loop iteration), will + * re-use it. + */ + list_splice(&accumulator, ¤t->private_pages); + return 0; + fail_free_pages: + perthread_pages_release(i); + return -ENOMEM; +} +EXPORT_SYMBOL_GPL(perthread_pages_reserve); + +/** + * perthread_pages_release -- release pages from thread-local pool + * @nrpages: number of pages to release + * + * This functions release @nrpages that we allocated to a thread-local + * pool using perthread_pages_reserve(). + */ +void perthread_pages_release(unsigned int nrpages) +{ + struct list_head *perthread_pages = ¤t->private_pages; + + for (; nrpages; --nrpages) { + struct page *page; + + BUG_ON(list_empty(perthread_pages)); + page = list_entry(perthread_pages->next, struct page, lru); + list_del(&page->lru); + page_cache_release(page); + } +} +EXPORT_SYMBOL_GPL(perthread_pages_release); + +/* + * Try to allocate pages from the per-thread private_pages pool. No + * locking is needed: this list can only be modified by the thread + * itself, and not by interrupts or other threads. + */ +static inline struct page *__alloc_perthread_page(void) +{ + struct list_head *perthread_pages = ¤t->private_pages; + struct page *page = NULL; + + if (!list_empty(perthread_pages)) { + page = list_entry(perthread_pages->next, struct page, lru); + list_del(&page->lru); + } + + return page; +} + /* * This is the 'heart' of the zoned buddy allocator. * @@ -672,6 +752,12 @@ int can_try_harder; might_sleep_if(wait); + + if (order == 0 && !in_interrupt()) { + page = __alloc_perthread_page(); + if (page) + return page; + } /* * The caller may dip into page reserves a bit more if the caller -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] per thread page reservation patch 2005-01-07 20:54 ` Christoph Hellwig @ 2005-01-07 21:00 ` Nikita Danilov 2005-01-07 21:07 ` Christoph Hellwig 0 siblings, 1 reply; 27+ messages in thread From: Nikita Danilov @ 2005-01-07 21:00 UTC (permalink / raw) To: Christoph Hellwig Cc: Vladimir Saveliev, Andrew Morton, linux-kernel, linux-mm Christoph Hellwig <hch@infradead.org> writes: > On Fri, Jan 07, 2005 at 11:48:58PM +0300, Nikita Danilov wrote: >> sufficient to create and use per-thread reservations. Using >> current->private_pages_count directly >> >> - makes API less uniform, not contained within single namespace >> (perthread_pages_*), and worse, >> >> - exhibits internal implementation detail to the user. > > Completely disagreed, hiding all the details doesn't help for such > trivial access, it's just obsfucating things. > > But looking at it the API doesn't make sense at all. Number of pages > in the thread-pool is an internal implementation detail and no caller > must look at it - think about two callers, e.g. filesystem and iscsi > initiator using it in the same thread. > > Here's an updated patch with my suggestions implemented and other goodies > such a kerneldoc comments (but completely untested so far): > > > --- 1.20/include/linux/gfp.h 2005-01-05 18:30:39 +01:00 > +++ edited/include/linux/gfp.h 2005-01-07 20:30:20 +01:00 > @@ -130,6 +130,9 @@ > #define __free_page(page) __free_pages((page), 0) > #define free_page(addr) free_pages((addr),0) > > +extern int perthread_pages_reserve(unsigned int nrpages, unsigned int gfp_mask); > +extern void perthread_pages_release(unsigned int nrpages); I don't see how this can work. /* make conservative estimation... */ perthread_pages_reserve(100, GFP_KERNEL); /* actually, use only 10 pages during atomic operation */ /* want to release remaining 90 pages... */ perthread_pages_release(???); Can you elaborate how to calculate number of pages that has to be released? Nikita. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] per thread page reservation patch 2005-01-07 21:00 ` Nikita Danilov @ 2005-01-07 21:07 ` Christoph Hellwig 0 siblings, 0 replies; 27+ messages in thread From: Christoph Hellwig @ 2005-01-07 21:07 UTC (permalink / raw) To: Nikita Danilov Cc: Christoph Hellwig, Vladimir Saveliev, Andrew Morton, linux-kernel, linux-mm On Sat, Jan 08, 2005 at 12:00:00AM +0300, Nikita Danilov wrote: > I don't see how this can work. > > /* make conservative estimation... */ > perthread_pages_reserve(100, GFP_KERNEL); > > /* actually, use only 10 pages during atomic operation */ > > /* want to release remaining 90 pages... */ > perthread_pages_release(???); > > Can you elaborate how to calculate number of pages that has to be > released? That's a good question actually, and it's true for the original version aswell if we want (and to make the API useful we must) make it useable for multiple callers in the same thread. Assuming the callers strictly nest we can indeed just go back to the orininal count, maybe with an API like: int perthread_pages_reserve(unsigned int nrpages, unsigned int gfp_mask); void perthread_pages_release(unsigned int nrpages_goal); unsigned int old_nr_pages; ... old_nr_pages = perthread_pages_reserve(100, GFP_KERNEL); ... perthread_pages_release(old_nr_pages); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] per thread page reservation patch 2005-01-07 17:16 ` [RFC] per thread page reservation patch Vladimir Saveliev 2005-01-07 18:48 ` Andrew Morton 2005-01-07 19:05 ` Christoph Hellwig @ 2005-01-07 19:14 ` Paulo Marques 2005-01-07 19:32 ` Christoph Hellwig 2005-01-07 20:55 ` Nikita Danilov 2 siblings, 2 replies; 27+ messages in thread From: Paulo Marques @ 2005-01-07 19:14 UTC (permalink / raw) To: Vladimir Saveliev Cc: linux-mm, Andrew Morton, linux-kernel, Christoph Hellwig Vladimir Saveliev wrote: > [...] > + if (order == 0) { > + page = perthread_pages_alloc(); > + if (page != NULL) > + return page; > + } I hope this has not been extensively discussed yet, and I missed the thread but, does everybody think this is a good thing? This seems like a very asymmetrical behavior. If the code explicitly reserves pages, it should explicitly use them, or it will become impossible to track down who is using what (not to mention that this will slow down every regular user of __alloc_pages, even if it is just for a quick test). Why are there specialized functions to reserve the pages, but then they are used through the standard __alloc_pages interface? At the very least this test should be moved to the very beginning of the function. It is of no use to calculate "can_try_harder" before running this code if it will use a reserved page. Having a specialized function to get the reserved pages, would also make the logic in "perthread_pages_reserve" more clear (i.e., that comment would become unnecessary), and lose the test to "in_interrupt()" in "perthread_pages_alloc", if I'm reading this correctly. -- Paulo Marques - www.grupopie.com "A journey of a thousand miles begins with a single step." Lao-tzu, The Way of Lao-tzu -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] per thread page reservation patch 2005-01-07 19:14 ` Paulo Marques @ 2005-01-07 19:32 ` Christoph Hellwig 2005-01-07 19:42 ` Andi Kleen 2005-01-07 20:55 ` Nikita Danilov 1 sibling, 1 reply; 27+ messages in thread From: Christoph Hellwig @ 2005-01-07 19:32 UTC (permalink / raw) To: Paulo Marques Cc: Vladimir Saveliev, linux-mm, Andrew Morton, linux-kernel, Christoph Hellwig On Fri, Jan 07, 2005 at 07:14:15PM +0000, Paulo Marques wrote: > This seems like a very asymmetrical behavior. If the code explicitly > reserves pages, it should explicitly use them, or it will become > impossible to track down who is using what (not to mention that this > will slow down every regular user of __alloc_pages, even if it is just > for a quick test). > > Why are there specialized functions to reserve the pages, but then they > are used through the standard __alloc_pages interface? That seems to be the whole point of the patch, as that way it'll serve all sub-allocators or kernel function called by the user. Without this behaviour the caller could have simply used a mempool. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] per thread page reservation patch 2005-01-07 19:32 ` Christoph Hellwig @ 2005-01-07 19:42 ` Andi Kleen 0 siblings, 0 replies; 27+ messages in thread From: Andi Kleen @ 2005-01-07 19:42 UTC (permalink / raw) To: Christoph Hellwig, Paulo Marques, Vladimir Saveliev, linux-mm, Andrew Morton, linux-kernel On Fri, Jan 07, 2005 at 07:32:40PM +0000, Christoph Hellwig wrote: > On Fri, Jan 07, 2005 at 07:14:15PM +0000, Paulo Marques wrote: > > This seems like a very asymmetrical behavior. If the code explicitly > > reserves pages, it should explicitly use them, or it will become > > impossible to track down who is using what (not to mention that this > > will slow down every regular user of __alloc_pages, even if it is just > > for a quick test). > > > > Why are there specialized functions to reserve the pages, but then they > > are used through the standard __alloc_pages interface? > > That seems to be the whole point of the patch, as that way it'll serve > all sub-allocators or kernel function called by the user. Without this > behaviour the caller could have simply used a mempool. I still don't get it why reiserfs can't put the preallocated pool somewhere private like all other users who do similar things do too (and yes there are quite a lot of subsystems who do preallocation) Why pollute task_struct for this? -Andi -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] per thread page reservation patch 2005-01-07 19:14 ` Paulo Marques 2005-01-07 19:32 ` Christoph Hellwig @ 2005-01-07 20:55 ` Nikita Danilov 2005-01-07 21:24 ` Andrew Morton 1 sibling, 1 reply; 27+ messages in thread From: Nikita Danilov @ 2005-01-07 20:55 UTC (permalink / raw) To: Paulo Marques; +Cc: linux-mm, Andrew Morton, linux-kernel, Christoph Hellwig Paulo Marques <pmarques@grupopie.com> writes: [...] > > This seems like a very asymmetrical behavior. If the code explicitly > reserves pages, it should explicitly use them, or it will become > impossible to track down who is using what (not to mention that this > will slow down every regular user of __alloc_pages, even if it is just > for a quick test). > > Why are there specialized functions to reserve the pages, but then they > are used through the standard __alloc_pages interface? That's the whole idea behind this patch: at the beginning of "atomic" operation, some number of pages is reserved. As these pages are available through page allocator, _all_ allocations done by atomic operation will use reserved pages transparently. For example: perthread_pages_reserve(nr, GFP_KERNEL); foo()-> bar()-> page = find_or_create_page(some_mapping, ...); perthread_pages_release(unused_pages); find_or_create() pages will use pages reserved for this thread and, hence, is guaranteed to succeed (given correct reservation). Alternative is to pass some sort of handle all the way down to actual calls to allocator, and to modify all generic code to use reservations. Nikita. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] per thread page reservation patch 2005-01-07 20:55 ` Nikita Danilov @ 2005-01-07 21:24 ` Andrew Morton 2005-01-07 21:24 ` Andi Kleen ` (2 more replies) 0 siblings, 3 replies; 27+ messages in thread From: Andrew Morton @ 2005-01-07 21:24 UTC (permalink / raw) To: Nikita Danilov; +Cc: pmarques, linux-mm, linux-kernel, hch Nikita Danilov <nikita@clusterfs.com> wrote: > > That's the whole idea behind this patch: at the beginning of "atomic" > operation, some number of pages is reserved. As these pages are > available through page allocator, _all_ allocations done by atomic > operation will use reserved pages transparently. For example: > > perthread_pages_reserve(nr, GFP_KERNEL); > > foo()-> > > bar()-> > > page = find_or_create_page(some_mapping, ...); > > perthread_pages_release(unused_pages); > > find_or_create() pages will use pages reserved for this thread and, > hence, is guaranteed to succeed (given correct reservation). > > Alternative is to pass some sort of handle all the way down to actual > calls to allocator, and to modify all generic code to use reservations. Maybe I'm being thick, but I don't see how you can protect the reservation of an outer reserver in the above way: perthread_pages_reserve(10); ... /* current->private_pages_count = 10 */ perthread_pages_reserve(10) /* private_pages_count = 20 */ use 5 pages /* private_pages_count = 15 */ perthread_pages_release(5); But how does the caller come up with the final "5"? Seems better to me if prethread_pages_reserve() were to return the initial value of private_pages_count, so the caller can do: old = perthread_pages_reserve(10); use 5 pages perthread_pages_release(old); or whatever. That kinda stinks too in a way, because both the outer and the inner callers need to overallocate pages on behalf of the worst case user in some deep call stack. And the whole idea is pretty flaky really - how can one precalculate how much memory an arbitrary md-on-dm-on-loop-on-md-on-NBD stack will want to use? It really would be better if we could drop the whole patch and make reiser4 behave more sanely when its writepage is called with for_reclaim=1. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] per thread page reservation patch 2005-01-07 21:24 ` Andrew Morton @ 2005-01-07 21:24 ` Andi Kleen 2005-01-07 22:12 ` Nikita Danilov 2005-01-25 16:39 ` reiser4 core patches: [Was: [RFC] per thread page reservation patch] Vladimir Saveliev 2 siblings, 0 replies; 27+ messages in thread From: Andi Kleen @ 2005-01-07 21:24 UTC (permalink / raw) To: Andrew Morton; +Cc: Nikita Danilov, pmarques, linux-mm, linux-kernel, hch > And the whole idea is pretty flaky really - how can one precalculate how > much memory an arbitrary md-on-dm-on-loop-on-md-on-NBD stack will want to > use? It really would be better if we could drop the whole patch and make > reiser4 behave more sanely when its writepage is called with for_reclaim=1. Or just preallocate into a private array. -Andi -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] per thread page reservation patch 2005-01-07 21:24 ` Andrew Morton 2005-01-07 21:24 ` Andi Kleen @ 2005-01-07 22:12 ` Nikita Danilov 2005-01-07 23:03 ` Andrew Morton 2005-01-25 16:39 ` reiser4 core patches: [Was: [RFC] per thread page reservation patch] Vladimir Saveliev 2 siblings, 1 reply; 27+ messages in thread From: Nikita Danilov @ 2005-01-07 22:12 UTC (permalink / raw) To: Andrew Morton; +Cc: pmarques, linux-mm, linux-kernel, hch Andrew Morton <akpm@osdl.org> writes: [...] > > Maybe I'm being thick, but I don't see how you can protect the reservation > of an outer reserver in the above way: > > perthread_pages_reserve(10); > ... /* current->private_pages_count = 10 */ > perthread_pages_reserve(10) /* private_pages_count = 20 */ > use 5 pages /* private_pages_count = 15 */ > perthread_pages_release(5); > > But how does the caller come up with the final "5"? wasreserved = perthread_pages_count(); result = perthread_pages_reserve(estimate_reserve(), GFP_KERNEL); if (result != 0) return result; /* do something that consumes reservation */ perthread_pages_release(perthread_pages_count() - wasreserved); > > Seems better to me if prethread_pages_reserve() were to return the initial > value of private_pages_count, so the caller can do: > > old = perthread_pages_reserve(10); > use 5 pages > perthread_pages_release(old); > > or whatever. > > That kinda stinks too in a way, because both the outer and the inner > callers need to overallocate pages on behalf of the worst case user in some > deep call stack. > > And the whole idea is pretty flaky really - how can one precalculate how > much memory an arbitrary md-on-dm-on-loop-on-md-on-NBD stack will want to > use? It really would be better if we could drop the whole patch and make > reiser4 behave more sanely when its writepage is called with for_reclaim=1. Reiser4 doesn't use this for ->writepage(), by the way. This is used by tree balancing code to assure that balancing cannot get -ENOMEM in the middle of tree modification, because undo is _so_ very complicated. Nikita. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] per thread page reservation patch 2005-01-07 22:12 ` Nikita Danilov @ 2005-01-07 23:03 ` Andrew Morton 2005-01-07 23:17 ` Nikita Danilov 0 siblings, 1 reply; 27+ messages in thread From: Andrew Morton @ 2005-01-07 23:03 UTC (permalink / raw) To: Nikita Danilov; +Cc: pmarques, linux-mm, linux-kernel, hch Nikita Danilov <nikita@clusterfs.com> wrote: > > > And the whole idea is pretty flaky really - how can one precalculate how > > much memory an arbitrary md-on-dm-on-loop-on-md-on-NBD stack will want to > > use? It really would be better if we could drop the whole patch and make > > reiser4 behave more sanely when its writepage is called with for_reclaim=1. > > Reiser4 doesn't use this for ->writepage(), by the way. This is used by > tree balancing code to assure that balancing cannot get -ENOMEM in the > middle of tree modification, because undo is _so_ very complicated. Oh. And that involves performing I/O, yes? Why does the filesystem risk going oom during the rebalance anyway? Is it doing atomic allocations? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] per thread page reservation patch 2005-01-07 23:03 ` Andrew Morton @ 2005-01-07 23:17 ` Nikita Danilov 2005-01-07 23:43 ` Andrew Morton 0 siblings, 1 reply; 27+ messages in thread From: Nikita Danilov @ 2005-01-07 23:17 UTC (permalink / raw) To: Andrew Morton; +Cc: pmarques, linux-mm, linux-kernel, hch Andrew Morton <akpm@osdl.org> writes: > Nikita Danilov <nikita@clusterfs.com> wrote: >> >> > And the whole idea is pretty flaky really - how can one precalculate how >> > much memory an arbitrary md-on-dm-on-loop-on-md-on-NBD stack will want to >> > use? It really would be better if we could drop the whole patch and make >> > reiser4 behave more sanely when its writepage is called with for_reclaim=1. >> >> Reiser4 doesn't use this for ->writepage(), by the way. This is used by >> tree balancing code to assure that balancing cannot get -ENOMEM in the >> middle of tree modification, because undo is _so_ very complicated. > > Oh. And that involves performing I/O, yes? Yes, balancing may read tree or bitmap nodes from the disk. > > Why does the filesystem risk going oom during the rebalance anyway? Is it > doing atomic allocations? No, just __alloc_pages(GFP_KERNEL, 0, ...) returns NULL. When this happens, the only thing balancing can do is to panic. Nikita. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] per thread page reservation patch 2005-01-07 23:17 ` Nikita Danilov @ 2005-01-07 23:43 ` Andrew Morton 2005-01-08 12:44 ` Nikita Danilov 2005-01-09 11:35 ` Marcelo Tosatti 0 siblings, 2 replies; 27+ messages in thread From: Andrew Morton @ 2005-01-07 23:43 UTC (permalink / raw) To: Nikita Danilov; +Cc: pmarques, linux-mm, linux-kernel, hch Nikita Danilov <nikita@clusterfs.com> wrote: > > > > > Why does the filesystem risk going oom during the rebalance anyway? Is it > > doing atomic allocations? > > No, just __alloc_pages(GFP_KERNEL, 0, ...) returns NULL. When this > happens, the only thing balancing can do is to panic. __alloc_pages(GFP_KERNEL, ...) doesn't return NULL. It'll either succeed or never return ;) That behaviour may change at any time of course, but it does make me wonder why we're bothering with this at all. Maybe it's because of the possibility of a GFP_IO failure under your feet or something? What happens if reiser4 simply doesn't use this code? If we introduce this mechanism, people will end up using it all over the place. Probably we could remove radix_tree_preload(), which is the only similar code I can I can immediately think of. Page reservation is not a bad thing per-se, but it does need serious thought. How does reiser4 end up deciding how many pages to reserve? Gross overkill? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] per thread page reservation patch 2005-01-07 23:43 ` Andrew Morton @ 2005-01-08 12:44 ` Nikita Danilov 2005-01-08 13:43 ` Hugh Dickins 2005-01-09 11:35 ` Marcelo Tosatti 1 sibling, 1 reply; 27+ messages in thread From: Nikita Danilov @ 2005-01-08 12:44 UTC (permalink / raw) To: Andrew Morton; +Cc: pmarques, linux-mm, linux-kernel, hch Andrew Morton <akpm@osdl.org> writes: > Nikita Danilov <nikita@clusterfs.com> wrote: >> >> > >> > Why does the filesystem risk going oom during the rebalance anyway? Is it >> > doing atomic allocations? >> >> No, just __alloc_pages(GFP_KERNEL, 0, ...) returns NULL. When this >> happens, the only thing balancing can do is to panic. > > __alloc_pages(GFP_KERNEL, ...) doesn't return NULL. It'll either succeed > or never return ;) That behaviour may change at any time of course, but it Hmm... it used to, when I wrote that code. > does make me wonder why we're bothering with this at all. Maybe it's > because of the possibility of a GFP_IO failure under your feet or > something? This is what happens: - we start inserting new item into balanced tree, - lock nodes on the leaf level and modify them - go to the parent level - lock nodes on the parent level and modify them. This may require allocating new nodes. If allocation fails---we have to panic, because tree is in inconsistent state and there is no roll-back; if allocation hangs forever---deadlock is on its way, because we are still keeping locks on nodes on the leaf level. > > What happens if reiser4 simply doesn't use this code? At the time I tested it, it panicked after getting NULL from __alloc_pages(). With current `do_retry' logic in __alloc_pages() it will deadlock, I guess. > > > If we introduce this mechanism, people will end up using it all over the > place. Probably we could remove radix_tree_preload(), which is the only > similar code I can I can immediately think of. > > Page reservation is not a bad thing per-se, but it does need serious > thought. > > How does reiser4 end up deciding how many pages to reserve? Gross > overkill? Worst-case behavior of tree algorithms is well-known. Yes it's overkill. Nikita. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] per thread page reservation patch 2005-01-08 12:44 ` Nikita Danilov @ 2005-01-08 13:43 ` Hugh Dickins 0 siblings, 0 replies; 27+ messages in thread From: Hugh Dickins @ 2005-01-08 13:43 UTC (permalink / raw) To: Nikita Danilov; +Cc: Andrew Morton, pmarques, linux-mm, linux-kernel, hch On Sat, 8 Jan 2005, Nikita Danilov wrote: > Andrew Morton <akpm@osdl.org> writes: > > > > __alloc_pages(GFP_KERNEL, ...) doesn't return NULL. It'll either succeed > > or never return ;) That behaviour may change at any time of course, but it > > Hmm... it used to, when I wrote that code. And still does, if OOM decides to kill _your_ task: OOM sets PF_MEMDIE, and then you don't get to go the retry route at all: /* This allocation should allow future memory freeing. */ if ((p->flags & (PF_MEMALLOC | PF_MEMDIE)) && !in_interrupt()) { /* go through the zonelist yet again, ignoring mins */ for (i = 0; (z = zones[i]) != NULL; i++) { page = buffered_rmqueue(z, order, gfp_mask); if (page) goto got_pg; } goto nopage; } .... nopage: .... return NULL; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] per thread page reservation patch 2005-01-07 23:43 ` Andrew Morton 2005-01-08 12:44 ` Nikita Danilov @ 2005-01-09 11:35 ` Marcelo Tosatti 2005-01-09 18:16 ` Nikita Danilov 1 sibling, 1 reply; 27+ messages in thread From: Marcelo Tosatti @ 2005-01-09 11:35 UTC (permalink / raw) To: Andrew Morton; +Cc: Nikita Danilov, pmarques, linux-mm, linux-kernel, hch On Fri, Jan 07, 2005 at 03:43:05PM -0800, Andrew Morton wrote: > Nikita Danilov <nikita@clusterfs.com> wrote: > > > > > > > > Why does the filesystem risk going oom during the rebalance anyway? Is it > > > doing atomic allocations? > > > > No, just __alloc_pages(GFP_KERNEL, 0, ...) returns NULL. When this > > happens, the only thing balancing can do is to panic. > > __alloc_pages(GFP_KERNEL, ...) doesn't return NULL. It'll either succeed > or never return ;) That behaviour may change at any time of course, but it > does make me wonder why we're bothering with this at all. Maybe it's > because of the possibility of a GFP_IO failure under your feet or > something? > > What happens if reiser4 simply doesn't use this code? > > > If we introduce this mechanism, people will end up using it all over the > place. Probably we could remove radix_tree_preload(), which is the only > similar code I can I can immediately think of. > > Page reservation is not a bad thing per-se, but it does need serious > thought. Whenever scheme comes up I dont think the current check in __alloc_pages() is any good: if (order == 0) { page = perthread_pages_alloc(); if (page != NULL) return page; } Two things: - all instances of an allocator from the current thread will eat from the perthread reserves, you probably want only a few special allocations to eat from the reserves? Thing is its not really a reservation intended for emergency situations, rather a "generic per-thread pool" the way things are now. - its a real fast path, we're adding quite some instructions there which are only used by reiserfs now. I think in a "final" implementation emergency allocations should be explicitly stated as such by the callers ? > How does reiser4 end up deciding how many pages to reserve? Gross > overkill? > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] per thread page reservation patch 2005-01-09 11:35 ` Marcelo Tosatti @ 2005-01-09 18:16 ` Nikita Danilov 0 siblings, 0 replies; 27+ messages in thread From: Nikita Danilov @ 2005-01-09 18:16 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: Andrew Morton, pmarques, linux-mm, linux-kernel, hch Marcelo Tosatti <marcelo.tosatti@cyclades.com> writes: [...] > > - all instances of an allocator from the current thread will eat from the perthread > reserves, you probably want only a few special allocations to eat from the reserves? > Thing is its not really a reservation intended for emergency situations, > rather a "generic per-thread pool" the way things are now. Per-thread reservations are only useful when they are transparent. If users have to call special function, or to pass special flag to __alloc_pages() they might just use mempool as well. Idea is to reserve some number of pages before starting complex operation so that generic function (like find_get_page()) called as part of this operation are guaranteed to succeed. > > - its a real fast path, we're adding quite some instructions there which are only > used by reiserfs now. Yes, this worries me too. Possibly we should move this check below in __alloc_pages(), so that per-thread reservation is only checked if fast-path allocation failed. Nikita. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* reiser4 core patches: [Was: [RFC] per thread page reservation patch] 2005-01-07 21:24 ` Andrew Morton 2005-01-07 21:24 ` Andi Kleen 2005-01-07 22:12 ` Nikita Danilov @ 2005-01-25 16:39 ` Vladimir Saveliev 2005-01-27 10:37 ` Adrian Bunk 2005-01-27 11:01 ` Christoph Hellwig 2 siblings, 2 replies; 27+ messages in thread From: Vladimir Saveliev @ 2005-01-25 16:39 UTC (permalink / raw) To: Andrew Morton, Christoph Hellwig; +Cc: Nikita Danilov, linux-mm, linux-kernel Hello [per thread page reservation discussion is snipped] > And the whole idea is pretty flaky really - how can one precalculate how > much memory an arbitrary md-on-dm-on-loop-on-md-on-NBD stack will want to > use? It really would be better if we could drop the whole patch and make > reiser4 behave more sanely when its writepage is called with for_reclaim=1. ok, we will change reiser4 to keep its pool of preallocated pages privately as Andi suggested. This will require to have reiser4_find_or_create_page which will do what find_or_create_page does plus get a page from the list of preallocated pages if alloc_page returns NULL. So, currently, reiser4 depends on the core patches listed below. Would you please look over them and let us know which look reasonable and which are to be eliminated. reiser4-sb_sync_inodes.patch This patch adds new operation (sync_inodes) to struct super_operations. This operation allows a filesystem to writeout dirty pages not necessarily on per-inode basis. Default implementation of this operation is sync_sb_inodes. reiser4-allow-drop_inode-implementation.patch This EXPORT_SYMBOL-s inodes_stat, generic_forget_inode, destroy_inode and wake_up_inode which are needed to implement drop_inode. reiser4 implements function similar to generic_delete_inode to be able to truncate inode pages together with metadata destroying in reiser4_delete_inode whereas generic_delete_inode first truncates pages and then calls foofs_delete_inode. reiser4-truncate_inode_pages_range.patch This patch makes truncate_inode_pages_range from truncate_inode_pages. truncate_inode_pages_range can truncate only pages which fall into specified range. truncate_inode_pages which trucates all pages starting from specified offset is made a one liner which calls truncate_inode_pages_range. reiser4-rcu-barrier.patch This patch introduces a new interface - rcu_barrier() which waits until all the RCUs queued until this call have been completed. This patch is by Dipankar Sarma <dipankar@in.ibm.com> reiser4-reget-page-mapping.patch This patch allows to remove page from page cache in foofs_releasepage. reiser4-radix_tree_lookup_slot.patch This patch extents radxi tree API with a function which returns pointer to found item within the tree. reiser4-export-remove_from_page_cache.patch reiser4-export-page_cache_readahead.patch reiser4-export-pagevec-funcs.patch reiser4-export-radix_tree_preload.patch reiser4-export-find_get_pages.patch reiser4-export-generic_sync_sb_inodes.patch The above patches EXPORT_SYMBOL several functions. Others may find it useful if they were exported. reiser4-export-inode_lock.patch Reiser4 used to manipulate with super block inode lists so it needs inode_lock exported. We are working now to not need this. But quite many things are based on it. Is there any chance to have it included? Thanks -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: reiser4 core patches: [Was: [RFC] per thread page reservation patch] 2005-01-25 16:39 ` reiser4 core patches: [Was: [RFC] per thread page reservation patch] Vladimir Saveliev @ 2005-01-27 10:37 ` Adrian Bunk 2005-01-27 11:01 ` Christoph Hellwig 1 sibling, 0 replies; 27+ messages in thread From: Adrian Bunk @ 2005-01-27 10:37 UTC (permalink / raw) To: Vladimir Saveliev Cc: Andrew Morton, Christoph Hellwig, Nikita Danilov, linux-mm, linux-kernel Hi Vladimir, a general request: If you add new EXPORT_SYMBOL's, I'd strongly prefer them being EXPORT_SYMBOL_GPL's unless there's a very good reason for an EXPORT_SYMBOL. cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: reiser4 core patches: [Was: [RFC] per thread page reservation patch] 2005-01-25 16:39 ` reiser4 core patches: [Was: [RFC] per thread page reservation patch] Vladimir Saveliev 2005-01-27 10:37 ` Adrian Bunk @ 2005-01-27 11:01 ` Christoph Hellwig 1 sibling, 0 replies; 27+ messages in thread From: Christoph Hellwig @ 2005-01-27 11:01 UTC (permalink / raw) To: Vladimir Saveliev; +Cc: Andrew Morton, Nikita Danilov, linux-mm, linux-kernel > So, currently, reiser4 depends on the core patches listed below. Would > you please look over them and let us know which look reasonable and > which are to be eliminated. > reiser4-sb_sync_inodes.patch > This patch adds new operation (sync_inodes) to struct super_operations. > This operation allows a filesystem to writeout dirty pages not > necessarily on per-inode basis. Default implementation of this operation > is sync_sb_inodes. ok. > reiser4-allow-drop_inode-implementation.patch > This EXPORT_SYMBOL-s inodes_stat, generic_forget_inode, destroy_inode > and wake_up_inode which are needed to implement drop_inode. > reiser4 implements function similar to generic_delete_inode to be able > to truncate inode pages together with metadata destroying in > reiser4_delete_inode whereas generic_delete_inode first truncates pages > and then calls foofs_delete_inode. Not okay. I though I explained you how to do it instead already? > reiser4-truncate_inode_pages_range.patch > This patch makes truncate_inode_pages_range from truncate_inode_pages. > truncate_inode_pages_range can truncate only pages which fall into > specified range. truncate_inode_pages which trucates all pages starting > from specified offset is made a one liner which calls > truncate_inode_pages_range. Ok. > reiser4-rcu-barrier.patch > This patch introduces a new interface - rcu_barrier() which waits until > all the RCUs queued until this call have been completed. > This patch is by Dipankar Sarma <dipankar@in.ibm.com> No idea, but if Dipankar things it's okay it probably is. > reiser4-reget-page-mapping.patch > This patch allows to remove page from page cache in foofs_releasepage. probably ok, Andrew? > reiser4-radix_tree_lookup_slot.patch > This patch extents radxi tree API with a function which returns pointer > to found item within the tree. Idea is okay, implementation needs work: __lookup_slot should be called radix_tree_lookup_slot and exported direcly. radix_tree_lookup should be an inlined wrapper in the header, and kill the != NULL comparims, it's superflous: static inline void *radix_tree_lookup(struct radix_tree_root *root, unsigned long index) { void **slot = radix_tree_lookup_slot(root, index); return slot ? *slot : NULL; } > reiser4-export-remove_from_page_cache.patch Probably okay, but why do you need both remove_from_page_cache and __remove_from_page_cache? > reiser4-export-page_cache_readahead.patch ok > reiser4-export-pagevec-funcs.patch ok > reiser4-export-radix_tree_preload.patch this one is nasty. not your fault though, but why do we even have global radix-tree preloads instead of pools. What do you need it for? > reiser4-export-find_get_pages.patch Why don't you use pagevecs instead? > reiser4-export-generic_sync_sb_inodes.patch Can't find this one in current -mm > reiser4-export-inode_lock.patch > Reiser4 used to manipulate with super block inode lists so it needs > inode_lock exported. > We are working now to not need this. But quite many things are based on > it. Is there any chance to have it included? No. Modules must not look at inode_lock, it's far too much of an implementation detail. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2005-01-27 11:01 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20050103011113.6f6c8f44.akpm@osdl.org>
[not found] ` <20050103114854.GA18408@infradead.org>
[not found] ` <41DC2386.9010701@namesys.com>
[not found] ` <1105019521.7074.79.camel@tribesman.namesys.com>
[not found] ` <20050107144644.GA9606@infradead.org>
2005-01-07 17:16 ` [RFC] per thread page reservation patch Vladimir Saveliev
2005-01-07 18:48 ` Andrew Morton
2005-01-07 20:21 ` Nikita Danilov
2005-01-07 19:05 ` Christoph Hellwig
2005-01-07 19:12 ` Christoph Hellwig
2005-01-07 19:21 ` Robert Love
2005-01-07 20:48 ` Nikita Danilov
2005-01-07 20:54 ` Christoph Hellwig
2005-01-07 21:00 ` Nikita Danilov
2005-01-07 21:07 ` Christoph Hellwig
2005-01-07 19:14 ` Paulo Marques
2005-01-07 19:32 ` Christoph Hellwig
2005-01-07 19:42 ` Andi Kleen
2005-01-07 20:55 ` Nikita Danilov
2005-01-07 21:24 ` Andrew Morton
2005-01-07 21:24 ` Andi Kleen
2005-01-07 22:12 ` Nikita Danilov
2005-01-07 23:03 ` Andrew Morton
2005-01-07 23:17 ` Nikita Danilov
2005-01-07 23:43 ` Andrew Morton
2005-01-08 12:44 ` Nikita Danilov
2005-01-08 13:43 ` Hugh Dickins
2005-01-09 11:35 ` Marcelo Tosatti
2005-01-09 18:16 ` Nikita Danilov
2005-01-25 16:39 ` reiser4 core patches: [Was: [RFC] per thread page reservation patch] Vladimir Saveliev
2005-01-27 10:37 ` Adrian Bunk
2005-01-27 11:01 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox