* Re: [PATCH 8/8] Per-container pages reclamation [not found] ` <461A397A.8080609@sw.ru> @ 2007-05-17 11:31 ` Balbir Singh 2007-05-21 15:15 ` Pavel Emelianov 0 siblings, 1 reply; 3+ messages in thread From: Balbir Singh @ 2007-05-17 11:31 UTC (permalink / raw) To: Pavel Emelianov Cc: Andrew Morton, Paul Menage, Srivatsa Vaddagiri, Balbir Singh, devel, Linux Kernel Mailing List, Kirill Korotaev, Chandra Seetharaman, Cedric Le Goater, Eric W. Biederman, Rohit Seth, Linux Containers, Linux Memory Management List [-- Attachment #1: Type: text/plain, Size: 2703 bytes --] Pavel Emelianov wrote: > Implement try_to_free_pages_in_container() to free the > pages in container that has run out of memory. > > The scan_control->isolate_pages() function isolates the > container pages only. > > Hi, Pavel/Andrew, I've started running some basic tests like lmbench and LTP vm stress on the RSS controller. With the controller rss_limit set to 256 MB, I saw the following panic on a machine Unable to handle kernel NULL pointer dereference at 000000000000001c RIP: [<ffffffff80328581>] _raw_spin_lock+0xd/0xf6 PGD 3c841067 PUD 5d5d067 PMD 0 Oops: 0000 [1] SMP CPU 2 Modules linked in: ipv6 hidp rfcomm l2cap bluetooth sunrpc video button battery asus_acpi backlight ac lp parport_pc parport nvram pcspkr amd_rng rng_core i2c_amd756 i2c_core Pid: 13581, comm: mtest01 Not tainted 2.6.20-autokern1 #1 RIP: 0010:[<ffffffff80328581>] [<ffffffff80328581>] _raw_spin_lock+0xd/0xf6 RSP: 0000:ffff81003e6c9ce8 EFLAGS: 00010096 RAX: ffffffff8087f720 RBX: 0000000000000018 RCX: ffff81003f36f9d0 RDX: ffff8100807bb040 RSI: 0000000000000001 RDI: 0000000000000018 RBP: 0000000000000000 R08: ffff81003e6c8000 R09: 0000000000000002 R10: ffff810001021da8 R11: ffffffff8044658f R12: ffff81000c861e01 R13: 0000000000000018 R14: ffff81000c861eb8 R15: ffff810032d34138 FS: 00002abf7a1961e0(0000) GS:ffff81003edb94c0(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000000000001c CR3: 000000002ba6e000 CR4: 00000000000006e0 Process mtest01 (pid: 13581, threadinfo ffff81003e6c8000, task ffff81003d8ec040) Stack: ffff810001003638 ffff810014a8c2c0 0000000000000000 ffff81000c861e01 0000000000000018 ffffffff80287166 ffff81000c861eb8 ffff81000000bac0 ffff81003f36f9a0 ffff81000c861e40 ffff81001d4b6a20 ffffffff8026a92e Call Trace: [<ffffffff80287166>] container_rss_move_lists+0x3b/0xaf [<ffffffff8026a92e>] activate_page+0xc1/0xd0 [<ffffffff80245f15>] wake_bit_function+0x0/0x23 [<ffffffff8026ab34>] mark_page_accessed+0x1b/0x2f [<ffffffff80265d25>] filemap_nopage+0x180/0x338 [<ffffffff80270474>] __handle_mm_fault+0x1f2/0xa81 [<ffffffff804c58ef>] do_page_fault+0x42b/0x7b3 [<ffffffff802484c4>] hrtimer_cancel+0xc/0x16 [<ffffffff804c2a89>] do_nanosleep+0x47/0x70 [<ffffffff802485f4>] hrtimer_nanosleep+0x58/0x119 [<ffffffff8023bc1f>] sys_sysinfo+0x15b/0x173 [<ffffffff804c3d3d>] error_exit+0x0/0x84 On analyzing the code, I found that the page is mapped (we have a page_mapped() check in container_rss_move_lists()), but the page_container is invalid. Please review the fix attached (we reset the page's container pointer to NULL when a page is completely unmapped) -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL [-- Attachment #2: rss-fix-lru-race.patch --] [-- Type: text/x-patch, Size: 2194 bytes --] Index: linux-2.6.20/mm/rss_container.c =================================================================== --- linux-2.6.20.orig/mm/rss_container.c 2007-05-15 05:13:46.000000000 -0700 +++ linux-2.6.20/mm/rss_container.c 2007-05-16 20:45:45.000000000 -0700 @@ -212,6 +212,7 @@ void container_rss_del(struct page_conta css_put(&rss->css); kfree(pc); + init_page_container(page); } static void rss_move_task(struct container_subsys *ss, Index: linux-2.6.20/mm/page_alloc.c =================================================================== --- linux-2.6.20.orig/mm/page_alloc.c 2007-05-16 10:30:10.000000000 -0700 +++ linux-2.6.20/mm/page_alloc.c 2007-05-16 20:45:24.000000000 -0700 @@ -41,6 +41,7 @@ #include <linux/pfn.h> #include <linux/backing-dev.h> #include <linux/fault-inject.h> +#include <linux/rss_container.h> #include <asm/tlbflush.h> #include <asm/div64.h> @@ -1977,6 +1978,7 @@ void __meminit memmap_init_zone(unsigned set_page_links(page, zone, nid, pfn); init_page_count(page); reset_page_mapcount(page); + init_page_container(page); SetPageReserved(page); INIT_LIST_HEAD(&page->lru); #ifdef WANT_PAGE_VIRTUAL Index: linux-2.6.20/include/linux/rss_container.h =================================================================== --- linux-2.6.20.orig/include/linux/rss_container.h 2007-05-16 10:31:04.000000000 -0700 +++ linux-2.6.20/include/linux/rss_container.h 2007-05-16 10:32:14.000000000 -0700 @@ -28,6 +28,11 @@ void container_rss_move_lists(struct pag unsigned long isolate_pages_in_container(unsigned long nr_to_scan, struct list_head *dst, unsigned long *scanned, struct zone *zone, struct rss_container *, int active); +static inline void init_page_container(struct page *page) +{ + page_container(page) = NULL; +} + #else static inline int container_rss_prepare(struct page *pg, struct vm_area_struct *vma, struct page_container **pc) @@ -56,6 +61,10 @@ static inline void mm_free_container(str { } +static inline void init_page_container(struct page *page) +{ +} + #define isolate_container_pages(nr, dst, scanned, rss, act, zone) ({ BUG(); 0;}) #define container_rss_move_lists(pg, active) do { } while (0) #endif ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH 8/8] Per-container pages reclamation 2007-05-17 11:31 ` [PATCH 8/8] Per-container pages reclamation Balbir Singh @ 2007-05-21 15:15 ` Pavel Emelianov 2007-05-24 7:59 ` Balbir Singh 0 siblings, 1 reply; 3+ messages in thread From: Pavel Emelianov @ 2007-05-21 15:15 UTC (permalink / raw) To: balbir Cc: Andrew Morton, Paul Menage, Srivatsa Vaddagiri, Balbir Singh, devel, Linux Kernel Mailing List, Kirill Korotaev, Chandra Seetharaman, Cedric Le Goater, Eric W. Biederman, Rohit Seth, Linux Containers, Linux Memory Management List Balbir Singh wrote: > Pavel Emelianov wrote: >> Implement try_to_free_pages_in_container() to free the >> pages in container that has run out of memory. >> >> The scan_control->isolate_pages() function isolates the >> container pages only. Sorry for the late answer, but I have just managed to get to the patches. One comment is below. >> > > Hi, Pavel/Andrew, > > I've started running some basic tests like lmbench and LTP vm stress > on the RSS controller. > > With the controller rss_limit set to 256 MB, I saw the following panic > on a machine > > Unable to handle kernel NULL pointer dereference at 000000000000001c RIP: > [<ffffffff80328581>] _raw_spin_lock+0xd/0xf6 > PGD 3c841067 PUD 5d5d067 PMD 0 > Oops: 0000 [1] SMP > CPU 2 > Modules linked in: ipv6 hidp rfcomm l2cap bluetooth sunrpc video button battery asus_acpi backlight ac lp parport_pc parport nvram pcspkr amd_rng rng_core i2c_amd756 i2c_core > Pid: 13581, comm: mtest01 Not tainted 2.6.20-autokern1 #1 > RIP: 0010:[<ffffffff80328581>] [<ffffffff80328581>] _raw_spin_lock+0xd/0xf6 > RSP: 0000:ffff81003e6c9ce8 EFLAGS: 00010096 > RAX: ffffffff8087f720 RBX: 0000000000000018 RCX: ffff81003f36f9d0 > RDX: ffff8100807bb040 RSI: 0000000000000001 RDI: 0000000000000018 > RBP: 0000000000000000 R08: ffff81003e6c8000 R09: 0000000000000002 > R10: ffff810001021da8 R11: ffffffff8044658f R12: ffff81000c861e01 > R13: 0000000000000018 R14: ffff81000c861eb8 R15: ffff810032d34138 > FS: 00002abf7a1961e0(0000) GS:ffff81003edb94c0(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 000000000000001c CR3: 000000002ba6e000 CR4: 00000000000006e0 > Process mtest01 (pid: 13581, threadinfo ffff81003e6c8000, task ffff81003d8ec040) > Stack: ffff810001003638 ffff810014a8c2c0 0000000000000000 ffff81000c861e01 > 0000000000000018 ffffffff80287166 ffff81000c861eb8 ffff81000000bac0 > ffff81003f36f9a0 ffff81000c861e40 ffff81001d4b6a20 ffffffff8026a92e > Call Trace: > [<ffffffff80287166>] container_rss_move_lists+0x3b/0xaf > [<ffffffff8026a92e>] activate_page+0xc1/0xd0 > [<ffffffff80245f15>] wake_bit_function+0x0/0x23 > [<ffffffff8026ab34>] mark_page_accessed+0x1b/0x2f > [<ffffffff80265d25>] filemap_nopage+0x180/0x338 > [<ffffffff80270474>] __handle_mm_fault+0x1f2/0xa81 > [<ffffffff804c58ef>] do_page_fault+0x42b/0x7b3 > [<ffffffff802484c4>] hrtimer_cancel+0xc/0x16 > [<ffffffff804c2a89>] do_nanosleep+0x47/0x70 > [<ffffffff802485f4>] hrtimer_nanosleep+0x58/0x119 > [<ffffffff8023bc1f>] sys_sysinfo+0x15b/0x173 > [<ffffffff804c3d3d>] error_exit+0x0/0x84 > > On analyzing the code, I found that the page is mapped (we have a page_mapped() check in > container_rss_move_lists()), but the page_container is invalid. Please review the fix > attached (we reset the page's container pointer to NULL when a page is completely unmapped) > > > > ------------------------------------------------------------------------ > > Index: linux-2.6.20/mm/rss_container.c > =================================================================== > --- linux-2.6.20.orig/mm/rss_container.c 2007-05-15 05:13:46.000000000 -0700 > +++ linux-2.6.20/mm/rss_container.c 2007-05-16 20:45:45.000000000 -0700 > @@ -212,6 +212,7 @@ void container_rss_del(struct page_conta > > css_put(&rss->css); > kfree(pc); > + init_page_container(page); This hunk is bad. See, when the page drops its mapcount to 0 it may be reused right after this if it belongs to a file map - another CPU can touch it. Thus you're risking to reset the wrong container. The main idea if the accounting is that you cannot trust the page_container(page) value after the page's mapcount became 0. > } > > static void rss_move_task(struct container_subsys *ss, > Index: linux-2.6.20/mm/page_alloc.c > =================================================================== > --- linux-2.6.20.orig/mm/page_alloc.c 2007-05-16 10:30:10.000000000 -0700 > +++ linux-2.6.20/mm/page_alloc.c 2007-05-16 20:45:24.000000000 -0700 > @@ -41,6 +41,7 @@ > #include <linux/pfn.h> > #include <linux/backing-dev.h> > #include <linux/fault-inject.h> > +#include <linux/rss_container.h> > > #include <asm/tlbflush.h> > #include <asm/div64.h> > @@ -1977,6 +1978,7 @@ void __meminit memmap_init_zone(unsigned > set_page_links(page, zone, nid, pfn); > init_page_count(page); > reset_page_mapcount(page); > + init_page_container(page); > SetPageReserved(page); > INIT_LIST_HEAD(&page->lru); > #ifdef WANT_PAGE_VIRTUAL > Index: linux-2.6.20/include/linux/rss_container.h > =================================================================== > --- linux-2.6.20.orig/include/linux/rss_container.h 2007-05-16 10:31:04.000000000 -0700 > +++ linux-2.6.20/include/linux/rss_container.h 2007-05-16 10:32:14.000000000 -0700 > @@ -28,6 +28,11 @@ void container_rss_move_lists(struct pag > unsigned long isolate_pages_in_container(unsigned long nr_to_scan, > struct list_head *dst, unsigned long *scanned, > struct zone *zone, struct rss_container *, int active); > +static inline void init_page_container(struct page *page) > +{ > + page_container(page) = NULL; > +} > + > #else > static inline int container_rss_prepare(struct page *pg, > struct vm_area_struct *vma, struct page_container **pc) > @@ -56,6 +61,10 @@ static inline void mm_free_container(str > { > } > > +static inline void init_page_container(struct page *page) > +{ > +} > + > #define isolate_container_pages(nr, dst, scanned, rss, act, zone) ({ BUG(); 0;}) > #define container_rss_move_lists(pg, active) do { } while (0) > #endif -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH 8/8] Per-container pages reclamation 2007-05-21 15:15 ` Pavel Emelianov @ 2007-05-24 7:59 ` Balbir Singh 0 siblings, 0 replies; 3+ messages in thread From: Balbir Singh @ 2007-05-24 7:59 UTC (permalink / raw) To: Pavel Emelianov Cc: Andrew Morton, Paul Menage, Srivatsa Vaddagiri, Balbir Singh, devel, Linux Kernel Mailing List, Kirill Korotaev, Chandra Seetharaman, Cedric Le Goater, Eric W. Biederman, Rohit Seth, Linux Containers, Linux Memory Management List [-- Attachment #1: Type: text/plain, Size: 1048 bytes --] Pavel Emelianov wrote: >> Index: linux-2.6.20/mm/rss_container.c >> =================================================================== >> --- linux-2.6.20.orig/mm/rss_container.c 2007-05-15 05:13:46.000000000 -0700 >> +++ linux-2.6.20/mm/rss_container.c 2007-05-16 20:45:45.000000000 -0700 >> @@ -212,6 +212,7 @@ void container_rss_del(struct page_conta >> >> css_put(&rss->css); >> kfree(pc); >> + init_page_container(page); > > This hunk is bad. > See, when the page drops its mapcount to 0 it may be reused right > after this if it belongs to a file map - another CPU can touch it. > Thus you're risking to reset the wrong container. > > The main idea if the accounting is that you cannot trust the > page_container(page) value after the page's mapcount became 0. > Good catch, I'll move the initialization to free_hot_cold_page(). I'm attaching a new patch. I've also gotten rid of the unused variable page in container_rss_del(). I've compile and boot tested the fix -- Thanks, Balbir Singh Linux Technology Center IBM, ISTL [-- Attachment #2: rss-fix-lru-race.patch --] [-- Type: text/x-patch, Size: 2617 bytes --] Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> --- Index: linux-2.6.20/mm/page_alloc.c =================================================================== --- linux-2.6.20.orig/mm/page_alloc.c 2007-05-16 10:30:10.000000000 -0700 +++ linux-2.6.20/mm/page_alloc.c 2007-05-24 00:41:00.000000000 -0700 @@ -41,6 +41,7 @@ #include <linux/pfn.h> #include <linux/backing-dev.h> #include <linux/fault-inject.h> +#include <linux/rss_container.h> #include <asm/tlbflush.h> #include <asm/div64.h> @@ -791,6 +792,7 @@ static void fastcall free_hot_cold_page( if (!PageHighMem(page)) debug_check_no_locks_freed(page_address(page), PAGE_SIZE); + init_page_container(page); arch_free_page(page, 0); kernel_map_pages(page, 1, 0); @@ -1977,6 +1979,7 @@ void __meminit memmap_init_zone(unsigned set_page_links(page, zone, nid, pfn); init_page_count(page); reset_page_mapcount(page); + init_page_container(page); SetPageReserved(page); INIT_LIST_HEAD(&page->lru); #ifdef WANT_PAGE_VIRTUAL Index: linux-2.6.20/include/linux/rss_container.h =================================================================== --- linux-2.6.20.orig/include/linux/rss_container.h 2007-05-16 10:31:04.000000000 -0700 +++ linux-2.6.20/include/linux/rss_container.h 2007-05-16 10:32:14.000000000 -0700 @@ -28,6 +28,11 @@ void container_rss_move_lists(struct pag unsigned long isolate_pages_in_container(unsigned long nr_to_scan, struct list_head *dst, unsigned long *scanned, struct zone *zone, struct rss_container *, int active); +static inline void init_page_container(struct page *page) +{ + page_container(page) = NULL; +} + #else static inline int container_rss_prepare(struct page *pg, struct vm_area_struct *vma, struct page_container **pc) @@ -56,6 +61,10 @@ static inline void mm_free_container(str { } +static inline void init_page_container(struct page *page) +{ +} + #define isolate_container_pages(nr, dst, scanned, rss, act, zone) ({ BUG(); 0;}) #define container_rss_move_lists(pg, active) do { } while (0) #endif Index: linux-2.6.20/mm/rss_container.c =================================================================== --- linux-2.6.20.orig/mm/rss_container.c 2007-05-15 05:13:46.000000000 -0700 +++ linux-2.6.20/mm/rss_container.c 2007-05-24 00:58:43.000000000 -0700 @@ -199,12 +199,9 @@ void container_rss_add(struct page_conta void container_rss_del(struct page_container *pc) { - struct page *page; struct rss_container *rss; - page = pc->page; rss = pc->cnt; - spin_lock_irq(&rss->res.lock); list_del(&pc->list); res_counter_uncharge_locked(&rss->res, 1); ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2007-05-24 8:01 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <461A3010.90403@sw.ru>
[not found] ` <461A397A.8080609@sw.ru>
2007-05-17 11:31 ` [PATCH 8/8] Per-container pages reclamation Balbir Singh
2007-05-21 15:15 ` Pavel Emelianov
2007-05-24 7:59 ` Balbir Singh
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox