* OOMs on PS3 since kernel 6.9-rc4 [not found] <7CE7C8BC-D728-4A10-BD8F-15293D7CF312.ref@yahoo.com> @ 2024-09-24 20:52 ` Damian Dudycz 2024-09-25 17:20 ` Johannes Weiner 0 siblings, 1 reply; 5+ messages in thread From: Damian Dudycz @ 2024-09-24 20:52 UTC (permalink / raw) To: linux-mm; +Cc: sam, holger, kernel, hannes [-- Attachment #1: Type: text/plain, Size: 2480 bytes --] I'm running Gentoo on the PlayStation 3 console (PPC64BE CPU), using custom firmware (OtherOS++) feature. Upgrading from 6.6 to 6.10, I have noticed that OOM kills started during long and intense processes, like compiling code or extracting a large archive. The OOM usually occurs after about 10-20 minutes of for example compiling the gentoo-kernel package. This system has limited amount of RAM (256MB) and there's possibility to use another 256MB of VRam as a fast swap device. Besides that, there's also standard swap partition of 4GB enabled. I bisected with vanilla upstream sources with the exception of some irrelevant patches mentioned at the end. After bisecting, I have found that the issue first started to appear in commit c0cd6f557b9090525d288806cccbc73440ac235a (build 6.9.0-rc4-test) (titled: "page_alloc: fix freelist movement during block conversion"). https://github.com/torvalds/linux/commit/c0cd6f557b9090525d288806cccbc73440ac235a Unfortunately, it doesn't revert cleanly on 6.11 so I couldn't test that. # Files and directories: - patches: contains patches applied to kernel when preparing a test build. These are working with version 6.9 - config: kernel config used - bisect.txt: log from bisecting process - dmesg.txt: log from dmesg after issue occurred - c0cd6f557b9090525d288806cccbc73440ac235a.patch: diff from commit that introduced the issue - proc - Collection of files from /proc, before, after and during test. („During" was always taken 5 minutes after test was started). - 6.9.0-rc3-test-dirty - working version, issue didn't happened. - 6.9.0-rc4-test-00116-gc0cd6f557b90-dirty - commit that introduced the issue, OOM occurred. - 6.11.0-test-dirty - newer version of kernel, OOM still occurred. # Patches: In order for kernel to work on the PS3 using OtherOS++, some patches are required. I reduced the number of patches during testing, only to the ones that are essential to boot correctly. The patches I have used are in "patches" directory. These are used mainly to enable linux to use disk regions that are used for linux and I doubt they have any impact on the issue, but I'm adding them in case this needs verification. There are also 2 disabled patches related to page allocation, that I have left there, but these were not used in tests, as they don't affect the result in this situation, I'm leaving them just in case. Mentioned logs and files are in attached tarball. [-- Attachment #2: files-linux-6.9.0-rc4-test.tar.xz --] [-- Type: application/x-xz, Size: 308736 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: OOMs on PS3 since kernel 6.9-rc4 2024-09-24 20:52 ` OOMs on PS3 since kernel 6.9-rc4 Damian Dudycz @ 2024-09-25 17:20 ` Johannes Weiner 2024-09-25 17:43 ` Damian Dudycz 0 siblings, 1 reply; 5+ messages in thread From: Johannes Weiner @ 2024-09-25 17:20 UTC (permalink / raw) To: Damian Dudycz; +Cc: linux-mm, sam, holger, kernel, Michal Hocko Hi Damian, On Tue, Sep 24, 2024 at 10:52:28PM +0200, Damian Dudycz wrote: > I'm running Gentoo on the PlayStation 3 console (PPC64BE CPU), using custom > firmware (OtherOS++) feature. > > Upgrading from 6.6 to 6.10, I have noticed that OOM kills started during long > and intense processes, like compiling code or extracting a large archive. > > The OOM usually occurs after about 10-20 minutes of for example > compiling the gentoo-kernel package. Thanks for your excellent and detailed report, and sorry about the breakage. While going through the dmesg, I'm noticing the following: [ 719.989545] configure invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=2, oom_score_adj=0 [ 719.989607] COMPACTION is disabled!!! [ 719.989633] CPU: 1 PID: 4701 Comm: configure Not tainted 6.9.0-rc4-test-00116-gc0cd6f557b90-dirty #1 [ 719.989665] Hardware name: SonyPS3 Cell Broadband Engine 0x702100 PS3 [ 719.989688] Call Trace: [ 719.989708] [c00000000a5834a0] [c000000000662e9c] .dump_stack_lvl+0xb0/0x100 (unreliable) [ 719.989777] [c00000000a583530] [c00000000013e43c] .dump_header+0x5c/0x414 [ 719.989835] [c00000000a583600] [c00000000013ec38] .oom_kill_process+0xcc/0x598 [ 719.989888] [c00000000a5836f0] [c00000000013f6f0] .out_of_memory+0x3d0/0x3f0 [ 719.989939] [c00000000a5837a0] [c00000000018f28c] .__alloc_pages_slowpath.constprop.0+0x540/0x6b0 [ 719.989987] [c00000000a5838f0] [c00000000018f4f4] .__alloc_pages_noprof+0xf8/0x1c0 [ 719.990031] [c00000000a5839c0] [c0000000000505d0] .copy_process+0x1d4/0x1bf0 [ 719.990085] [c00000000a583b40] [c000000000052144] .kernel_clone+0xcc/0x3f0 [ 719.990136] [c00000000a583c50] [c0000000000524d4] .__do_sys_clone+0x6c/0x90 [ 719.990188] [c00000000a583d80] [c00000000001f600] .system_call_exception+0x1f4/0x260 [ 719.990246] [c00000000a583e10] [c00000000000b2d4] system_call_common+0xf4/0x258 This is clone() trying to allocate a thread stack, which is a request for 4 physically contiguous pages (order=2 -> 2^2 pages). The second line warns that you don't have CONFIG_COMPACTION enabled, which is the kernel's facility to assemble such contiguous page blocks. (God bless you, Michal Hocko, for adding this warning.) This is not a common configuration anymore, as we have since removed various other mechanisms from the MM code to support higher order allocations. So I think you may have gotten lucky in the past. Can you please try with CONFIG_COMPACTION=y? [ I think what likely happened is that, before my patch, an unmovable request falling back to a movable block would have stolen the rest of its free pages even if it hadn't claimed the block as unmovable. Now it doesn't anymore, and the block, already dominated by cache and anon, will continue to fill up with cache and anon. Not an issue with compaction - and better for long-term defragmentation prospects; but without compaction, you just get a bit less lucky specifically with those higher-order kernel requests. ] Thanks ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: OOMs on PS3 since kernel 6.9-rc4 2024-09-25 17:20 ` Johannes Weiner @ 2024-09-25 17:43 ` Damian Dudycz 2024-09-26 7:00 ` Damian Dudycz 0 siblings, 1 reply; 5+ messages in thread From: Damian Dudycz @ 2024-09-25 17:43 UTC (permalink / raw) To: Johannes Weiner; +Cc: linux-mm, sam, holger, kernel, Michal Hocko Thank you for the response Johannes. I’ll test this and get back with the results. Also, I should mention - Holger have suggested to enable LRU and this also seems to be helping with this issue, but still I thought I should report it when it’s not enabled. I’ll see if compaction helps and let you know if it helped. Regards, Damian. > Wiadomość napisana przez Johannes Weiner <hannes@cmpxchg.org> w dniu 25 wrz 2024, o godz. 19:20: > > Hi Damian, > > On Tue, Sep 24, 2024 at 10:52:28PM +0200, Damian Dudycz wrote: >> I'm running Gentoo on the PlayStation 3 console (PPC64BE CPU), using custom >> firmware (OtherOS++) feature. >> >> Upgrading from 6.6 to 6.10, I have noticed that OOM kills started during long >> and intense processes, like compiling code or extracting a large archive. >> >> The OOM usually occurs after about 10-20 minutes of for example >> compiling the gentoo-kernel package. > > Thanks for your excellent and detailed report, and sorry about the > breakage. > > While going through the dmesg, I'm noticing the following: > > [ 719.989545] configure invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=2, oom_score_adj=0 > [ 719.989607] COMPACTION is disabled!!! > [ 719.989633] CPU: 1 PID: 4701 Comm: configure Not tainted 6.9.0-rc4-test-00116-gc0cd6f557b90-dirty #1 > [ 719.989665] Hardware name: SonyPS3 Cell Broadband Engine 0x702100 PS3 > [ 719.989688] Call Trace: > [ 719.989708] [c00000000a5834a0] [c000000000662e9c] .dump_stack_lvl+0xb0/0x100 (unreliable) > [ 719.989777] [c00000000a583530] [c00000000013e43c] .dump_header+0x5c/0x414 > [ 719.989835] [c00000000a583600] [c00000000013ec38] .oom_kill_process+0xcc/0x598 > [ 719.989888] [c00000000a5836f0] [c00000000013f6f0] .out_of_memory+0x3d0/0x3f0 > [ 719.989939] [c00000000a5837a0] [c00000000018f28c] .__alloc_pages_slowpath.constprop.0+0x540/0x6b0 > [ 719.989987] [c00000000a5838f0] [c00000000018f4f4] .__alloc_pages_noprof+0xf8/0x1c0 > [ 719.990031] [c00000000a5839c0] [c0000000000505d0] .copy_process+0x1d4/0x1bf0 > [ 719.990085] [c00000000a583b40] [c000000000052144] .kernel_clone+0xcc/0x3f0 > [ 719.990136] [c00000000a583c50] [c0000000000524d4] .__do_sys_clone+0x6c/0x90 > [ 719.990188] [c00000000a583d80] [c00000000001f600] .system_call_exception+0x1f4/0x260 > [ 719.990246] [c00000000a583e10] [c00000000000b2d4] system_call_common+0xf4/0x258 > > This is clone() trying to allocate a thread stack, which is a request > for 4 physically contiguous pages (order=2 -> 2^2 pages). > > The second line warns that you don't have CONFIG_COMPACTION enabled, > which is the kernel's facility to assemble such contiguous page > blocks. (God bless you, Michal Hocko, for adding this warning.) > > This is not a common configuration anymore, as we have since removed > various other mechanisms from the MM code to support higher order > allocations. So I think you may have gotten lucky in the past. > > Can you please try with CONFIG_COMPACTION=y? > > [ I think what likely happened is that, before my patch, an unmovable > request falling back to a movable block would have stolen the rest > of its free pages even if it hadn't claimed the block as unmovable. > Now it doesn't anymore, and the block, already dominated by cache > and anon, will continue to fill up with cache and anon. Not an issue > with compaction - and better for long-term defragmentation > prospects; but without compaction, you just get a bit less lucky > specifically with those higher-order kernel requests. ] > > Thanks ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: OOMs on PS3 since kernel 6.9-rc4 2024-09-25 17:43 ` Damian Dudycz @ 2024-09-26 7:00 ` Damian Dudycz 2024-09-26 10:34 ` Johannes Weiner 0 siblings, 1 reply; 5+ messages in thread From: Damian Dudycz @ 2024-09-26 7:00 UTC (permalink / raw) To: Johannes Weiner; +Cc: linux-mm, sam, holger, kernel, Michal Hocko Johannes, I have tested this with compaction enabled and it seems to be working fine now. I think, in that case, this should be enabled in ps3_defconfig by default. As for not having compaction in previous versions - I have been using this for pretty long time and Im pretty sure it used to work fine without it. Still I understand, that it should have been used, just mentioning that it really did work without this before that version. I’ll let ps3_defconfig maintainer know about compaction missing in ps3_defconfig or send patch for that config myself. Thank you all for your help with this. Regards, Damian. > Wiadomość napisana przez Damian Dudycz <damiandudycz@yahoo.com> w dniu 25 wrz 2024, o godz. 19:43: > > Thank you for the response Johannes. > > I’ll test this and get back with the results. > Also, I should mention - Holger have suggested to enable LRU and this also seems to be helping with this issue, but > still I thought I should report it when it’s not enabled. > I’ll see if compaction helps and let you know if it helped. > > Regards, > Damian. > >> Wiadomość napisana przez Johannes Weiner <hannes@cmpxchg.org> w dniu 25 wrz 2024, o godz. 19:20: >> >> Hi Damian, >> >> On Tue, Sep 24, 2024 at 10:52:28PM +0200, Damian Dudycz wrote: >>> I'm running Gentoo on the PlayStation 3 console (PPC64BE CPU), using custom >>> firmware (OtherOS++) feature. >>> >>> Upgrading from 6.6 to 6.10, I have noticed that OOM kills started during long >>> and intense processes, like compiling code or extracting a large archive. >>> >>> The OOM usually occurs after about 10-20 minutes of for example >>> compiling the gentoo-kernel package. >> >> Thanks for your excellent and detailed report, and sorry about the >> breakage. >> >> While going through the dmesg, I'm noticing the following: >> >> [ 719.989545] configure invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=2, oom_score_adj=0 >> [ 719.989607] COMPACTION is disabled!!! >> [ 719.989633] CPU: 1 PID: 4701 Comm: configure Not tainted 6.9.0-rc4-test-00116-gc0cd6f557b90-dirty #1 >> [ 719.989665] Hardware name: SonyPS3 Cell Broadband Engine 0x702100 PS3 >> [ 719.989688] Call Trace: >> [ 719.989708] [c00000000a5834a0] [c000000000662e9c] .dump_stack_lvl+0xb0/0x100 (unreliable) >> [ 719.989777] [c00000000a583530] [c00000000013e43c] .dump_header+0x5c/0x414 >> [ 719.989835] [c00000000a583600] [c00000000013ec38] .oom_kill_process+0xcc/0x598 >> [ 719.989888] [c00000000a5836f0] [c00000000013f6f0] .out_of_memory+0x3d0/0x3f0 >> [ 719.989939] [c00000000a5837a0] [c00000000018f28c] .__alloc_pages_slowpath.constprop.0+0x540/0x6b0 >> [ 719.989987] [c00000000a5838f0] [c00000000018f4f4] .__alloc_pages_noprof+0xf8/0x1c0 >> [ 719.990031] [c00000000a5839c0] [c0000000000505d0] .copy_process+0x1d4/0x1bf0 >> [ 719.990085] [c00000000a583b40] [c000000000052144] .kernel_clone+0xcc/0x3f0 >> [ 719.990136] [c00000000a583c50] [c0000000000524d4] .__do_sys_clone+0x6c/0x90 >> [ 719.990188] [c00000000a583d80] [c00000000001f600] .system_call_exception+0x1f4/0x260 >> [ 719.990246] [c00000000a583e10] [c00000000000b2d4] system_call_common+0xf4/0x258 >> >> This is clone() trying to allocate a thread stack, which is a request >> for 4 physically contiguous pages (order=2 -> 2^2 pages). >> >> The second line warns that you don't have CONFIG_COMPACTION enabled, >> which is the kernel's facility to assemble such contiguous page >> blocks. (God bless you, Michal Hocko, for adding this warning.) >> >> This is not a common configuration anymore, as we have since removed >> various other mechanisms from the MM code to support higher order >> allocations. So I think you may have gotten lucky in the past. >> >> Can you please try with CONFIG_COMPACTION=y? >> >> [ I think what likely happened is that, before my patch, an unmovable >> request falling back to a movable block would have stolen the rest >> of its free pages even if it hadn't claimed the block as unmovable. >> Now it doesn't anymore, and the block, already dominated by cache >> and anon, will continue to fill up with cache and anon. Not an issue >> with compaction - and better for long-term defragmentation >> prospects; but without compaction, you just get a bit less lucky >> specifically with those higher-order kernel requests. ] >> >> Thanks > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: OOMs on PS3 since kernel 6.9-rc4 2024-09-26 7:00 ` Damian Dudycz @ 2024-09-26 10:34 ` Johannes Weiner 0 siblings, 0 replies; 5+ messages in thread From: Johannes Weiner @ 2024-09-26 10:34 UTC (permalink / raw) To: Damian Dudycz; +Cc: linux-mm, sam, holger, kernel, Michal Hocko Hello Damian, On Thu, Sep 26, 2024 at 09:00:25AM +0200, Damian Dudycz wrote: > Johannes, > > I have tested this with compaction enabled and it seems to be working fine now. > I think, in that case, this should be enabled in ps3_defconfig by default. I'm glad to hear it's working again! > As for not having compaction in previous versions - I have been using this for pretty long > time and Im pretty sure it used to work fine without it. Still I understand, that it should have been > used, just mentioning that it really did work without this before that version. Yes, it's a real regression and I believe you that it has worked until now. My comment about luck was more in reference to the level of support, testing and attention this configuration is getting: config COMPACTION bool "Allow for memory compaction" default y select MIGRATION depends on MMU help Compaction is the only memory management component to form high order (larger physically contiguous) memory blocks reliably. The page allocator relies on compaction heavily and the lack of the feature can lead to unexpected OOM killer invocations for high order memory requests. You shouldn't disable this option unless there really is a strong reason for it and then we would be really interested to hear about that at linux-mm@kvack.org. So I definitely agree that the ps3_defconfig should be fixed. > I’ll let ps3_defconfig maintainer know about compaction missing in ps3_defconfig > or send patch for that config myself. Thanks, yes this makes sense. This should be a good list of pointers: hannes@column ~/src/linux/linux $ ./scripts/get_maintainer.pl -f arch/powerpc/configs/ps3_defconfig Michael Ellerman <mpe@ellerman.id.au> (supporter:LINUX FOR POWERPC (32-BIT AND 64-BIT),commit_signer:2/2=100%) Nicholas Piggin <npiggin@gmail.com> (reviewer:LINUX FOR POWERPC (32-BIT AND 64-BIT)) Christophe Leroy <christophe.leroy@csgroup.eu> (reviewer:LINUX FOR POWERPC (32-BIT AND 64-BIT)) Naveen N Rao <naveen@kernel.org> (reviewer:LINUX FOR POWERPC (32-BIT AND 64-BIT)) Geoff Levand <geoff@infradead.org> (commit_signer:2/2=100%,authored:2/2=100%,added_lines:1/1=100%,removed_lines:1/1=100%) linuxppc-dev@lists.ozlabs.org (open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)) linux-kernel@vger.kernel.org (open list) Johannes ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-09-26 10:34 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <7CE7C8BC-D728-4A10-BD8F-15293D7CF312.ref@yahoo.com>
2024-09-24 20:52 ` OOMs on PS3 since kernel 6.9-rc4 Damian Dudycz
2024-09-25 17:20 ` Johannes Weiner
2024-09-25 17:43 ` Damian Dudycz
2024-09-26 7:00 ` Damian Dudycz
2024-09-26 10:34 ` Johannes Weiner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox