* [Bugme-new] [Bug 2019] New: Bug from the mm subsystem involving X (fwd)
@ 2004-02-04 23:17 Martin J. Bligh
2004-02-04 23:58 ` Linus Torvalds
0 siblings, 1 reply; 19+ messages in thread
From: Martin J. Bligh @ 2004-02-04 23:17 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-mm mailing list, kmannth
Summary: Bug from the mm subsystem involving X
Kernel Version: kernel.org 2.6.2
Status: NEW
Severity: normal
Owner: mm_numa-discontigmem@kernel-bugs.osdl.org
Submitter: kmannth@us.ibm.com
Distribution: Red Hat Enterprise Linux AS release 3 (Taroon Update 1)
Hardware Environment: IBM x445 16-way 64gig of ram
Software Environment: AS3.0 update 1 with stock 2.6.2
Problem Description: The X server and the kenel do not play well.
Steps to reproduce: Load AS3.0 (any flavor) and install a v2.6 kernel
start X on boot.
So there have been alot of X issue with Red Hat and 2.6 kernels. I managed to
get the system to panic and I decide it was time to open this bug. I got this
on boot up.
NET: Registered protocol family 17
kjournald starting. Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Freeing unused kernel memory: 296k freed
???????
Red Hat Enterprise Linux AS release 3 (Taroon Update 1)
Kernel 2.6.2 on an i686
elm3a80 login: Unable to handle kernel paging request at virtual address 0264d000
printing eip:
c0147af4
*pde = 00000000
Oops: 0000 [#1]
CPU: 7
EIP: 0060:[<c0147af4>] Not tainted
EFLAGS: 00013206
EIP is at remap_page_range+0x193/0x26c
eax: 0264d000 ebx: 000f5200 ecx: 00000001 edx: dad0fa80
esi: 001fe000 edi: d87c9ff0 ebp: f5200000 esp: d8835ee4
ds: 007b es: 007b ss: 0068
Process X (pid: 1285, threadinfo=d8834000 task=d9474ce0)
Stack: d961d580 001ff000 001ff000 40000000 f5002000 001fe000 d9578000 d961d580
401ff000 d9576508 00000000 f5200000 d961d580 00000001 c0247055 d87d62c0
401fe000 b5002000 00001000 00000027 d9388e80 00001000 c014a7fd d9388e80
Call Trace:
[<c0247055>] mmap_mem+0x71/0xd4
[<c014a7fd>] do_mmap_pgoff+0x362/0x70d
[<c0156f65>] filp_open+0x67/0x69
[<c0111c4d>] sys_mmap2+0x7a/0xaa
[<c010aced>] sysenter_past_esp+0x52/0x71
Code: 8b 00 a9 00 08 00 00 74 10 89 d8 8b 54 24 4c c1 e8 14 09 ea
<6>note: X[1285] exited with preempt_count 1
bad: scheduling while atomic!
Call Trace:
[<c011da0a>] schedule+0x6d0/0x6d5
[<c0122357>] __call_console_drivers+0x5b/0x5d
[<c0122449>] call_console_drivers+0x69/0x11f
[<c0223ffb>] rwsem_down_read_failed+0xa7/0x15a
[<c012513f>] .text.lock.exit+0xeb/0x18c
[<c010be11>] do_divide_error+0x0/0xfb
[<c011a06f>] do_page_fault+0x1f8/0x561
[<c0138b93>] find_get_page+0x3d/0x7a
[<c0139db6>] filemap_nopage+0x287/0x378
[<c013b166>] generic_file_aio_write+0x78/0xa2
[<c0119e77>] do_page_fault+0x0/0x561
[<c010b7a9>] error_code+0x2d/0x38
[<c0147af4>] remap_page_range+0x193/0x26c
[<c0247055>] mmap_mem+0x71/0xd4
[<c014a7fd>] do_mmap_pgoff+0x362/0x70d
[<c0156f65>] filp_open+0x67/0x69
[<c0111c4d>] sys_mmap2+0x7a/0xaa
[<c010aced>] sysenter_past_esp+0x52/0x71
Red Hat Enterprise Linux AS release 3 (Taroon Update 1)
Kernel 2.6.2 on an i686
My X version is XFree86-4.3.0-44.EL
Also if I do proc related thing on the pid (ps top ...) I hang the login session
(strace shows I don't return from a read on what I suppose is the X pid)
Any thoughts, comments or suggestions are wanted.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [Bugme-new] [Bug 2019] New: Bug from the mm subsystem involving X (fwd) 2004-02-04 23:17 [Bugme-new] [Bug 2019] New: Bug from the mm subsystem involving X (fwd) Martin J. Bligh @ 2004-02-04 23:58 ` Linus Torvalds 2004-02-05 0:12 ` Martin J. Bligh 0 siblings, 1 reply; 19+ messages in thread From: Linus Torvalds @ 2004-02-04 23:58 UTC (permalink / raw) To: Martin J. Bligh; +Cc: linux-kernel, linux-mm mailing list, kmannth On Wed, 4 Feb 2004, Martin J. Bligh wrote: > > So there have been alot of X issue with Red Hat and 2.6 kernels. I managed to > get the system to panic and I decide it was time to open this bug. I got this > on boot up. Hmm. Compiler? Why would AS-3 in particular have problems? > Unable to handle kernel paging request at virtual address 0264d000 > printing eip: > c0147af4 > *pde = 00000000 > Oops: 0000 [#1] > CPU: 7 > EIP: 0060:[<c0147af4>] Not tainted > EFLAGS: 00013206 > EIP is at remap_page_range+0x193/0x26c > eax: 0264d000 ebx: 000f5200 ecx: 00000001 edx: dad0fa80 > esi: 001fe000 edi: d87c9ff0 ebp: f5200000 esp: d8835ee4 > ds: 007b es: 007b ss: 0068 > Process X (pid: 1285, threadinfo=d8834000 task=d9474ce0) > Stack: d961d580 001ff000 001ff000 40000000 f5002000 001fe000 d9578000 d961d580 > 401ff000 d9576508 00000000 f5200000 d961d580 00000001 c0247055 d87d62c0 > 401fe000 b5002000 00001000 00000027 d9388e80 00001000 c014a7fd d9388e80 > Call Trace: > [<c0247055>] mmap_mem+0x71/0xd4 > [<c014a7fd>] do_mmap_pgoff+0x362/0x70d > [<c0156f65>] filp_open+0x67/0x69 > [<c0111c4d>] sys_mmap2+0x7a/0xaa > [<c010aced>] sysenter_past_esp+0x52/0x71 > > Code: 8b 00 a9 00 08 00 00 74 10 89 d8 8b 54 24 4c c1 e8 14 09 ea This _seems_ to be the code ... if (!pfn_valid(pfn) || PageReserved(pfn_to_page(pfn))) set_pte(pte, pfn_pte(pfn, prot)); ... in particular, it disassembles to 0x8048490 <insn>: mov (%eax),%eax 0x8048492 <insn+2>: test $0x800,%eax 0x8048497 <insn+7>: je 0x80484a9 0x8048499 <insn+9>: mov %ebx,%eax 0x804849b <insn+11>: mov 0x4c(%esp,1),%edx 0x804849f <insn+15>: shr $0x14,%eax which seems to be the "PageReserved(pfn_to_page(pfn))" test. This implies that you have either: - a buggy "pfn_valid()" macro (do you use CONFIG_DISCONTIGMEM?) - or a buggy compiler (it sure ain't the compiler I use, since that one will generate a "testb $8,%ah" instead) It might help if you disassembled the code (in your kernel) around that point, since that might give a clue about it. Quite honestly, to me it looks like the address being remapped is likely in %ebp (0xf5200000), and that pfn is in %ebx (0x000f5200), and that your pfn_valid() is buggered, causing a totally bogus "struct page *" from "pfn_to_page()". Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bugme-new] [Bug 2019] New: Bug from the mm subsystem involving X (fwd) 2004-02-04 23:58 ` Linus Torvalds @ 2004-02-05 0:12 ` Martin J. Bligh 2004-02-05 0:36 ` Martin J. Bligh 0 siblings, 1 reply; 19+ messages in thread From: Martin J. Bligh @ 2004-02-05 0:12 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel, linux-mm mailing list, kmannth >> So there have been alot of X issue with Red Hat and 2.6 kernels. I managed to >> get the system to panic and I decide it was time to open this bug. I got this >> on boot up. > > Hmm. Compiler? Why would AS-3 in particular have problems? I think it's more likely the combination of NUMA and X. People hardly ever run X on the big servers ... Keith is just odd ;-) >> Unable to handle kernel paging request at virtual address 0264d000 >> printing eip: >> c0147af4 >> *pde = 00000000 >> Oops: 0000 [#1] >> CPU: 7 >> EIP: 0060:[<c0147af4>] Not tainted >> EFLAGS: 00013206 >> EIP is at remap_page_range+0x193/0x26c >> eax: 0264d000 ebx: 000f5200 ecx: 00000001 edx: dad0fa80 >> esi: 001fe000 edi: d87c9ff0 ebp: f5200000 esp: d8835ee4 >> ds: 007b es: 007b ss: 0068 >> Process X (pid: 1285, threadinfo=d8834000 task=d9474ce0) >> Stack: d961d580 001ff000 001ff000 40000000 f5002000 001fe000 d9578000 d961d580 >> 401ff000 d9576508 00000000 f5200000 d961d580 00000001 c0247055 d87d62c0 >> 401fe000 b5002000 00001000 00000027 d9388e80 00001000 c014a7fd d9388e80 >> Call Trace: >> [<c0247055>] mmap_mem+0x71/0xd4 >> [<c014a7fd>] do_mmap_pgoff+0x362/0x70d >> [<c0156f65>] filp_open+0x67/0x69 >> [<c0111c4d>] sys_mmap2+0x7a/0xaa >> [<c010aced>] sysenter_past_esp+0x52/0x71 >> >> Code: 8b 00 a9 00 08 00 00 74 10 89 d8 8b 54 24 4c c1 e8 14 09 ea > > This _seems_ to be the code > > ... > if (!pfn_valid(pfn) || PageReserved(pfn_to_page(pfn))) > set_pte(pte, pfn_pte(pfn, prot)); > ... > > in particular, it disassembles to > > 0x8048490 <insn>: mov (%eax),%eax > 0x8048492 <insn+2>: test $0x800,%eax > 0x8048497 <insn+7>: je 0x80484a9 > 0x8048499 <insn+9>: mov %ebx,%eax > 0x804849b <insn+11>: mov 0x4c(%esp,1),%edx > 0x804849f <insn+15>: shr $0x14,%eax > > which seems to be the "PageReserved(pfn_to_page(pfn))" test. > > This implies that you have either: > - a buggy "pfn_valid()" macro (do you use CONFIG_DISCONTIGMEM?) Yup. #define pfn_valid(pfn) ((pfn) < num_physpages) Which is wrong. There's a even a comment above it that says: /* * pfn_valid should be made as fast as possible, and the current definition * is valid for machines that are NUMA, but still contiguous, which is what * is currently supported. A more generalised, but slower definition would * be something like this - mbligh: * ( pfn_to_pgdat(pfn) && ((pfn) < node_end_pfn(pfn_to_nid(pfn))) ) */ ;-) Which I still don't think is correct, as there's a hole in the middle of node 0 ... I'll make a new patch up somehow and give to Keith to test ;-) Thanks, M. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bugme-new] [Bug 2019] New: Bug from the mm subsystem involving X (fwd) 2004-02-05 0:12 ` Martin J. Bligh @ 2004-02-05 0:36 ` Martin J. Bligh 2004-02-05 0:43 ` Linus Torvalds 0 siblings, 1 reply; 19+ messages in thread From: Martin J. Bligh @ 2004-02-05 0:36 UTC (permalink / raw) To: Linus Torvalds Cc: linux-kernel, linux-mm mailing list, kmannth, Andrew Morton >> which seems to be the "PageReserved(pfn_to_page(pfn))" test. >> >> This implies that you have either: >> - a buggy "pfn_valid()" macro (do you use CONFIG_DISCONTIGMEM?) > > Yup. ># define pfn_valid(pfn) ((pfn) < num_physpages) > > Which is wrong. There's a even a comment above it that says: > > /* > * pfn_valid should be made as fast as possible, and the current definition > * is valid for machines that are NUMA, but still contiguous, which is what > * is currently supported. A more generalised, but slower definition would > * be something like this - mbligh: > * ( pfn_to_pgdat(pfn) && ((pfn) < node_end_pfn(pfn_to_nid(pfn))) ) > */ > > ;-) > > Which I still don't think is correct, as there's a hole in the middle of > node 0 ... I'll make a new patch up somehow and give to Keith to test ;-) Oh hell ... I remember what's wrong with this whole bit. pfn_valid is used inconsistently in different places, IIRC. Linus / Andrew ... what do you actually want it to mean? Some things seem to use it to say "the memory here is valid accessible RAM", some things "there is a valid struct page for this pfn". I was aiming for the latter, but a few other arches seemed to disagree. Could I get a ruling on this? ;-) Thanks, M. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bugme-new] [Bug 2019] New: Bug from the mm subsystem involving X (fwd) 2004-02-05 0:36 ` Martin J. Bligh @ 2004-02-05 0:43 ` Linus Torvalds 2004-02-05 0:56 ` Andrew Morton 0 siblings, 1 reply; 19+ messages in thread From: Linus Torvalds @ 2004-02-05 0:43 UTC (permalink / raw) To: Martin J. Bligh Cc: linux-kernel, linux-mm mailing list, kmannth, Andrew Morton On Wed, 4 Feb 2004, Martin J. Bligh wrote: > > Oh hell ... I remember what's wrong with this whole bit. pfn_valid is > used inconsistently in different places, IIRC. Linus / Andrew ... what > do you actually want it to mean? Some things seem to use it to say > "the memory here is valid accessible RAM", some things "there is a > valid struct page for this pfn". I was aiming for the latter, but a > few other arches seemed to disagree. > > Could I get a ruling on this? ;-) It _definitely_ means "there is a valid 'struct page' for this pfn". To test for "there is RAM" here, you need to first check that the pfn is valid, and then you can check what the page type is (usually that would be PageReserved(), but it could be a highmem check or something like that too). Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bugme-new] [Bug 2019] New: Bug from the mm subsystem involving X (fwd) 2004-02-05 0:43 ` Linus Torvalds @ 2004-02-05 0:56 ` Andrew Morton 2004-02-05 1:29 ` Linus Torvalds 0 siblings, 1 reply; 19+ messages in thread From: Andrew Morton @ 2004-02-05 0:56 UTC (permalink / raw) To: Linus Torvalds; +Cc: mbligh, linux-kernel, linux-mm, kmannth Linus Torvalds <torvalds@osdl.org> wrote: > > > > On Wed, 4 Feb 2004, Martin J. Bligh wrote: > > > > Oh hell ... I remember what's wrong with this whole bit. pfn_valid is > > used inconsistently in different places, IIRC. Linus / Andrew ... what > > do you actually want it to mean? Some things seem to use it to say > > "the memory here is valid accessible RAM", some things "there is a > > valid struct page for this pfn". I was aiming for the latter, but a > > few other arches seemed to disagree. > > > > Could I get a ruling on this? ;-) > > It _definitely_ means "there is a valid 'struct page' for this pfn". > > To test for "there is RAM" here, you need to first check that the pfn is > valid, and then you can check what the page type is (usually that would be > PageReserved(), but it could be a highmem check or something like that > too). pfn_valid() could become quite expensive indeed, and it lies on super-duper hotpaths. An alternative which is less conceptually clean but should work in this case is to mark all vma's which were created by /dev/mem mappings as VM_IO, and test that in remap_page_range(). The marking of mmap_mem() vma's as VM_IO has been in -mm for four months. But I didn't changelog it at the time and I've forgotten why I wrote it (really). It's something to do with get_user_pages() against a mapping of /dev/mem :( ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.2-rc3/2.6.2-rc3-mm1/broken-out/get_user_pages-handle-VM_IO.patch -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bugme-new] [Bug 2019] New: Bug from the mm subsystem involving X (fwd) 2004-02-05 0:56 ` Andrew Morton @ 2004-02-05 1:29 ` Linus Torvalds 2004-02-05 1:56 ` Keith Mannthey 0 siblings, 1 reply; 19+ messages in thread From: Linus Torvalds @ 2004-02-05 1:29 UTC (permalink / raw) To: Andrew Morton; +Cc: mbligh, linux-kernel, linux-mm, kmannth On Wed, 4 Feb 2004, Andrew Morton wrote: > > pfn_valid() could become quite expensive indeed, and it lies on super-duper > hotpaths. Yes. However, sometimes it is the only choice. So it does need to be fixed, and if it ends up being a noticeable perofmance problem, then we can look at the hot-paths one by one and see if we can avoid using it. We probably can, most of the time. > An alternative which is less conceptually clean but should work in this > case is to mark all vma's which were created by /dev/mem mappings as VM_IO, > and test that in remap_page_range(). Hmm.. Grepping for "pfn_valid()", I'm starting to suspect that yes, with a VM_IO approach and a fixed virt_addr_valid(), there really aren't any other uses. (virt_addr_valid() is useful for debugging and for validation of untrusted pointers, but pfn_valid() just isn't very good for it. Never really was: it started out as an ugly hack, and it never got cleaned up. It should be easily fixable with something _proper_). Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bugme-new] [Bug 2019] New: Bug from the mm subsystem involving X (fwd) 2004-02-05 1:29 ` Linus Torvalds @ 2004-02-05 1:56 ` Keith Mannthey 2004-02-05 2:04 ` Linus Torvalds 0 siblings, 1 reply; 19+ messages in thread From: Keith Mannthey @ 2004-02-05 1:56 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andrew Morton, Martin J. Bligh, linux-kernel, linux-mm On Wed, 2004-02-04 at 17:29, Linus Torvalds wrote: > > So it does need to be fixed, and if it ends up being a noticeable > perofmance problem, then we can look at the hot-paths one by one and see > if we can avoid using it. We probably can, most of the time. > Martin sent me a patch that fixed the X panics (NUMA and DISCONTIG enabled). (Thanks Martin!) I don't have the same X panics and issues I had before. I don't know if this will work for the generic case. It compiles with a simple memory situation just fine but I didn't boot it. diff -purN -X /home/mbligh/.diff.exclude virgin/include/asm-i386/mmzone.h pfn_valid/include/asm-i386/mmzone.h --- virgin/include/asm-i386/mmzone.h 2003-10-01 11:48:22.000000000 -0700 +++ pfn_valid/include/asm-i386/mmzone.h 2004-02-04 16:39:12.000000000 -0800 @@ -84,14 +84,8 @@ extern struct pglist_data *node_data[]; + __zone->zone_start_pfn; \ }) #define pmd_page(pmd) (pfn_to_page(pmd_val(pmd) >> PAGE_SHIFT)) -/* - * pfn_valid should be made as fast as possible, and the current definition - * is valid for machines that are NUMA, but still contiguous, which is what - * is currently supported. A more generalised, but slower definition would - * be something like this - mbligh: - * ( pfn_to_pgdat(pfn) && ((pfn) < node_end_pfn(pfn_to_nid(pfn))) ) - */ -#define pfn_valid(pfn) ((pfn) < num_physpages) + +#define pfn_valid(pfn) ( pfn_to_pgdat(pfn) && ((pfn) < node_end_pfn(pfn_to_nid(pfn))) ) /* * generic node memory support, the following assumptions apply: -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bugme-new] [Bug 2019] New: Bug from the mm subsystem involving X (fwd) 2004-02-05 1:56 ` Keith Mannthey @ 2004-02-05 2:04 ` Linus Torvalds 2004-02-05 2:33 ` Keith Mannthey 2004-02-06 7:17 ` Martin J. Bligh 0 siblings, 2 replies; 19+ messages in thread From: Linus Torvalds @ 2004-02-05 2:04 UTC (permalink / raw) To: Keith Mannthey; +Cc: Andrew Morton, Martin J. Bligh, linux-kernel, linux-mm On Wed, 4 Feb 2004, Keith Mannthey wrote: > > Martin sent me a patch that fixed the X panics (NUMA and DISCONTIG > enabled). (Thanks Martin!) I don't have the same X panics and issues I > had before. I don't know if this will work for the generic case. It > compiles with a simple memory situation just fine but I didn't boot it. Looks ok, but the thing should be made a function (possibly inline, depending on how big the code generated ends up being). As it is, it now uses its arguments several times, and while I don't see anything where that could screw up, it's just a tad scary. Also, related to this whole mess, what the _heck_ is this in mm/rmap.c: if (!pfn_valid(page_to_pfn(page)) || PageReserved(page)) return pte_chain; that "pfn_valid(page_to_pfn(page))" just looks totally nonsensical. Can somebody really pass in random page pointers to this thing, and if so, are they guaranteed to be "not-random enough" to not cause bogus behaviour when the "page_to_pfn()" happens to be valid.. If VM_IO gets rid of this, then we should immediately apply the patch. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bugme-new] [Bug 2019] New: Bug from the mm subsystem involving X (fwd) 2004-02-05 2:04 ` Linus Torvalds @ 2004-02-05 2:33 ` Keith Mannthey 2004-02-05 2:47 ` Linus Torvalds 2004-02-06 7:17 ` Martin J. Bligh 1 sibling, 1 reply; 19+ messages in thread From: Keith Mannthey @ 2004-02-05 2:33 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andrew Morton, Martin J. Bligh, linux-kernel, linux-mm On Wed, 2004-02-04 at 18:04, Linus Torvalds wrote: > If VM_IO gets rid of this, then we should immediately apply the patch. I tried Andrews VM_IO patch earlier today but it didn't fix the problem. Keith -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bugme-new] [Bug 2019] New: Bug from the mm subsystem involving X (fwd) 2004-02-05 2:33 ` Keith Mannthey @ 2004-02-05 2:47 ` Linus Torvalds 0 siblings, 0 replies; 19+ messages in thread From: Linus Torvalds @ 2004-02-05 2:47 UTC (permalink / raw) To: Keith Mannthey; +Cc: Andrew Morton, Martin J. Bligh, linux-kernel, linux-mm On Wed, 4 Feb 2004, Keith Mannthey wrote: > > I tried Andrews VM_IO patch earlier today but it didn't fix the > problem. Yeah, that patch is not actually converting the pfn_valid() users to only trust VM_IO, it only does a few special cases (notably the follow_pages() thing, which wasn't the issue here). So the patch would have to be expanded to cover _all_ of the page table following functions. It probably isn't that much, just looking for code that checks for PageReserved() will pinpoint the needed users pretty well. So I think the VM_IO approach could fix this, but it would need to be fleshed out more. In the meantime, fixing pfn_valid() is definitely the right thing to do. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bugme-new] [Bug 2019] New: Bug from the mm subsystem involving X (fwd) 2004-02-05 2:04 ` Linus Torvalds 2004-02-05 2:33 ` Keith Mannthey @ 2004-02-06 7:17 ` Martin J. Bligh 2004-02-06 7:19 ` Martin J. Bligh 2004-02-06 9:57 ` Dave Hansen 1 sibling, 2 replies; 19+ messages in thread From: Martin J. Bligh @ 2004-02-06 7:17 UTC (permalink / raw) To: Linus Torvalds, Keith Mannthey; +Cc: Andrew Morton, linux-kernel, linux-mm >> Martin sent me a patch that fixed the X panics (NUMA and DISCONTIG >> enabled). (Thanks Martin!) I don't have the same X panics and issues I >> had before. I don't know if this will work for the generic case. It >> compiles with a simple memory situation just fine but I didn't boot it. > > Looks ok, but the thing should be made a function (possibly inline, > depending on how big the code generated ends up being). As it is, it now > uses its arguments several times, and while I don't see anything where > that could screw up, it's just a tad scary. Yup, sorry about that. Unfortunately fixing that gets into a small problem with the definition of pfn_to_nid. I've had a small patch pending for ages to clean up that mess anyway, so now is probably the right time to push it. pfn_to_nid patch follows, and I'll send the (rejigged) original patch in a follow-up email. Andrew - I'm pretty sure this works fine but could you possibly test it in -mm for a bit? Thanks, M. -------------------------------- Makes sure pfn_to_nid is defined for all combinations of subarches, and that it's defined before it's used so we don't run into implicit declaration problems. diff -aurpN -X /home/fletch/.diff.exclude virgin/include/asm-i386/mmzone.h pfn_to_nid/include/asm-i386/mmzone.h --- virgin/include/asm-i386/mmzone.h Mon Nov 17 18:28:57 2003 +++ pfn_to_nid/include/asm-i386/mmzone.h Thu Feb 5 20:58:00 2004 @@ -10,7 +10,49 @@ #ifdef CONFIG_DISCONTIGMEM +#ifdef CONFIG_NUMA + #ifdef CONFIG_X86_NUMAQ + #include <asm/numaq.h> + #else /* summit or generic arch */ + #include <asm/srat.h> + #endif +#else /* !CONFIG_NUMA */ + #define get_memcfg_numa get_memcfg_numa_flat + #define get_zholes_size(n) (0) +#endif /* CONFIG_NUMA */ + extern struct pglist_data *node_data[]; +#define NODE_DATA(nid) (node_data[nid]) + +/* + * generic node memory support, the following assumptions apply: + * + * 1) memory comes in 256Mb contigious chunks which are either present or not + * 2) we will not have more than 64Gb in total + * + * for now assume that 64Gb is max amount of RAM for whole system + * 64Gb / 4096bytes/page = 16777216 pages + */ +#define MAX_NR_PAGES 16777216 +#define MAX_ELEMENTS 256 +#define PAGES_PER_ELEMENT (MAX_NR_PAGES/MAX_ELEMENTS) + +extern u8 physnode_map[]; + +static inline int pfn_to_nid(unsigned long pfn) +{ +#ifdef CONFIG_NUMA + return(physnode_map[(pfn) / PAGES_PER_ELEMENT]); +#else + return 0; +#endif +} + +static inline struct pglist_data *pfn_to_pgdat(unsigned long pfn) +{ + return(NODE_DATA(pfn_to_nid(pfn))); +} + /* * Following are macros that are specific to this numa platform. @@ -43,11 +85,6 @@ extern struct pglist_data *node_data[]; */ #define kvaddr_to_nid(kaddr) pfn_to_nid(__pa(kaddr) >> PAGE_SHIFT) -/* - * Return a pointer to the node data for node n. - */ -#define NODE_DATA(nid) (node_data[nid]) - #define node_mem_map(nid) (NODE_DATA(nid)->node_mem_map) #define node_start_pfn(nid) (NODE_DATA(nid)->node_start_pfn) #define node_end_pfn(nid) \ @@ -92,40 +129,6 @@ extern struct pglist_data *node_data[]; * ( pfn_to_pgdat(pfn) && ((pfn) < node_end_pfn(pfn_to_nid(pfn))) ) */ #define pfn_valid(pfn) ((pfn) < num_physpages) - -/* - * generic node memory support, the following assumptions apply: - * - * 1) memory comes in 256Mb contigious chunks which are either present or not - * 2) we will not have more than 64Gb in total - * - * for now assume that 64Gb is max amount of RAM for whole system - * 64Gb / 4096bytes/page = 16777216 pages - */ -#define MAX_NR_PAGES 16777216 -#define MAX_ELEMENTS 256 -#define PAGES_PER_ELEMENT (MAX_NR_PAGES/MAX_ELEMENTS) - -extern u8 physnode_map[]; - -static inline int pfn_to_nid(unsigned long pfn) -{ - return(physnode_map[(pfn) / PAGES_PER_ELEMENT]); -} -static inline struct pglist_data *pfn_to_pgdat(unsigned long pfn) -{ - return(NODE_DATA(pfn_to_nid(pfn))); -} - -#ifdef CONFIG_X86_NUMAQ -#include <asm/numaq.h> -#elif CONFIG_ACPI_SRAT -#include <asm/srat.h> -#elif CONFIG_X86_PC -#define get_zholes_size(n) (0) -#else -#define pfn_to_nid(pfn) (0) -#endif /* CONFIG_X86_NUMAQ */ extern int get_memcfg_numa_flat(void ); /* diff -aurpN -X /home/fletch/.diff.exclude virgin/include/linux/mmzone.h pfn_to_nid/include/linux/mmzone.h --- virgin/include/linux/mmzone.h Wed Feb 4 23:03:38 2004 +++ pfn_to_nid/include/linux/mmzone.h Thu Feb 5 21:01:05 2004 @@ -311,6 +311,7 @@ extern struct pglist_data contig_page_da #define NODE_DATA(nid) (&contig_page_data) #define NODE_MEM_MAP(nid) mem_map #define MAX_NODES_SHIFT 1 +#define pfn_to_nid(pfn) (0) #else /* CONFIG_DISCONTIGMEM */ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bugme-new] [Bug 2019] New: Bug from the mm subsystem involving X (fwd) 2004-02-06 7:17 ` Martin J. Bligh @ 2004-02-06 7:19 ` Martin J. Bligh 2004-02-06 9:57 ` Dave Hansen 1 sibling, 0 replies; 19+ messages in thread From: Martin J. Bligh @ 2004-02-06 7:19 UTC (permalink / raw) To: Linus Torvalds, Keith Mannthey; +Cc: Andrew Morton, linux-kernel, linux-mm Fix pfn_valid for architctures with discontiguous memory. This only changes the NUMA definition, and it leaves the NUMA-Q definition as was, because it's faster that way, it's in hotpaths, and our memory is always contiguous. diff -aurpN -X /home/fletch/.diff.exclude pfn_to_nid/include/asm-i386/mmzone.h pfn_valid/include/asm-i386/mmzone.h --- pfn_to_nid/include/asm-i386/mmzone.h Thu Feb 5 20:58:00 2004 +++ pfn_valid/include/asm-i386/mmzone.h Thu Feb 5 22:08:57 2004 @@ -121,14 +121,19 @@ static inline struct pglist_data *pfn_to + __zone->zone_start_pfn; \ }) #define pmd_page(pmd) (pfn_to_page(pmd_val(pmd) >> PAGE_SHIFT)) -/* - * pfn_valid should be made as fast as possible, and the current definition - * is valid for machines that are NUMA, but still contiguous, which is what - * is currently supported. A more generalised, but slower definition would - * be something like this - mbligh: - * ( pfn_to_pgdat(pfn) && ((pfn) < node_end_pfn(pfn_to_nid(pfn))) ) - */ + +#ifdef CONFIG_X86_NUMAQ /* we have contiguous memory on NUMA-Q */ #define pfn_valid(pfn) ((pfn) < num_physpages) +#else +static inline int pfn_valid(int pfn) +{ + int nid = pfn_to_nid(pfn); + + if (nid >= 0) + return (pfn < node_end_pfn(nid)); + return 0; +} +#endif extern int get_memcfg_numa_flat(void ); /* -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bugme-new] [Bug 2019] New: Bug from the mm subsystem involving X (fwd) 2004-02-06 7:17 ` Martin J. Bligh 2004-02-06 7:19 ` Martin J. Bligh @ 2004-02-06 9:57 ` Dave Hansen 2004-02-06 15:49 ` Martin J. Bligh 1 sibling, 1 reply; 19+ messages in thread From: Dave Hansen @ 2004-02-06 9:57 UTC (permalink / raw) To: Martin J. Bligh Cc: Linus Torvalds, Keith Mannthey, Andrew Morton, linux-kernel, linux-mm On Thu, 2004-02-05 at 23:17, Martin J. Bligh wrote: > +#ifdef CONFIG_NUMA > + #ifdef CONFIG_X86_NUMAQ > + #include <asm/numaq.h> > + #else /* summit or generic arch */ > + #include <asm/srat.h> > + #endif > +#else /* !CONFIG_NUMA */ > + #define get_memcfg_numa get_memcfg_numa_flat > + #define get_zholes_size(n) (0) > +#endif /* CONFIG_NUMA */ We ran into a bug with #ifdefs like this before. It was fixed in some of the code that you're trying to remove. It's not safe to assume that NUMA && !NUMAQ means SUMMIT. Remember the linking errors we got when we turned CONFIG_NUMA on with the regular PC config? The generic arch wasn't a problem because it sets CONFIG_X86_SUMMIT and compiles in the summit code, but the regular PC code doesn't. Also, I don't think we need the #ifdef CONFIG_NUMA around the whole block. How about something like this? #ifdef CONFIG_X86_NUMAQ #include <asm/numaq.h> #elif CONFIG_X86_SUMMIT #include <asm/srat.h> #else #define get_memcfg_numa get_memcfg_numa_flat #define get_zholes_size(n) (0) #endif /* CONFIG_NUMA */ > +static inline int pfn_to_nid(unsigned long pfn) > +{ > +#ifdef CONFIG_NUMA > + return(physnode_map[(pfn) / PAGES_PER_ELEMENT]); > +#else > + return 0; > +#endif > +} Looks like somebody pasted that in from a macro. "(pfn)" :) --dave -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bugme-new] [Bug 2019] New: Bug from the mm subsystem involving X (fwd) 2004-02-06 9:57 ` Dave Hansen @ 2004-02-06 15:49 ` Martin J. Bligh 2004-02-06 17:22 ` Dave Hansen 0 siblings, 1 reply; 19+ messages in thread From: Martin J. Bligh @ 2004-02-06 15:49 UTC (permalink / raw) To: Dave Hansen Cc: Linus Torvalds, Keith Mannthey, Andrew Morton, linux-kernel, linux-mm >> +#ifdef CONFIG_NUMA >> + #ifdef CONFIG_X86_NUMAQ >> + #include <asm/numaq.h> >> + #else /* summit or generic arch */ >> + #include <asm/srat.h> >> + #endif >> +#else /* !CONFIG_NUMA */ >> + #define get_memcfg_numa get_memcfg_numa_flat >> + #define get_zholes_size(n) (0) >> +#endif /* CONFIG_NUMA */ > > We ran into a bug with #ifdefs like this before. It was fixed in some > of the code that you're trying to remove. What bug? > It's not safe to assume that NUMA && !NUMAQ means SUMMIT. Remember the > linking errors we got when we turned CONFIG_NUMA on with the regular PC > config? The generic arch wasn't a problem because it sets > CONFIG_X86_SUMMIT and compiles in the summit code, but the regular PC > code doesn't. > > Also, I don't think we need the #ifdef CONFIG_NUMA around the whole > block. How about something like this? If you want to go change it, and test the crap out of it for 3 months on a variety of platforms, then go for it. What's here works, and is well tested - I'm sticking with it, unless you can point out a specific case where it's wrong. M. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bugme-new] [Bug 2019] New: Bug from the mm subsystem involving X (fwd) 2004-02-06 15:49 ` Martin J. Bligh @ 2004-02-06 17:22 ` Dave Hansen 2004-02-06 19:59 ` Martin J. Bligh 0 siblings, 1 reply; 19+ messages in thread From: Dave Hansen @ 2004-02-06 17:22 UTC (permalink / raw) To: Martin J. Bligh Cc: Linus Torvalds, Keith Mannthey, Andrew Morton, linux-kernel, linux-mm On Fri, 2004-02-06 at 07:49, Martin J. Bligh wrote: > >> +#ifdef CONFIG_NUMA > >> + #ifdef CONFIG_X86_NUMAQ > >> + #include <asm/numaq.h> > >> + #else /* summit or generic arch */ > >> + #include <asm/srat.h> > >> + #endif > >> +#else /* !CONFIG_NUMA */ > >> + #define get_memcfg_numa get_memcfg_numa_flat > >> + #define get_zholes_size(n) (0) > >> +#endif /* CONFIG_NUMA */ > > > > We ran into a bug with #ifdefs like this before. It was fixed in some > > of the code that you're trying to remove. > > What bug? With a regular PC config, plus CONFIG_NUMA turned on: CC arch/i386/kernel/process.o In file included from include/asm/mmzone.h:17, from include/linux/mmzone.h:318, from include/linux/gfp.h:4, from include/linux/slab.h:15, from include/linux/percpu.h:4, from include/linux/sched.h:31, from include/linux/module.h:10, from init/do_mounts.c:1: include/asm/srat.h:31: #error CONFIG_ACPI_SRAT not defined, and srat.h header has been included In file included from include/asm/mmzone.h:17, from include/linux/mmzone.h:318, from include/linux/gfp.h:4, from include/linux/slab.h:15, from include/linux/percpu.h:4, from include/linux/rcupdate.h:42, from include/linux/dcache.h:10, from include/linux/fs.h:17, from init/do_mounts_initrd.c:3: I can post the config if you like. You were the one who made me go fix it in the first place. That's why I added that #error. :) --dave -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bugme-new] [Bug 2019] New: Bug from the mm subsystem involving X (fwd) 2004-02-06 17:22 ` Dave Hansen @ 2004-02-06 19:59 ` Martin J. Bligh 2004-02-06 20:16 ` Linus Torvalds 0 siblings, 1 reply; 19+ messages in thread From: Martin J. Bligh @ 2004-02-06 19:59 UTC (permalink / raw) To: Dave Hansen Cc: Linus Torvalds, Keith Mannthey, Andrew Morton, linux-kernel, linux-mm, Andi Kleen --On Friday, February 06, 2004 09:22:49 -0800 Dave Hansen <haveblue@us.ibm.com> wrote: > On Fri, 2004-02-06 at 07:49, Martin J. Bligh wrote: >> >> +#ifdef CONFIG_NUMA >> >> + #ifdef CONFIG_X86_NUMAQ >> >> + #include <asm/numaq.h> >> >> + #else /* summit or generic arch */ >> >> + #include <asm/srat.h> >> >> + #endif >> >> +#else /* !CONFIG_NUMA */ >> >> + #define get_memcfg_numa get_memcfg_numa_flat >> >> + #define get_zholes_size(n) (0) >> >> +#endif /* CONFIG_NUMA */ >> > >> > We ran into a bug with #ifdefs like this before. It was fixed in some >> > of the code that you're trying to remove. >> >> What bug? > > With a regular PC config, plus CONFIG_NUMA turned on: Ah ... that's the problem. That's not a valid config - the correct way to do that is with generic arch, not the PC one. Somehow we ended up leaving that as allowable ... I think that was just a communiciation breakdown somewhere between you, Andi, and myself (or quite possibly between myself and myself ;-)). So ... I still think my original patch is correct (there's some stylistic stuff we could debate, but it's not a functional problem). Here's an additional patch that stops people from turning on NUMA for the PC subarch, which it wasn't designed to work with. Thanks, M. ------------------------------------------------------------- Disallow NUMA on the i386 PC subarch (it doesn't work, nor was it intended to). diff -purN -X /home/mbligh/.diff.exclude pfn_to_nid/arch/i386/Kconfig pc_numa/arch/i386/Kconfig --- pfn_to_nid/arch/i386/Kconfig 2004-02-04 16:23:49.000000000 -0800 +++ pc_numa/arch/i386/Kconfig 2004-02-06 11:16:19.000000000 -0800 @@ -701,7 +701,7 @@ config X86_PAE # Common NUMA Features config NUMA bool "Numa Memory Allocation Support" - depends on SMP && HIGHMEM64G && (X86_PC || X86_NUMAQ || X86_GENERICARCH || (X86_SUMMIT && ACPI && !ACPI_HT_ONLY)) + depends on SMP && HIGHMEM64G && (X86_NUMAQ || X86_GENERICARCH || (X86_SUMMIT && ACPI && !ACPI_HT_ONLY)) default n if X86_PC default y if (X86_NUMAQ || X86_SUMMIT) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bugme-new] [Bug 2019] New: Bug from the mm subsystem involving X (fwd) 2004-02-06 19:59 ` Martin J. Bligh @ 2004-02-06 20:16 ` Linus Torvalds 2004-02-06 21:18 ` Martin J. Bligh 0 siblings, 1 reply; 19+ messages in thread From: Linus Torvalds @ 2004-02-06 20:16 UTC (permalink / raw) To: Martin J. Bligh Cc: Dave Hansen, Keith Mannthey, Andrew Morton, linux-kernel, linux-mm, Andi Kleen On Fri, 6 Feb 2004, Martin J. Bligh wrote: > > Ah ... that's the problem. That's not a valid config It really _should_ be a valid config, though. Otherwise, nobody can ever test it in any reasonable way on a regular PC. So why not allow a NuMA config for a PC (and it should end up as being just one node, of course)? Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bugme-new] [Bug 2019] New: Bug from the mm subsystem involving X (fwd) 2004-02-06 20:16 ` Linus Torvalds @ 2004-02-06 21:18 ` Martin J. Bligh 0 siblings, 0 replies; 19+ messages in thread From: Martin J. Bligh @ 2004-02-06 21:18 UTC (permalink / raw) To: Linus Torvalds Cc: Dave Hansen, Keith Mannthey, Andrew Morton, linux-kernel, linux-mm, Andi Kleen > On Fri, 6 Feb 2004, Martin J. Bligh wrote: >> >> Ah ... that's the problem. That's not a valid config > > It really _should_ be a valid config, though. Otherwise, nobody can ever > test it in any reasonable way on a regular PC. > > So why not allow a NuMA config for a PC (and it should end up as being > just one node, of course)? We have that - it's what the generic arch is. It's also good for distros, as it'll enable them to build one binary kernel and run it on flat SMP boxes and the Summit/x440 boxes. If we really want to do good testing, we should make a fake NUMA config that can run a 4x SMP box as fake NUMA, with half the memory in each "node" and half the processors ... but I never got around to coding that ;-) M. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2004-02-06 21:18 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2004-02-04 23:17 [Bugme-new] [Bug 2019] New: Bug from the mm subsystem involving X (fwd) Martin J. Bligh 2004-02-04 23:58 ` Linus Torvalds 2004-02-05 0:12 ` Martin J. Bligh 2004-02-05 0:36 ` Martin J. Bligh 2004-02-05 0:43 ` Linus Torvalds 2004-02-05 0:56 ` Andrew Morton 2004-02-05 1:29 ` Linus Torvalds 2004-02-05 1:56 ` Keith Mannthey 2004-02-05 2:04 ` Linus Torvalds 2004-02-05 2:33 ` Keith Mannthey 2004-02-05 2:47 ` Linus Torvalds 2004-02-06 7:17 ` Martin J. Bligh 2004-02-06 7:19 ` Martin J. Bligh 2004-02-06 9:57 ` Dave Hansen 2004-02-06 15:49 ` Martin J. Bligh 2004-02-06 17:22 ` Dave Hansen 2004-02-06 19:59 ` Martin J. Bligh 2004-02-06 20:16 ` Linus Torvalds 2004-02-06 21:18 ` Martin J. Bligh
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox