From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Wed, 04 Feb 2004 16:12:38 -0800 From: "Martin J. Bligh" Subject: Re: [Bugme-new] [Bug 2019] New: Bug from the mm subsystem involving X (fwd) Message-ID: <60330000.1075939958@flay> In-Reply-To: References: <51080000.1075936626@flay> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline Sender: owner-linux-mm@kvack.org Return-Path: To: Linus Torvalds Cc: linux-kernel , linux-mm mailing list , kmannth@us.ibm.com List-ID: >> So there have been alot of X issue with Red Hat and 2.6 kernels. I managed to >> get the system to panic and I decide it was time to open this bug. I got this >> on boot up. > > Hmm. Compiler? Why would AS-3 in particular have problems? I think it's more likely the combination of NUMA and X. People hardly ever run X on the big servers ... Keith is just odd ;-) >> Unable to handle kernel paging request at virtual address 0264d000 >> printing eip: >> c0147af4 >> *pde = 00000000 >> Oops: 0000 [#1] >> CPU: 7 >> EIP: 0060:[] Not tainted >> EFLAGS: 00013206 >> EIP is at remap_page_range+0x193/0x26c >> eax: 0264d000 ebx: 000f5200 ecx: 00000001 edx: dad0fa80 >> esi: 001fe000 edi: d87c9ff0 ebp: f5200000 esp: d8835ee4 >> ds: 007b es: 007b ss: 0068 >> Process X (pid: 1285, threadinfo=d8834000 task=d9474ce0) >> Stack: d961d580 001ff000 001ff000 40000000 f5002000 001fe000 d9578000 d961d580 >> 401ff000 d9576508 00000000 f5200000 d961d580 00000001 c0247055 d87d62c0 >> 401fe000 b5002000 00001000 00000027 d9388e80 00001000 c014a7fd d9388e80 >> Call Trace: >> [] mmap_mem+0x71/0xd4 >> [] do_mmap_pgoff+0x362/0x70d >> [] filp_open+0x67/0x69 >> [] sys_mmap2+0x7a/0xaa >> [] sysenter_past_esp+0x52/0x71 >> >> Code: 8b 00 a9 00 08 00 00 74 10 89 d8 8b 54 24 4c c1 e8 14 09 ea > > This _seems_ to be the code > > ... > if (!pfn_valid(pfn) || PageReserved(pfn_to_page(pfn))) > set_pte(pte, pfn_pte(pfn, prot)); > ... > > in particular, it disassembles to > > 0x8048490 : mov (%eax),%eax > 0x8048492 : test $0x800,%eax > 0x8048497 : je 0x80484a9 > 0x8048499 : mov %ebx,%eax > 0x804849b : mov 0x4c(%esp,1),%edx > 0x804849f : shr $0x14,%eax > > which seems to be the "PageReserved(pfn_to_page(pfn))" test. > > This implies that you have either: > - a buggy "pfn_valid()" macro (do you use CONFIG_DISCONTIGMEM?) Yup. #define pfn_valid(pfn) ((pfn) < num_physpages) Which is wrong. There's a even a comment above it that says: /* * pfn_valid should be made as fast as possible, and the current definition * is valid for machines that are NUMA, but still contiguous, which is what * is currently supported. A more generalised, but slower definition would * be something like this - mbligh: * ( pfn_to_pgdat(pfn) && ((pfn) < node_end_pfn(pfn_to_nid(pfn))) ) */ ;-) Which I still don't think is correct, as there's a hole in the middle of node 0 ... I'll make a new patch up somehow and give to Keith to test ;-) Thanks, M. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: aart@kvack.org