Tejun Heo wrote: > Pekka Enberg wrote: > >> Tejun Heo wrote: >> >>> Pekka Enberg wrote: >>> >>>> On Fri, Sep 18, 2009 at 10:34 PM, Mel Gorman wrote: >>>> >>>>> SLQB used a seemingly nice hack to allocate per-node data for the >>>>> statically >>>>> initialised caches. Unfortunately, due to some unknown per-cpu >>>>> optimisation, these regions are being reused by something else as the >>>>> per-node data is getting randomly scrambled. This patch fixes the >>>>> problem but it's not fully understood *why* it fixes the problem at the >>>>> moment. >>>>> >>>> Ouch, that sounds bad. I guess it's architecture specific bug as x86 >>>> works ok? Lets CC Tejun. >>>> >>> Is the corruption being seen on ppc or s390? >>> >> On ppc. >> > > Can you please post full dmesg showing the corruption? Also, if you > apply the attached patch, does the added BUG_ON() trigger? > I applied the three patches from Mel and one from Tejun. With these patches applied the machine boots past the original reported SLQB problem, but then hangs just after printing these messages. <6>ehea: eth0: Physical port up <7>irq: irq 33539 on host null mapped to virtual irq 259 <6>ehea: External switch port is backup port <7>irq: irq 33540 on host null mapped to virtual irq 260 <6>NET: Registered protocol family 10 ^^^^^^ Hangs at this point. Tejun, the above hang looks exactly the same as the one i have reported here : http://lists.ozlabs.org/pipermail/linuxppc-dev/2009-September/075791.html This particular hang was bisected to the following patch powerpc64: convert to dynamic percpu allocator This hang can be recreated without SLQB. So i think this is a different problem. I have attached the complete dmesg log here. Thanks -Sachin -- --------------------------------- Sachin Sant IBM Linux Technology Center India Systems and Technology Labs Bangalore, India ---------------------------------