Tejun Heo wrote:
> Pekka Enberg wrote:
>   
>> Tejun Heo wrote:
>>     
>>> Pekka Enberg wrote:
>>>       
>>>> On Fri, Sep 18, 2009 at 10:34 PM, Mel Gorman <mel@csn.ul.ie> wrote:
>>>>         
>>>>> SLQB used a seemingly nice hack to allocate per-node data for the
>>>>> statically
>>>>> initialised caches. Unfortunately, due to some unknown per-cpu
>>>>> optimisation, these regions are being reused by something else as the
>>>>> per-node data is getting randomly scrambled. This patch fixes the
>>>>> problem but it's not fully understood *why* it fixes the problem at the
>>>>> moment.
>>>>>           
>>>> Ouch, that sounds bad. I guess it's architecture specific bug as x86
>>>> works ok? Lets CC Tejun.
>>>>         
>>> Is the corruption being seen on ppc or s390?
>>>       
>> On ppc.
>>     
>
> Can you please post full dmesg showing the corruption?  Also, if you
> apply the attached patch, does the added BUG_ON() trigger?
>   
I applied the three patches from Mel and one from Tejun.
With these patches applied the machine boots past
the original reported SLQB problem, but then hangs
just after printing these messages.

<6>ehea: eth0: Physical port up
<7>irq: irq 33539 on host null mapped to virtual irq 259
<6>ehea: External switch port is backup port
<7>irq: irq 33540 on host null mapped to virtual irq 260
<6>NET: Registered protocol family 10
^^^^^^ Hangs at this point.

Tejun, the above hang looks exactly the same as the one
i have reported here :

http://lists.ozlabs.org/pipermail/linuxppc-dev/2009-September/075791.html

This particular hang was bisected to the following patch

powerpc64: convert to dynamic percpu allocator

This hang can be recreated without SLQB. So i think this is a different
problem. 

I have attached the complete dmesg log here.

Thanks
-Sachin


-- 

---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------