Hi Martin, Thanks a lot. On 2019/3/26 2:31, Martin Blumenstingl wrote: > Hi Liang, > > On Mon, Mar 25, 2019 at 11:03 AM Liang Yang wrote: >> >> Hi Martin, >> >> On 2019/3/23 5:07, Martin Blumenstingl wrote: >>> Hi Matthew, >>> >>> On Thu, Mar 21, 2019 at 10:44 PM Matthew Wilcox wrote: >>>> >>>> On Thu, Mar 21, 2019 at 09:17:34PM +0100, Martin Blumenstingl wrote: >>>>> Hello, >>>>> >>>>> I am experiencing the following crash: >>>>> ------------[ cut here ]------------ >>>>> kernel BUG at mm/slub.c:3950! >>>> >>>> if (unlikely(!PageSlab(page))) { >>>> BUG_ON(!PageCompound(page)); >>>> >>>> You called kfree() on the address of a page which wasn't allocated by slab. >>>> >>>>> I have traced this crash to the kfree() in meson_nfc_read_buf(). >>>>> my observation is as follows: >>>>> - meson_nfc_read_buf() is called 7 times without any crash, the >>>>> kzalloc() call returns 0xe9e6c600 (virtual address) / 0x29e6c600 >>>>> (physical address) >>>>> - the eight time meson_nfc_read_buf() is called kzalloc() call returns >>>>> 0xee39a38b (virtual address) / 0x2e39a38b (physical address) and the >>>>> final kfree() crashes >>>>> - changing the size in the kzalloc() call from PER_INFO_BYTE (= 8) to >>>>> PAGE_SIZE works around that crash >>>> >>>> I suspect you're doing something which corrupts memory. Overrunning >>>> the end of your allocation or something similar. Have you tried KASAN >>>> or even the various slab debugging (eg redzones)? >>> KASAN is not available on 32-bit ARM. there was some progress last >>> year [0] but it didn't make it into mainline. I tried to make the >>> patches apply again and got it to compile (and my kernel is still >>> booting) but I have no idea if it's still working. for anyone >>> interested, my patches are here: [1] (I consider this a HACK because I >>> don't know anything about the code which is being touched in the >>> patches, I only made it compile) >>> >>> SLAB debugging (redzones) were a great hint, thank you very much for >>> that Matthew! I enabled: >>> CONFIG_SLUB_DEBUG=y >>> CONFIG_SLUB_DEBUG_ON=y >>> and with that I now get "BUG kmalloc-64 (Not tainted): Redzone >>> overwritten" (a larger kernel log extract is attached). >>> >>> I'm starting to wonder if the NAND controller (hardware) writes more >>> than 8 bytes. >>> some context: the "info" buffer allocated in meson_nfc_read_buf is >>> then passed to the NAND controller IP (after using dma_map_single). >>> >>> Liang, how does the NAND controller know that it only has to send >>> PER_INFO_BYTE (= 8) bytes when called from meson_nfc_read_buf? all >>> other callers of meson_nfc_dma_buffer_setup (which passes the info >>> buffer to the hardware) are using (nand->ecc.steps * PER_INFO_BYTE) >>> bytes? >>> >> NFC_CMD_N2M and CMDRWGEN are different commands. CMDRWGEN needs to set >> the ecc page size (1KB or 512B) and Pages(2, 4, 8, ...), so >> PER_INFO_BYTE(= 8) bytes for each ecc page. >> I have never used NFC_CMD_N2M to transfer data before, because it is >> very low efficient. And I do a experiment with the attachment and find >> on overwritten on my meson axg platform. >> >> Martin, I would appreciate it very much if you would try the attachment >> on your meson m8b platform. > thank you for your debug patch! on my board 2 * PER_INFO_BYTE is not enough. > I took the idea from your patch and adapted it so I could print a > buffer with 256 bytes (which seems to be "big enough" for my board). it only needs PER_INFO_BYTE (= 8) bytes, because NFC_CMD_N2M don't set *Pages*, that is not like CMDRWGEN which needs Pages*PER_INFO_BYTE (= 8) bytes when setting *Pages* parameter. I have been thinking that NFC_CMD_N2M only occupis PER_INFO_BYTE (= 8) bytes. And i have tried to not set the info address, the machine would crash. > see the attached, modified patch > > in the output I see that sometimes the first 32 bytes are not touched > by the controller, but everything beyond 32 bytes is modified in the > info buffer. > it really makes sense that the controller sometimes fills the space beyond the first 8 bytes. However i expect the controller should only take the first 8 bytes when using NFC_CMD_N2M. > I also tried to increase the buffer size to 512, but that didn't make > a difference (I never saw any info buffer modification beyond 256 > bytes). > > also I just noticed that I didn't give you much details on my NAND chip yet. > from Amlogic vendor u-boot on Meson8m2 (all my Meson8b boards have > eMMC flash, but I believe the NAND controller on Meson8 to GXBB is > identical): > m8m2_n200_v1#amlnf chipinfo > flash info > name:B revision 20nm NAND 8GiB H27UCG8T2B, id:ad de 94 eb 74 44 0 0 > pagesize:0x4000, blocksize:0x400000, oobsize:0x500, chipsize:0x2000, > option:0x8, T_REA:16, T_RHOH:15 > hw controller info > chip_num:1, onfi_mode:0, page_shift:14, block_shift:22, option:0xc2 > ecc_unit:1024, ecc_bytes:70, ecc_steps:16, ecc_max:40 > bch_mode:5, user_mode:2, oobavail:32, oobtail:64384 > I don't think it is caused by a different NAND type, but i have followed the some test on my GXL platform. we can see the result from the attachment. By the way, i don't find any information about this on meson NFC datasheet, so i will ask our VLSI. Martin, May you reproduce it with the new patch on meson8b platform ? I need a more clear and easier compared log like gxl.txt. Thanks. > > Regards > > Martin >