Hi Liang, On Mon, Mar 25, 2019 at 11:03 AM Liang Yang wrote: > > Hi Martin, > > On 2019/3/23 5:07, Martin Blumenstingl wrote: > > Hi Matthew, > > > > On Thu, Mar 21, 2019 at 10:44 PM Matthew Wilcox wrote: > >> > >> On Thu, Mar 21, 2019 at 09:17:34PM +0100, Martin Blumenstingl wrote: > >>> Hello, > >>> > >>> I am experiencing the following crash: > >>> ------------[ cut here ]------------ > >>> kernel BUG at mm/slub.c:3950! > >> > >> if (unlikely(!PageSlab(page))) { > >> BUG_ON(!PageCompound(page)); > >> > >> You called kfree() on the address of a page which wasn't allocated by slab. > >> > >>> I have traced this crash to the kfree() in meson_nfc_read_buf(). > >>> my observation is as follows: > >>> - meson_nfc_read_buf() is called 7 times without any crash, the > >>> kzalloc() call returns 0xe9e6c600 (virtual address) / 0x29e6c600 > >>> (physical address) > >>> - the eight time meson_nfc_read_buf() is called kzalloc() call returns > >>> 0xee39a38b (virtual address) / 0x2e39a38b (physical address) and the > >>> final kfree() crashes > >>> - changing the size in the kzalloc() call from PER_INFO_BYTE (= 8) to > >>> PAGE_SIZE works around that crash > >> > >> I suspect you're doing something which corrupts memory. Overrunning > >> the end of your allocation or something similar. Have you tried KASAN > >> or even the various slab debugging (eg redzones)? > > KASAN is not available on 32-bit ARM. there was some progress last > > year [0] but it didn't make it into mainline. I tried to make the > > patches apply again and got it to compile (and my kernel is still > > booting) but I have no idea if it's still working. for anyone > > interested, my patches are here: [1] (I consider this a HACK because I > > don't know anything about the code which is being touched in the > > patches, I only made it compile) > > > > SLAB debugging (redzones) were a great hint, thank you very much for > > that Matthew! I enabled: > > CONFIG_SLUB_DEBUG=y > > CONFIG_SLUB_DEBUG_ON=y > > and with that I now get "BUG kmalloc-64 (Not tainted): Redzone > > overwritten" (a larger kernel log extract is attached). > > > > I'm starting to wonder if the NAND controller (hardware) writes more > > than 8 bytes. > > some context: the "info" buffer allocated in meson_nfc_read_buf is > > then passed to the NAND controller IP (after using dma_map_single). > > > > Liang, how does the NAND controller know that it only has to send > > PER_INFO_BYTE (= 8) bytes when called from meson_nfc_read_buf? all > > other callers of meson_nfc_dma_buffer_setup (which passes the info > > buffer to the hardware) are using (nand->ecc.steps * PER_INFO_BYTE) > > bytes? > > > NFC_CMD_N2M and CMDRWGEN are different commands. CMDRWGEN needs to set > the ecc page size (1KB or 512B) and Pages(2, 4, 8, ...), so > PER_INFO_BYTE(= 8) bytes for each ecc page. > I have never used NFC_CMD_N2M to transfer data before, because it is > very low efficient. And I do a experiment with the attachment and find > on overwritten on my meson axg platform. > > Martin, I would appreciate it very much if you would try the attachment > on your meson m8b platform. thank you for your debug patch! on my board 2 * PER_INFO_BYTE is not enough. I took the idea from your patch and adapted it so I could print a buffer with 256 bytes (which seems to be "big enough" for my board). see the attached, modified patch in the output I see that sometimes the first 32 bytes are not touched by the controller, but everything beyond 32 bytes is modified in the info buffer. I also tried to increase the buffer size to 512, but that didn't make a difference (I never saw any info buffer modification beyond 256 bytes). also I just noticed that I didn't give you much details on my NAND chip yet. from Amlogic vendor u-boot on Meson8m2 (all my Meson8b boards have eMMC flash, but I believe the NAND controller on Meson8 to GXBB is identical): m8m2_n200_v1#amlnf chipinfo flash info name:B revision 20nm NAND 8GiB H27UCG8T2B, id:ad de 94 eb 74 44 0 0 pagesize:0x4000, blocksize:0x400000, oobsize:0x500, chipsize:0x2000, option:0x8, T_REA:16, T_RHOH:15 hw controller info chip_num:1, onfi_mode:0, page_shift:14, block_shift:22, option:0xc2 ecc_unit:1024, ecc_bytes:70, ecc_steps:16, ecc_max:40 bch_mode:5, user_mode:2, oobavail:32, oobtail:64384 Regards Martin