Hi Liang, On Wed, Mar 27, 2019 at 9:52 AM Liang Yang wrote: > > Hi Martin, > > Thanks a lot. > On 2019/3/26 2:31, Martin Blumenstingl wrote: > > Hi Liang, > > > > On Mon, Mar 25, 2019 at 11:03 AM Liang Yang wrote: > >> > >> Hi Martin, > >> > >> On 2019/3/23 5:07, Martin Blumenstingl wrote: > >>> Hi Matthew, > >>> > >>> On Thu, Mar 21, 2019 at 10:44 PM Matthew Wilcox wrote: > >>>> > >>>> On Thu, Mar 21, 2019 at 09:17:34PM +0100, Martin Blumenstingl wrote: > >>>>> Hello, > >>>>> > >>>>> I am experiencing the following crash: > >>>>> ------------[ cut here ]------------ > >>>>> kernel BUG at mm/slub.c:3950! > >>>> > >>>> if (unlikely(!PageSlab(page))) { > >>>> BUG_ON(!PageCompound(page)); > >>>> > >>>> You called kfree() on the address of a page which wasn't allocated by slab. > >>>> > >>>>> I have traced this crash to the kfree() in meson_nfc_read_buf(). > >>>>> my observation is as follows: > >>>>> - meson_nfc_read_buf() is called 7 times without any crash, the > >>>>> kzalloc() call returns 0xe9e6c600 (virtual address) / 0x29e6c600 > >>>>> (physical address) > >>>>> - the eight time meson_nfc_read_buf() is called kzalloc() call returns > >>>>> 0xee39a38b (virtual address) / 0x2e39a38b (physical address) and the > >>>>> final kfree() crashes > >>>>> - changing the size in the kzalloc() call from PER_INFO_BYTE (= 8) to > >>>>> PAGE_SIZE works around that crash > >>>> > >>>> I suspect you're doing something which corrupts memory. Overrunning > >>>> the end of your allocation or something similar. Have you tried KASAN > >>>> or even the various slab debugging (eg redzones)? > >>> KASAN is not available on 32-bit ARM. there was some progress last > >>> year [0] but it didn't make it into mainline. I tried to make the > >>> patches apply again and got it to compile (and my kernel is still > >>> booting) but I have no idea if it's still working. for anyone > >>> interested, my patches are here: [1] (I consider this a HACK because I > >>> don't know anything about the code which is being touched in the > >>> patches, I only made it compile) > >>> > >>> SLAB debugging (redzones) were a great hint, thank you very much for > >>> that Matthew! I enabled: > >>> CONFIG_SLUB_DEBUG=y > >>> CONFIG_SLUB_DEBUG_ON=y > >>> and with that I now get "BUG kmalloc-64 (Not tainted): Redzone > >>> overwritten" (a larger kernel log extract is attached). > >>> > >>> I'm starting to wonder if the NAND controller (hardware) writes more > >>> than 8 bytes. > >>> some context: the "info" buffer allocated in meson_nfc_read_buf is > >>> then passed to the NAND controller IP (after using dma_map_single). > >>> > >>> Liang, how does the NAND controller know that it only has to send > >>> PER_INFO_BYTE (= 8) bytes when called from meson_nfc_read_buf? all > >>> other callers of meson_nfc_dma_buffer_setup (which passes the info > >>> buffer to the hardware) are using (nand->ecc.steps * PER_INFO_BYTE) > >>> bytes? > >>> > >> NFC_CMD_N2M and CMDRWGEN are different commands. CMDRWGEN needs to set > >> the ecc page size (1KB or 512B) and Pages(2, 4, 8, ...), so > >> PER_INFO_BYTE(= 8) bytes for each ecc page. > >> I have never used NFC_CMD_N2M to transfer data before, because it is > >> very low efficient. And I do a experiment with the attachment and find > >> on overwritten on my meson axg platform. > >> > >> Martin, I would appreciate it very much if you would try the attachment > >> on your meson m8b platform. > > thank you for your debug patch! on my board 2 * PER_INFO_BYTE is not enough. > > I took the idea from your patch and adapted it so I could print a > > buffer with 256 bytes (which seems to be "big enough" for my board). > it only needs PER_INFO_BYTE (= 8) bytes, because NFC_CMD_N2M don't set > *Pages*, that is not like CMDRWGEN which needs Pages*PER_INFO_BYTE (= 8) > bytes when setting *Pages* parameter. I have been thinking that > NFC_CMD_N2M only occupis PER_INFO_BYTE (= 8) bytes. And i have tried to > not set the info address, the machine would crash. thank you for the explanation. the command is built using: cmd = NFC_CMD_N2M | (len & GENMASK(5, 0)); > > see the attached, modified patch > > > > in the output I see that sometimes the first 32 bytes are not touched > > by the controller, but everything beyond 32 bytes is modified in the > > info buffer. > > > it really makes sense that the controller sometimes fills the space > beyond the first 8 bytes. However i expect the controller should only > take the first 8 bytes when using NFC_CMD_N2M. in my tests (see the attached log output) it seems that the info buffer size has the following constraints: - use the "len" which is passed to meson_nfc_read_buf - if "len" is smaller than PER_INFO_BYTE then use PER_INFO_BYTE (= 8) > > I also tried to increase the buffer size to 512, but that didn't make > > a difference (I never saw any info buffer modification beyond 256 > > bytes). > > > > also I just noticed that I didn't give you much details on my NAND chip yet. > > from Amlogic vendor u-boot on Meson8m2 (all my Meson8b boards have > > eMMC flash, but I believe the NAND controller on Meson8 to GXBB is > > identical): > > m8m2_n200_v1#amlnf chipinfo > > flash info > > name:B revision 20nm NAND 8GiB H27UCG8T2B, id:ad de 94 eb 74 44 0 0 > > pagesize:0x4000, blocksize:0x400000, oobsize:0x500, chipsize:0x2000, > > option:0x8, T_REA:16, T_RHOH:15 > > hw controller info > > chip_num:1, onfi_mode:0, page_shift:14, block_shift:22, option:0xc2 > > ecc_unit:1024, ecc_bytes:70, ecc_steps:16, ecc_max:40 > > bch_mode:5, user_mode:2, oobavail:32, oobtail:64384 > > > I don't think it is caused by a different NAND type, but i have followed > the some test on my GXL platform. we can see the result from the > attachment. By the way, i don't find any information about this on meson > NFC datasheet, so i will ask our VLSI. > Martin, May you reproduce it with the new patch on meson8b platform ? I > need a more clear and easier compared log like gxl.txt. Thanks. your gxl.txt is great, finally I can also compare my own results with something that works for you! in my results (see attachment) the "DATA_IN [256 B, force 8-bit]" instructions result in a different info buffer output. does this make any sense to you? Regards Martin