From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.3 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6918C4338F for ; Wed, 25 Aug 2021 11:21:22 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2D69C6103A for ; Wed, 25 Aug 2021 11:21:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 2D69C6103A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 96D866B006C; Wed, 25 Aug 2021 07:21:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8F6B26B0071; Wed, 25 Aug 2021 07:21:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 76F6B6B0072; Wed, 25 Aug 2021 07:21:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0063.hostedemail.com [216.40.44.63]) by kanga.kvack.org (Postfix) with ESMTP id 578E36B006C for ; Wed, 25 Aug 2021 07:21:21 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 09F8D181CBDCF for ; Wed, 25 Aug 2021 11:21:21 +0000 (UTC) X-FDA: 78513361962.18.270E96C Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf07.hostedemail.com (Postfix) with ESMTP id A36E710000B9 for ; Wed, 25 Aug 2021 11:21:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1629890480; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VJXvNTmBBfDGXjU/7ARTSAdTY6SYM7YFAa0PiR2stTg=; b=EdvfxcLsKbclJ+Z78WyrI5eAAy2bM5CEoTfPKndmAd7poJSNkoA6U0qQm8/JgG7IJQySYr 73QNLR+sIixlzEZnkxCDZhEy0xIf9IHWnAhyDJ9Cp/bodegiwjeCRwEJ7PUff8PxyDHAi0 3fhBW5PUjRTCSIJLXuRZScarvTU1ANk= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-320-UyFgsv2WM5uR1JJZ_mcrUQ-1; Wed, 25 Aug 2021 07:21:18 -0400 X-MC-Unique: UyFgsv2WM5uR1JJZ_mcrUQ-1 Received: by mail-wm1-f72.google.com with SMTP id y188-20020a1c7dc5000000b002e80e0b2f87so2720837wmc.1 for ; Wed, 25 Aug 2021 04:21:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:cc:references:from:organization:subject :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=VJXvNTmBBfDGXjU/7ARTSAdTY6SYM7YFAa0PiR2stTg=; b=R3tqeSWbNgrZJ9M9bCzjke9edNqwuJTyicDLcP5IjbDL4js21/dl57/wPrhDhTGXfh 16IetZ5w4VSBLqV1UEKwAxoUwhtbmXHIqbkID92kt0pJVS81saNsNBlYnVSLSd2apil/ YFlzY25cCKc4RXQGiJ15XW06xh/Wics/QEskNJQA87JcVF4eB86AkHbobwZIPsVLeVtx aYH064hJEzV0XbYIcKuFFT4eMjZhHF9ErXgIIXiqLg9IzkRsm1R6B1If5DBIm3yGc7zX LbmP6JJY8YQBMcS+BD/HmbFCmaX4U9Q27xbdv9/S96wCWRIyc2sOQANDj6c3dw2AGPFN EY4A== X-Gm-Message-State: AOAM532ALhiEscaVqB3OUt1pptwiKOKwD8jDl8u/lycOt2zKGxtPSqiI M+EBXPdwmi3ZX2QiDvrV68N1jJGaT5bE9q0DPhpgTGuabOEq1pCP5427j2BjMbfQDPxTrIMZhQ5 wJZBmXOYPoIM= X-Received: by 2002:a1c:7408:: with SMTP id p8mr8909887wmc.24.1629890477576; Wed, 25 Aug 2021 04:21:17 -0700 (PDT) X-Google-Smtp-Source: ABdhPJymCQUYqkrSezlZuHLWadhqmHeEHEYmq/+7+MpDJj0ehJ32Vl/kM1+t3PD0cJvseIIanu7rDA== X-Received: by 2002:a1c:7408:: with SMTP id p8mr8909856wmc.24.1629890477315; Wed, 25 Aug 2021 04:21:17 -0700 (PDT) Received: from [192.168.3.132] (p4ff23d6b.dip0.t-ipconnect.de. [79.242.61.107]) by smtp.gmail.com with ESMTPSA id z2sm4923306wma.45.2021.08.25.04.21.16 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 25 Aug 2021 04:21:16 -0700 (PDT) To: Robin Murphy , Catalin Marinas Cc: Mike Rapoport , Alex Bee , Will Deacon , Andrew Morton , Anshuman Khandual , Linux Kernel Mailing List , linux-mm@kvack.org, Linux ARM , Christoph Hellwig References: <20210824173741.GC623@arm.com> <0908ce39-7e30-91fa-68ef-11620f9596ae@arm.com> <60a11eba-2910-3b5f-ef96-97d4556c1596@redhat.com> <20210825102044.GA3420@arm.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [BUG 5.14] arm64/mm: dma memory mapping fails (in some cases) Message-ID: Date: Wed, 25 Aug 2021 13:21:16 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=EdvfxcLs; spf=none (imf07.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: A36E710000B9 X-Stat-Signature: jpcto41g8aqgx44e7deobs6b1dok6u9p X-HE-Tag: 1629890480-256075 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 25.08.21 12:58, Robin Murphy wrote: > On 2021-08-25 11:38, David Hildenbrand wrote: >> On 25.08.21 12:20, Catalin Marinas wrote: >>> + hch >>> >>> On Tue, Aug 24, 2021 at 08:59:22PM +0200, David Hildenbrand wrote: >>>> On 24.08.21 20:46, Robin Murphy wrote: >>>>> On 2021-08-24 19:28, Mike Rapoport wrote: >>>>>> On Tue, Aug 24, 2021 at 06:37:41PM +0100, Catalin Marinas wrote: >>>>>>> On Tue, Aug 24, 2021 at 03:40:47PM +0200, Alex Bee wrote: >>>>>>>> it seems there is a regression in arm64 memory mapping in 5.14, >>>>>>>> since it >>>>>>>> fails on Rockchip RK3328 when the pl330 dmac tries to map with: >>>>>>>> >>>>>>>> =C2=A0 ------------[ cut here ]------------ >>>>>>>> =C2=A0 WARNING: CPU: 2 PID: 373 at kernel/dma/mapping.c:235 >>>>>>>> dma_map_resource+0x68/0xc0 >>>>>>>> =C2=A0 Modules linked in: spi_rockchip(+) fuse >>>>>>>> =C2=A0 CPU: 2 PID: 373 Comm: systemd-udevd Not tainted 5.14.0-r= c7 #1 >>>>>>>> =C2=A0 Hardware name: Pine64 Rock64 (DT) >>>>>>>> =C2=A0 pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=3D--) >>>>>>>> =C2=A0 pc : dma_map_resource+0x68/0xc0 >>>>>>>> =C2=A0 lr : pl330_prep_slave_fifo+0x78/0xd0 >>>>>>>> =C2=A0 sp : ffff800012102ae0 >>>>>>>> =C2=A0 x29: ffff800012102ae0 x28: ffff000005c94800 x27: 0000000= 000000000 >>>>>>>> =C2=A0 x26: ffff000000566bd0 x25: 0000000000000001 x24: 0000000= 000000001 >>>>>>>> =C2=A0 x23: 0000000000000002 x22: ffff000000628c00 x21: 0000000= 000000001 >>>>>>>> =C2=A0 x20: ffff000000566bd0 x19: 0000000000000001 x18: 0000000= 000000000 >>>>>>>> =C2=A0 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000= 000000000 >>>>>>>> =C2=A0 x14: 0000000000000277 x13: 0000000000000001 x12: 0000000= 000000000 >>>>>>>> =C2=A0 x11: 0000000000000001 x10: 00000000000008e0 x9 : ffff800= 012102a80 >>>>>>>> =C2=A0 x8 : ffff000000d14b80 x7 : ffff0000fe7b12f0 x6 : ffff000= 0fe7b1100 >>>>>>>> =C2=A0 x5 : fffffc000000000f x4 : 0000000000000000 x3 : 0000000= 000000001 >>>>>>>> =C2=A0 x2 : 0000000000000001 x1 : 00000000ff190800 x0 : ffff000= 000628c00 >>>>>>>> =C2=A0 Call trace: >>>>>>>> =C2=A0=C2=A0=C2=A0 dma_map_resource+0x68/0xc0 >>>>>>>> =C2=A0=C2=A0=C2=A0 pl330_prep_slave_sg+0x58/0x220 >>>>>>>> =C2=A0=C2=A0=C2=A0 rockchip_spi_prepare_dma+0xd8/0x2c0 [spi_roc= kchip] >>>>>>>> =C2=A0=C2=A0=C2=A0 rockchip_spi_transfer_one+0x294/0x3d8 [spi_r= ockchip] >>>>>>> [...] >>>>>>>> Note: This does not relate to the spi driver - when disabling >>>>>>>> this device in >>>>>>>> the device tree it fails for any other (i2s, for instance) which >>>>>>>> uses dma. >>>>>>>> Commenting out the failing check at [1], however, helps and the >>>>>>>> mapping >>>>>>>> works again. >>>>>> >>>>>>> Do you know which address dma_map_resource() is trying to map (ma= ybe >>>>>>> add some printk())? It's not supposed to map RAM, hence the warni= ng. >>>>>>> Random guess, the address is 0xff190800 (based on the x1 above bu= t >>>>>>> the >>>>>>> regs might as well be mangled). >>>>>> >>>>>> 0xff190800 will cause this warning for sure. It has a memory map, >>>>>> but it is >>>>>> not RAM so old version of pfn_valid() would return 0 and the new o= ne >>>>>> returns 1. >>>>> >>>>> How does that happen, though? It's not a memory address, and it's n= ot >>>>> even within the bounds of anywhere there should or could be memory. >>>>> This >>>>> SoC has a simple memory map - everything from 0 to 0xfeffffff goes = to >>>>> the DRAM controller (which may not all be populated, and may have >>>>> pieces >>>>> carved out by secure firmware), while 0xff000000-0xffffffff is MMIO= . >>>>> Why >>>>> do we have pages (or at least the assumption of pages) for somewher= e >>>>> which by all rights should not have them? >>>> >>>> Simple: we allocate the vmemmap for whole sections (e.g., 128 MiB) t= o >>>> avoid >>>> any such hacks. If there is a memory hole, it gets a memmap as well. >=20 > Urgh, apologies for being slow. This case isn't a memory hole as such, > but I failed to consider the *ends* of memory not being section-aligned > and leading to an overlap anyway. >=20 >>>> Tricking pfn_valid() into returning "false" where we actually have a >>>> memmap >>>> only makes it look like there is no memmap; but there is one, and >>>> it's PG_reserved. >>> >>> I can see the documentation for pfn_valid() does not claim anything m= ore >>> than the presence of an memmap entry. But I wonder whether the confus= ion >>> is wider-spread than just the DMA code. At a quick grep, try_ram_rema= p() >>> assumes __va() can be used on pfn_valid(), though I suspect it relies= on >>> the calling function to check that the resource was RAM. The arm64 >>> kern_addr_valid() returns true based on pfn_valid() and kcore.c uses >>> standard memcpy on it, which wouldn't work for I/O (should we change >>> this check to pfn_is_map_memory() for arm64?). >>> >> >> kern_addr_valid() checks that there is a direct map entry, and that th= e >> mapped address has a valid mmap. (copied from x86-64) >> >> Would you expect to have a direct map for memory holes and similar (IO= W, >> !System RAM)? >=20 > Probably - we can have no-map regions for firmware-reserved RAM which I > believe end up as PG_reserved if they're small enough, for the same > reasoning as this case. >=20 >>>>>>> Either pfn_valid() gets confused in 5.14 or something is wrong >>>>>>> with the >>>>>>> DT. I have a suspicion it's the former since reverting the above >>>>>>> commit >>>>>>> makes it disappear. >>>>>> >>>>>> I think pfn_valid() actually behaves as expected but the caller is >>>>>> wrong >>>>>> because pfn_valid !=3D RAM (this applies btw to !arm64 as well). >>>>>> >>>>>> =C2=A0=C2=A0=C2=A0=C2=A0/* Don't allow RAM to be mapped */ >>>>>> =C2=A0=C2=A0=C2=A0=C2=A0if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_= addr)))) >>>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return DMA_MAPPING_ERR= OR; >>>>>> >>>>>> Alex, can you please try this patch: >>>>> >>>>> That will certainly paper over the issue, but it's avoiding the >>>>> question >>>>> of what went wrong with the memory map in the first place. The comm= ent >>>>> is indeed a bit inaccurate, but ultimately dma_map_resource() exist= s >>>>> for >>>>> addresses that would be wrong to pass to dma_map_page(), so I belie= ve >>>>> pfn_valid() is still the correct check. >>>> >>>> If we want to check for RAM, pfn_valid() would be wrong. If we want >>>> to check >>>> for "is there a memmap, for whatever lives or does not live there", >>>> pfn_valid() is the right check. >>> >>> So what should the DMA code use instead? Last time we needed somethin= g >>> similar, the recommendation was to use pfn_to_online_page(). Mike is >>> suggesting memblock_is_memory(). >> >> We use pfn_to_online_page() when we want to know if it's system RAM an= d >> that the memmap actually contains something sane (-> memmap content ha= s >> a well defined state). Sorry that was only partially right: to be more precise=20 pfn_to_online_page() might also succeeds on memory holes and similar=20 (essentially everywhere where we don't have CONFIG_HAVE_ARCH_PFN_VALID),=20 it just means that there is a memmap and that the memmap has a well=20 defined. If it's system RAM, it's online and either getting used or=20 lying around as free in the buddy. For example, pfn_to_online_page() won't succeed on ZONE_DEVICE memory,=20 while pfn_valid() will. >> >> You can have offline memory blocks where pfn_to_online_page() would >> return "false", memblock_is_memory() would return "true". IOW, there i= s >> a memmap, it's System RAM, but the memmap is stale and not trustworthy= . >=20 > That's fine - offline memory is doubly-invalid to map as an MMIO resour= ce :) >=20 >> If you want to make sure no System RAM (online/offline/...) will get >> mapped, memblock_is_memory() should be the right thing to use. I recal= l >> that x86 traverse the resource tree instead to exclude system ram >> regions similarly. >=20 > I'm thinking that "pfn_valid(pfn) && !PageReserved(pfn_to_page(pfn))" For offline memory blocks or for ZONE_DEVICE memory where pfn_valid()=20 succeeds, the memmap might be stale and not contain anything reasonable.=20 So the PageReserved() check can be problematic. Also note that PageReserved() has plenty of different semantics (see=20 PG_reserved in include/linux/page-flags.h ) > might be the closest thing to what I'd like to express - does that seem > sensible at all? The one thing I'm not quite sure about is the > interaction with P2P mappings of ZONE_DEVICE, but that's also true of > the previous behaviour, and I'm not aware that the usage model has been > fully nailed down yet, so I suspect we have some wiggle room. At worst, Yes. > false negatives in certain situations wouldn't be the end of the world, > since this is merely a sanity check for something which is expected to > be a no-op the vast majority of the time, so being unobtrusive is more > important than being exhaustive. Right. --=20 Thanks, David / dhildenb