From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 11EC0C433F5 for ; Mon, 20 Dec 2021 03:49:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2D3A96B0071; Sun, 19 Dec 2021 22:49:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2824D6B0073; Sun, 19 Dec 2021 22:49:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 149586B0074; Sun, 19 Dec 2021 22:49:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0188.hostedemail.com [216.40.44.188]) by kanga.kvack.org (Postfix) with ESMTP id 05B6D6B0071 for ; Sun, 19 Dec 2021 22:49:11 -0500 (EST) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id B1BB68248D52 for ; Mon, 20 Dec 2021 03:49:00 +0000 (UTC) X-FDA: 78936791640.08.007B72D Received: from mail-oi1-f181.google.com (mail-oi1-f181.google.com [209.85.167.181]) by imf01.hostedemail.com (Postfix) with ESMTP id 06B104002A for ; Mon, 20 Dec 2021 03:48:53 +0000 (UTC) Received: by mail-oi1-f181.google.com with SMTP id t23so13871082oiw.3 for ; Sun, 19 Dec 2021 19:49:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=B1j2CDzTNF0m2JBkGbaCYgtzCPMnfZRu2YruHU72ntE=; b=CzMq9dYGxrgDIms9/jNx2BuPUB+jYthIp9jXrg4ofFTjTXUP2xLWgcc1XzU089Yv6H cX+IMxZY70kQHE/SpJ0XhvW4vrHfg2qaYAo4V4gY2nUmHMjO70fm11Wct0we3N0D0wbl tMp+hw1c2UTACB/PC9uOi0AjrxhNWgdXfTHmmlnXn1CQf2z75Rr0pIMb8zZmRSt6HrKW A/+cB3wpSpRPkJfEm49W+HlxejclfMLxeYxSddZuaU0oG1JTTvNS/am2qXXdyCgK+4yZ tSStCpuMRlbnZNBGziuZo/qb5tKgpiqLeA6vIS5/yphXAWoVZB/OW2mxOOrp3qCGCCoU Eqwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=B1j2CDzTNF0m2JBkGbaCYgtzCPMnfZRu2YruHU72ntE=; b=G3WsbmVYRjLbuFqp5MeMUFr/qaBZDjkjaFDg0CnUVKopSh3eN6PKyylqYO8qmHa4wJ RE40rQ9hHlswn3KCdELrBWRc86jjMWmPYZ9RafAmV2hn3C/uq2E09RxljAsB8oHCF6iI DiWRnROc4iyJv+VeaBbQscxwgWoIzZiazoOxP+diCeAyYTfEZULdLxM+TCw00/P3tSh7 8PiuCddDsWfEYMIY0+Cpmb5kGpixwkHHdTizKJoM7rjMrGVUNYAQRbJHX9LV3mKVfIdx UwFOgeEkdnD0W7XuNSTfhsAOdIg9tmML9tTBF0m41oGtN2KFD3MkQGI4EApMxWZKlYqo hdTg== X-Gm-Message-State: AOAM531w8fwK+ofzggpNsLWcCjZC0JwgGZfAJs00gTDhYU0aSBqFwHy+ rWfzQHAAlmSoQTYodY8IoEC82YmJIv5ATOjh0/M= X-Google-Smtp-Source: ABdhPJy6Iydo/sxMOAklsGW6oGKTDwBI1+syO1sfRCphXRESVwwOKJ8Y1ck13Xns7X62wl4ODtiXn2mvSNykIdUlLLo= X-Received: by 2002:a54:4584:: with SMTP id z4mr16454845oib.158.1639972139627; Sun, 19 Dec 2021 19:48:59 -0800 (PST) MIME-Version: 1.0 References: <20211215080242.3034856-1-aisheng.dong@nxp.com> <20211215080242.3034856-2-aisheng.dong@nxp.com> <783f64f5-3a55-6c42-a740-19a12c2c7321@redhat.com> <7d9b7e5f-a6c0-2079-90e7-c02aaeb1f4c0@redhat.com> <88ce4f53-587b-c18a-9694-a3e173e6e030@redhat.com> In-Reply-To: <88ce4f53-587b-c18a-9694-a3e173e6e030@redhat.com> From: Dong Aisheng Date: Mon, 20 Dec 2021 11:43:48 +0800 Message-ID: Subject: Re: [PATCH 1/2] mm: cma: fix allocation may fail sometimes To: David Hildenbrand Cc: Aisheng Dong , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , "linux-arm-kernel@lists.infradead.org" , Jason Liu , Leo Li , Abel Vesa , "shawnguo@kernel.org" , dl-linux-imx , "akpm@linux-foundation.org" , "m.szyprowski@samsung.com" , "lecopzer.chen@mediatek.com" , "vbabka@suse.cz" , "stable@vger.kernel.org" , Shijie Qin Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 06B104002A X-Stat-Signature: gz4f8usxjo7mt3usjz7qoeocckq34gyo Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=CzMq9dYG; spf=pass (imf01.hostedemail.com: domain of dongas86@gmail.com designates 209.85.167.181 as permitted sender) smtp.mailfrom=dongas86@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-HE-Tag: 1639972133-560017 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Dec 17, 2021 at 8:27 PM David Hildenbrand wrote: > > On 17.12.21 04:44, Aisheng Dong wrote: > >> From: David Hildenbrand > >> Sent: Thursday, December 16, 2021 6:57 PM > >> > >> On 16.12.21 03:54, Aisheng Dong wrote: > >>>> From: David Hildenbrand > >>>> Sent: Wednesday, December 15, 2021 8:31 PM > >>>> > >>>> On 15.12.21 09:02, Dong Aisheng wrote: > >>>>> We met dma_alloc_coherent() fail sometimes when doing 8 VPU decoder > >>>>> test in parallel on a MX6Q SDB board. > >>>>> > >>>>> Error log: > >>>>> cma: cma_alloc: linux,cma: alloc failed, req-size: 148 pages, ret: > >>>>> -16 > >>>>> cma: number of available pages: > >>>>> > >>>> > >> 3@125+20@172+12@236+4@380+32@736+17@2287+23@2473+20@3607 > >>>> 6+99@40477+108 > >>>>> @40852+44@41108+20@41196+108@41364+108@41620+ > >>>>> > >>>> > >> 108@42900+108@43156+483@44061+1763@45341+1440@47712+20@49 > >>>> 324+20@49388+ > >>>>> 5076@49452+2304@55040+35@58141+20@58220+20@58284+ > >>>>> 7188@58348+84@66220+7276@66452+227@74525+6371@75549=> > >>>> 33161 free of > >>>>> 81920 total pages > >>>>> > >>>>> When issue happened, we saw there were still 33161 pages (129M) free > >>>>> CMA memory and a lot available free slots for 148 pages in CMA > >>>>> bitmap that we want to allocate. > >>>>> > >>>>> If dumping memory info, we found that there was also ~342M normal > >>>>> memory, but only 1352K CMA memory left in buddy system while a lot > >>>>> of pageblocks were isolated. > >>>>> > >>>>> Memory info log: > >>>>> Normal free:351096kB min:30000kB low:37500kB high:45000kB > >>>> reserved_highatomic:0KB > >>>>> active_anon:98060kB inactive_anon:98948kB active_file:60864kB > >>>> inactive_file:31776kB > >>>>> unevictable:0kB writepending:0kB present:1048576kB > >>>> managed:1018328kB mlocked:0kB > >>>>> bounce:0kB free_pcp:220kB local_pcp:192kB free_cma:1352kB > >>>>> lowmem_reserve[]: 0 0 0 > >>>>> Normal: 78*4kB (UECI) 1772*8kB (UMECI) 1335*16kB (UMECI) 360*32kB > >>>> (UMECI) 65*64kB (UMCI) > >>>>> 36*128kB (UMECI) 16*256kB (UMCI) 6*512kB (EI) 8*1024kB (UEI) > >>>> 4*2048kB (MI) 8*4096kB (EI) > >>>>> 8*8192kB (UI) 3*16384kB (EI) 8*32768kB (M) = 489288kB > >>>>> > >>>>> The root cause of this issue is that since commit a4efc174b382 > >>>>> ("mm/cma.c: remove redundant cma_mutex lock"), CMA supports > >>>> concurrent > >>>>> memory allocation. It's possible that the pageblock process A try to > >>>>> alloc has already been isolated by the allocation of process B > >>>>> during memory migration. > >>>>> > >>>>> When there're multi process allocating CMA memory in parallel, it's > >>>>> likely that other the remain pageblocks may have also been isolated, > >>>>> then CMA alloc fail finally during the first round of scanning of > >>>>> the whole available CMA bitmap. > >>>> > >>>> I already raised in different context that we should most probably > >>>> convert that -EBUSY to -EAGAIN -- to differentiate an actual > >>>> migration problem from a simple "concurrent allocations that target the > >> same MAX_ORDER -1 range". > >>>> > >>> > >>> Thanks for the info. Is there a patch under review? > >> > >> No, and I was too busy for now to send it out. > >> > >>> BTW i wonder that probably makes no much difference for my patch since > >>> we may prefer retry the next pageblock rather than busy waiting on the > >> same isolated pageblock. > >> > >> Makes sense. BUT as of now we isolate not only a pageblock but a > >> MAX_ORDER -1 page (e.g., 2 pageblocks on x86-64 (!) ). So you'll have the > >> same issue in that case. > > > > Yes, should I change to try next MAX_ORDER_NR_PAGES or keep as it is > > and let the core to improve it later? > > > > I saw there's a patchset under review which is going to remove the > > MAX_ORDER - 1 alignment requirement for CMA. > > https://patchwork.kernel.org/project/linux-mm/cover/20211209230414.2766515-1-zi.yan@sent.com/ > > > > Once it's merged, I guess we can back to align with pageblock rather > > than MAX_ORDER-1. > > While the goal is to get rid of the alignment requirement, we might > still have to isolate all applicable MAX_ORDER-1 pageblocks. Depends on > what we can or cannot achieve easily :) > Ok, got it. As that's another story and does not affect us to fix the current kernel problem first that CMA alloc may fail occasionally, I'm going to change to align with MAX_ORDER_NR_PAGES for retries as you pointed out in the next version. Do you have more suggestions for this patchset? Regards Aisheng > > -- > Thanks, > > David / dhildenb >