From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 99BD6C54E94 for ; Wed, 25 Jan 2023 20:32:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BE8476B0074; Wed, 25 Jan 2023 15:32:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B95EC6B0075; Wed, 25 Jan 2023 15:32:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A37746B0078; Wed, 25 Jan 2023 15:32:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 91D1D6B0074 for ; Wed, 25 Jan 2023 15:32:51 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 6AEDE140E80 for ; Wed, 25 Jan 2023 20:32:51 +0000 (UTC) X-FDA: 80394470142.22.308DB3E Received: from mail-yb1-f174.google.com (mail-yb1-f174.google.com [209.85.219.174]) by imf05.hostedemail.com (Postfix) with ESMTP id A2A55100011 for ; Wed, 25 Jan 2023 20:32:49 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="hEjHZH+/"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf05.hostedemail.com: domain of jstultz@google.com designates 209.85.219.174 as permitted sender) smtp.mailfrom=jstultz@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674678769; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=XvFYv/3BH+raPc6c7uVVEfo0w2IFIqzn0VV1dKg/DsE=; b=lidwPugdBsK4x9siW/pKl3MCM93BEATkRZSi/DDJQ/trctgqFPdoKykIxDypB+838O5X5L 7cp8sVSWRRhdK+tSTZ5165qez/xuOgjeqm07+kiZU9RUjzyfSkEmcyvuQoUNxFHqdhVtfX GZkFv08DSanO1nxAESbuEOBmlg2v/ts= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="hEjHZH+/"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf05.hostedemail.com: domain of jstultz@google.com designates 209.85.219.174 as permitted sender) smtp.mailfrom=jstultz@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674678769; a=rsa-sha256; cv=none; b=US5nKUFnr79RON63vIFWvnYc8yMMxwxI+EhjR3KcwqAvG+sFsQ3a+mijvNTGuWzXUJbK2v fgOWnTQkHLUvYXRZc0wvmRutMO379qlf2PIJYX8v/6qWoiL10AOj7nW9IYY3gD87KS7MHs ZaM1GAWiq2unZp3/8gI4MGp9r5/zoWE= Received: by mail-yb1-f174.google.com with SMTP id h5so7686611ybj.8 for ; Wed, 25 Jan 2023 12:32:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=XvFYv/3BH+raPc6c7uVVEfo0w2IFIqzn0VV1dKg/DsE=; b=hEjHZH+/PXCZbFzm8Gs0mP+gmOpzevnvWEyv2Hb9mrGy/Ia9UViVTW+yMu9OjHZoLn dSxNPeVlVzWaU+N8poCzMcay5CHW+Fe4obs71ybwcXc2FrAnQE22Xan4xS1dtd6zx8cN H4tKM55AfED9psd8JBKHNMZLRX3LPVwjGN16GzJEoHWdNIbhtDsMnnm5ksH8QOcQVrN0 Nwe40jKxIajox5lkyZv4fE5CLXLbj4nunWl9CN7Auwv8KWeN3faoj3pEHA2/jQRPSNY+ iVLzsvsdIvhn6Vpnjq5FvRPl1St52MohTqDKD0PBWe96eVEl0FT+9JcuQOZm+7LgjZYI zMqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=XvFYv/3BH+raPc6c7uVVEfo0w2IFIqzn0VV1dKg/DsE=; b=AcOdUDUJLZsy5V9c7ObmXPiWAL5WO3DyP5k6S4YntIwgWFQ9ek+WqtrZvWgJi3836+ ISD7AVOwcSLupih7WYceV7mLc+ftQdMOcHlQscvI6En4ZzZyIJ330/Y/krll+jGRrKt8 L0mz52b2g8/5pGD/63EMQQEc7KhfqmjVjIrja+9x2ZsAgsRneTK+II5w6X4F6RM8ELwP ofuZenD59xB9+zB76ieItp4v08IWenPtEwHdPl1WxSeJmDNjcSsfLN9/u2IRh7DLYekp xECYNSLL88ApLyd1+MkutQ/MMi1LsAfBp/N/S3ZUd+JV7meWzLMkgAppEHzR4ew8WTAM +YlA== X-Gm-Message-State: AFqh2kp5Wiy0xhjFun6r5rSAE7Lw8bUHcQdmhvHH1bRot0x6xKQnvUZX YZ7RZkF6VOiQr6F2zntsfhmwrf0w8d008vwr9ymW X-Google-Smtp-Source: AMrXdXuyOxlrSgIihhAPwblUHAnwZUnTmUWtVk/tjYUmRg/ZK9LBUgnOX1tV3qFXnFG1Gw1bFncqV28J3Ba/WA7dGaU= X-Received: by 2002:a25:dd5:0:b0:801:7846:7e97 with SMTP id 204-20020a250dd5000000b0080178467e97mr1757953ybn.49.1674678768614; Wed, 25 Jan 2023 12:32:48 -0800 (PST) MIME-Version: 1.0 References: <20230117082508.8953-1-jaewon31.kim@samsung.com> <20230117083103epcms1p63382eee1cce1077248a4b634681b0aca@epcms1p6> <20230125095646epcms1p2a97e403a9589ee1b74a3e7ac7d573f9b@epcms1p2> <20230125101957epcms1p2d06d65a9147e16f3281b13c085e5a74c@epcms1p2> In-Reply-To: <20230125101957epcms1p2d06d65a9147e16f3281b13c085e5a74c@epcms1p2> From: John Stultz Date: Wed, 25 Jan 2023 12:32:36 -0800 Message-ID: Subject: Re: [PATCH] dma-buf: system_heap: avoid reclaim for order 4 To: jaewon31.kim@samsung.com Cc: "T.J. Mercier" , "sumit.semwal@linaro.org" , "daniel.vetter@ffwll.ch" , "akpm@linux-foundation.org" , "hannes@cmpxchg.org" , "mhocko@kernel.org" , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , "jaewon31.kim@gmail.com" Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: A2A55100011 X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: 43sdkst3hby994tq5313gmdmomignw71 X-HE-Tag: 1674678769-82258 X-HE-Meta: U2FsdGVkX18JyUI497hbcIs4OR+wCbblWXV0/qtHEhVdSKIxTLB4HGpZGtR8JtsY8mPg23tcJdbPFiYFETUTizusqslDgPGzpFfZ+3VXbW84q9PdOotFV03AAhbqEsGyWFgYETylN0MxWEYJiLGUgRvgnc0sH2w7oeHi5mZJzqI25NXIIfm0iysYgf1e1FPQw9shwsqMcMcTvSlE8jHHLdlIxgOTT8IKbMScZywveHNXBTRws2qJAPPuaWqvBcE3MpqCXhZ+J/nHYbcqgP/ixAwrdPtQAOS17JQGvOJb9jYM7wZ+Wn0Lj+AIZpv+D1xLuNElzL25EAxrGJV7ZCt0odoRuktS3Wkh/iUo4ZMIWe/np5yBOFpY5sYtMKxpOfC0gZusUbtCQZX/EbZ8Bu6o+H0DT7kNrAC0nM3XhU/kP1cffPeY8cSshDkC+OOdkCv7GjoC1jSr9s3d+tFe7AnYxHxDWbq/y3IJK5X4XE8T6Odk4fb9wZT2yqGoFM1LKd/oP3wo2OLwXIaAFLLxOr3RnFjrjp/bWcfpmDU5zv05KViHpJzaiH0JSJvtXgqouxB74LQszu07eCU4AZtqr8Bv8BQeSbiU5K7kRYfY4fk1UNH6mzi/+Obz4o/rA84Kk9aE3E9VCJIKOHw/kBNeguMQic0lLfRKHmDj+11Bb09pC0M5gEM96pcF++8w48ky3zQM0991ZfEvv0Tv6WEh0593PoSEw8SdVYRDbqMfTFIrSJ6bPEKYUi6XXAbWmLP5ba7+Dn3rwz9imgZrJ/1Xp5eMuMAN6m+SSlm7z3v7Ut94R/r/AyE1ZXtSq8oDsbVNtknhDSTWtCmJOYkKOHTNxtLoB9vWc0oC9S/vPHKXO0ndVBVc1raUj+xC9kuG5Yhvl1NFsF1dRXX5Ko2FxxHVgIKZKyKx5CmJxIS1J86kwrZ6Ah2UVS3P1OX6RIKrYAH1WssWvNvCILePZpBfENSXNoV vO4urE5c s7C52VBs+WfWHOyTI9PUatJDQ38OYjb9m+FmRKQm5K84KZ22zXpmccrujFhMEw+6gQMpbx1APi8w+L20rXZ31ZF7Lub4mDXI6bTsbz92LqfAgVix72PGry5tVOqTovcLTlGX91nOv1BTqE0QPzlXlCwSosls30noAIO5jR/g0oSmMkiE6I2YJKlZRzSqjqIW28TtMvNEb3A8t0i4rk+MRwy/8mNVmDtzJC/bENLWEg0qrqaECvTWYelMjoAcE+Ra4IyrvUPTRLcTkVfBI5PNkk/rEdJrTrffCDi7HB9AcGlBzD6A1wh1kGkQQnmecT15OtNOBee0PPFlXXs2PrmN6uDSDDPZLWwPB6/hdxV/k5V/8MOGSdQYSVKF2gC/w4kdjD+LLKErYbfsAjjgJlbjtNbRySatxIL58iyQ/V8VAasRnLQNr082R0PyVSyhnyOmoAJN0E7bUsW8IS9A= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jan 25, 2023 at 2:20 AM Jaewon Kim wrote: > > > On Tue, Jan 17, 2023 at 10:54 PM John Stultz wrote: > > > > > > > > On Tue, Jan 17, 2023 at 12:31 AM Jaewon Kim wrote: > > > > > > Using order 4 pages would be helpful for many IOMMUs, but it could spend > > > > > > quite much time in page allocation perspective. > > > > > > > > > > > > The order 4 allocation with __GFP_RECLAIM may spend much time in > > > > > > reclaim and compation logic. __GFP_NORETRY also may affect. These cause > > > > > > unpredictable delay. > > > > > > > > > > > > To get reasonable allocation speed from dma-buf system heap, use > > > > > > HIGH_ORDER_GFP for order 4 to avoid reclaim. > > > > > > > > Thanks for sharing this! > > > > The case where the allocation gets stuck behind reclaim under pressure > > > > does sound undesirable, but I'd be a bit hesitant to tweak numbers > > > > that have been used for a long while (going back to ion) without a bit > > > > more data. > > > > > > > > It might be good to also better understand the tradeoff of potential > > > > on-going impact to performance from using low order pages when the > > > > buffer is used. Do you have any details like or tests that you could > > > > share to help ensure this won't impact other users? > > > > > > > > TJ: Do you have any additional thoughts on this? > > > > > > > I don't have any data on how often we hit reclaim for mid order > > > allocations. That would be interesting to know. However the 70th > > > percentile of system-wide buffer sizes while running the camera on my > > > phone is still only 1 page, so it looks like this change would affect > > > a subset of use-cases. > > > > > > Wouldn't this change make it less likely to get an order 4 allocation > > > (under memory pressure)? The commit message makes me think the goal of > > > the change is to get more of them. > > > > Hello John Stultz > > > > I've been waiting for your next reply. Sorry, I was thinking you were gathering data on the tradeoffs. Sorry for my confusion. > > With my commit, we may gather less number of order 4 pages and fill the > > requested size with more number of order 0 pages. I think, howerver, stable > > allocation speed is quite important so that corresponding user space > > context can move on within a specific time. > > > > Not only compaction but reclaim also, I think, would be invoked more if the > > __GFP_RECLAIM is added on order 4. I expect the reclaim could be decreased > > if we move to order 0. > > > > Additionally I'd like to say the old legacy ion system heap also used the > __GFP_RECLAIM only for order 8, not for order 4. > > drivers/staging/android/ion/ion_system_heap.c > > static gfp_t high_order_gfp_flags = (GFP_HIGHUSER | __GFP_ZERO | __GFP_NOWARN | > __GFP_NORETRY) & ~__GFP_RECLAIM; > static gfp_t low_order_gfp_flags = GFP_HIGHUSER | __GFP_ZERO; > static const unsigned int orders[] = {8, 4, 0}; > > static int ion_system_heap_create_pools(struct ion_page_pool **pools) > { > int i; > > for (i = 0; i < NUM_ORDERS; i++) { > struct ion_page_pool *pool; > gfp_t gfp_flags = low_order_gfp_flags; > > if (orders[i] > 4) > gfp_flags = high_order_gfp_flags; This seems a bit backwards from your statement. It's only removing __GFP_RECLAIM on order 8 (high_order_gfp_flags). So apologies again, but how is that different from the existing code? #define LOW_ORDER_GFP (GFP_HIGHUSER | __GFP_ZERO | __GFP_COMP) #define MID_ORDER_GFP (LOW_ORDER_GFP | __GFP_NOWARN) #define HIGH_ORDER_GFP (((GFP_HIGHUSER | __GFP_ZERO | __GFP_NOWARN \ | __GFP_NORETRY) & ~__GFP_RECLAIM) \ | __GFP_COMP) static gfp_t order_flags[] = {HIGH_ORDER_GFP, MID_ORDER_GFP, LOW_ORDER_GFP}; Where the main reason we introduced the mid-order flags is to avoid the warnings on order 4 allocation failures when we'll fall back to order 0 The only substantial difference I see between the old ion code and what we have now is the GFP_COMP addition, which is a bit hazy in my memory. I unfortunately don't have a record of why it was added (don't have access to my old mail box), so I suspect it was something brought up in private review. Dropping that from the low order flags probably makes sense as TJ pointed out, but this isn't what your patch is changing. Your patch is changing that for mid-order allocations we'll use the high order flags, so we'll not retry and not reclaim, so there will be more failing and falling back to single page allocations. This makes sense to make allocation time faster and more deterministic (I like it!), but potentially has the tradeoff of losing the performance benefit of using mid order page sizes. I suspect your change is a net win overall, as the cumulative effect of using larger pages probably won't benefit more than the large indeterministic allocation time, particularly under pressure. But because your change is different from what the old ion code did, I want to be a little cautious. So it would be nice to see some evaluation of not just the benefits the patch provides you but also of what negative impact it might have. And so far you haven't provided any details there. A quick example might be for the use case where mid-order allocations are causing you trouble, you could see how the performance changes if you force all mid-order allocations to be single page allocations (so orders[] = {8, 0, 0};) and compare it with the current code when there's no memory pressure (right after reboot when pages haven't been fragmented) so the mid-order allocations will succeed. That will let us know the potential downside if we have brief / transient pressure at allocation time that forces small pages. Does that make sense? thanks -john