From: Jaewon Kim <jaewon31.kim@samsung.com>
To: John Stultz <jstultz@google.com>, Jaewon Kim <jaewon31.kim@samsung.com>
Cc: "T.J. Mercier" <tjmercier@google.com>,
"sumit.semwal@linaro.org" <sumit.semwal@linaro.org>,
"daniel.vetter@ffwll.ch" <daniel.vetter@ffwll.ch>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"hannes@cmpxchg.org" <hannes@cmpxchg.org>,
"mhocko@kernel.org" <mhocko@kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"jaewon31.kim@gmail.com" <jaewon31.kim@gmail.com>
Subject: RE: (2) [PATCH] dma-buf: system_heap: avoid reclaim for order 4
Date: Tue, 07 Feb 2023 16:33:35 +0900 [thread overview]
Message-ID: <20230207073335epcms1p15df191db83bec0cb791e6f79dcecb31f@epcms1p1> (raw)
In-Reply-To: <CANDhNCrAMVT3rg0GPJhYKD75EAUn8bsivrp3yMJcsd6bouj1rQ@mail.gmail.com>
>
>
>
>
>--------- Original Message ---------
>
>Sender : John Stultz <jstultz@google.com>
>
>Date : 2023-02-07 13:37 (GMT+9)
>
>Title : Re: (2) [PATCH] dma-buf: system_heap: avoid reclaim for order 4
>
>
>
>On Sat, Feb 4, 2023 at 7:02 AM Jaewon Kim <jaewon31.kim@samsung.com> wrote:
>
>> Hello John Stultz, sorry for late reply.
>
>> I had to manage other urgent things and this test also took some time to finish.
>
>> Any I hope you to be happy with following my test results.
>
>>
>
>>
>
>> 1. system heap modification
>
>>
>
>> To avoid effect of allocation from the pool, all the freed dma
>
>> buffer were passed to buddy without keeping them in the pool.
>
>> Some trace_printk and order counting logic were added.
>
>>
>
>> 2. the test tool
>
>>
>
>> To test the dma-buf system heap allocation speed, I prepared
>
>> a userspace test program which requests a specified size to a heap.
>
>> With the program, I tried to request 16 times of 10 MB size and
>
>> added 1 sleep between each request. Each memory was not freed
>
>> until the total 16 times total memory was allocated.
>
>
>
>Oof. I really appreciate all your effort that I'm sure went in to
>
>generate these numbers, but this wasn't quite what I was asking for.
>
>I know you've been focused on allocation performance under memory
>
>pressure, but I was hoping to see the impact of __using__ order 0
>
>pages over order 4 pages in real world conditions (for camera or video
>
>recording or other use cases that use large allocations). These
>
>results seem to be still just focused on the difference in allocation
>
>performance between order 0 and order 4 with and without contention.
>
>
>
>That said, re-reading my email, I probably could have been more clear
>
>on this aspect.
>
>
>
>
>
>> 3. the test device
>
>>
>
>> The test device has arm64 CPU cores and v5.15 based kernel.
>
>> To get stable results, the CPU clock was fixed not to be changed
>
>> in run time, and the test tool was set to some specific CPU cores
>
>> running in the same CPU clock.
>
>>
>
>> 4. test results
>
>>
>
>> As we expected if order 4 exist in the buddy, the order 8, 4, 0
>
>> allocation was 1 to 4 times faster than the order 8, 0, 0. But
>
>> the order 8, 0, 0 also looks fast enough.
>
>>
>
>> Here's time diff, and number of each order.
>
>>
>
>> order 8, 4, 0 in the enough order 4 case
>
>>
>
>> diff 8 4 0
>
>> 665 usec 0 160 0
>
>> 1,148 usec 0 160 0
>
>> 1,089 usec 0 160 0
>
>> 1,154 usec 0 160 0
>
>> 1,264 usec 0 160 0
>
>> 1,414 usec 0 160 0
>
>> 873 usec 0 160 0
>
>> 1,148 usec 0 160 0
>
>> 1,158 usec 0 160 0
>
>> 1,139 usec 0 160 0
>
>> 1,169 usec 0 160 0
>
>> 1,174 usec 0 160 0
>
>> 1,210 usec 0 160 0
>
>> 995 usec 0 160 0
>
>> 1,151 usec 0 160 0
>
>> 977 usec 0 160 0
>
>>
>
>> order 8, 0, 0 in the enough order 4 case
>
>>
>
>> diff 8 4 0
>
>> 441 usec 10 0 0
>
>> 747 usec 10 0 0
>
>> 2,330 usec 2 0 2048
>
>> 2,469 usec 0 0 2560
>
>> 2,518 usec 0 0 2560
>
>> 1,176 usec 0 0 2560
>
>> 1,487 usec 0 0 2560
>
>> 1,402 usec 0 0 2560
>
>> 1,449 usec 0 0 2560
>
>> 1,330 usec 0 0 2560
>
>> 1,089 usec 0 0 2560
>
>> 1,481 usec 0 0 2560
>
>> 1,326 usec 0 0 2560
>
>> 3,057 usec 0 0 2560
>
>> 2,758 usec 0 0 2560
>
>> 3,271 usec 0 0 2560
>
>>
>
>> From the perspective of responsiveness, the deterministic
>
>> memory allocation speed, I think, is quite important. So I
>
>> tested other case where the free memory are not enough.
>
>>
>
>> On this test, I ran the 16 times allocation sets twice
>
>> consecutively. Then it showed the first set order 8, 4, 0
>
>> became very slow and varied, but the second set became
>
>> faster because of the already created the high order.
>
>>
>
>> order 8, 4, 0 in low memory
>
>>
>
>> diff 8 4 0
>
>> 584 usec 0 160 0
>
>> 28,428 usec 0 160 0
>
>> 100,701 usec 0 160 0
>
>> 76,645 usec 0 160 0
>
>> 25,522 usec 0 160 0
>
>> 38,798 usec 0 160 0
>
>> 89,012 usec 0 160 0
>
>> 23,015 usec 0 160 0
>
>> 73,360 usec 0 160 0
>
>> 76,953 usec 0 160 0
>
>> 31,492 usec 0 160 0
>
>> 75,889 usec 0 160 0
>
>> 84,551 usec 0 160 0
>
>> 84,352 usec 0 160 0
>
>> 57,103 usec 0 160 0
>
>> 93,452 usec 0 160 0
>
>>
>
>> diff 8 4 0
>
>> 808 usec 10 0 0
>
>> 778 usec 4 96 0
>
>> 829 usec 0 160 0
>
>> 700 usec 0 160 0
>
>> 937 usec 0 160 0
>
>> 651 usec 0 160 0
>
>> 636 usec 0 160 0
>
>> 811 usec 0 160 0
>
>> 622 usec 0 160 0
>
>> 674 usec 0 160 0
>
>> 677 usec 0 160 0
>
>> 738 usec 0 160 0
>
>> 1,130 usec 0 160 0
>
>> 677 usec 0 160 0
>
>> 553 usec 0 160 0
>
>> 1,048 usec 0 160 0
>
>>
>
>>
>
>> order 8, 0, 0 in low memory
>
>>
>
>> diff 8 4 0
>
>> 1,699 usec 2 0 2048
>
>> 2,082 usec 0 0 2560
>
>> 840 usec 0 0 2560
>
>> 875 usec 0 0 2560
>
>> 845 usec 0 0 2560
>
>> 1,706 usec 0 0 2560
>
>> 967 usec 0 0 2560
>
>> 1,000 usec 0 0 2560
>
>> 1,905 usec 0 0 2560
>
>> 2,451 usec 0 0 2560
>
>> 3,384 usec 0 0 2560
>
>> 2,397 usec 0 0 2560
>
>> 3,171 usec 0 0 2560
>
>> 2,376 usec 0 0 2560
>
>> 3,347 usec 0 0 2560
>
>> 2,554 usec 0 0 2560
>
>>
>
>> diff 8 4 0
>
>> 1,409 usec 2 0 2048
>
>> 1,438 usec 0 0 2560
>
>> 1,035 usec 0 0 2560
>
>> 1,108 usec 0 0 2560
>
>> 825 usec 0 0 2560
>
>> 927 usec 0 0 2560
>
>> 1,931 usec 0 0 2560
>
>> 2,024 usec 0 0 2560
>
>> 1,884 usec 0 0 2560
>
>> 1,769 usec 0 0 2560
>
>> 2,136 usec 0 0 2560
>
>> 1,738 usec 0 0 2560
>
>> 1,328 usec 0 0 2560
>
>> 1,438 usec 0 0 2560
>
>> 1,972 usec 0 0 2560
>
>> 2,963 usec 0 0 2560
>
>
>
>So, thank you for generating all of this. I think this all looks as
>
>expected, showing the benefit of your change to allocation under
>
>contention and showing the potential downside in the non-contention
>
>case.
>
>
>
>I still worry about the performance impact outside of allocation time
>
>of using the smaller order pages (via map and unmap through iommu to
>
>devices, etc), so it would still be nice to have some confidence this
>
>won't introduce other regressions, but I do agree the worse case
>
>impact is very bad.
>
>
>
>> Finally if we change order 4 to use HIGH_ORDER_GFP,
>
>> I expect that we could avoid the very slow cases.
>
>>
>
>
>
>Yeah. Again, this all aligns with the upside of your changes, which
>
>I'm eager for.
>
>I just want to have a strong sense of any regressions it might also cause.
>
>
>
>I don't mean to discourage you, especially after all the effort here.
>
>Do you think evaluating the before and after impact to buffer usage
>
>(not just allocation) would be doable in the near term?
>
Hello sorry but I don't have expertise on iommu. Actually I'm also wondering
all IOMMU can use order 4 free pages, if they are allocated. I am not sure
but I remember I heard order 9 (2MB) could be used, but I don't know about order 8 4.
I guess IOMMU mmap also be same patern like we expect. I mean if order 4 is
prepared it could be faster like 1 to 4 times. But it, I think, should NOT be
that much slow even though the entire free memory is prepared as order 0 pages.
>
>
>If you don't think so, given the benefit to allocation under pressure
>
>is large (and I don't mean to give you hurdles to jump), I'm willing
>
>to ack your change to get it merged, but if we later see performance
>
>trouble, I'll be quick to advocate for reverting it. Is that ok?
>
Yes sure. I also want to know if it is.
Thank you
>
>
>thanks
>
>-john
>
>
next prev parent reply other threads:[~2023-02-07 7:33 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20230117082521epcas1p22a709521a9e6d2346d06ac220786560d@epcas1p2.samsung.com>
2023-01-17 8:25 ` Jaewon Kim
[not found] ` <CGME20230117082521epcas1p22a709521a9e6d2346d06ac220786560d@epcms1p6>
2023-01-17 8:31 ` Jaewon Kim
2023-01-18 6:54 ` John Stultz
2023-01-18 19:55 ` T.J. Mercier
[not found] ` <CGME20230117082521epcas1p22a709521a9e6d2346d06ac220786560d@epcms1p2>
2023-01-25 9:56 ` Jaewon Kim
2023-01-25 10:19 ` Jaewon Kim
2023-01-25 20:32 ` John Stultz
[not found] ` <CGME20230117082521epcas1p22a709521a9e6d2346d06ac220786560d@epcms1p3>
2023-01-26 4:42 ` 김재원
2023-01-26 5:04 ` (2) " John Stultz
[not found] ` <CGME20230117082521epcas1p22a709521a9e6d2346d06ac220786560d@epcms1p8>
2023-01-18 7:30 ` Jaewon Kim
2023-02-04 15:02 ` Jaewon Kim
2023-02-07 4:37 ` (2) " John Stultz
[not found] ` <CGME20230117082521epcas1p22a709521a9e6d2346d06ac220786560d@epcms1p1>
2023-02-07 7:33 ` Jaewon Kim [this message]
2023-02-07 16:56 ` John Stultz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230207073335epcms1p15df191db83bec0cb791e6f79dcecb31f@epcms1p1 \
--to=jaewon31.kim@samsung.com \
--cc=akpm@linux-foundation.org \
--cc=daniel.vetter@ffwll.ch \
--cc=hannes@cmpxchg.org \
--cc=jaewon31.kim@gmail.com \
--cc=jstultz@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=sumit.semwal@linaro.org \
--cc=tjmercier@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox