From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 87713C636CC for ; Tue, 7 Feb 2023 07:33:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BAE0C6B0096; Tue, 7 Feb 2023 02:33:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B5E3B6B009A; Tue, 7 Feb 2023 02:33:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9B04A6B009B; Tue, 7 Feb 2023 02:33:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 867956B0096 for ; Tue, 7 Feb 2023 02:33:44 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 568DF1A0356 for ; Tue, 7 Feb 2023 07:33:44 +0000 (UTC) X-FDA: 80439681168.21.8597436 Received: from mailout1.samsung.com (mailout1.samsung.com [203.254.224.24]) by imf28.hostedemail.com (Postfix) with ESMTP id 9CDB7C0002 for ; Tue, 7 Feb 2023 07:33:40 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=samsung.com header.s=mail20170921 header.b=CS2+5W6R; spf=pass (imf28.hostedemail.com: domain of jaewon31.kim@samsung.com designates 203.254.224.24 as permitted sender) smtp.mailfrom=jaewon31.kim@samsung.com; dmarc=pass (policy=none) header.from=samsung.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675755221; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xZb1v10+hGcQ8VtxPnebcTT8l6wHo+v7TXUob9js49o=; b=dY4g3kzQxnLa5fBRxEVYxnDZuS9kJQ04GEuoaEnT9i6/e98gunqe5uxLWTARAVqpMFiUJB LVLCE34BukCSELl240HAgrunWmHeoo65dC23BG7nq2nEgY4uHZ2XHH46B9I59xXhcshXTT yee7DAwlwhYE4/6mMb3tc9FDC5LVaqU= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=samsung.com header.s=mail20170921 header.b=CS2+5W6R; spf=pass (imf28.hostedemail.com: domain of jaewon31.kim@samsung.com designates 203.254.224.24 as permitted sender) smtp.mailfrom=jaewon31.kim@samsung.com; dmarc=pass (policy=none) header.from=samsung.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675755221; a=rsa-sha256; cv=none; b=OZyNn5Iq7U53KTlCKOY6HrrFC/bdZUMKVZRA1N5z0iOwDU9ZgOBXHAufa4W4XIpu+8VCQz GC3cjCtQQQ5Z47erXJJV3nN7k95iHw+BAuwcqi+cMnDtWPJRpw5ze92WFlSinerj5lWBTP TRUtzKMgFaFJ4Rpwwv84q7cTRhac1nU= Received: from epcas1p2.samsung.com (unknown [182.195.41.46]) by mailout1.samsung.com (KnoxPortal) with ESMTP id 20230207073337epoutp01927a0dfc5e4635e8089473379736400c~Ben3IZSJe2935329353epoutp010 for ; Tue, 7 Feb 2023 07:33:37 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 mailout1.samsung.com 20230207073337epoutp01927a0dfc5e4635e8089473379736400c~Ben3IZSJe2935329353epoutp010 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=samsung.com; s=mail20170921; t=1675755217; bh=xZb1v10+hGcQ8VtxPnebcTT8l6wHo+v7TXUob9js49o=; h=Subject:Reply-To:From:To:CC:In-Reply-To:Date:References:From; b=CS2+5W6RAGhbQoWr/JVqqyBIqMsBi89HkTv6IOezyndvjnLt4xVMPSs0dn6DQBjhp jd6B/hBNldSUTJnpS3NgeUuTBTh8D3Raq4C5kHHnH1gzV8R3z457Sd95a0kZK2kcS3 0x2BKeH6gQ18exJ3z+i42ZWgj+MlJl67p6i7H0Ts= Received: from epsnrtp1.localdomain (unknown [182.195.42.162]) by epcas1p4.samsung.com (KnoxPortal) with ESMTP id 20230207073336epcas1p444f31586b9d1004f22230d861ccfb401~Ben2eeXxg1726817268epcas1p4J; Tue, 7 Feb 2023 07:33:36 +0000 (GMT) Received: from epsmges1p1.samsung.com (unknown [182.195.36.226]) by epsnrtp1.localdomain (Postfix) with ESMTP id 4P9vxq66gtz4x9Q2; Tue, 7 Feb 2023 07:33:35 +0000 (GMT) X-AuditID: b6c32a35-d9fff7000000d8eb-8a-63e1fecfb32d Received: from epcas1p2.samsung.com ( [182.195.41.46]) by epsmges1p1.samsung.com (Symantec Messaging Gateway) with SMTP id B3.0E.55531.FCEF1E36; Tue, 7 Feb 2023 16:33:35 +0900 (KST) Mime-Version: 1.0 Subject: RE: (2) [PATCH] dma-buf: system_heap: avoid reclaim for order 4 Reply-To: jaewon31.kim@samsung.com From: Jaewon Kim To: John Stultz , Jaewon Kim CC: "T.J. Mercier" , "sumit.semwal@linaro.org" , "daniel.vetter@ffwll.ch" , "akpm@linux-foundation.org" , "hannes@cmpxchg.org" , "mhocko@kernel.org" , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , "jaewon31.kim@gmail.com" X-Priority: 3 X-Content-Kind-Code: NORMAL In-Reply-To: X-Drm-Type: N,general X-Msg-Generator: Mail X-Msg-Type: PERSONAL X-Reply-Demand: N Message-ID: <20230207073335epcms1p15df191db83bec0cb791e6f79dcecb31f@epcms1p1> Date: Tue, 07 Feb 2023 16:33:35 +0900 X-CMS-MailID: 20230207073335epcms1p15df191db83bec0cb791e6f79dcecb31f Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="utf-8" X-Sendblock-Type: SVC_REQ_APPROVE CMS-TYPE: 101P X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrEJsWRmVeSWpSXmKPExsWy7bCmnu75fw+TDR6c07WYs34Nm8XCh3eZ LVZv8rXo3jyT0aL3/Ssmiz8nNrJZXN41h83i3pr/rBavvy1jtjh19zO7xbv1X9gcuD0Ov3nP 7LH32wIWj52z7rJ7LNhU6rFpVSebx6ZPk9g97lzbw+ZxYsZvFo++LasYPT5vkgvgisq2yUhN TEktUkjNS85PycxLt1XyDo53jjc1MzDUNbS0MFdSyEvMTbVVcvEJ0HXLzAG6WEmhLDGnFCgU kFhcrKRvZ1OUX1qSqpCRX1xiq5RakJJTYFagV5yYW1yal66Xl1piZWhgYGQKVJiQnfFz/SmW grWeFRO2TWVsYJyj3cXIySEhYCKxZ14zSxcjF4eQwA5GiRfLPwI5HBy8AoISf3cIg9QIC3hI HPiygBXEFhJQkjj74wo7RFxXoql7NQuIzSagLfF+wSSwGhEBb4mF/x4wgcxkFjjJLHHlxQtm iGW8EjPan7JA2NIS25dvZQSxOQUCJc7/ncsIEReVuLn6LTuM/f7YfKi4iETrvbNQcwQlHvzc zQgz58/x52wQdrnEjjn7oewKid/9y6Dm6Etc6Z/JBGLzCvhKdDTfA7NZBFQl/v/dA3WPi8Sd iefB4swC8hLb385hBoUDs4CmxPpd+hAlihI7f0OcySzAJ/Huaw8rzFs75j1hgrDVJFqefYWK y0j8/fcMyvaQaF7TxAIJw91sEhc3SExgVJiFCOlZSBbPQli8gJF5FaNYakFxbnpqsWGBITxy k/NzNzGCE6+W6Q7GiW8/6B1iZOJgPMQowcGsJMJreuBBshBvSmJlVWpRfnxRaU5q8SFGU6CX JzJLiSbnA1N/Xkm8oYmlgYmZkYmFsaWxmZI4r7jtyWQhgfTEktTs1NSC1CKYPiYOTqkGpiBr pqnKrX/2ckY3CyQu9fwmLBxeqZr8l3/LtpSOfX8SVTXdEhIu2JfwCn17xWL95+Q/n25v6ePN Gk9a5VNOP2kPcZuv0C5vvTW5paXxlMxWtc2mcuLrN3Bu8tzwOqRRQMVLRKM/52jRL5lP356z Lv7s+sPh3cw/N1u95ye8cRVM0toXyfe10lX194EKLYU11XMEeTMLTNc9veqbl7JjTUhB/LV1 q09cPtH/af6y+Zx+ClkMyg2+PeeaZtjt2z2beamdN0NX19ek2Yb2Z9tXXdlic+vSI9lTG1kO xZ8WerHYnSet+9H9NnlmFqcr3xXW6jeLvtnxXe4E40eJzlJlmZYku7mzDTL7FN6LRSopsRRn JBpqMRcVJwIAK8a7Z0UEAAA= DLP-Filter: Pass X-CFilter-Loop: Reflected X-CMS-RootMailID: 20230117082521epcas1p22a709521a9e6d2346d06ac220786560d References: <20230117082508.8953-1-jaewon31.kim@samsung.com> <20230117083103epcms1p63382eee1cce1077248a4b634681b0aca@epcms1p6> <20230125095646epcms1p2a97e403a9589ee1b74a3e7ac7d573f9b@epcms1p2> <20230125101957epcms1p2d06d65a9147e16f3281b13c085e5a74c@epcms1p2> <20230126044218epcms1p35474178c2f2b18524f35c7d9799e3aed@epcms1p3> <20230204150215epcms1p8d466d002c1e4dc2005d38f847adea6fa@epcms1p8> X-Rspamd-Queue-Id: 9CDB7C0002 X-Stat-Signature: aoyxubdffzfnurn8dwb6zeb11z5is6bh X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1675755220-961087 X-HE-Meta: U2FsdGVkX18N8NA+VFdEo9v+uyFSOQOpIpl5OfrFWllbtjvNeW1VhATiStUuxE8LdFNTg32A23IA9+0oNE/QTEuRjFcLlrLLy3BAbuZrilTQer0kc4W8/n1akXOeuxkSQsq+2CHs9roFV2oSgpsvrShwBhisd48epaGEdE2d3b6oEsY5uCjlMlPEU+7g40f9XBeK7yFD8vyjAxAf1MB4JOuHauCfqGiaBRn2cmNdlFx7hPLZoOxhAkSBtQ/axBrl01BXDLt+a0mu5mdJayWdIdYZ+TEz0Jqd/wW8gQ2AQ04uCWzm9VX0/eTZl5IKdT1mA87CU8H2H8d6XxUJfAELc6TGqjNDbU8J0iirHbzqYhOcyi2GmXd90uS+ByGyp3FZ8p2nCH1x6I6Mg2NOw15v4XDCTEOtAVhA9qYyYQjtNfFM5GBaJy3QKD5lw71f6HkxwJ/fttzG8PDymAocxG16kYl1rYhMK0WQzvJCImmYKeAMU+EOfbXxfBdSl0mOMdjuVoHPiq/360OnI+RpdgIlmsa7pARMOkyA3p1xDcmpl+eyY23M33XcrA5VXZ9lwbxoXfbgFrziErJa1a7AUgCrz+52PuvtOyF9aMyIMSTtt1W5KxnxexNkpAqy4aI0FrnMZaLWDlbJz8h5jMv0OXjprVS/E7+RM9NJPKvRmC8l1faySQ+pDW73pFfubO+63OAcX91n+cusQUpgkTNKAShRS/N5Q2zxXnf9CDfzA20p/329pYaeGE2hoGkIYO8uAtMx1u8TZldkGaEnsyG8V/Q+/jrADOcVYrIafiTSrFoYSz07vyUkTn/JNG4oU9tmIJpKYiLQseb3xse+eBQ0PPTzXeaFoalk4iSaIeC06jdaNJIfD+OZ9VaHEFl/0Wo4NLbq+Q/5qIbdkT7r8/nswI1hbstdjdnvMBmdDQcG3tw9P+1j6VG/NwtVZAlHV5Iu8vZ0Qx9+6kcge8gr9ezyHwa WG+rCSE5 HtGH5C+Et0FjWyD2OkG+xnx16l1o0KuCE/MxL58nDD7OLMgJ9pVau1ODD0u8FGBuQopQDUN83Xer9vrsfApysJ7NTVdPt3SCZRoeHq3cxdfP+cthu1tEgnHAfcxobqZg3muByojafJcEc4SOKD5/s4JmI3hlh5mvAlq2MZ3XkmCZ+x+zzMB8fJPXfIiTJgeGdRh35w/0lxf3L3NBcGVhCCVA9sCxt4jPRvxe/DpKcO8f/KQuhts+nxyN0OvlJnYhJ0qIwEWCr1/lqIPb9DNei+9nFp53Aom6LFHMxcoI+EGsEMMvHAs2pMC0iDT9g9iGakUpZV0E6MlTmtBb0+MBUAhC4lRDP//77AHDu89G7ce2SCibPISkm1XMfaR2reTjkhPkMQU4Ycv+MBM50CwQ/l9CMVQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > > > > >--------- Original Message --------- > >Sender : John Stultz > >Date : 2023-02-07 13:37 (GMT+9) > >Title : Re: (2) [PATCH] dma-buf: system_heap: avoid reclaim for order 4 > > > >On Sat, Feb 4, 2023 at 7:02 AM Jaewon Kim wrote: > >> Hello John Stultz, sorry for late reply. > >> I had to manage other urgent things and this test also took some time to finish. > >> Any I hope you to be happy with following my test results. > >> > >> > >> 1. system heap modification > >> > >> To avoid effect of allocation from the pool, all the freed dma > >> buffer were passed to buddy without keeping them in the pool. > >> Some trace_printk and order counting logic were added. > >> > >> 2. the test tool > >> > >> To test the dma-buf system heap allocation speed, I prepared > >> a userspace test program which requests a specified size to a heap. > >> With the program, I tried to request 16 times of 10 MB size and > >> added 1 sleep between each request. Each memory was not freed > >> until the total 16 times total memory was allocated. > > > >Oof. I really appreciate all your effort that I'm sure went in to > >generate these numbers, but this wasn't quite what I was asking for. > >I know you've been focused on allocation performance under memory > >pressure, but I was hoping to see the impact of __using__ order 0 > >pages over order 4 pages in real world conditions (for camera or video > >recording or other use cases that use large allocations). These > >results seem to be still just focused on the difference in allocation > >performance between order 0 and order 4 with and without contention. > > > >That said, re-reading my email, I probably could have been more clear > >on this aspect. > > > > > >> 3. the test device > >> > >> The test device has arm64 CPU cores and v5.15 based kernel. > >> To get stable results, the CPU clock was fixed not to be changed > >> in run time, and the test tool was set to some specific CPU cores > >> running in the same CPU clock. > >> > >> 4. test results > >> > >> As we expected if order 4 exist in the buddy, the order 8, 4, 0 > >> allocation was 1 to 4 times faster than the order 8, 0, 0. But > >> the order 8, 0, 0 also looks fast enough. > >> > >> Here's time diff, and number of each order. > >> > >> order 8, 4, 0 in the enough order 4 case > >> > >> diff 8 4 0 > >> 665 usec 0 160 0 > >> 1,148 usec 0 160 0 > >> 1,089 usec 0 160 0 > >> 1,154 usec 0 160 0 > >> 1,264 usec 0 160 0 > >> 1,414 usec 0 160 0 > >> 873 usec 0 160 0 > >> 1,148 usec 0 160 0 > >> 1,158 usec 0 160 0 > >> 1,139 usec 0 160 0 > >> 1,169 usec 0 160 0 > >> 1,174 usec 0 160 0 > >> 1,210 usec 0 160 0 > >> 995 usec 0 160 0 > >> 1,151 usec 0 160 0 > >> 977 usec 0 160 0 > >> > >> order 8, 0, 0 in the enough order 4 case > >> > >> diff 8 4 0 > >> 441 usec 10 0 0 > >> 747 usec 10 0 0 > >> 2,330 usec 2 0 2048 > >> 2,469 usec 0 0 2560 > >> 2,518 usec 0 0 2560 > >> 1,176 usec 0 0 2560 > >> 1,487 usec 0 0 2560 > >> 1,402 usec 0 0 2560 > >> 1,449 usec 0 0 2560 > >> 1,330 usec 0 0 2560 > >> 1,089 usec 0 0 2560 > >> 1,481 usec 0 0 2560 > >> 1,326 usec 0 0 2560 > >> 3,057 usec 0 0 2560 > >> 2,758 usec 0 0 2560 > >> 3,271 usec 0 0 2560 > >> > >> From the perspective of responsiveness, the deterministic > >> memory allocation speed, I think, is quite important. So I > >> tested other case where the free memory are not enough. > >> > >> On this test, I ran the 16 times allocation sets twice > >> consecutively. Then it showed the first set order 8, 4, 0 > >> became very slow and varied, but the second set became > >> faster because of the already created the high order. > >> > >> order 8, 4, 0 in low memory > >> > >> diff 8 4 0 > >> 584 usec 0 160 0 > >> 28,428 usec 0 160 0 > >> 100,701 usec 0 160 0 > >> 76,645 usec 0 160 0 > >> 25,522 usec 0 160 0 > >> 38,798 usec 0 160 0 > >> 89,012 usec 0 160 0 > >> 23,015 usec 0 160 0 > >> 73,360 usec 0 160 0 > >> 76,953 usec 0 160 0 > >> 31,492 usec 0 160 0 > >> 75,889 usec 0 160 0 > >> 84,551 usec 0 160 0 > >> 84,352 usec 0 160 0 > >> 57,103 usec 0 160 0 > >> 93,452 usec 0 160 0 > >> > >> diff 8 4 0 > >> 808 usec 10 0 0 > >> 778 usec 4 96 0 > >> 829 usec 0 160 0 > >> 700 usec 0 160 0 > >> 937 usec 0 160 0 > >> 651 usec 0 160 0 > >> 636 usec 0 160 0 > >> 811 usec 0 160 0 > >> 622 usec 0 160 0 > >> 674 usec 0 160 0 > >> 677 usec 0 160 0 > >> 738 usec 0 160 0 > >> 1,130 usec 0 160 0 > >> 677 usec 0 160 0 > >> 553 usec 0 160 0 > >> 1,048 usec 0 160 0 > >> > >> > >> order 8, 0, 0 in low memory > >> > >> diff 8 4 0 > >> 1,699 usec 2 0 2048 > >> 2,082 usec 0 0 2560 > >> 840 usec 0 0 2560 > >> 875 usec 0 0 2560 > >> 845 usec 0 0 2560 > >> 1,706 usec 0 0 2560 > >> 967 usec 0 0 2560 > >> 1,000 usec 0 0 2560 > >> 1,905 usec 0 0 2560 > >> 2,451 usec 0 0 2560 > >> 3,384 usec 0 0 2560 > >> 2,397 usec 0 0 2560 > >> 3,171 usec 0 0 2560 > >> 2,376 usec 0 0 2560 > >> 3,347 usec 0 0 2560 > >> 2,554 usec 0 0 2560 > >> > >> diff 8 4 0 > >> 1,409 usec 2 0 2048 > >> 1,438 usec 0 0 2560 > >> 1,035 usec 0 0 2560 > >> 1,108 usec 0 0 2560 > >> 825 usec 0 0 2560 > >> 927 usec 0 0 2560 > >> 1,931 usec 0 0 2560 > >> 2,024 usec 0 0 2560 > >> 1,884 usec 0 0 2560 > >> 1,769 usec 0 0 2560 > >> 2,136 usec 0 0 2560 > >> 1,738 usec 0 0 2560 > >> 1,328 usec 0 0 2560 > >> 1,438 usec 0 0 2560 > >> 1,972 usec 0 0 2560 > >> 2,963 usec 0 0 2560 > > > >So, thank you for generating all of this. I think this all looks as > >expected, showing the benefit of your change to allocation under > >contention and showing the potential downside in the non-contention > >case. > > > >I still worry about the performance impact outside of allocation time > >of using the smaller order pages (via map and unmap through iommu to > >devices, etc), so it would still be nice to have some confidence this > >won't introduce other regressions, but I do agree the worse case > >impact is very bad. > > > >> Finally if we change order 4 to use HIGH_ORDER_GFP, > >> I expect that we could avoid the very slow cases. > >> > > > >Yeah. Again, this all aligns with the upside of your changes, which > >I'm eager for. > >I just want to have a strong sense of any regressions it might also cause. > > > >I don't mean to discourage you, especially after all the effort here. > >Do you think evaluating the before and after impact to buffer usage > >(not just allocation) would be doable in the near term? > Hello sorry but I don't have expertise on iommu. Actually I'm also wondering all IOMMU can use order 4 free pages, if they are allocated. I am not sure but I remember I heard order 9 (2MB) could be used, but I don't know about order 8 4. I guess IOMMU mmap also be same patern like we expect. I mean if order 4 is prepared it could be faster like 1 to 4 times. But it, I think, should NOT be that much slow even though the entire free memory is prepared as order 0 pages. > > >If you don't think so, given the benefit to allocation under pressure > >is large (and I don't mean to give you hurdles to jump), I'm willing > >to ack your change to get it merged, but if we later see performance > >trouble, I'll be quick to advocate for reverting it. Is that ok? > Yes sure. I also want to know if it is. Thank you > > >thanks > >-john > >