From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1A500D3C93F for ; Mon, 21 Oct 2024 06:09:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 451B66B007B; Mon, 21 Oct 2024 02:09:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4020B6B0082; Mon, 21 Oct 2024 02:09:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2EFF26B0083; Mon, 21 Oct 2024 02:09:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 106A66B007B for ; Mon, 21 Oct 2024 02:09:48 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 9BC5E1416B0 for ; Mon, 21 Oct 2024 06:09:31 +0000 (UTC) X-FDA: 82696583046.14.9FDC654 Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by imf13.hostedemail.com (Postfix) with ESMTP id 06A6720008 for ; Mon, 21 Oct 2024 06:09:28 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=none; spf=pass (imf13.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729490909; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=me0ptehYshh5JEAwf5H+WAbuSwdt7pgeUb1AQ+bd7cs=; b=Aou3t0IUuEewxRGefG2F+355y4Cc+55GSCZbCoqsfhLw55Lt28gNVDAtoOyA9LHFt0eNR7 +uR38lGxPllCxzPHcw5xAiJykkNaTHxb0voqMkxq1ExOnr04BBQAyr/UUmNin1KwLERVUJ SRvQHjr6Yfu7ERyTQcTVI9QHzILkrvo= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=none; spf=pass (imf13.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729490909; a=rsa-sha256; cv=none; b=JjyDxBZ6f3CyotiYMMN35gQ73CEGW57DqPyFFme6cocw7roK+uKqXPN/LuFYIfURdefxNQ vOjtaBoQgTAuyfVgjxskfme5pL/rGtlfFudAjRGJRL/1Las9D5MwXRCIOx3tRWTpjv5qEz Bh7cswrWhnLe/p8fcgIZRe+Kk3GEVo8= Received: from mail.maildlp.com (unknown [172.19.163.174]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4XX4ct0kBjzQs1C; Mon, 21 Oct 2024 14:08:46 +0800 (CST) Received: from dggpemf100008.china.huawei.com (unknown [7.185.36.138]) by mail.maildlp.com (Postfix) with ESMTPS id ACA0414010D; Mon, 21 Oct 2024 14:09:35 +0800 (CST) Received: from [10.174.177.243] (10.174.177.243) by dggpemf100008.china.huawei.com (7.185.36.138) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Mon, 21 Oct 2024 14:09:35 +0800 Message-ID: <789aba5c-e2dd-4b4c-bfac-8d534c7a9211@huawei.com> Date: Mon, 21 Oct 2024 14:09:34 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] mm: shmem: convert to use folio_zero_range() To: Barry Song <21cnbao@gmail.com> CC: Matthew Wilcox , Andrew Morton , Hugh Dickins , David Hildenbrand , Baolin Wang , References: <20241017142504.1170208-1-wangkefeng.wang@huawei.com> <20241017142504.1170208-2-wangkefeng.wang@huawei.com> Content-Language: en-US From: Kefeng Wang In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.177.243] X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To dggpemf100008.china.huawei.com (7.185.36.138) X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 06A6720008 X-Stat-Signature: zxi5jahxz6npuo7ymj3g66rqbowcft66 X-HE-Tag: 1729490968-457260 X-HE-Meta: U2FsdGVkX1+xWLZzFfQS3x7cU8yM833q4dK619s0FbHgNjHiuplTzaRB5/WDBTRuWLZNrrAMftTPwHEG+AnnYrL33szh7pvYBi/0XBEY88aVFNdG4FGD0u3AVSulY9mTCQbzRyhVQQJAb1TmfMvgk2zXRnuOOWf2LO0x2xH35wGOYpfERclfzi2SsefKmRuQIdKdrjvq0zMn1I5+/nNGh9IbSThlfIzM7vycz4VBHLEZ36gkxhUw5sJaHtkI5PzSB0VH2sPsF8/JtxdvVZZj7UUAj/e7Am8fUiu7fclQNMsPzzbkmRcjnm6A/GJAhzmqaWbPUjeVMbOWVlcT1D7taMD8GOB5LU1V8llgG081a86KIGQTOc2tpTY1gyA9MKQOpbhtzaRwj5T8PCtOcG2R0uS84bsyvTMetuC4COzOYf8nFBGKPZggGOJ8fmDMiZbNwdFoD2dKp1UozLF6bpno6dPFQbIcyQmSTRXwLi+skigUVCalX425/WeNPfvujC/r4tWCVHuoJeoTqMJJrq0IWeuc5pa/aq/6gYA1GpoTObh5FqRJEbTc9VfXad+K68J9W9XP/8Nl8oZBkIPB486VttZ6AcFF4bNl8LUALpg+RVwTLn7y6GsuCMaZGzGtvxs5svDoVbZ3UnR1+vm6hRg91wXkRtgR41fVTyEtfHjANIpTV/7o6mIsCgtf2owzLKt6uIHH9nQ0bnED+VltBCfQXx0vCRwCw81bAga/bFxR7r1OsMI6vJD0tr2p6t2W3wHUJtUbmZAFH9bgj+/qtwYgNCqFBJiP+U37lckVV6g4uolbKEzrYwDypU4ZJpWAwuq2mU4ZTSHLrdm/lv4npJeGHE/iZIIehrFJsWSl30qFj2NZ3J01EusPYvDV7LwCHlToNVCeW/KkF1SmyYDIeAKJ1ZCuC78DMxFrwrupy8G+YfwkGeFQJak+mJ9owokV9ZSkO5584JwQV7Bo6qxYTWq G4Jz4I7q zcK8vY1GL5+mjm7l13GQ91UO3hvDBpcSP3R2Oaircsw0UGJuFWsFhiCGp14sDHMCtwMqokiTFz7CBPMXIO3Z4DpGyqxqA8ht2MydGbhmaKWnfpHl2ud6p2BCGewbAdvFc5qxM8J2/bGXdE+mu5v9LNqGuCwTwHPz7yo3QeThLjrTZcpSabAuqE+vT+BnhhB3ZcXnewGP2XRRA/aJuKjpWxgEdjKMW+qqgAe+L3ja+xvc4THW19iWjb/z1X5eg2Uu7Kank25dAP66ge04= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/10/21 13:38, Barry Song wrote: > On Mon, Oct 21, 2024 at 6:16 PM Kefeng Wang wrote: >> >> >> >> On 2024/10/21 12:15, Barry Song wrote: >>> On Fri, Oct 18, 2024 at 8:48 PM Kefeng Wang wrote: >>>> >>>> >>>> >>>> On 2024/10/18 15:32, Kefeng Wang wrote: >>>>> >>>>> >>>>> On 2024/10/18 13:23, Barry Song wrote: >>>>>> On Fri, Oct 18, 2024 at 6:20 PM Kefeng Wang >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 2024/10/17 23:09, Matthew Wilcox wrote: >>>>>>>> On Thu, Oct 17, 2024 at 10:25:04PM +0800, Kefeng Wang wrote: >>>>>>>>> Directly use folio_zero_range() to cleanup code. >>>>>>>> >>>>>>>> Are you sure there's no performance regression introduced by this? >>>>>>>> clear_highpage() is often optimised in ways that we can't optimise for >>>>>>>> a plain memset(). On the other hand, if the folio is large, maybe a >>>>>>>> modern CPU will be able to do better than clear-one-page-at-a-time. >>>>>>>> >>>>>>> >>>>>>> Right, I missing this, clear_page might be better than memset, I change >>>>>>> this one when look at the shmem_writepage(), which already convert to >>>>>>> use folio_zero_range() from clear_highpage(), also I grep >>>>>>> folio_zero_range(), there are some other to use folio_zero_range(). >>>>>>> >>>>>>> fs/bcachefs/fs-io-buffered.c: folio_zero_range(folio, 0, >>>>>>> folio_size(folio)); >>>>>>> fs/bcachefs/fs-io-buffered.c: folio_zero_range(f, >>>>>>> 0, folio_size(f)); >>>>>>> fs/bcachefs/fs-io-buffered.c: folio_zero_range(f, >>>>>>> 0, folio_size(f)); >>>>>>> fs/libfs.c: folio_zero_range(folio, 0, folio_size(folio)); >>>>>>> fs/ntfs3/frecord.c: folio_zero_range(folio, 0, >>>>>>> folio_size(folio)); >>>>>>> mm/page_io.c: folio_zero_range(folio, 0, folio_size(folio)); >>>>>>> mm/shmem.c: folio_zero_range(folio, 0, folio_size(folio)); >>>>>>> >>>>>>> >>>>>>>> IOW, what performance testing have you done with this patch? >>>>>>> >>>>>>> No performance test before, but I write a testcase, >>>>>>> >>>>>>> 1) allocate N large folios (folio_alloc(PMD_ORDER)) >>>>>>> 2) then calculate the diff(us) when clear all N folios >>>>>>> clear_highpage/folio_zero_range/folio_zero_user >>>>>>> 3) release N folios >>>>>>> >>>>>>> the result(run 5 times) shown below on my machine, >>>>>>> >>>>>>> N=1, >>>>>>> clear_highpage folio_zero_range folio_zero_user >>>>>>> 1 69 74 177 >>>>>>> 2 57 62 168 >>>>>>> 3 54 58 234 >>>>>>> 4 54 58 157 >>>>>>> 5 56 62 148 >>>>>>> avg 58 62.8 176.8 >>>>>>> >>>>>>> >>>>>>> N=100 >>>>>>> clear_highpage folio_zero_range folio_zero_user >>>>>>> 1 11015 11309 32833 >>>>>>> 2 10385 11110 49751 >>>>>>> 3 10369 11056 33095 >>>>>>> 4 10332 11017 33106 >>>>>>> 5 10483 11000 49032 >>>>>>> avg 10516.8 11098.4 39563.4 >>>>>>> >>>>>>> N=512 >>>>>>> clear_highpage folio_zero_range folio_zero_user >>>>>>> 1 55560 60055 156876 >>>>>>> 2 55485 60024 157132 >>>>>>> 3 55474 60129 156658 >>>>>>> 4 55555 59867 157259 >>>>>>> 5 55528 59932 157108 >>>>>>> avg 55520.4 60001.4 157006.6 >>>>>>> >>>>>>> >>>>>>> >>>>>>> folio_zero_user with many cond_resched(), so time fluctuates a lot, >>>>>>> clear_highpage is better folio_zero_range as you said. >>>>>>> >>>>>>> Maybe add a new helper to convert all folio_zero_range(folio, 0, >>>>>>> folio_size(folio)) >>>>>>> to use clear_highpage + flush_dcache_folio? >>>>>> >>>>>> If this also improves performance for other existing callers of >>>>>> folio_zero_range(), then that's a positive outcome. >>>>> ... >>> hi Kefeng, >>> what's your point? providing a helper like clear_highfolio() or similar? >> >> Yes, from above test, using clear_highpage/flush_dcache_folio is better >> than using folio_zero_range() for folio zero(especially for large >> folio), so I'd like to add a new helper, maybe name it folio_zero() >> since it zero the whole folio. > > we already have a helper like folio_zero_user()? > it is not good enough? Since it is with many cond_resched(), the performance is worst...