From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F64FE674AF for ; Fri, 1 Nov 2024 07:44:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 79BA06B0085; Fri, 1 Nov 2024 03:44:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 772AB6B0089; Fri, 1 Nov 2024 03:44:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 63A236B008A; Fri, 1 Nov 2024 03:44:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 45B116B0085 for ; Fri, 1 Nov 2024 03:44:06 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id EADCF80CD6 for ; Fri, 1 Nov 2024 07:44:05 +0000 (UTC) X-FDA: 82736736516.27.6D51645 Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by imf30.hostedemail.com (Postfix) with ESMTP id E7C3180019 for ; Fri, 1 Nov 2024 07:43:09 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=none; spf=pass (imf30.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730446865; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Ml2+MXj8jDkOfsMf2++inQP6gjbcUXU0uoyc9jfU1pw=; b=e2uo5QsvXIGjE9Q2TantJEZ5kYc721MEfGPJhbdHtEm5o1/bj5Gy8pM2PIO+UWp9twQovb kfZemXKLJRLyYNk8+Q95eWo8wlTrRCCT3vcB6og8HLhzf8XX70y3in+maaPD+SKDKIxZ6P oIPpS6q84MaXO1NXiExyshwlfQ5oBZ0= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=none; spf=pass (imf30.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730446865; a=rsa-sha256; cv=none; b=nqo3f+6lfJxyFWUg0XvohNMMilxDB98blReyR4ql0AcVXhF5Iq5QSWsXWvalC3D46I+xBM IksZTdriUj0UePUhPILajb/Pu2/tXyO5J/42USE7t2vy76o6/4M7HOuEAZX1IN1gokN7xQ 0zAKv/ljEuHSYykxuRjfviAMDMRA4KA= Received: from mail.maildlp.com (unknown [172.19.163.174]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4XftBQ4TS3zQsTv; Fri, 1 Nov 2024 15:42:54 +0800 (CST) Received: from dggpemf100008.china.huawei.com (unknown [7.185.36.138]) by mail.maildlp.com (Postfix) with ESMTPS id 3BAF0140390; Fri, 1 Nov 2024 15:43:56 +0800 (CST) Received: from [10.174.177.243] (10.174.177.243) by dggpemf100008.china.huawei.com (7.185.36.138) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Fri, 1 Nov 2024 15:43:55 +0800 Message-ID: <848e4b40-f734-475f-9b1e-2f543e622a6c@huawei.com> Date: Fri, 1 Nov 2024 15:43:55 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird From: Kefeng Wang Subject: Re: [PATCH v2 1/2] mm: use aligned address in clear_gigantic_page() To: "Huang, Ying" CC: David Hildenbrand , Andrew Morton , Matthew Wilcox , Muchun Song , , Zi Yan References: <20241026054307.3896926-1-wangkefeng.wang@huawei.com> <54f5f3ee-8442-4c49-ab4e-c46e8db73576@huawei.com> <4219a788-52ad-4d80-82e6-35a64c980d50@redhat.com> <127d4a00-29cc-4b45-aa96-eea4e0adaed2@huawei.com> <9b06805b-4f4f-4b37-861f-681e3ab9d470@huawei.com> <113d3cb9-0391-48ab-9389-f2fd1773ab73@redhat.com> <878qu6wgcm.fsf@yhuang6-desk2.ccr.corp.intel.com> <87sese9sy9.fsf@yhuang6-desk2.ccr.corp.intel.com> <64f1c69d-3706-41c5-a29f-929413e3dfa2@huawei.com> <87v7x88y3q.fsf@yhuang6-desk2.ccr.corp.intel.com> Content-Language: en-US In-Reply-To: <87v7x88y3q.fsf@yhuang6-desk2.ccr.corp.intel.com> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.177.243] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To dggpemf100008.china.huawei.com (7.185.36.138) X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: E7C3180019 X-Stat-Signature: s3t3ebncdfyj97u36b1cwj7xxhits3h8 X-Rspam-User: X-HE-Tag: 1730446989-179982 X-HE-Meta: U2FsdGVkX1+7ydCeP9/ibTkKfCET8QrxvXvnOdcXK1E21ARhPdjS8VoHV/rDpG6MtBQA721SAon5thS9UeGBbekUIrxoF08AWKVqAe5shFxmA5Nu4PiHeg2Qm3dVj5wZAb9HvtKozpdcfwVLtFTjxXON9apqgVZV6rPsSgj3HDDXwVnrHDtUXYutqkaKkfK21Db03abXgacXO8hI3T02zXt8OvDERaxzDlWRQ3BH6L6YxnLJxbisdRWjmcAS+oWCtdOkE08hTLATgJaKB9r3PwuhKisuwlJf4IHHfCgQ9mgacTOj7qkD/x+JItkDXEkPyNX5sguJtpKoX6rVSV5KwJWnnMgvJOHtqGUJCswbGF699q4gzg2b+p+yCLnHctp8eBR+VREfgzPi/BGi6hOXrpQ/wlbNxFmSGlig+AL8tssjBD4kV/4/ZwLzREJbKCOYanhf/IMvWYHuy51kRVwGPCCpVM7QJygkRlfJMVWIcDVtGVzbsAjKo2DVKpI/Vccy/Tfn3UCUGykXDtGBz8UxyMoEFtGzMOazUm3x72vSmZ2SnrYVfTyIfBlZ9kaCpVGr7w8xg9PtkhQfRBQOIxZ6PMUwqCsp6BSvZwH/DKjApxJwzyZB/QOxx6sw0G3DVuGTptqfD4AI+LmLdKU0/r7FCxUb/A/tbxW5i627W2sDScex0oKiDaN2PwDcNdtWQ7ZwicKRSXp9BcABN4MQtmDpMnpkdDbLqfb7rfGD3MJHjLqOo8mZiTB74NCzAbnzobrcqxo5T/ZQIVpI7Y7EbvKYpr9BFJBNIRSFms0k/W8/QF0rSGPY8+MkWIip/sH+iNGbJloFT7AM42cZsD3H3i/loDtf9DVGVY6fRQ/OeyWaBVQZqBgsdKSYLP4f6WuCUwRcaotjQ6iTCo8JZgPAX37FioeNLv5j7gqr8E9St/LDBZY6EOyZX/Nk5T+u+wbBI1Tmm7gYmDOJ5J2TAa4WMcU YSPkwc+o Oru9ssTSdV3xqOTS9JmI9xdFNCLDAgWV4vzGl0AbjFrJq5vygen5jxUY3RXyhjAv34vQvolB+ifiG6F16gl6TJTmGpTnsTNoKaufWD4zBikdKZw9tN1MRYShEeiDjdBphqVr3MaeC5wMlgvm4nRPPEBcJkZnF9MK3PoUUk8GY1cBx0f4dTNcBCHFzDki030sWqNR6E10Ak8MUSxtp6Q0lD2gXJwGjktsZ1VhvpZlwrriPFRJEgx8PHBP66VxHWaQi9g/WL/zTZS1AR5AtlTkRSqAe1Fgn4mzKWc3s X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/10/31 16:39, Huang, Ying wrote: > Kefeng Wang writes: > > [snip] >> >>>> 1) Will test some rand test to check the different of performance as >>>> David suggested.>>>> >>>> 2) Hope the LKP to run more tests since it is very useful(more test >>>> set and different machines) >>> I'm starting to use LKP to test. >> >> Greet. Sorry for the late, > > I have run some tests with LKP to test. > > Firstly, there's almost no measurable difference between clearing pages > from start to end or from end to start on Intel server CPU. I guess > that there's some similar optimization for both direction. > > For multiple processes (same as logical CPU number) > vm-scalability/anon-w-seq test case, the benchmark score increases > about 22.4%. So process_huge_page is better than clear_gigantic_page() on Intel? Could you test the following case on x86? echo 10240 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages mkdir -p /hugetlbfs/ mount none /hugetlbfs/ -t hugetlbfs rm -f /hugetlbfs/test && fallocate -l 20G /hugetlbfs/test && fallocate -d -l 20G /hugetlbfs/test && time taskset -c 10 fallocate -l 20G /hugetlbfs/test > > For multiple processes vm-scalability/anon-w-rand test case, no > measurable difference for benchmark score. > > So, the optimization helps sequential workload mainly. > > In summary, on x86, process_huge_page() will not introduce any > regression. And it helps some workload. > > However, on ARM64, it does introduce some regression for clearing pages > from end to start. That needs to be addressed. I guess that the > regression can be resolved via using more clearing from start to end > (but not all). For example, can you take a look at the patch below? > Which uses the similar framework as before, but clear each small trunk > (mpage) from start to end. You can adjust MPAGE_NRPAGES to check when > the regression can be restored. > > WARNING: the patch is only build tested. Base: baseline Change1: using clear_gigantic_page() for 2M PMD Change2: your patch with MPAGE_NRPAGES=16 Change3: Case3 + fix[1] Change4: your patch with MPAGE_NRPAGES=64 + fix[1] 1. For rand write, case-anon-w-rand/case-anon-w-rand-hugetlb no measurable difference 2. For seq write, 1) case-anon-w-seq-mt: base: real 0m2.490s 0m2.254s 0m2.272s user 1m59.980s 2m23.431s 2m18.739s sys 1m3.675s 1m15.462s 1m15.030s Change1: real 0m2.234s 0m2.225s 0m2.159s user 2m56.105s 2m57.117s 3m0.489s sys 0m17.064s 0m17.564s 0m16.150s Change2: real 0m2.244s 0m2.384s 0m2.370s user 2m39.413s 2m41.990s 2m42.229s sys 0m19.826s 0m18.491s 0m18.053s Change3: // best performance real 0m2.155s 0m2.204s 0m2.194s user 3m2.640s 2m55.837s 3m0.902s sys 0m17.346s 0m17.630s 0m18.197s Change4: real 0m2.287s 0m2.377s 0m2.284s user 2m37.030s 2m52.868s 3m17.593s sys 0m15.445s 0m34.430s 0m45.224s 2) case-anon-w-seq-hugetlb very similar 1), Change4 slightly better than Change3, but not big different. 3) hugetlbfs fallocate 20G Change1(0m1.136s) = Change3(0m1.136s) = Change4(0m1.135s) < Change2(0m1.275s) < base(0m3.016s) In summary, the Change3 is best and Change1 is good on my arm64 machine. > > Best Regards, > Huang, Ying > > -----------------------------------8<---------------------------------------- > From 406bcd1603987fdd7130d2df6f7d4aee4cc6b978 Mon Sep 17 00:00:00 2001 > From: Huang Ying > Date: Thu, 31 Oct 2024 11:13:57 +0800 > Subject: [PATCH] mpage clear > > --- > mm/memory.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++--- > 1 file changed, 67 insertions(+), 3 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 3ccee51adfbb..1fdc548c4275 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -6769,6 +6769,68 @@ static inline int process_huge_page( > return 0; > } > > +#define MPAGE_NRPAGES (1<<4) > +#define MPAGE_SIZE (PAGE_SIZE * MPAGE_NRPAGES) > +static inline int clear_huge_page( > + unsigned long addr_hint, unsigned int nr_pages, > + int (*process_subpage)(unsigned long addr, int idx, void *arg), > + void *arg) > +{ > + int i, n, base, l, ret; > + unsigned long addr = addr_hint & > + ~(((unsigned long)nr_pages << PAGE_SHIFT) - 1); > + unsigned long nr_mpages = ((unsigned long)nr_pages << PAGE_SHIFT) / MPAGE_SIZE; > + > + /* Process target subpage last to keep its cache lines hot */ > + might_sleep(); > + n = (addr_hint - addr) / MPAGE_SIZE; > + if (2 * n <= nr_mpages) { > + /* If target subpage in first half of huge page */ > + base = 0; > + l = n; > + /* Process subpages at the end of huge page */ > + for (i = nr_mpages - 1; i >= 2 * n; i--) { > + cond_resched(); > + ret = process_subpage(addr + i * MPAGE_SIZE, > + i * MPAGE_NRPAGES, arg); > + if (ret) > + return ret; > + } > + } else { > + /* If target subpage in second half of huge page */ > + base = nr_mpages - 2 * (nr_mpages - n); > + l = nr_mpages - n; > + /* Process subpages at the begin of huge page */ > + for (i = 0; i < base; i++) { > + cond_resched(); > + ret = process_subpage(addr + i * MPAGE_SIZE, > + i * MPAGE_NRPAGES, arg); > + if (ret) > + return ret; > + } > + } > + /* > + * Process remaining subpages in left-right-left-right pattern > + * towards the target subpage > + */ > + for (i = 0; i < l; i++) { > + int left_idx = base + i; > + int right_idx = base + 2 * l - 1 - i; > + > + cond_resched(); > + ret = process_subpage(addr + left_idx * MPAGE_SIZE, > + left_idx * MPAGE_NRPAGES, arg); > + if (ret) > + return ret; > + cond_resched(); > + ret = process_subpage(addr + right_idx * MPAGE_SIZE, > + right_idx * MPAGE_NRPAGES, arg); > + if (ret) > + return ret; > + } > + return 0; > +} > + > static void clear_gigantic_page(struct folio *folio, unsigned long addr, > unsigned int nr_pages) > { > @@ -6784,8 +6846,10 @@ static void clear_gigantic_page(struct folio *folio, unsigned long addr, > static int clear_subpage(unsigned long addr, int idx, void *arg) > { > struct folio *folio = arg; > + int i; > > - clear_user_highpage(folio_page(folio, idx), addr); > + for (i = 0; i < MPAGE_NRPAGES; i++) > + clear_user_highpage(folio_page(folio, idx + i), addr + i * PAGE_SIZE); > return 0; > } > > @@ -6798,10 +6862,10 @@ void folio_zero_user(struct folio *folio, unsigned long addr_hint) > { > unsigned int nr_pages = folio_nr_pages(folio); > > - if (unlikely(nr_pages > MAX_ORDER_NR_PAGES)) > + if (unlikely(nr_pages != HPAGE_PMD_NR)) > clear_gigantic_page(folio, addr_hint, nr_pages); > else > - process_huge_page(addr_hint, nr_pages, clear_subpage, folio); > + clear_huge_page(addr_hint, nr_pages, clear_subpage, folio); > } > > static int copy_user_gigantic_page(struct folio *dst, struct folio *src, [1] fix patch diff --git a/mm/memory.c b/mm/memory.c index b22d4b83295b..aee99ede0c4f 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -6816,7 +6816,7 @@ static inline int clear_huge_page( base = 0; l = n; /* Process subpages at the end of huge page */ - for (i = nr_mpages - 1; i >= 2 * n; i--) { + for (i = 2 * n; i < nr_mpages; i++) { cond_resched(); ret = process_subpage(addr + i * MPAGE_SIZE, i * MPAGE_NRPAGES, arg);