From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CDC99C3DA63 for ; Wed, 24 Jul 2024 15:04:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 441626B0082; Wed, 24 Jul 2024 11:04:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3F07C6B0085; Wed, 24 Jul 2024 11:04:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2B8E26B0088; Wed, 24 Jul 2024 11:04:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 0DD6E6B0082 for ; Wed, 24 Jul 2024 11:04:05 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id BBABF4096F for ; Wed, 24 Jul 2024 15:04:04 +0000 (UTC) X-FDA: 82374966408.26.20FA9D3 Received: from mail-wm1-f46.google.com (mail-wm1-f46.google.com [209.85.128.46]) by imf03.hostedemail.com (Postfix) with ESMTP id BF9512002D for ; Wed, 24 Jul 2024 15:04:01 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="eaXR/dQv"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf03.hostedemail.com: domain of alexander.duyck@gmail.com designates 209.85.128.46 as permitted sender) smtp.mailfrom=alexander.duyck@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721833388; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gPvWKR2PLPzi3D5yHk1cl8pcpkDoIM3okaADhc+L+64=; b=kU+GqxEGzK14KKIRs+5S+izEvopz1kuI+uD9TwDEnkOk08mbypLcuzSrXhlxEYKuN9cm4n 7NK2bQlc8BOoyuAYXO4susCc8cO6gBP2zdS2Jjq17J6VGIjEgKzku3+RIB6Vd80Bs8+eR8 JIP5yJ96gAK43iL217HmkkSwlnXEBJw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721833388; a=rsa-sha256; cv=none; b=6SCFzB8aWhcMVNzVTEJhIP4lgkzqX1NJ55OwzpNNyfw+ScHeWilqVuJcIxDWzDgcYit9iL ajWr/EEZzMDbnirCJRgp/9d4mNFIqO/XIvlqrl02NNSoK+Gmn/88H3Ol3BcQIf79JjCNFo fgZ8QslJBBhGUxTiiMieRGl+FOFMNdQ= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="eaXR/dQv"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf03.hostedemail.com: domain of alexander.duyck@gmail.com designates 209.85.128.46 as permitted sender) smtp.mailfrom=alexander.duyck@gmail.com Received: by mail-wm1-f46.google.com with SMTP id 5b1f17b1804b1-42797289c8bso52693465e9.0 for ; Wed, 24 Jul 2024 08:04:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1721833440; x=1722438240; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=gPvWKR2PLPzi3D5yHk1cl8pcpkDoIM3okaADhc+L+64=; b=eaXR/dQvHLIHE880sP8JfvNZMuffiTk+WyWnVg/kLH++exTv6PbPgYfE3VmxgiJTZC zhimRKH98F5+dQG4jfhiL7+ukt46H41tFGKenEMADWBi9/Tl/ZMqrG8vfLJeSx2Zdw1P UDD9IkDZlN0YesYM2Q7JKz68xytAqrv6JdgfxbExQXSVXh1RrHBWrHUlMmeypzKAnQOz PB/iwuD5Jyg8l2Heg1W0tyFFBUfRDvUN2dw1UvfKscgoIlRyirv1ZOlA0fcSquOk5D/d Z8vAAE+KwnjgCuHrGnQ0uQ/xSaEN2/w94t8djj/0W8nkOFrNk4AT5jo7q8f5MWZnvs+j OkVA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721833440; x=1722438240; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gPvWKR2PLPzi3D5yHk1cl8pcpkDoIM3okaADhc+L+64=; b=q4yooUHax9PRSai3q2C3rjjsdiMCwqqK97ytFIPjskN5V9NT6mkGuFchMsJdtkucaq D7fcUXKZlnz7aF4nM6JU058ucB3u2siQJH0ymYxS+yuIZSa7xmoEiYTOQ5avhmY+JTID mien9ABs2B9lpODKxS5MLBoMXoIoGQf1Q8rKKsajM4p5JmZbp/4v0Ph4w3w8bTYgd9O3 lMWi83DRKzriNqoTwdYqxTgvQah920bMXAtElpjsAfwHEhfSxhmNGAXlkaNaUjFrFdIt BlDX1uQaf4ZpY2WICQJsDVe4j6R33cL9b9lNGbAbmlwbW7ZHIS+W8HUGi170ek4WgRyv e6XA== X-Forwarded-Encrypted: i=1; AJvYcCWE9Jwupdltp4aIiubgGqM1zhzwza0YAMIZqek9qvUAQUK+ksgXC1w9Wi0quNy4eNvsaNCGrAn4PXmzMnw0bXRVON4= X-Gm-Message-State: AOJu0Yxf3zO/iKOjpXhwFd153VqBYvH0Z/HK2lAtEksXgIO4Lvk52UoI AnXThkF4y1bFo7QFnE6AWB6RkuWp6rgSaN6M9zfgPiFoM88jb2Ee4poiZjbq9UxvSL0OtCTWaI+ YWOa6fTROu9M9z4qHtPmrkK8gTOHPzr+l X-Google-Smtp-Source: AGHT+IGRehIHe7oI2KBNV6dibdwlRyBBsOutgRPRaF36jnIlTmaAazTBYjAILxRbMGnmPORx5UR7Itj+UkiW7eCLIWg= X-Received: by 2002:adf:eccb:0:b0:367:4dbb:ed4e with SMTP id ffacd0b85a97d-369f59c21b3mr1406072f8f.0.1721833439697; Wed, 24 Jul 2024 08:03:59 -0700 (PDT) MIME-Version: 1.0 References: <20240719093338.55117-1-linyunsheng@huawei.com> <20240719093338.55117-10-linyunsheng@huawei.com> In-Reply-To: From: Alexander Duyck Date: Wed, 24 Jul 2024 08:03:23 -0700 Message-ID: Subject: Re: [RFC v11 09/14] mm: page_frag: use __alloc_pages() to replace alloc_pages_node() To: Yunsheng Lin Cc: davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Andrew Morton , linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: BF9512002D X-Stat-Signature: hophcfmpi9nqqp58y6zyxcyaknn1ggas X-Rspam-User: X-HE-Tag: 1721833441-205406 X-HE-Meta: U2FsdGVkX1+8bkPIVq0DBIYpE5E4kQY3isDAnD1IoSsyHPhwMMnec/eh0j5Og5PL0awZFEktVW6mmtr4INW8Z2tnfeiVwLAqDWKH/LTVrFg3ZoLUKpQ1GS0WmdTYcvE0r8jzvI8z3+ryOoycfGGEFBN+1TV8/RFrP94d8FG1TMUbR2UgzpAhK50bJ8VTq1AQHRC9UpACQR5hOPtYfGVSSnddfSgxFYsT6OO5KytjjaGAH6/XcXLkx5pWbc8RsRciO0rvpzUXKSFHg9uPlPuUdvt7wl5woZYXd6oe5qNKgY/4goeOXvDwdFK73MGdmsVWgStfG6eC5Ba1wLxQaTDX9r62kbxOOS/DjgDLcY36SWDUcyoBTBGsbX8p0V+qniBAoZPwzwforLmxr/kcdgjGJEu+c9fv7BRhE9RktoM+LfGFX6668LNDtmwEc2IYms796brzHrcLdjgeGPI2vtc4S2s6GebaVRI6bf117beGRJGu91QK3BJCkpR4xNRqPPCI6+kJHe0otXpP4gFvSNUnpfXfRAoUcVk+42X3xH+vyYEv3elGnfr0G7lcCmsiWq+VEwpjpWIEcSPP7Gyub2sbzaovPWPpDMR7pIizFrqH6pnX0cIdDRsSFtkioywlm8Cvy0pnMbQrm3aFEJWIS2AF/Q24vTZwafLcBPos6AEm+TPLYTWzPjFD4CHQPQOYyujXpfXdcbviyBxda+uwXVlKwr+3M8cl5EIBbAate1scMlWGMQS/clsQOcho7AQE3pErCvni/9rvO2dbsn7ewSaDRvPHe4wRSeloUKubOsHrX/jhpRP/7pyF71MUeePNRUQeJDKEX2+0JO+teifbsTCYc7OBNbt61EYQMqZqN2kceH2HKJj+7Ycczxtjz5eNmNpYRIz7NPe78fAdOm1S6NCVj+yd6IU2h/sqZa7vQxCQdlDYOpIruz0cxDqCao2SSyLMROPQg2Tnt9HkQI06ZXa cA0xcAr3 ZTiiZbqIW6Ilafas9dlNNL+Xn+aMkRd5/uh2P579Yy7ewpoJi6zKLWsi0j3pW+Z+sG9gq7nH6RSoFUjH/WRP7WV8ZDbNf+PUfp5NfVUitoITD3UgWCJkDhTXkENRCMbu3f7iFlfu50lTTKbJVANps3/7UJTNa7HunQS3LK8VCzZpK3Wnj+4mtENX+n9nFvUI0mUZGJliKfPqxerScQhsAB10vZ1x29Y6HAncj5Yxz6ubcDW4CBZhsI1bzZcZGWfEDOIh1P2Ea53TH4LAghxlLijud4uD18RJv+T/PqvyvAD7t4nIW9MZ+bHSnZ8WtQVQCq5JXpcie7TSc1Lv8F3s26Ba+3xrchu77f98SPhpSPfi1MsaiOsAfPPaYDFZ7vmL47xs2bZcKS+wm6bwAVIeqBbvfgPmc31fTQwrhe4zJ67SGarI3DVVhtiBXmTJuV4QXcyCw3J+dGyMrKBE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jul 24, 2024 at 5:55=E2=80=AFAM Yunsheng Lin wrote: > > On 2024/7/22 5:41, Alexander H Duyck wrote: > > ... > > >> if (unlikely(!page)) { > >> - page =3D alloc_pages_node(NUMA_NO_NODE, gfp, 0); > >> + page =3D __alloc_pages(gfp, 0, numa_mem_id(), NULL); > >> if (unlikely(!page)) { > >> memset(nc, 0, sizeof(*nc)); > >> return NULL; > > > > So if I am understanding correctly this is basically just stripping the > > checks that were being performed since they aren't really needed to > > verify the output of numa_mem_id. > > > > Rather than changing the code here, it might make more sense to update > > alloc_pages_node_noprof to move the lines from > > __alloc_pages_node_noprof into it. Then you could put the VM_BUG_ON and > > warn_if_node_offline into an else statement which would cause them to > > be automatically stripped for this and all other callers. The benefit > > I suppose you meant something like below: > > @@ -290,10 +290,14 @@ struct folio *__folio_alloc_node_noprof(gfp_t gfp, = unsigned int order, int nid) > static inline struct page *alloc_pages_node_noprof(int nid, gfp_t gfp_ma= sk, > unsigned int order) > { > - if (nid =3D=3D NUMA_NO_NODE) > + if (nid =3D=3D NUMA_NO_NODE) { > nid =3D numa_mem_id(); > + } else { > + VM_BUG_ON(nid < 0 || nid >=3D MAX_NUMNODES); > + warn_if_node_offline(nid, gfp_mask); > + } > > - return __alloc_pages_node_noprof(nid, gfp_mask, order); > + return __alloc_pages_noprof(gfp_mask, order, nid, NULL); > } Yes, that is more or less what I was thinking. > > would likely be much more significant and may be worthy of being > > accepted on its own merit without being a part of this patch set as I > > would imagine it would show slight gains in terms of performance and > > binary size by dropping the unnecessary instructions. > > Below is the result, it does reduce the binary size for > __page_frag_alloc_align() significantly as expected, but also > increase the size for other functions, which seems to be passing > a runtime nid, so the trick above doesn't work. I am not sure if > the overall reduction is significant enough to justify the change? > It seems that depends on how many future callers are passing runtime > nid to alloc_pages_node() related APIs. > > [linyunsheng@localhost net-next]$ ./scripts/bloat-o-meter vmlinux.org vml= inux > add/remove: 1/2 grow/shrink: 13/8 up/down: 160/-256 (-96) > Function old new delta > bpf_map_alloc_pages 708 764 +56 > its_probe_one 2836 2860 +24 > iommu_dma_alloc 984 1008 +24 > __iommu_dma_alloc_noncontiguous.constprop 1180 1192 +12 > e843419@0f3f_00011fb1_4348 - 8 +8 > its_vpe_irq_domain_deactivate 312 316 +4 > its_vpe_irq_domain_alloc 1492 1496 +4 > its_irq_domain_free 440 444 +4 > iommu_dma_map_sg 1328 1332 +4 > dpaa_eth_probe 5524 5528 +4 > dpaa2_eth_xdp_xmit 676 680 +4 > dpaa2_eth_open 564 568 +4 > dma_direct_get_required_mask 116 120 +4 > __dma_direct_alloc_pages.constprop 656 660 +4 > its_vpe_set_affinity 928 924 -4 > its_send_single_command 340 336 -4 > its_alloc_table_entry 456 452 -4 > dpaa_bp_seed 232 228 -4 > arm_64_lpae_alloc_pgtable_s1 680 676 -4 > __arm_lpae_alloc_pages 900 896 -4 > e843419@0473_00005079_16ec 8 - -8 > e843419@0189_00001c33_1c8 8 - -8 > ringbuf_map_alloc 612 600 -12 > __page_frag_alloc_align 740 536 -204 > Total: Before=3D30306836, After=3D30306740, chg -0.00% I'm assuming the compiler must have uninlined __alloc_pages_node_noprof in the previous version of things for the cases where it is causing an increase in the code size. One alternative approach we could look at doing would be to just add the following to the start of the function: if (__builtin_constant_p(nid) && nid =3D=3D NUMA_NO_NODE) return __alloc_pages_noprof(gfp_mask, order, numa_mem_id(), NULL); That should yield the best result as it essentially skips over the problematic code at compile time for the constant case, otherwise the code should be fully stripped so it shouldn't add any additional overhead.