From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0053AC4345F for ; Wed, 17 Apr 2024 15:11:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3BBDA6B0093; Wed, 17 Apr 2024 11:11:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 36BFA6B0095; Wed, 17 Apr 2024 11:11:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 20C736B0098; Wed, 17 Apr 2024 11:11:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id F38776B0093 for ; Wed, 17 Apr 2024 11:11:28 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 71E3AA1558 for ; Wed, 17 Apr 2024 15:11:28 +0000 (UTC) X-FDA: 82019362656.04.7B362AF Received: from mail-pf1-f175.google.com (mail-pf1-f175.google.com [209.85.210.175]) by imf20.hostedemail.com (Postfix) with ESMTP id 81E5B1C002B for ; Wed, 17 Apr 2024 15:11:26 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=R2iU5Pdv; spf=pass (imf20.hostedemail.com: domain of alexander.duyck@gmail.com designates 209.85.210.175 as permitted sender) smtp.mailfrom=alexander.duyck@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1713366686; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=dpBZLOS2Fuv5Seble3BWO+qJ4xS8G4uoQZKUk1ZQWaE=; b=otnFhhcBau0Y1rFZP984DQyLiRPLwWIzl0jfvCUT0DnQE0bavSpO/B+lcSr1//K74hDg/v h4vNf/JlKh6MgXh8HWuPq2NVKAUh8Jt1mfltG1Bi+0Dv0obnjOrYRflca8Q5Wq6nYLBntD 8OaeaiQatoQQR/Q7TyJXXfnQDWMSDNs= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=R2iU5Pdv; spf=pass (imf20.hostedemail.com: domain of alexander.duyck@gmail.com designates 209.85.210.175 as permitted sender) smtp.mailfrom=alexander.duyck@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713366686; a=rsa-sha256; cv=none; b=r2iVyIj5A13PJJFBHmba+UPBOzQUHxK22wB4Lz9h390uHCAnWHALgZc3Fa3ifXqUeys9kz euTVq0fRtsU+XMItxXlyICpxVt81g+/XkPUNKfpQb/Mj+vQkbFUZnWIDSrf0k12+gtSICq iPsODMdnf0jL0z40vbtFkYEae/BrAdw= Received: by mail-pf1-f175.google.com with SMTP id d2e1a72fcca58-6ed0e9ccca1so5214751b3a.0 for ; Wed, 17 Apr 2024 08:11:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1713366685; x=1713971485; darn=kvack.org; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:cc:to:from:subject:message-id:from:to:cc:subject :date:message-id:reply-to; bh=dpBZLOS2Fuv5Seble3BWO+qJ4xS8G4uoQZKUk1ZQWaE=; b=R2iU5PdvZKtwBpE9n153Yj+CUlDiRcyKVTHXF8mNfTr7JVHdL613kTf0opZaAneOSb W6XFoIT1zYTyJ0M/1vcRbum8NgljzMZAfe0TmB728mV5Vb+TCQef3lXXAbOigc8UCD4i WGHTDEW2pUPXR5Z3TXmnwOeE4SxpHe/MxTHsF3dOLJtpycg5moNY9TKTD6B66TynxwsE MM6R8Ru5pQ9LaHA6qcAxn998RsRASlGfyNPO7bdfRbipYIMlPTX96k1RGFCCwOFs3AHm MNUB2h0ExdM1P/+ZQ1WcELv6SGO7bbYafBiPnSvG3SAi+vpsxLXCHNDdqQEAIIXoC+uA NYew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713366685; x=1713971485; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:cc:to:from:subject:message-id:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=dpBZLOS2Fuv5Seble3BWO+qJ4xS8G4uoQZKUk1ZQWaE=; b=ikLd3qs2k4GER+tlik1o77ofRSAhrghriULFRTnuC/mG534DglhYZakYRMkv2w4TUo romT+I6qAllOOtSNSMeTRLsT5eeSIOUwNvX5Y2zM8wYX2Msz5WzXKHxSKmLuxXzi4BX1 luaL1P0q4S9LGnYMSIR44vFhG6Tv2gk1cJKc2OjAySAHE1k6su+p+/4SF1Giyzcn+NMD AjeHVp1wH1TpdGo4KyY0Qb20l6Wu99ymW7jDQjLECnLcYSt8RFtrYQf1DwqrhigINuyG 6YCPv4gV1VPcli1Qii9rZizGtlotmaFQLeZf5HFFYvfI9LcEYINrh/Si5nsFVUX1h+xE zmyA== X-Forwarded-Encrypted: i=1; AJvYcCXIheiE8HzOjH9ZqzCc6FVNYa6L63qQfns9137ZYRSvZ90n0UWWNR3vBoqs8rlpQRTckNET4ClJY1n8i249dVuSJLE= X-Gm-Message-State: AOJu0YxdlA8aAQQK5iW8GqMqRnC0SfB2sCRPUA6CfSCHNo4Mp8I0vSJ+ +Gp46enw2j9abY51BwxwKhOMBt9XSHG6tpGlG1MSdY/ycBK3Z+h5 X-Google-Smtp-Source: AGHT+IF60I58mUil+yAvrTASGIidkjvHH9ReBEji9jXZNYg8MVFL+W19+1BbY1JcwK3yslsobpUhyg== X-Received: by 2002:a17:90b:388c:b0:2a4:7133:7e02 with SMTP id mu12-20020a17090b388c00b002a471337e02mr14723720pjb.35.1713366685168; Wed, 17 Apr 2024 08:11:25 -0700 (PDT) Received: from ?IPv6:2605:59c8:43f:400:82ee:73ff:fe41:9a02? ([2605:59c8:43f:400:82ee:73ff:fe41:9a02]) by smtp.googlemail.com with ESMTPSA id b9-20020a17090a010900b002a67079c3absm1608837pjb.42.2024.04.17.08.11.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Apr 2024 08:11:24 -0700 (PDT) Message-ID: <17066b6a4f941eea3ef567767450b311096da22b.camel@gmail.com> Subject: Re: [PATCH net-next v2 09/15] mm: page_frag: reuse MSB of 'size' field for pfmemalloc From: Alexander H Duyck To: Yunsheng Lin , davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Andrew Morton , linux-mm@kvack.org Date: Wed, 17 Apr 2024 08:11:23 -0700 In-Reply-To: <8b7361c2-6f45-72e8-5aca-92e8a41a7e5e@huawei.com> References: <20240415131941.51153-1-linyunsheng@huawei.com> <20240415131941.51153-10-linyunsheng@huawei.com> <37d012438d4850c3d7090e784e09088d02a2780c.camel@gmail.com> <8b7361c2-6f45-72e8-5aca-92e8a41a7e5e@huawei.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.48.4 (3.48.4-1.fc38) MIME-Version: 1.0 X-Stat-Signature: bch7czx4jzwiciu7cahfioxjzfrm5sqb X-Rspamd-Queue-Id: 81E5B1C002B X-Rspamd-Server: rspam10 X-Rspam-User: X-HE-Tag: 1713366686-781260 X-HE-Meta: U2FsdGVkX19fYIOrdjqKRd0DPRiQmxqf/1FVfuNqAba+nRw73k8WkoBLndVcApZA+HvMXj0kNqm768U/REpwMki4F+j52xiSOIkC2aVgernWArNh7nZOtcouJhBdzajmrAPaTsPJjXZssi9ENvdWGjDB6PSm7HOca8w5AQv1/BkPTYFMk2xzna0kXt0EhbmY5GRmdOY7y5HstBfQ3rE4qxCnlpcQTZcj4Q20Zz74cIykAaMBNxvE8Si6vtgbDCCFcMlr3ZDgEkiT4b6p0OxXUqYGLBJHKPFGUOCY944nWwv0Xlc1EHLqyqKCwBsyYypYcLFRkxuTR3W8H5xnDOeBsDp6ucXjh7/S4Ugsfz7rKSHvwHI7Vg6TprurAv56WraG09wcw+FGCnIxrgPPEzcVR7UqkhRVO/m5I2ysjCRoSo7wIAU6wtOngN/rQ0Do5/fO1y4BqCnS6b711+XwMHzfQvmPnLR4VhC7SRs5TN3RfLmXQ8nPJ+Uamynd6a+RZbV7XgX21LSetVfLK/WuXUBaJRlKX+RQbNMzj1igKDbQHjp/pQlHvJJlN1o12YNgzu7YNArceDW6ItBL+wj+QetXvv40eSOB0b93GCWZPnNmQIoWu1dX3QKWeEfe0lTdl45/C09FhiM+Hk6r/rISkOSDuD6vh5QTDIr9YGKjtn7fnTN6qpS9l4V8+BoACT3mivvKiDqXZHGqgiX2fuBt0C6hMnkQoUg0WsvjAIpMBVagHBLC7OWx3Jxmy0yegTPQSlM2O6MGgDUHpJeItN8KzSKjcqtog/GEN8J76ZmlbAeyD1aVHBo/ljiOsSB428CzDJvnbCRF4pZ01ApP2a/wkAabDJySiv0gtZhxj+e/R2Nf6i+0IXkCoj0ulIT4xIDsS4ZobaLLti8it808bND/lvkUrfL8JMpixjJA5Fwjifd+NBnfm5mKX7xSxlSoE5Gg+g4RBpFEXFfCZ7SalMp3l2m pdZvS8ZV c4glNX+3t4NF9KFpvXH1Y21nDYKjDTStLQrvUDLbKdYctz0vaaomao04Tcc2KsQfMAyu2VNYxJuLZq8MXM2Axtfy4QkJ9Ticawh+Kjx2Do5CRHaBG7CwgV9bGgfVxKjCn3LqDJ8Vbx5op4ei9TvBcvOLxJTmzyNB2PeYWeaf7wRw7jHBuUXm5FrhoNlIvH7N35vnVFeMfxZhEI3Wtlc6/OWLM7+Cv+EiVAlOaKC+xF6F9T+sPj31xu0goRfU0t4sVD/K8T/O/tFWBMMw1d9F1e6bYORfJDe6zroViMuKQQsq76D04Gd9aQrzi4vfp87Nxpd3jZwz9nGYR2Fqrg4STc+0XsQ9kQofvxl/Vx8gYfXGE49pLk0L58ahS7Blc6zhF+oZE0zscDU8x2vetcGL7VKx/CNY9sfOViwBl6H7IC3bncZAaMXb8g6koGDzIpzV4FiDV/5C+j8ByM9vi+Om6XMV+SXQq3wh0qjbq2egXHH3hXFTI40d/j2MRxOHIoTLFBRB+ZYZtfY3dLE4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, 2024-04-17 at 21:19 +0800, Yunsheng Lin wrote: > On 2024/4/17 0:22, Alexander H Duyck wrote: > > On Mon, 2024-04-15 at 21:19 +0800, Yunsheng Lin wrote: > > > The '(PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)' case is for the > > > system with page size less than 32KB, which is 0x8000 bytes > > > requiring 16 bits space, change 'size' to 'size_mask' to avoid > > > using the MSB, and change 'pfmemalloc' field to reuse the that > > > MSB, so that we remove the orginal space needed by 'pfmemalloc'. > > >=20 > > > For another case, the MSB of 'offset' is reused for 'pfmemalloc'. > > >=20 > > > Signed-off-by: Yunsheng Lin > > > --- > > > include/linux/page_frag_cache.h | 13 ++++++++----- > > > mm/page_frag_cache.c | 5 +++-- > > > 2 files changed, 11 insertions(+), 7 deletions(-) > > >=20 > > > diff --git a/include/linux/page_frag_cache.h b/include/linux/page_fra= g_cache.h > > > index fe5faa80b6c3..40a7d6da9ef0 100644 > > > --- a/include/linux/page_frag_cache.h > > > +++ b/include/linux/page_frag_cache.h > > > @@ -12,15 +12,16 @@ struct page_frag_cache { > > > void *va; > > > #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) > > > __u16 offset; > > > - __u16 size; > > > + __u16 size_mask:15; > > > + __u16 pfmemalloc:1; > > > #else > > > - __u32 offset; > > > + __u32 offset:31; > > > + __u32 pfmemalloc:1; > > > #endif > >=20 > > This seems like a really bad idea. Using a bit-field like this seems > > like a waste as it means that all the accesses now have to add > > additional operations to access either offset or size. It wasn't as if > > this is an oversized struct, or one that we are allocating a ton of. As > > such I am not sure why we need to optmize for size like this. >=20 > For the old 'struct page_frag' use case, there is one 'struct page_frag' > for every socket and task_struct, there may be tens of thousands of > them even in a 32 bit sysmem, which might mean a lof of memory even for > a single byte saving, see patch 13. >=20 Yeah, I finally had time to finish getting through the patch set last night. Sorry for quick firing reviews but lately I haven't had much time to work on upstream work, and as you mentioned last time I only got to 3 patches so I was trying to speed through. I get that you are trying to reduce the size but in the next patch you actually end up overshooting that on x86_64 systems. I am assuming that is to try to account for the 32b use case? On 64b I am pretty sure you don't get any gain since a long followed by two u16s and an int will still be 16B. What we probably need to watch out for is the optimization for size versus having to add instructions to extract and insert the data back into the struct. Anyway as far as this layout I am not sure it is the best way to go. You are combining pfmemalloc with either size *OR* offset, and then combining the pagecnt_bias with the va. I'm wondering if it wouldn't make more sense to look at putting together the structure something like: #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) typedef u16 page_frag_bias_t; #else typedef u32 page_frag_bias_t; #endif struct page_frag_cache { /* page address and offset */ void *va; page_frag_bias_t pagecnt_bias; u8 pfmemalloc; u8 page_frag_order; } The basic idea would be that we would be able to replace the size mask with just a shift value representing the page order of the page being fragmented. With that we can reduce the size to just a single byte. In addition we could probably leave it there regardless of build as the order should be initialized to 0 when this is allocated to it would be correct even in the case where it isn't used (and there isn't much we can do about the hole anyway). In addition by combining the virtual address with the offset we can just use the combined result for what we need. The only item that has to be worked out is how to deal with the end of a page in the count up case. However the combination seems the most logical one since they are meant to be combined ultimately anyway. It does put limits on when we can align things as we don't want to align ourselves into the next page, but I think it makes more sense then the other limits that had to be put on allocations and such in order to allow us to squeeze pagecnt_bias into the virtual address. Anyway I pulled in your patches and plan to do a bit of testing, after I figure out what the nvme disk ID regression is I am seeing. My main concern can be summed up as the NIC driver use case (netdev/napi_alloc_frag callers) versus the socket/vhost use case. The main thing in the case of the NIC driver callers is that we have a need for isolation and guarantees that we won't lose cache line alignment. I think those are the callers you are missing in your benchmarks, but arguably that might be something you cannot test as I don't know what NICs you have access to and if you have any that are using those calls.