From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B5ED4C001B0 for ; Tue, 8 Aug 2023 15:11:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1BEA56B0071; Tue, 8 Aug 2023 11:11:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 16F176B0074; Tue, 8 Aug 2023 11:11:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0370A8D0001; Tue, 8 Aug 2023 11:11:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id E49746B0071 for ; Tue, 8 Aug 2023 11:11:16 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 8BB80120CDE for ; Tue, 8 Aug 2023 15:11:16 +0000 (UTC) X-FDA: 81101275752.04.799451E Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf06.hostedemail.com (Postfix) with ESMTP id 90C57180023 for ; Tue, 8 Aug 2023 15:08:53 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=w4qehdrY; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=2TXngQdy; spf=pass (imf06.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.29 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691507334; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tWcv16REQqHhPYwN0ug3QkMuy7/75HPhpbQTjYO3Z7Q=; b=JMktKw2+qOe1mWyMUQag66tlL6Q7QIFREllfikaL+bQ0LLiHu71DSPwpO+o2CWuFGpJW2S d23Pag/IZRkxmq9gxvuJCwPs8tji0ymtfJMmiBMVzVHEDb4pRJY8b4+LHAzmugrn8VSOuU 8hTzU+zEaUghhPEZz9lIws6blIbWFcA= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=w4qehdrY; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=2TXngQdy; spf=pass (imf06.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.29 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691507334; a=rsa-sha256; cv=none; b=kZJIB49C5tMbYQO0y0nrhuzWTfy1NVMVxpwkc8xC+btYd218mWuFSRVIM50DbxyrxyYkYz F62x1Z0pnCzlK1fDpgErciGEyv7a+rKgcfWG5c+1J7KOIIv/qr3y+fl1Ge6nw+rXCIQkWO 01AEVmMVp1c9jimg6Zob8gicZ5qYnco= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 5B1121F853; Tue, 8 Aug 2023 15:08:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1691507331; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tWcv16REQqHhPYwN0ug3QkMuy7/75HPhpbQTjYO3Z7Q=; b=w4qehdrYb5omMlVFfcrdUyXjnBMrr8yT00CU44ryYSj84vzc1fmI59dlRuJXtFPUGxJFhm GOrWu85ocY0Vi44NjEo+W4YV4MZTRQA1NFkYtJz942FnpMENYJpDfAzWElbzbyjpqtvYhM hzv7DMyTaqYjApsEorce9U8VYb8UiFw= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1691507331; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tWcv16REQqHhPYwN0ug3QkMuy7/75HPhpbQTjYO3Z7Q=; b=2TXngQdyVXn/f+KSSeli81VcHWr4fxTMxkKg6zvUyCvB/CRJCToVIMb+F9pTwTd4S0Asv2 mJxxhM7EYUfgahAw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 1DC37139D1; Tue, 8 Aug 2023 15:08:51 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id jEd2BoNa0mSESQAAMHmgww (envelope-from ); Tue, 08 Aug 2023 15:08:51 +0000 Message-ID: Date: Tue, 8 Aug 2023 17:08:50 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.14.0 Subject: Re: [RFC v1 4/5] maple_tree: avoid bulk alloc/free to use percpu array more Content-Language: en-US To: "Liam R. Howlett" , Peng Zhang , Hyeonggon Yoo <42.hyeyoo@gmail.com>, Joonsoo Kim , Pekka Enberg , David Rientjes , Christoph Lameter , Roman Gushchin , linux-mm@kvack.org, linux-kernel@vger.kernel.org, patches@lists.linux.dev, Matthew Wilcox References: <20230808095342.12637-7-vbabka@suse.cz> <20230808095342.12637-11-vbabka@suse.cz> <853af8fa-0cef-b00b-3fd6-9780a2008050@bytedance.com> <20230808142945.tulcze5bjg5ciftk@revolver> From: Vlastimil Babka In-Reply-To: <20230808142945.tulcze5bjg5ciftk@revolver> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 90C57180023 X-Rspam-User: X-Stat-Signature: 7gso953d58s5jgjacx38cq9r8digob7u X-Rspamd-Server: rspam01 X-HE-Tag: 1691507333-330233 X-HE-Meta: U2FsdGVkX19HhsIwZWiTHD3B1NmAzpLMiFjNmr2OjoNe8gTpoD0N4yVn9eVibOp9cd2udio38F1ZfyPAuuc/n02kGM9c8vtfKRWu52E8la+TjaQK1dgDKJbUQA0/PMT2PRvf5lpncap/e5brzoEo8jHZ+kGeQtkIH7UG9EThaUARMBojG6wJBjPk4AydUcBcPFt03RI4yy9WJZMJFj1YT0ZYewirZu8HrSZ5BjidphoxnBzbTDMv48odvP28Fvnx4WmFIqcg5wgKRncJwAb8JaZ4iFIx7XkFNWfag84anY6ajLRO4XcgW6XeoBvIFSOVkYQiMeP0rvVkB4z5o289sAOHtlMxwu98KNWoA+rTJusHbjvqFalyPnXpButEhAmxsiXOdeYSCiV8DQVbhva6avVLkVQVtGte4BTES/yV8hzQsnCfnwU9UslM2AP1J6MX5kK+BdgMJewXdTEG42ux5Pq3c8cXwarqArLXmeeBlqJBeDEg0+WHWuZP9JPVDAcfzxJtoA4gGlBNbzG5MuumTGfHBbsXJomMNNFZlHwbnDh8l2nzDgQDILQMXQA6yKETWYQm2GS8F+1UxOn0/mm/VsUiB+4ZvOebyjHTXYXTDbfnbmp1Tnv7yfNI/0F9FwnILIOX/4ciCVUXjQos0P9V2a55bNkC4TF5sb2lEaqOvm9gD1vsSsuZdfecUytfrYR3ZGVsup8u1Ye4NEy23aL3SobdZa4mUK25EBmBlQKbQ7S3ZXgpw5U7HAFDe+T3OHfvAqyJmhHmQQuBn5/BgMFWkKqJ+9tOyl4d0v7di7MepXxVnHfX4ec0C32HJFwDm/Eb7nVEJDm3t16jnE63aduLmPd53p/wAI42YCT/Z5DSKzcaYpoNG1Bm6TQsi8jJOptDjQKoxan9d3AAW5Dn5PxUt5gdz6/LFOhlnJMDYpEHgf0uanAth+w932zCOrAGu8RL4ElYOhmF8V/ic3IOuPp eRFsdB45 KninvqSr5Ly/1gJlLDzd/Vn3U80YWuXDa9M40Sv6T31aegZ7Wzd1iAhI6wFUGapaFM8GspfZGn1xMwKaCPU6FXAUmoK2gNv6a263/U+OT9LYtPFgyJBDqQBvmDbFpwbx4ecytGehfpa20XNE6gEDqnDoQu5C/1NcSzFOYnlt0VYDXN/fIGCSRjWR26aVP148fuLKK/c2VAbRf4Yf0QM56sMrlch7hj5x3Jw+JscN12Mx3mhiuz+SHQWQyR2e1Ul4agYPTX1rpjrAk7jnpwoi2clLCoZd5VG65YwVztlivrY8//nTVNgl+LGFBSrtnWdcw89TwANTEK1W9o8i7BRkBxX+jtX61hReqIhlLX2r4sSwT5hq3gi0VY99ScA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 8/8/23 16:29, Liam R. Howlett wrote: > * Peng Zhang [230808 07:17]: >> >> >> 在 2023/8/8 17:53, Vlastimil Babka 写道: >> > Using bulk alloc/free on a cache with percpu array should not be >> > necessary and the bulk alloc actually bypasses the array (the prefill >> > functionality currently relies on this). >> > >> > The simplest change is just to convert the respective maple tree >> > wrappers to do a loop of normal alloc/free. >> > --- >> > lib/maple_tree.c | 11 +++++++++-- >> > 1 file changed, 9 insertions(+), 2 deletions(-) >> > >> > diff --git a/lib/maple_tree.c b/lib/maple_tree.c >> > index 1196d0a17f03..7a8e7c467d7c 100644 >> > --- a/lib/maple_tree.c >> > +++ b/lib/maple_tree.c >> > @@ -161,12 +161,19 @@ static inline struct maple_node *mt_alloc_one(gfp_t gfp) >> > static inline int mt_alloc_bulk(gfp_t gfp, size_t size, void **nodes) >> > { >> > - return kmem_cache_alloc_bulk(maple_node_cache, gfp, size, nodes); >> > + int allocated = 0; >> > + for (size_t i = 0; i < size; i++) { >> > + nodes[i] = kmem_cache_alloc(maple_node_cache, gfp); >> > + if (nodes[i]) >> If the i-th allocation fails, node[i] will be NULL. This is wrong. We'd >> better guarantee that mt_alloc_bulk() allocates completely successfully, >> or returns 0. The following cases are not allowed: >> nodes: [addr1][addr2][NULL][addr3]. Thanks, indeed. I guess it should just break; in case of failure and return how many allocations succeeded so far. But note this is a really a quick RFC proof of concept hack. I'd expect if the whole idea is deemed as good, the maple tree node handling could be redesigned (simplified?) around it and maybe there's no mt_alloc_bulk() anymore as a result? > Thanks for pointing this out Peng. > > We can handle a lower number than requested being returned, but we > cannot handle the sparse data. > > The kmem_cache_alloc_bulk() can return a failure today - leaving the > array to be cleaned by the caller, so if this is changed to a full > success or full fail, then we will also have to change the caller to > handle whatever state is returned if it differs from > kmem_cache_alloc_bulk(). > > It might be best to return the size already allocated when a failure is > encountered. This will make the caller, mas_alloc_nodes(), request more > nodes. Only in the case of zero allocations would this be seen as an > OOM event. > > Vlastimil, Is the first kmem_cache_alloc() call failing a possibility? Sure, if there's no memory, it can fail. In practice if gfp is one that allows reclaim, it will ultimately be the "too small to fail" allocation on the page allocator level. But there are exceptions, like having received a fatal signal, IIRC :) > If so, what should be the corrective action? Depends on your context, if you can pass on -ENOMEM to the caller, or need to succeed. >> > + allocated++; >> > + } >> > + return allocated; >> > } >> > static inline void mt_free_bulk(size_t size, void __rcu **nodes) >> > { >> > - kmem_cache_free_bulk(maple_node_cache, size, (void **)nodes); >> > + for (size_t i = 0; i < size; i++) >> > + kmem_cache_free(maple_node_cache, nodes[i]); >> > } >> > static void mt_free_rcu(struct rcu_head *head)