From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B2912CC6B35 for ; Thu, 2 Apr 2026 09:01:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D57456B0088; Thu, 2 Apr 2026 05:01:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D2ECE6B0089; Thu, 2 Apr 2026 05:01:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C44C26B008A; Thu, 2 Apr 2026 05:01:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id B35926B0088 for ; Thu, 2 Apr 2026 05:01:09 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 54873C20CF for ; Thu, 2 Apr 2026 09:01:09 +0000 (UTC) X-FDA: 84613021458.24.4270A26 Received: from out-184.mta0.migadu.com (out-184.mta0.migadu.com [91.218.175.184]) by imf29.hostedemail.com (Postfix) with ESMTP id 0F3EB12000E for ; Thu, 2 Apr 2026 09:01:06 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=XMoeXPTV; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf29.hostedemail.com: domain of hao.li@linux.dev designates 91.218.175.184 as permitted sender) smtp.mailfrom=hao.li@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1775120467; a=rsa-sha256; cv=none; b=N8lvGrPgnAv1a7ylzcgYUTW1GlDkm3iTVBkSWh7YtkV8QyAPnGF+bbhkeeh7ItRL1hTOEe t1FmSUHJp+6gNYKfGuPYwc4fwS5Xlr/4WUYXVDdf776VDOen+zvqx3nrnuvDCN6LQ6SB+e 2l6P3cC+nzzdrtkBJqQRpuRmmSzckZk= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=XMoeXPTV; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf29.hostedemail.com: domain of hao.li@linux.dev designates 91.218.175.184 as permitted sender) smtp.mailfrom=hao.li@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1775120467; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=OdtWjXiRlBkB7nKEXP3cSbCZsVWVuOVA4JJ6xkb7YsQ=; b=nDcbXmVcuI2QYG7G8qRMC5gWKH/nG1g914YcnBSeQ4LNejNMGLa6fhHCEpln/ltllmEI2t XPohHN+AbWarEDWvGc3TpOJ99IynMrPMW9NLe6d0aFXZw+Rm8fNZh/MxW9j7Xy+DYAHmSf Txji5btn3ElRNIhNirt0AU4T8bA1nTI= Date: Thu, 2 Apr 2026 17:00:26 +0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1775120454; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=OdtWjXiRlBkB7nKEXP3cSbCZsVWVuOVA4JJ6xkb7YsQ=; b=XMoeXPTVwcCUTNEk8ka56+erJ+NWJW1vsTU3fdi67noxzwABtyFIDq8EVK0znbC+w/yuIC rfZNPnUf2o/fzZ5GqMmaCxMspG9XUQjQsFht4MdwQFHP6TrWRdze7nxVmbJ8fhWsnJJHTc tD/mWKGwAqeiaDfBgeA7hQY4eAQkxzg= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Hao Li To: "Harry Yoo (Oracle)" Cc: hu.shengming@zte.com.cn, vbabka@kernel.org, akpm@linux-foundation.org, cl@gentwo.org, rientjes@google.com, roman.gushchin@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, zhang.run@zte.com.cn, xu.xin16@zte.com.cn, yang.tao172@zte.com.cn, yang.yang29@zte.com.cn Subject: Re: [PATCH v2] mm/slub: skip freelist construction for whole-slab bulk refill Message-ID: References: <20260402150310775aOAcX92pJLmjcUIRoWFER@zte.com.cn> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Stat-Signature: a5j4pe1dmbp11buiii6zghbiyh9yihj6 X-Rspamd-Queue-Id: 0F3EB12000E X-Rspam-User: X-Rspamd-Server: rspam03 X-HE-Tag: 1775120466-985503 X-HE-Meta: U2FsdGVkX18hXTpkH85oSXfcpEgtYllp3xovCgDqkT7Ov4i1/Hku3aCP1KEDSa+aJNfVsj1X2RcHu1HKc1tJZDEw7ErOYzERrar27nv9yv+DmMAJxeenVU9AUcMjaXzVC5z5EugM6abXR+O1tyyPEt3inMltZHS7AYxlueUfAHRBRVk1EjbcJnq5/O+NMS8hRR/AhxEXIAIZhGTsVKwg3qUY/Ov3Hbmy5LHsxqFDF7J7ykJ+4nXI9F8qNp53RgDb6aht/yYPCL6tUCurLsl0VXzRvf8g9oT3/vZrnennJ1Xt827b0wi56MlWendSd3xfMymzje5WJrNv/SiiMq+y2oS6gZ+sp3Mdw/dJIX8QQxYKryGfUvgsXHuOcnp2l1lYGwdlZu6lKL/h/+dS8BMLFkHG2LGzHqdojcjljr1ALUnRUpK2CE6QYmc0cdRLoTJbJEn6EkUFiJqNNpqR9oMkznorv61Jh907izqeM3+icu7rWJ8NoWxsmtXLcMZtj0hXrKpgOjO0Pq332dm530IoyAllp987m6Pa0cOCQ5oC8GDVUrQdHFIS76to8LyUiKk5zQRocS/APz/sYL8WHao69Ei43U+wpJ6+gG9h1LolTpd0IbN0xZCamA+EWHZJXUMho0OkvnNcJd9DFVVNmaZweARPhtV9dTjrRGUWRL1Ryi6w6B6IMlVnmMI/WQv0VrbKLk2TcLTj7bMK2Qu5S3vEQ11BCDXExaAMsHFjTtxPFmo1t4OalTGtx/bKLYKMF4w2Ukt8KLXysojB45ABsokPhEf3Vc9MMbJsIXt7THI8nvZkXJ3UwPQ2tJ/NzRmt5ZiTduIpOC5XQnKBmwWlSaIRRiWYCG89aSebMNnd9Z53O7uqmU1p9cJQmihLSI3f+lYm6oHrIeGnT/QD24DByFVAc1g1PTdyrpp+13o2f3gdXg0p2C7OhTEMLGGDPn1v88/msagj9nqTqf33NqmGQ2x pplVmFtb na6zPCsUIJ+Bfax7euhwJhOfyyl6nvkdI5d4pK95f/cgyXKAfqsR201SCg8kqqOUE1KIDTFVFtWYx3wPRANpR25Rl7+w59pbh/OMjDrfvssXeViJ6ZGuZpJQaousWObfY2mrIHrkzVZAK+oVIym4+kBcywB3I5jpQJUPWfcTxZKnrW7sofVXX7VuxJ8oFxNtEasEykRgonhEdsYgX6QLGYhI4LPD15MPEMRT85Dp6DPt5QE5deAArsm6cTBbYZiJzJvNxP1Ps7wICCVaHI2oYJcrAaKQcGUGPS+WHLPpsLaLQf/8= Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Apr 02, 2026 at 05:12:23PM +0900, Harry Yoo (Oracle) wrote: > On Thu, Apr 02, 2026 at 03:03:10PM +0800, hu.shengming@zte.com.cn wrote: > > Harry wrote: > > > On Wed, Apr 01, 2026 at 02:55:23PM +0800, Hao Li wrote: > > > > On Wed, Apr 01, 2026 at 12:57:25PM +0800, hu.shengming@zte.com.cn wrote: > > > > > @@ -4395,6 +4458,48 @@ static unsigned int alloc_from_new_slab(struct kmem_cache *s, struct slab *slab, > > > > > return allocated; > > > > > } > > > > > +static unsigned int alloc_whole_from_new_slab(struct kmem_cache *s, > > > > > + struct slab *slab, void **p, bool allow_spin) > > > > > +{ > > > > > + > > > > > + unsigned int allocated = 0; > > > > > + void *object, *start; > > > > > + > > > > > + if (alloc_whole_from_new_slab_random(s, slab, p, allow_spin, > > > > > + &allocated)) { > > > > > + goto done; > > > > > + } > > > > > + > > > > > + start = fixup_red_left(s, slab_address(slab)); > > > > > + object = setup_object(s, start); > > > > > + > > > > > + while (allocated < slab->objects - 1) { > > > > > + p[allocated] = object; > > > > > + maybe_wipe_obj_freeptr(s, object); > > > > > + > > > > > + allocated++; > > > > > + object += s->size; > > > > > + object = setup_object(s, object); > > > > > + } > > > > > > > > Also, I feel the current patch contains some duplicated code like this loop. > > > > > > > > Would it make sense to split allocate_slab() into two functions? > > > > > > > > For example, > > > > the first part could be called allocate_slab_meta_setup() (just an example name) > > > > And, the second part could be allocate_slab_objects_setup(), with the core logic > > > > being the loop over objects. Then allocate_slab_objects_setup() could support > > > > two modes: one called BUILD_FREELIST, which builds the freelist, and another > > > > called EMIT_OBJECTS, which skips building the freelist and directly places the > > > > objects into the target array.> > > > > > Something similar but a little bit more thoughts to unify the code > > > (**regardless of CONFIG_SLAB_FREELIST_RANDOM**) and avoid treating > > > "the whole slab->freelist fits into the sheaf" as a special case:> > > > > > - allocate_slab() no longer builds the freelist. > > > the freelist is built only when there are objects left after > > > allocating objects from the new slab.> > > > > > - new_slab() allocates a new slab AND builds the freelist > > > to keep existing behaviour.> > > > > > - refill_objects() allocates a slab using allocate_slab(), > > > and passes it to alloc_from_new_slab().> > > > > > alloc_from_new_slab() consumes some objects in random order, > > > and then build the freelist with the objects left (if exists).> > > > > > We could actually abstract "iterating free objects in random order" > > > into an API, and there would be two users of the API: > > > - Building freelist > > > - Filling objects into the sheaf (without building freelist!)> > > > > > Something like this... > > > (names here are just examples, I'm not good at naming things!)> > > > > > struct freelist_iter { > > > int pos; > > > int freelist_count; > > > int page_limit; > > > void *start; > > > };> > > > > > /* note: handling !allow_spin nicely is tricky :-) */ > > > alloc_from_new_slab(...) { > > > struct freelist_iter fit;> > > > > > prep_freelist_iter(s, slab, &fit, allow_spin); > > > while (slab->inuse < min(count, slab->objects)) { > > > p[slab->inuse++] = next_freelist_entry(s, &fit); > > > }> > > > > > if (slab->inuse < slab->objects) > > > build_freelist(s, slab, &fit); > > > }> > > > > > build_freelist(s, slab, fit) { > > > size = slab->objects - slab->inuse;> > > > > > cur = next_freelist_entry(s, fit); > > > cur = setup_object(s, cur); > > > slab->freelist = cur; > > > for (i = 1; i < size; i++) { > > > next = next_freelist_entry(s, fit); > > > next = setup_object(s, next); > > > set_freepointer(s, cur, next); > > > cur = next; > > > } > > > }> > > > > > #ifdef CONFIG_SLAB_FREELIST_RANDOM > > > prep_freelist_iter(s, slab, fit, allow_spin) { > > > fit->freelist_count = oo_objects(s->oo); > > > fit->page_limit = slab->objects * s->size; > > > fit->start = fixup_red_left(s, slab_address(slab));> > > > > > if (slab->objects < 2 || !s->random_seq) { > > > fit->pos = 0; > > > } else if (allow_spin) { > > > fit->pos = get_random_u32_below(freelist_count); > > > } else { > > > struct rnd_state *state;> > > > > > /* > > > * An interrupt or NMI handler might interrupt and change > > > * the state in the middle, but that's safe. > > > */ > > > state = &get_cpu_var(slab_rnd_state); > > > fit->pos = prandom_u32_state(state) % freelist_count; > > > put_cpu_var(slab_rnd_state); > > > }> > > > > > return; > > > } > > > next_freelist_entry(s, fit) { > > > /* > > > * If the target page allocation failed, the number of objects on the > > > * page might be smaller than the usual size defined by the cache. > > > */ > > > do { > > > idx = s->random_seq[fit->pos]; > > > fit->pos += 1; > > > if (fit->pos >= freelist_count) > > > fit->pos = 0; > > > } while (unlikely(idx >= page_limit));> > > > > > return (char *)start + idx; > > > } > > > #else > > > prep_freelist_iter(s, slab, fit, allow_spin) { > > > fit->pos = 0; > > > return; > > > } > > > next_freelist_entry(s, fit) { > > > void *next = fit->start + fit->pos * s->size;> > > > > > fit->pos++; > > > return next; > > > } > > > #endif> > > > > Hi Harry, > > > > Thanks a lot for the detailed suggestion. This is a very good direction for > > restructuring refill_objects(). > > > > I agree that abstracting the free-object iteration and making the flow uniform > > regardless of CONFIG_SLAB_FREELIST_RANDOM is a cleaner approach than keeping > > the “whole slab fits into the sheaf” case as a special path. > > > > Your idea of letting alloc_from_new_slab() consume objects first and only > > build the freelist for the remainder makes a lot of sense, > Hi, Harry and Shengming I just finished understanding the new detailed code skeleton, and I think the idea looks really promising. It seems to unify the handling of shuffle and non-shuffle cases, while also naturally covering scenarios where a new slab is not fully consumed. It looks very nice. Let's see how v3 performs going forward. > I believe Hao is working on trying to allow consuming all objects (from new > and partial slabs) to fill sheaves when possible, > Yeah, I've had some work going on in that area too, and I'm trying to keep it from stepping on the changes in the current patch :) > but it'd still be nice to > do this as long as it keeps the implementation simple. Yes, exactly. > > > and should also help reduce the duplicated object-setup logic. > > Yeah, less code means less tokens, so better for the environm... oh wait, > April Fools day is over! (just joking). Oh, I didn't even realize yesterday was April Fools' Day. I guess my clock must be broken :P > > > I’ll rework the patch along these lines, incorporating your and Hao suggestions, > > and send a v3. > > Thanks for working on this. > > > Thanks again for the thoughtful review. > > No problem! > > -- > Cheers, > Harry / Hyeonggon -- Thanks, Hao