From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 24191C433EF for ; Fri, 25 Feb 2022 09:50:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6A3BA8D0002; Fri, 25 Feb 2022 04:50:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 652368D0001; Fri, 25 Feb 2022 04:50:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 51B458D0002; Fri, 25 Feb 2022 04:50:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id 44ACC8D0001 for ; Fri, 25 Feb 2022 04:50:21 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 14C6661A02 for ; Fri, 25 Feb 2022 09:50:21 +0000 (UTC) X-FDA: 79180831842.14.2B873C1 Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) by imf23.hostedemail.com (Postfix) with ESMTP id 86CF3140005 for ; Fri, 25 Feb 2022 09:50:20 +0000 (UTC) Received: by mail-pl1-f176.google.com with SMTP id c9so4379530pll.0 for ; Fri, 25 Feb 2022 01:50:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=lVYEDM7mRxcxwGeZJvM0aYBXbHL/p5MWq+1vleilcGw=; b=LG/BKIioPCw8nU8uuRNHApyCxdSkvKvfZ2KmyteumucA/LZuzGgjxNSy/T4A02Yu9Q +3W0M35EKopE0FD7TN9FIaBfqOUXeQLHk1/oS9AGQEcKel8lasWstV+4ymyFHxVlTBOG 36wVjc02qyu49/cOtK+tyWBOPoY+w5j1jvAO79mF2Wj3TLndosYGlJMHz5w8+sS48nb5 KMttN9tYOJX44loACKcaBG/72aVejdkz2oyRO+8aNtAVlAaMu2FHJqZQ65sL+PDUDMbR O4XWkWylWIfe8nbzLsXyxjq52n6syCNuEDoOpLFsOnamAq/5yHtEqymjehAtsrn2zVqa ITRQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=lVYEDM7mRxcxwGeZJvM0aYBXbHL/p5MWq+1vleilcGw=; b=IBTJxO/3hworihRW6oHtY7+6kNIb/drer6uGhzQZjlyar/VPcpMby9rD92swTa06i+ m8irVOJ+VlVlfgPyorAiYjVjoYIB+JRvMKfEXz6REVbFdbs8VY3aBvgm/G4XZ4GC/9Xv 5nfvCCCw4srhK1D+Zv7lhmWghs+avscUD9y+tpvNO7GFBpkfs8s2ZZNkXcS0QzP3WQEX W/1coJocFFIW/x2M6w4LIvDEFwQiZKjkB6viOeXLbjc2w1uFw99JlQORj6P6932fwuIK nzDFY+OFTdqJHYcqBRN/H00BaiqhgKQhCm0BQYSpRCdv8Leuz8p6G9yo7KiKcjpjj4HY Pr1w== X-Gm-Message-State: AOAM532++qKKC8MxR8mnsAoGaIqPoVhi3OZzIAqJKy0arrInncT2G+WS XSc3BDTzet9zrouBWiFSPXI= X-Google-Smtp-Source: ABdhPJxWTfQsShcJnnJuTiSyb30pVkSliwmcnkQXUaiKFD5qNUV9gGiuDGeeKru8K+bCsyaoOzn2hw== X-Received: by 2002:a17:902:700b:b0:148:ee33:70fe with SMTP id y11-20020a170902700b00b00148ee3370femr6864054plk.38.1645782619458; Fri, 25 Feb 2022 01:50:19 -0800 (PST) Received: from ip-172-31-19-208.ap-northeast-1.compute.internal (ec2-18-181-137-102.ap-northeast-1.compute.amazonaws.com. [18.181.137.102]) by smtp.gmail.com with ESMTPSA id a20-20020a056a000c9400b004f396b965a9sm2648286pfv.49.2022.02.25.01.50.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 25 Feb 2022 01:50:19 -0800 (PST) Date: Fri, 25 Feb 2022 09:50:14 +0000 From: Hyeonggon Yoo <42.hyeyoo@gmail.com> To: Vlastimil Babka Cc: linux-mm@kvack.org, Roman Gushchin , Andrew Morton , linux-kernel@vger.kernel.org, Joonsoo Kim , David Rientjes , Christoph Lameter , Pekka Enberg Subject: Re: [PATCH 5/5] mm/slub: Refactor deactivate_slab() Message-ID: References: <20220221105336.522086-1-42.hyeyoo@gmail.com> <20220221105336.522086-6-42.hyeyoo@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 86CF3140005 X-Rspam-User: Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="LG/BKIio"; spf=pass (imf23.hostedemail.com: domain of 42.hyeyoo@gmail.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=42.hyeyoo@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Stat-Signature: xzs83tjc3ogk1yy96shdz3en1baod43f X-HE-Tag: 1645782620-985237 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Feb 25, 2022 at 09:34:09AM +0000, Hyeonggon Yoo wrote: > On Thu, Feb 24, 2022 at 07:16:11PM +0100, Vlastimil Babka wrote: > > On 2/21/22 11:53, Hyeonggon Yoo wrote: > > > Simply deactivate_slab() by removing variable 'lock' and replacing > > > 'l' and 'm' with 'mode'. Instead, remove slab from list and unlock > > > n->list_lock when cmpxchg_double() fails, and then retry. > > > > > > One slight functional change is releasing and taking n->list_lock again > > > when cmpxchg_double() fails. This is not harmful because SLUB avoids > > > deactivating slabs as much as possible. > > > > > > Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> > > > > Hm I wonder if we could simplify even a bit more. Do we have to actually > > place the slab on a partial (full) list before the cmpxchg, only to remove > > it when cmpxchg fails? Seems it's to avoid anyone else seeing the slab > > un-frozen, but not on the list, which would be unexpected. However if anyone > > sees such slab, they have to take the list_lock first to start working with > > the slab... so this should be safe, because we hold the list_lock here, and > > will place the slab on the list before we release it. But it thus shouldn't > > matter if the placement happens before or after a successful cmpxchg, no? So > > we can only do it once after a successful cmpxchg and need no undo's? > > > > My thought was similar. But after testing I noticed that &n->list_lock prevents > race between __slab_free() and deactivate_slab(). > > > Specifically AFAIK the only possible race should be with a __slab_free() > > which might observe !was_frozen after we succeed an unfreezing cmpxchg and > > go through the > > "} else { /* Needs to be taken off a list */" > > branch but then it takes the list_lock as the first thing, so will be able > > to proceed only after the slab is actually on the list. > > > > Do I miss anything or would you agree? > > > > It's so tricky. > > I tried to simplify more as you said. Seeing frozen slab on list was not > problem. But the problem was that something might interfere between > cmpxchg_double() and taking spinlock. > > This is what I faced: > > CPU A CPU B > deactivate_slab() { __slab_free() { > /* slab is full */ > slab.frozen = 0; > cmpxchg_double(); > /* Hmm... > slab->frozen == 0 && > slab->freelist != NULL? > Oh This must be on the list.. */ Oh this is wrong. slab->freelist must be NULL because it's full slab. It's more complex than I thought... > spin_lock_irqsave(); > cmpxchg_double(); > /* Corruption: slab > * was not yet inserted to > * list but try removing */ > remove_full(); > spin_unlock_irqrestore(); > } > spin_lock_irqsave(); > add_full(); > spin_unlock_irqrestore(); > } So it was... CPU A CPU B deactivate_slab() { __slab_free() { /* slab is full */ slab.frozen = 0; cmpxchg_double(); /* Hmm... !was_frozen && prior == NULL? Let's freeze this! */ put_cpu_partial(); } spin_lock_irqsave(); add_full(); /* It's now frozen by CPU B and at the same time on full list */ spin_unlock_irqrestore(); And &n->list_lock prevents such a race. > > I think it's quite confusing because it's protecting code, not data. > > Would you have an idea to solve it, or should we add a comment for this? > > > > --- > > > mm/slub.c | 74 +++++++++++++++++++++++++------------------------------ > > > 1 file changed, 33 insertions(+), 41 deletions(-) > > > > > > diff --git a/mm/slub.c b/mm/slub.c > > > index a4964deccb61..2d0663befb9e 100644 > > > --- a/mm/slub.c > > > +++ b/mm/slub.c > > > @@ -2350,8 +2350,8 @@ static void deactivate_slab(struct kmem_cache *s, struct slab *slab, > > > { > > > enum slab_modes { M_NONE, M_PARTIAL, M_FULL, M_FREE }; > > > struct kmem_cache_node *n = get_node(s, slab_nid(slab)); > > > - int lock = 0, free_delta = 0; > > > - enum slab_modes l = M_NONE, m = M_NONE; > > > + int free_delta = 0; > > > + enum slab_modes mode = M_NONE; > > > void *nextfree, *freelist_iter, *freelist_tail; > > > int tail = DEACTIVATE_TO_HEAD; > > > unsigned long flags = 0; > > > @@ -2420,57 +2420,49 @@ static void deactivate_slab(struct kmem_cache *s, struct slab *slab, > > > new.frozen = 0; > > > > > > if (!new.inuse && n->nr_partial >= s->min_partial) > > > - m = M_FREE; > > > + mode = M_FREE; > > > else if (new.freelist) { > > > - m = M_PARTIAL; > > > - if (!lock) { > > > - lock = 1; > > > - /* > > > - * Taking the spinlock removes the possibility that > > > - * acquire_slab() will see a slab that is frozen > > > - */ > > > - spin_lock_irqsave(&n->list_lock, flags); > > > - } > > > - } else { > > > - m = M_FULL; > > > - if (kmem_cache_debug_flags(s, SLAB_STORE_USER) && !lock) { > > > - lock = 1; > > > - /* > > > - * This also ensures that the scanning of full > > > - * slabs from diagnostic functions will not see > > > - * any frozen slabs. > > > - */ > > > - spin_lock_irqsave(&n->list_lock, flags); > > > - } > > > + mode = M_PARTIAL; > > > + /* > > > + * Taking the spinlock removes the possibility that > > > + * acquire_slab() will see a slab that is frozen > > > + */ > > > + spin_lock_irqsave(&n->list_lock, flags); > > > + add_partial(n, slab, tail); > > > + } else if (kmem_cache_debug_flags(s, SLAB_STORE_USER)) { > > > + mode = M_FULL; > > > + /* > > > + * This also ensures that the scanning of full > > > + * slabs from diagnostic functions will not see > > > + * any frozen slabs. > > > + */ > > > + spin_lock_irqsave(&n->list_lock, flags); > > > + add_full(s, n, slab); > > > } > > > > > > - if (l != m) { > > > - if (l == M_PARTIAL) > > > - remove_partial(n, slab); > > > - else if (l == M_FULL) > > > - remove_full(s, n, slab); > > > > > > - if (m == M_PARTIAL) > > > - add_partial(n, slab, tail); > > > - else if (m == M_FULL) > > > - add_full(s, n, slab); > > > - } > > > - > > > - l = m; > > > if (!cmpxchg_double_slab(s, slab, > > > old.freelist, old.counters, > > > new.freelist, new.counters, > > > - "unfreezing slab")) > > > + "unfreezing slab")) { > > > + if (mode == M_PARTIAL) { > > > + remove_partial(n, slab); > > > + spin_unlock_irqrestore(&n->list_lock, flags); > > > + } else if (mode == M_FULL) { > > > + remove_full(s, n, slab); > > > + spin_unlock_irqrestore(&n->list_lock, flags); > > > + } > > > goto redo; > > > + } > > > > > > - if (lock) > > > - spin_unlock_irqrestore(&n->list_lock, flags); > > > > > > - if (m == M_PARTIAL) > > > + if (mode == M_PARTIAL) { > > > + spin_unlock_irqrestore(&n->list_lock, flags); > > > stat(s, tail); > > > - else if (m == M_FULL) > > > + } else if (mode == M_FULL) { > > > + spin_unlock_irqrestore(&n->list_lock, flags); > > > stat(s, DEACTIVATE_FULL); > > > - else if (m == M_FREE) { > > > + } else if (mode == M_FREE) { > > > stat(s, DEACTIVATE_EMPTY); > > > discard_slab(s, slab); > > > stat(s, FREE_SLAB); > > > > -- > Thank you, You are awesome! > Hyeonggon :-) -- Thank you, You are awesome! Hyeonggon :-)