From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 58C1FC2BA1E for ; Mon, 6 Apr 2020 12:57:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 013722223F for ; Mon, 6 Apr 2020 12:57:05 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="MlzHeSeJ" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 013722223F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 867E28E0011; Mon, 6 Apr 2020 08:57:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 83EE28E000D; Mon, 6 Apr 2020 08:57:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 72D6B8E0011; Mon, 6 Apr 2020 08:57:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 5D6FE8E000D for ; Mon, 6 Apr 2020 08:57:05 -0400 (EDT) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 20914180AD804 for ; Mon, 6 Apr 2020 12:57:05 +0000 (UTC) X-FDA: 76677430410.15.crate43_780093b74b157 X-HE-Tag: crate43_780093b74b157 X-Filterd-Recvd-Size: 12799 Received: from mail-lj1-f196.google.com (mail-lj1-f196.google.com [209.85.208.196]) by imf09.hostedemail.com (Postfix) with ESMTP for ; Mon, 6 Apr 2020 12:57:04 +0000 (UTC) Received: by mail-lj1-f196.google.com with SMTP id f20so14590311ljm.0 for ; Mon, 06 Apr 2020 05:57:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:date:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=cavs1/zGwwnVa6t26NTgNXxld8dwURYLnlC3kKmNPVE=; b=MlzHeSeJiXddSl5dFJOUI5UdB3m7v87YwkYfneeEzZjv/WNsmfVwt22mKXD8CVaBAo sDBSLjQAhBNPoToh3HSqWvefi7lotRVnZG+aIWHbfCL+B8DBBdDPPBNBs5ziRvLN/bGu FA0isNdqRWYdUpAIiZYApNIf/5vkq7J1FUrrvXmz9pEKmUteXVbwkcjxP5uDn8Gpc2uR bYomTJphL6BKM9dadVqKQ+B4bENIuXUxpfczulp+y53xhCLZzL5/WVRR7j7x78D+7TBh URSgPhx9zB3utPE1Z8G41ixtFkXBuXgFjlEK1q5WSmTAU7M0o485lDjxH9MZVi5NN6EI Rtjw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:date:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=cavs1/zGwwnVa6t26NTgNXxld8dwURYLnlC3kKmNPVE=; b=gAwWIGGC6NXjznguFV78JNNzHQYO780Y/AaF4B6ajIip8ZceUn6b3M0TdYcB/Aw0Dq yjymFO9h8Pfh3i+dvrt8Wfmow5pAL33Xqt79IYmZ4LGf19nj2c0WA+GIDqIPzKuPh3fe SOobDaZxqQTIQHjHE9qaA3XBfOpahJbr0nodIzl6at1SQNdyMK0Q0qGhLKojQThf2u/u 8b9eiOrIp5VuSvSMBzBRwk8SMflBYjvkNS40x9nXmXVjr6fNsG7ViHcfV38RqnC4whlf 6AhTXnX63A0ySd2i/PA7StMr3vESdFt0DJ5+0iT+XBRKMF82vFzChrqNn92c5y3zEJ7O PUjw== X-Gm-Message-State: AGi0Pub4uIlJIjDW1bav6v6TBsnpxSsDJDhLgSlZ8xasB4dB8V/MtlGa j4hbiPEXGRQ8vXySEttnEUg= X-Google-Smtp-Source: APiQypLb/wS/+PldeRTi1ZCDnDkWev0Kd4BHF9Rey/aX0YbMPntJYEGMQ0I2w6nPQ238lpBBY+py6g== X-Received: by 2002:a2e:854e:: with SMTP id u14mr11600256ljj.182.1586177822854; Mon, 06 Apr 2020 05:57:02 -0700 (PDT) Received: from pc636 (h5ef52e31.seluork.dyn.perspektivbredband.net. [94.245.46.49]) by smtp.gmail.com with ESMTPSA id y23sm9832096ljh.42.2020.04.06.05.56.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 Apr 2020 05:57:01 -0700 (PDT) From: Uladzislau Rezki X-Google-Original-From: Uladzislau Rezki Date: Mon, 6 Apr 2020 14:56:40 +0200 To: Joel Fernandes , "Paul E . McKenney" Cc: Uladzislau Rezki , LKML , "Paul E . McKenney" , RCU , linux-mm@kvack.org, Andrew Morton , Steven Rostedt , Oleksiy Avramchenko Subject: Re: [PATCH 1/1] rcu/tree: add emergency pool for headless case Message-ID: <20200406125640.GA23256@pc636> References: <20200403173051.4081-1-urezki@gmail.com> <20200404195129.GA83565@google.com> <20200405172105.GA7539@pc636> <20200405233028.GC83565@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200405233028.GC83565@google.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hello, Joel. > > > > > > Hi Vlad, > > > > > > One concern I have is this moves the problem a bit further down. My belief is > > > we should avoid the likelihood of even needing an rcu_head allocated for the > > > headless case, to begin with - than trying to do damage-control when it does > > > happen. The only way we would end up needing an rcu_head is if we could not > > > allocate an array. > > > > > Let me share my view on all such caching. I think that now it becomes less as > > the issue, because of we have now https://lkml.org/lkml/2020/4/2/383 patch. > > I see that it does help a lot. I tried to simulate low memory condition and > > apply high memory pressure with that. I did not manage to trigger the > > "synchronize rcu" path at all. It is because of using much more permissive > > parameters when we request a memory from the SLAB(direct reclaim, etc...). > > That's a good sign that we don't hit this path in your tests. > Just one request, of course if you have a time :) Could you please double check on your test environment to stress the system to check if you also can not hit it? How i test it. Please apply below patch: t a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 5e26145e9ead..25f7ac8583e1 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -3203,6 +3203,7 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func) if (head) { ptr = (void *) head - (unsigned long) func; + head = NULL; } else { /* * Please note there is a limitation for the head-less @@ -3233,16 +3234,18 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func) * Under high memory pressure GFP_NOWAIT can fail, * in that case the emergency path is maintained. */ - success = kvfree_call_rcu_add_ptr_to_bulk(krcp, ptr); - if (!success) { + /* success = kvfree_call_rcu_add_ptr_to_bulk(krcp, ptr); */ + /* if (!success) { */ /* Is headless object? */ if (head == NULL) { /* Drop the lock. */ krc_this_cpu_unlock(krcp, flags); head = attach_rcu_head_to_object(ptr); - if (head == NULL) + if (head == NULL) { + success = false; goto inline_return; + } /* Take it back. */ krcp = krc_this_cpu_lock(&flags); @@ -3267,7 +3270,7 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func) */ expedited_drain = true; success = true; - } + /* } */ WRITE_ONCE(krcp->count, krcp->count + 1); @@ -3297,7 +3300,9 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func) if (!rcu_kfree_nowarn) WARN_ON_ONCE(1); debug_rcu_head_unqueue(ptr); - synchronize_rcu(); + /* synchronize_rcu(); */ + printk(KERN_ERR "-> hit synchronize_rcu() path.\n"); + trace_printk("-> hit synchronize_rcu() path.\n"); kvfree(ptr); } } lower the memory size and run kfree rcu tests. It would be appreciated. > > I guess also, with your latest patch on releasing the lock to be in a > non-atomic context, and then doing the allocation, it became even more > permissive? If you drop that patch and tried, do you still not hit the > synchronous path more often? > Yep. If i drop the patch, i can hit it. > > Could you also try introducing memory pressure by reducing your system's > total memory and see how it behaves? > I simulated low memory condition by setting the system memory to 145MB. That was the minimum amount the KVM system was capable of booting. After that i used kfree rcu tests to simulate memory pressure. > > > So instead of adding a pool for rcu_head allocations, how do you feel about > > > pre-allocation of the per-cpu cache array instead, which has the same effect > > > as you are intending? > > > > > In the v2 i have a list of such objects. It is also per-CPU(it is scaled to CPUs), > > but the difference is, those objects require much less memory, it is 8 + sizeof(struct > > rcu_head) bytes comparing to one page. Therefore the memory footprint is lower. > > Yes, true. That is one drawback is it higher memory usage. But if you have at > least 1 kfree_rcu() request an each CPU, then pre-allocation does not > increase memory usage any more than it already has right now. Unless, we > extend my proposal to cache more than 2 pages per-cpu which I think you > mentioned below. > If we cache two pages per-CPU, i think that is fine. When it comes to increasing it, it can be a bit wasting. For example consider 128 CPUs system. > > I have doubts that we would ever hit this emergency list, because of mentioned > > above patch, but from the other hand i can not say and guarantee 100%. Just in > > case, we may keep it. > > You really have to hit OOM in your tests to trigger it I suppose. Basically > the emergency pool improves situation under OOM, but otherwise does not > improve it due to the direct-reclaim that happens as you mentioned. Right? > See above how i simulated it. Direct reclaim is our GFP_KERNEL + other flags helper. If even after reclaim process there is no memory, then emergency list is supposed to be used. But we can drop this patch, i mean "emergency list" if we agree on it. The good point would be if you could stress your system by the i did. See above description :) > > Paul, could you please share your view and opinion? It would be appreciated :) > > > > > This has 3 benefits: > > > 1. It scales with number of CPUs, no configuration needed. > > > 2. It makes the first kfree_rcu() faster and less dependent on an allocation > > > succeeding. > > > 3. Much simpler code, no new structures or special handling. > > > 4. In the future we can extend it to allocate more than 2 pages per CPU using > > > the same caching mechanism. > > > > > > The obvious drawback being its 2 pages per CPU but at least it scales by > > > number of CPUs. Something like the following (just lightly tested): > > > > > > ---8<----------------------- > > > > > > From: "Joel Fernandes (Google)" > > > Subject: [PATCH] rcu/tree: Preallocate the per-cpu cache for kfree_rcu() > > > > > > In recent changes, we have made it possible to use kfree_rcu() without > > > embedding an rcu_head in the object being free'd. This requires dynamic > > > allocation. In case dynamic allocation fails due to memory pressure, we > > > would end up synchronously waiting for an RCU grace period thus hurting > > > kfree_rcu() latency. > > > > > > To make this less probable, let us pre-allocate the per-cpu cache so we > > > depend less on the dynamic allocation succeeding. This also has the > > > effect of making kfree_rcu() slightly faster at run time. > > > > > > Signed-off-by: Joel Fernandes (Google) > > > --- > > > kernel/rcu/tree.c | 5 +++++ > > > 1 file changed, 5 insertions(+) > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > index 6172e6296dd7d..9fbdeb4048425 100644 > > > --- a/kernel/rcu/tree.c > > > +++ b/kernel/rcu/tree.c > > > @@ -4251,6 +4251,11 @@ static void __init kfree_rcu_batch_init(void) > > > krcp->krw_arr[i].krcp = krcp; > > > } > > > > > > + krcp->bkvcache[0] = (struct kvfree_rcu_bulk_data *) > > > + __get_free_page(GFP_NOWAIT | __GFP_NOWARN); > > > + krcp->bkvcache[1] = (struct kvfree_rcu_bulk_data *) > > > + __get_free_page(GFP_NOWAIT | __GFP_NOWARN); > > > + > > > INIT_DELAYED_WORK(&krcp->monitor_work, kfree_rcu_monitor); > > > krcp->initialized = true; > > > } > > We pre-allocate it, but differently comparing with your proposal :) I do not see > > how it can improve things. The difference is you do it during initializing or booting > > phase. In case of current code it will pre-allocate and cache one page after first > > calling of the kvfree_call_rcu(), say in one second. So basically both variants are > > the same. > > Well, one proposal is only 5 lines extra ;-). That has got to be at least a > bit appealing ;-) ;-). > :) > > But i think that we should allow to be used two pages as cached ones, no matter > > whether it is vmalloc ptrs. or SLAB ones. So basically, two cached pages can be > > used by vmalloc path and SLAB path. And probably it makes sense because of two > > phases: one is when we collect pointers, second one is memory reclaim path. Thus > > one page per one phase, i.e. it would be paired. > > You are saying this with regard to my proposal right? I agree number of > pages could be increased. The caching mechanism already in-place could be > starting point for that extension. > We already have two pages. What we need is to allow to use them in both paths, vmalloc one and SLAB one, i mean reclaim path. At least that fits well to the collecting/reclaim phases. Thanks for comments! -- Vlad Rezki