From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EBA9EE77180 for ; Thu, 12 Dec 2024 14:44:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6B4D66B0089; Thu, 12 Dec 2024 09:44:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 665016B008A; Thu, 12 Dec 2024 09:44:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 52D656B008C; Thu, 12 Dec 2024 09:44:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 344986B0089 for ; Thu, 12 Dec 2024 09:44:45 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 87C3DA0D15 for ; Thu, 12 Dec 2024 14:44:44 +0000 (UTC) X-FDA: 82886576724.17.CD038EE Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by imf04.hostedemail.com (Postfix) with ESMTP id 9C05240002 for ; Thu, 12 Dec 2024 14:44:16 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=linutronix.de header.s=2020 header.b=MNStwzcm; dkim=pass header.d=linutronix.de header.s=2020e header.b=VXbkPsTI; dmarc=pass (policy=none) header.from=linutronix.de; spf=pass (imf04.hostedemail.com: domain of bigeasy@linutronix.de designates 193.142.43.55 as permitted sender) smtp.mailfrom=bigeasy@linutronix.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734014671; a=rsa-sha256; cv=none; b=WaPbS9dbI+Az5pw6S0aQ7rFE4yO3/GNL8jFfpcqH1EYwZpnHnqBQ/2VuuLm2Uelt/Wuo7p wg8Gn1SPZSfCeeWWh4Y1lo8i0LsTAUeKI/fRiHypjwONqDieB/8mjTrf4ejoheIEZny0f8 qggLuAMeAEsZIYFNmKnkhGXPNy5ZILg= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=linutronix.de header.s=2020 header.b=MNStwzcm; dkim=pass header.d=linutronix.de header.s=2020e header.b=VXbkPsTI; dmarc=pass (policy=none) header.from=linutronix.de; spf=pass (imf04.hostedemail.com: domain of bigeasy@linutronix.de designates 193.142.43.55 as permitted sender) smtp.mailfrom=bigeasy@linutronix.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734014671; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=u4Omk5678+THnJHgzrgIJpzmquIj1RqB6vEWe8xWi4A=; b=L2gnkcJes+dzScEeL4xZzPBF/twdhz9ChZYPwCzce+2ZCumfMAFNZW2gl7QXL+PqDbS63b gt07XBLZontHrt8o5fbMzPOIzAQbMLaPo9s2aMomAPqRFV22Lsa+Hlkp/b4SlmZ4Nzj+7M jfmLxB1cokpL7skIAc0MuCopbHWqX/o= Date: Thu, 12 Dec 2024 15:44:38 +0100 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1734014680; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=u4Omk5678+THnJHgzrgIJpzmquIj1RqB6vEWe8xWi4A=; b=MNStwzcm6YDOdfC7m04uibbBOh8HhPqqXAeMwkyP2Jbt4lapKaFllVDIaNoC2D9JuP1NE0 m0QKNyZTIYfZF+QgBR93bSrAQlT5HJOeWndbGhtHG2ZU8cH3AbxThfksGJQRbGVVGJMTTo wvDHTqJHrSOVdjMv/RkSUVIXI5agURYYh3JArfhixT4EphAbrtV3hqq6Mq/pm4VQpECV8t axlYAHDDV865HaYqF+X4NBCNYwQ9PIo0l9buDZW2zazlKWYWOXvTKP9pepC8nsQnPm0J20 OCL5lBLxjI4QFGEyyjzx1lH/Q4MO7w/rgTQ20ZMdQbc8GSJ2f9yyx2IXGKqgRA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1734014680; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=u4Omk5678+THnJHgzrgIJpzmquIj1RqB6vEWe8xWi4A=; b=VXbkPsTIa0I9BDwIqYzZbpSBi+7fQ71onc4N/G3kZb2OQGPaHlL3ybxZ9YdTmcUl4tIrUp y6S8jU+U4/qPFLAw== From: Sebastian Andrzej Siewior To: Alexei Starovoitov Cc: bpf , Andrii Nakryiko , Kumar Kartikeya Dwivedi , Andrew Morton , Peter Zijlstra , Vlastimil Babka , Steven Rostedt , Hou Tao , Johannes Weiner , shakeel.butt@linux.dev, Michal Hocko , Matthew Wilcox , Thomas Gleixner , Tejun Heo , linux-mm , Kernel Team Subject: Re: [PATCH bpf-next v2 2/6] mm, bpf: Introduce free_pages_nolock() Message-ID: <20241212144438.HDVlGUyA@linutronix.de> References: <20241210023936.46871-1-alexei.starovoitov@gmail.com> <20241210023936.46871-3-alexei.starovoitov@gmail.com> <20241210083503.zJdPI8s5@linutronix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 9C05240002 X-Stat-Signature: 8ohutaf4hzo35tnyzgiyf5d7jrnnczm9 X-HE-Tag: 1734014656-994265 X-HE-Meta: U2FsdGVkX1+e4+mCN8Wt9bBrR++q+CnoUsl54roz2Q1Ze3aT6vCMQLHMA7XxPcbDAd8e1J7AbGkoTvdNvzB6if7uWqIVkhZc6QjLL4RMtrBRF0Y/6J8mcQPEJBlgT1c3rwzhBSCw9uIRn9sswaGu1Pe5koTRG71j0iWlryo+qZ+PevDykUWgBnPY2Q0E5OHI4Tel50LCwLkSigOfFgxjjQzIQDiaIsI94DcQ2AJVnnpQ3jzCOla1bg2dyXckRuyJ1uGdaVPoghXQXHVrqeTalK52tI07MS2j6kLUcIvxkSZ3yHtX0V2d0hCOCFVxpiI83IKyj0xTt/P/vXF1UV9nkycU8tsskLOcUztuE+GZrJ50DluvBGlHREtUJxkrWO1bL+IdI+Ht+nuIwEse8ZnU0ao+E9tZ34CjJ+CtTUMm3Ckgu148rOyGoeCjeQNuoGlqllmFe7hmmzVH59HUrjVHhL7tdmFnJuhSU7ThrfKh1Yn7wl8oUKn4o3fKcalQuRiuympr8cU8VVGxOqSpch5vdzJl7hCC1dT2bVSeY8R8MdNka5NJOZHA/gPhifAigVOa8nuQRVinrKJgkmNqX0ojwBEC5FIHIFm1j/b8dblv6bvvzekNeKoVx5UYvN6xJ2WIGQfyyX01NyG1/7D9teO3t6/hX61pz+SOa0LRHoPQzo7FKeIytR9VD5C+RtBka2NAoHXm1xRznwyb8lO+sKMs5KcxlVvR61whVjXVDDyvxnxIPjANCwVogfyT0SBXnfiPjgQZon78R9FGgeh3I91o/6oF1IHFIfF7VoYR2Nrj3sj4JbIRFXG84vmHgeUGRBIdkvmOJ/cq+PDZtPykuMN88DIixFbCDUkaNDXbaqT3V7ObZ4F7qEHs1/tKe4QS1IC6/amgfKyeACEfjxGmKjS7pr3A3DPF/bu4q0pLjcmss582SZqubZxLWcC3u1k7++hWyi5CpVJxeEzyEnIxBWV jWhorL5A rc+M4fHSwCUwXl6RV61LpPOsmJDjgzBU1f8ibjqP0d1PeS736V25odeUQXionk0q6b7QQsH90eoH6ZrjQqI+/kXqbj8s+kb63N8pcUbbmSYzYM9YK3z4NS+cyBUd4jeqAEEGnSS+F8LdhvKmFZ9RHNo/9tn15xj5kPqYmsxdi2Whl8Y8ydSrJ6WRwstaeVYrK9vBWwV2q6LHrKJoKP5qfyHBQqwR2TsrHP1841Z3NsO8J88tdrhaXTrqaJlkmNMMHdoehsb8ccXh7ukb/QCSe4yo0xQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024-12-10 14:49:14 [-0800], Alexei Starovoitov wrote: > On Tue, Dec 10, 2024 at 12:35=E2=80=AFAM Sebastian Andrzej Siewior > wrote: > > > > On 2024-12-09 18:39:32 [-0800], Alexei Starovoitov wrote: > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > index d511e68903c6..a969a62ec0c3 100644 > > > --- a/mm/page_alloc.c > > > +++ b/mm/page_alloc.c > > > @@ -1251,9 +1254,33 @@ static void free_one_page(struct zone *zone, s= truct page *page, > > > unsigned long pfn, unsigned int order, > > > fpi_t fpi_flags) > > > { > > > + struct llist_head *llhead; > > > unsigned long flags; > > > > > > - spin_lock_irqsave(&zone->lock, flags); > > > + if (!spin_trylock_irqsave(&zone->lock, flags)) { > > > + if (unlikely(fpi_flags & FPI_TRYLOCK)) { > > > + /* Remember the order */ > > > + page->order =3D order; > > > + /* Add the page to the free list */ > > > + llist_add(&page->pcp_llist, &zone->trylock_free= _pages); > > > + return; > > > + } > > > + spin_lock_irqsave(&zone->lock, flags); > > > + } > > > + > > > + /* The lock succeeded. Process deferred pages. */ > > > + llhead =3D &zone->trylock_free_pages; > > > + if (unlikely(!llist_empty(llhead))) { > > > + struct llist_node *llnode; > > > + struct page *p, *tmp; > > > + > > > + llnode =3D llist_del_all(llhead); > > > > Do you really need to turn the list around? >=20 > I didn't think LIFO vs FIFO would make a difference. > Why spend time rotating it? I'm sorry. I read llist_reverse_order() in there but it is not there. So it is all good. > > > + llist_for_each_entry_safe(p, tmp, llnode, pcp_llist) { > > > + unsigned int p_order =3D p->order; > > > + split_large_buddy(zone, p, page_to_pfn(p), p_or= der, fpi_flags); > > > + __count_vm_events(PGFREE, 1 << p_order); > > > + } > > > > We had something like that (returning memory in IRQ/ irq-off) in RT tree > > and we got rid of it before posting the needed bits to mm. > > > > If we really intend to do something like this, could we please process > > this list in an explicitly locked section? I mean not in a try-lock > > fashion which might have originated in an IRQ-off region on PREEMPT_RT > > but in an explicit locked section which would remain preemptible. This > > would also avoid the locking problem down the road when > > shuffle_pick_tail() invokes get_random_u64() which in turn acquires a > > spinlock_t. >=20 > I see. So the concern is though spin_lock_irqsave(&zone->lock) > is sleepable in RT, bpf prog might have been called in the context > where preemption is disabled and do split_large_buddy() for many > pages might take too much time? Yes. > How about kicking irq_work then? The callback is in kthread in RT. > We can irq_work_queue() right after llist_add(). >=20 > Or we can process only N pages at a time in this loop and > llist_add() leftover back into zone->trylock_free_pages. It could be simpler to not process the trylock_free_pages list in the trylock attempt, only in the lock case which is preemptible. Sebastian