From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C4AF3E77180 for ; Thu, 12 Dec 2024 19:57:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 506386B009A; Thu, 12 Dec 2024 14:57:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4B6596B009B; Thu, 12 Dec 2024 14:57:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 356E76B009D; Thu, 12 Dec 2024 14:57:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 14BB96B009A for ; Thu, 12 Dec 2024 14:57:18 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id C025116057D for ; Thu, 12 Dec 2024 19:57:17 +0000 (UTC) X-FDA: 82887365400.12.C177661 Received: from mail-wm1-f45.google.com (mail-wm1-f45.google.com [209.85.128.45]) by imf02.hostedemail.com (Postfix) with ESMTP id B17CC80007 for ; Thu, 12 Dec 2024 19:56:28 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=XvWEJRT4; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf02.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.128.45 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734033424; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0JEGqxKtdaUUilE2n0nETZalgEqO/v8JCJtGSqnmfjs=; b=LTAXTQEVSFoXcPRTfGPhaMEauyLQ4KwlC9wY2EBaArSB72Nkt2L8ZX5TZLtZ16IBDUKwkV K5XBfFT3ndecH8zR5YktJjWurKRxnGANMT4NWuT79ef/o+++zfJyZP19wcS+PXUL08yDCX UCksuUVOJF2LaQbmdmI9wfS8GyDerXg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734033424; a=rsa-sha256; cv=none; b=jsZprXqtnyB6vQ2Z3wIR0VFGBSoIsVDrNF/FxUGxi7gsjdC9nmvJNIwUX0yxjCKXo70p5s TyEWFeboYmYCmLApOj9StB2olFLFtiS+cYU8/Ksq2sLgzlLguNtJZsXwJIN3vPL7ownXW5 G1j420yw1Tj05YIVUay3564H+mMl12E= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=XvWEJRT4; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf02.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.128.45 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com Received: by mail-wm1-f45.google.com with SMTP id 5b1f17b1804b1-4361f664af5so10748325e9.1 for ; Thu, 12 Dec 2024 11:57:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1734033434; x=1734638234; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=0JEGqxKtdaUUilE2n0nETZalgEqO/v8JCJtGSqnmfjs=; b=XvWEJRT4Kg+Y/6biJto0sqzo4cpWjzWMhZwXvpYD5YzaW1gIQ4L3L3vmxvigXY4yqu 08cX4QZzh4aKRSzvEc+G/JX0kVrN72+abfvdxthphYXDpPNtoolHcvnIKgybo/QcfpZR IxDj0VyDhwTWSAXnxqy42mwV2ih90xh6GYlYyvQdMSRx4oKy/k/MZ7DCDfbq+50eljiv 0f002DkPiPynqHqHlUW9QBDOX+7YvMuX147nCaSXYp8xqdwxxxAewkQUj3pNrtj+lmWt ChbhlajpmbcSB7ZAx14xjAJhUnh5E9YVyE/ajKK7AWfOMbCTQBTn+lElOLXKkmlcTDuY h0Uw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734033434; x=1734638234; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0JEGqxKtdaUUilE2n0nETZalgEqO/v8JCJtGSqnmfjs=; b=da55xJE14oZV4EYQg+z5AGxhpjEy1hzDMDeux0UY3tSgIwGma5Ra/Wb4qEZZMKiA69 TIAJLZwlDvVvCwwzxKdqdXdaWNdI5rtUQqQQLYvazrybG44+y2swRti8LhA2VGUmZBwG 9kQfhDmOI8EPPOPtOm9pwMoMSpmnsKSNnWK/RIPwz+swtfZJab2JNs7ZBrNSEU21yBz4 7yg6DHQqCATkoNpkwlRI57feuEUwkZEy42x9yq2oouD0HAtQNHq/GQqPdg8zilYPAMQ1 XWh/BYtp8BIF/ntzLfMjkO4isc83/PQ9bPqMIA+7sHV2H4koD+4weJjZeGM/DMqonus2 d20Q== X-Forwarded-Encrypted: i=1; AJvYcCU1+sdn48PPikDKKlJAP6iUQK9/RnwsbRClWe+HsifniorQwLQBmawnim2FsvnaBfPXGpaLD5AaBA==@kvack.org X-Gm-Message-State: AOJu0Yz8xzyDPI/7NwirPgxo0Dvu30u+/AhJtFdr7hw+eLZiWrAUd28O kGYKESbHFhl9Olm7h/O+yj61D9YAJq6WjW3ixND7fDHVEsZMvJAwF3yM7FvJlMUb/6WSNtC+vos AFuXBgDl8d/yZWQ53w7LwylfOz7c= X-Gm-Gg: ASbGncvqNEcuA7h0sYb4Ka8TH9XlzbP2v+9e2uKXVciOnkBuKaHof5SaSYHRDD6Ndld XewLNQorMcjPX/jG0K6oKoeN1/Wl5zF7L70aEXky+pPoMIgxk6SdPdAPlS9rOahkEdWd00g== X-Google-Smtp-Source: AGHT+IGF7gUTtsY5pQMcEJgtjMQfaCZcdoWMxUYkZxHPtYKNCnC8ZsYLG8Cu+mz5d968ne2oPFonsnUOrbO6fA+PJRg= X-Received: by 2002:a05:6000:1541:b0:385:e5d8:3ec2 with SMTP id ffacd0b85a97d-38787697206mr4289022f8f.28.1734033434066; Thu, 12 Dec 2024 11:57:14 -0800 (PST) MIME-Version: 1.0 References: <20241210023936.46871-1-alexei.starovoitov@gmail.com> <20241210023936.46871-3-alexei.starovoitov@gmail.com> <20241210083503.zJdPI8s5@linutronix.de> <20241212144438.HDVlGUyA@linutronix.de> In-Reply-To: <20241212144438.HDVlGUyA@linutronix.de> From: Alexei Starovoitov Date: Thu, 12 Dec 2024 11:57:03 -0800 Message-ID: Subject: Re: [PATCH bpf-next v2 2/6] mm, bpf: Introduce free_pages_nolock() To: Sebastian Andrzej Siewior Cc: bpf , Andrii Nakryiko , Kumar Kartikeya Dwivedi , Andrew Morton , Peter Zijlstra , Vlastimil Babka , Steven Rostedt , Hou Tao , Johannes Weiner , shakeel.butt@linux.dev, Michal Hocko , Matthew Wilcox , Thomas Gleixner , Tejun Heo , linux-mm , Kernel Team Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: onif8uqfo5abj5mki5pycida9n8z18qd X-Rspamd-Queue-Id: B17CC80007 X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1734033388-501285 X-HE-Meta: U2FsdGVkX196sUFGrD2IRqi+eupgoOuNnruOG7vr0tByh67zD1FjV6psY+MhwdD/26gDDxkwULYb788OkFvx+ej42LjNn2/O6pN3qR58M6VuR+Z1JJu1gyZOi15uyZwwNoRsuS+5L7rvqcWihv3tsCP+DJ+afcaPWMlvP6k8jBU0OVKGPrt7WEEpQF52xW6d8c2xKksDyvtM+alOJ2U34LIATZ934RG8Earpjv7DVXI2uT0ruGP5Z+sXR/j4uT4a1oYJpSyOACrL15Z9bICpoEvOXG5yBdFA70ctlPH1rg1bNKK+Rg3yVYmOh2Id2C+G2IeIz2c+yGwG6+fkcH4yRkuSRaWhPP89T9o46MskSsTrZ2svqSij7xrJw7SL8rvElfRLRejxFuR+J81k3tp3uWn9SccfqOrkjWJTXNX73Njq3zBbGkWGBX6R9KB8PeVAhEreIQoCJD1P7QDRoGy9v/rdoiVc5eZb9wr0X5A5srv5prCs94kNo9xz4SD8AFANkKhcUCsiSE4ZCpYshxBxPdkK9mtU4COuiNcGaDktpt+mBvbKD92A09o9Zj8lu2ZROMvMgicgnF9AcydFByZuj459M81Tij/zpKSlF5vjcss41HVMjak3WfVKXlLWgB1L74SbxwYUuyWgcGuZ+NOfkN1KFOlVSegCmWIMoDc8PW8iWlbjK1oUxAW2ULZ13/eHX5FX61kLOtngNTZPDnZDEEA2xCz1cIPASXNB8N7swrAcZv61X9zmo+zZSLTflZaxWRozlsZEZENiGXBJQmaEhQnKmUZRRFCjwnV9OEOGI2BhOvpEdwjv8+of1hcJXbf5oem7LEgVzyEVqWZtZzA4niUxQjS0TUMH20l4IVDnoz/eqEP+Suv8C8ITULroJeW/o6iov096lFmyByBuhYW6hSy0jIOSGlVgNTQ9m8XWmCTThcw89Zl1rakix1p+70QUqG+TNi7slaExJIrdIWz OSxqrypK q4V06WvH1czCCxRx/z7T11w86N3Uu5LENWuOoYuPkFFrXNv+E1Pr1qtfJV3asCgI3G/xMjUhHDdqR/IgJXp3GaUZi0d6gLzFQjv4h9oe1MNB21p2A+qjr76v//1Xrem+65PMl5sopdJJUfhISSi2e1SUPrPmuO0O2zjCqO7KQ1HsCVzmTo7JE4dpA0h4wOqHDjVnOk9bZuI/KQ2vCamGM+z3lQi5hl9DrZmIMFuPftwv8lFKS+YHOBphOZCtgmSl32PWCU2+HUO4qqXnBEbkCXP/fbPL+ZMhJXAoEY8EE3bl0fV8g1koYN+L7yWFROampeDYGOKbTJ8vJ5HNsDimFvCIWlPRwtEefCMuQs8U6ugFI76nzES+VmsfLFI3wJX3vrpjf9N7M7lMs0oszvghYyKTB3w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000153, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Dec 12, 2024 at 6:44=E2=80=AFAM Sebastian Andrzej Siewior wrote: > > On 2024-12-10 14:49:14 [-0800], Alexei Starovoitov wrote: > > On Tue, Dec 10, 2024 at 12:35=E2=80=AFAM Sebastian Andrzej Siewior > > wrote: > > > > > > On 2024-12-09 18:39:32 [-0800], Alexei Starovoitov wrote: > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > > index d511e68903c6..a969a62ec0c3 100644 > > > > --- a/mm/page_alloc.c > > > > +++ b/mm/page_alloc.c > > > > @@ -1251,9 +1254,33 @@ static void free_one_page(struct zone *zone,= struct page *page, > > > > unsigned long pfn, unsigned int order, > > > > fpi_t fpi_flags) > > > > { > > > > + struct llist_head *llhead; > > > > unsigned long flags; > > > > > > > > - spin_lock_irqsave(&zone->lock, flags); > > > > + if (!spin_trylock_irqsave(&zone->lock, flags)) { > > > > + if (unlikely(fpi_flags & FPI_TRYLOCK)) { > > > > + /* Remember the order */ > > > > + page->order =3D order; > > > > + /* Add the page to the free list */ > > > > + llist_add(&page->pcp_llist, &zone->trylock_fr= ee_pages); > > > > + return; > > > > + } > > > > + spin_lock_irqsave(&zone->lock, flags); > > > > + } > > > > + > > > > + /* The lock succeeded. Process deferred pages. */ > > > > + llhead =3D &zone->trylock_free_pages; > > > > + if (unlikely(!llist_empty(llhead))) { > > > > + struct llist_node *llnode; > > > > + struct page *p, *tmp; > > > > + > > > > + llnode =3D llist_del_all(llhead); > > > > > > Do you really need to turn the list around? > > > > I didn't think LIFO vs FIFO would make a difference. > > Why spend time rotating it? > > I'm sorry. I read llist_reverse_order() in there but it is not there. So > it is all good. > > > > > + llist_for_each_entry_safe(p, tmp, llnode, pcp_llist) = { > > > > + unsigned int p_order =3D p->order; > > > > + split_large_buddy(zone, p, page_to_pfn(p), p_= order, fpi_flags); > > > > + __count_vm_events(PGFREE, 1 << p_order); > > > > + } > > > > > > We had something like that (returning memory in IRQ/ irq-off) in RT t= ree > > > and we got rid of it before posting the needed bits to mm. > > > > > > If we really intend to do something like this, could we please proces= s > > > this list in an explicitly locked section? I mean not in a try-lock > > > fashion which might have originated in an IRQ-off region on PREEMPT_R= T > > > but in an explicit locked section which would remain preemptible. Thi= s > > > would also avoid the locking problem down the road when > > > shuffle_pick_tail() invokes get_random_u64() which in turn acquires a > > > spinlock_t. > > > > I see. So the concern is though spin_lock_irqsave(&zone->lock) > > is sleepable in RT, bpf prog might have been called in the context > > where preemption is disabled and do split_large_buddy() for many > > pages might take too much time? > Yes. > > > How about kicking irq_work then? The callback is in kthread in RT. > > We can irq_work_queue() right after llist_add(). > > > > Or we can process only N pages at a time in this loop and > > llist_add() leftover back into zone->trylock_free_pages. > > It could be simpler to not process the trylock_free_pages list in the > trylock attempt, only in the lock case which is preemptible. Make sense. Will change to: /* The lock succeeded. Process deferred pages. */ llhead =3D &zone->trylock_free_pages; - if (unlikely(!llist_empty(llhead))) { + if (unlikely(!llist_empty(llhead) && !(fpi_flags & FPI_TRYLOCK))) {