From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF240E7717F for ; Tue, 10 Dec 2024 21:54:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3F3486B0178; Tue, 10 Dec 2024 16:54:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3A2DB6B017A; Tue, 10 Dec 2024 16:54:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 26AD86B017C; Tue, 10 Dec 2024 16:54:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 09C356B0178 for ; Tue, 10 Dec 2024 16:54:01 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 74977140BAE for ; Tue, 10 Dec 2024 21:54:00 +0000 (UTC) X-FDA: 82880402388.28.B4F62C9 Received: from mail-wr1-f48.google.com (mail-wr1-f48.google.com [209.85.221.48]) by imf23.hostedemail.com (Postfix) with ESMTP id 8C95C140022 for ; Tue, 10 Dec 2024 21:53:43 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ZtSXUi11; spf=pass (imf23.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.221.48 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733867623; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nU6L/6q4fycGnb1IfLNF6RjJrVjEHLRk43yjWb/OKv4=; b=D6ruVL3g58wqxrudW6IW63rHMWNusuZ4xqIqGzA8zs2sLQy2Z2BZLDbVbGEetDHmR7Mgnc /gunUjGzYC5Dg6YEpvfu/+8j5XjBYQARG/kNP7hlrmcsPjuhQYYzoqRhCvpBbaniZpQtam hneAGVR+Uo0NcjHeCfib3PzTQZpkTwI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733867623; a=rsa-sha256; cv=none; b=tMzvwJDk3YvdXIUnnqqQRIqsmiISN+6bMzKbZEAsW+At5tgot0r29XWl8MvbxQkct4jR7p qUv/JwGuKPjFhwTjqDTm0Ly/qYiNdGsVl1uYYuSE8JeV2R9ie0Ai/xRtxcapKXDWLPYvZN lcZ6C5pxmUYMQm7itZpey+tYJPVplbc= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ZtSXUi11; spf=pass (imf23.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.221.48 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-wr1-f48.google.com with SMTP id ffacd0b85a97d-386329da1d9so2087046f8f.1 for ; Tue, 10 Dec 2024 13:53:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1733867637; x=1734472437; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=nU6L/6q4fycGnb1IfLNF6RjJrVjEHLRk43yjWb/OKv4=; b=ZtSXUi11slH6OoCm0lFHQuCXk7mc746y6v4beGTCaaf7zDwKK033uD+tjl5YkdzHuW dmWotTJmP9gbsDRUSUmKvVRk9fLBBcT06PH7/cFRnx1Jhonr766MPvy8E85eBRB+rM6k tFpNurGzKqXYINJpOuQ3aNXuLi37GSANjs8/UdW1OS6M01IM5w/WjvRuGqzuLEdZwnPc 0pd/a9UMOHznWEyjcEv8EkV7ieJkfIXbh7KFKZhAsNV2ed3kXXW2n8MkTL7v0n9ggLgP +LK4Rp9Q15JMsaURceVdQgbc6fJjKE58TBux5TBGK5dx2TIzm+f/yne/lNWwMGq4hzMp cu4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733867637; x=1734472437; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=nU6L/6q4fycGnb1IfLNF6RjJrVjEHLRk43yjWb/OKv4=; b=wIdHbMRf+9+61BfXWPFRW2hyMdOzHnO0GkfxVw0CSpxtOingprL1Ve/CvWo+57Vyjb x0DgR7oYYUNc3ja8UNtfs2VtHea3ita6v5ZJqOCvXK3ZwTkHQcK4jZrTvC+qIpYqsoy4 yF3W7YeQmube5Q3UqRkLsStVtC0Wcqi+l5aextYpU4gGHNsZMe6HPHpQHEsW6pVP7/XV GaB56GGgC4HXDiNdPtUpTssJLH/QxeY1PiG3VTnG54Mh/bnWkugA70Sej/Ti7bQzTiQf A66hetVr5x5Xsu91cfj3QDwui81pLH6+5KUk7k63NWaB5QmHYuqE/2Cnw5VPQUo/KJ9e giXQ== X-Forwarded-Encrypted: i=1; AJvYcCW352POdLK2Xu18lJupxl+qKH3Ot2WktqSZVBTWzvINziC7/nKWsLt+rEXDma/DXIhV7jic27EOJg==@kvack.org X-Gm-Message-State: AOJu0YxUN7vU2tN86Bs7NF2Q/uj5Ea7v+tQXNAc9pG88sCRFIGSdA2sP /ae7ejAvSB9/y23k+v+ctiUx6zrRY1yod+q9q/yeIZGIV8hXb2BPd5GIubB/C9htoyC4Rq+6x6J azwXj6n52gvkYoKQNwQcTUKRcMVc= X-Gm-Gg: ASbGncv9amMmTP6TLjkPj+q6HrrsQW7VW9B8/c8pq81YBNzzlcDhi0kHgeFJW14c+QT o2uyPQ7ud4UKjGC5fWG38uxYT83hKSPUOLVpch9cTXhBf4Bvky3E= X-Google-Smtp-Source: AGHT+IH9Pnj16gzbnddOYI8nZHGvVHzysHGw9SWtsSIM0/9Jl5uCgv9FIGEWbyDEn37isp/TeHJEEhEhsPkUSXMfSXo= X-Received: by 2002:a5d:64cf:0:b0:386:39fd:5ec with SMTP id ffacd0b85a97d-3864cea567cmr398753f8f.57.1733867636768; Tue, 10 Dec 2024 13:53:56 -0800 (PST) MIME-Version: 1.0 References: <20241210023936.46871-1-alexei.starovoitov@gmail.com> <20241210023936.46871-2-alexei.starovoitov@gmail.com> <20241210090136.DGfYLmeo@linutronix.de> In-Reply-To: <20241210090136.DGfYLmeo@linutronix.de> From: Alexei Starovoitov Date: Tue, 10 Dec 2024 13:53:44 -0800 Message-ID: Subject: Re: [PATCH bpf-next v2 1/6] mm, bpf: Introduce __GFP_TRYLOCK for opportunistic page allocation To: Sebastian Andrzej Siewior Cc: bpf , Andrii Nakryiko , Kumar Kartikeya Dwivedi , Andrew Morton , Peter Zijlstra , Vlastimil Babka , Steven Rostedt , Hou Tao , Johannes Weiner , shakeel.butt@linux.dev, Michal Hocko , Matthew Wilcox , Thomas Gleixner , Tejun Heo , linux-mm , Kernel Team Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 8C95C140022 X-Stat-Signature: dhbqsqjk911qggnsmh6d99o31nkzzmod X-Rspam-User: X-HE-Tag: 1733867623-515529 X-HE-Meta: U2FsdGVkX1+qvSXCfIMWCEWh/DqBtNtci/0YYWJwVexnpJ7UJXTEBDuH64sVCsz3bl8XPMV4FhQ9+t7FBXgrPo6JBN6Bb/0tk6Z8Fvt5QqKlJD0mmcyc8NYq21sRR9sm3QYCCXHRpAkYeNniEW472RSG8dsg122bJpRmAe1FF9ImmR6CozwH+fmi/a9hKxRrUepygd/BMdin2dW2849iyXyPr3f0HGE+aAOgk+38qdvb+OTQ4M5LjqAldgaK/yoo8zhOUeC2c0ajpHVNI1siZ3FNaVAyLGIWvPpNQXjQ/1BZd830fgHzQ6nsB3C3H5lhS8Hi8IFr6dRzRtJAPaHcW1uA6YTJbFAFBQC1zg9G+GNxW7kjv/FxxH8aW2LoseVcaz+z7bptQwmMttFhCr/EhH3DUxx6rVSuiY3AfgN4Wp+WVmtH85kgjQPZdzMA5NkJAMQygl7S8AHTidzV3T0vmtCE19EIRebOzBJsuHFnB856I+aShj2j5haph9+T5IjiNCI3Wc7nIZaBYbj27uGMYE37k1mHqRE9YM+1/I6/khGNMB9pYSwvpN4MnGsfAICpt5zjV9LrjUbHLi3Xy6r53gpK4scAWjfZnLho8cQIVImdhUtDN6mOctR6XgaueXLwC1zYF0LPHWG3Upl7UJTyC6NE29yKxWSc251BeSNQdpT5XLLYPya+vxAFVOh5q0CNGved8HoQ4TT895cMSPY2F85ItdOc7yAnODp8iEL/y+7W3UrPBZGMlnP2WRU38embVo+z3PctqgREk4t3xido6s5CCrf51+qdFpKhvZlnRJOZoB7M6BWuXxTYvWWB+C186C0v6/j2fGJ8kyKOBmONy7qj73+x2r8wX8isLYS+wd/T2FVTD3dN4XZBuxFexTqiiTLs4rrbAQZa752QpwG7haVh23YzQWqs1mXUxuLz9Ol1uyNvhFaMeXO5JpfOLXDPunsXiWQgHmiaG5hyFey HDTxuSwz fcGIGn90FxjNLFVD+LOUlqX1OGL8UumzMhHTgF5Dj4HqsXqvHT3DCjJxiRZDDWn1VKlKSYaVHWF6vxT+buza/n0dbhxig7/qiB34CImaKbd1Loc/TC9BSl3QgiOvGs+JfnJw3rvpxVJ6b+1LT5rCgijXT66FLyvarWPbofSk2VdSd3Ey22QLGjKKgh6kOiKP469fe3livZgu/wcyhjshk3RVIkgIsWA/IqLYX/6OjWmj042vWl+0Ky5ONruWmC213qranCChBkv0OSuu/mkj6jboZYQ6fEQ+PqpHhrc7n4jDYnsO6x3cuA/N0f4AjW/mJ4Abuco+BPL8+aMMhn5YDHfMf22/dxb4j2mszKk3vA1aBejwYPBobVK0SnLaoCFZPNOXEqH2k0R4OdPwYPcSNyvNQnVEtXg5+yLZOrFXkGtyP5+c= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000042, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Dec 10, 2024 at 1:01=E2=80=AFAM Sebastian Andrzej Siewior wrote: > > On 2024-12-09 18:39:31 [-0800], Alexei Starovoitov wrote: > > From: Alexei Starovoitov > > > > Tracing BPF programs execute from tracepoints and kprobes where running > > context is unknown, but they need to request additional memory. > > The prior workarounds were using pre-allocated memory and BPF specific > > freelists to satisfy such allocation requests. Instead, introduce > > __GFP_TRYLOCK flag that makes page allocator accessible from any contex= t. > > It relies on percpu free list of pages that rmqueue_pcplist() should be > > able to pop the page from. If it fails (due to IRQ re-entrancy or list > > being empty) then try_alloc_pages() attempts to spin_trylock zone->lock > > and refill percpu freelist as normal. > > BPF program may execute with IRQs disabled and zone->lock is sleeping i= n RT, > > so trylock is the only option. > > The __GFP_TRYLOCK flag looks reasonable given the challenges for BPF > where it is not known how much memory will be needed and what the > calling context is. Exactly. > I hope it does not spread across the kernel where > people do ATOMIC in preempt/ IRQ-off on PREEMPT_RT and then once they > learn that this does not work, add this flag to the mix to make it work > without spending some time on reworking it. We can call it __GFP_BPF to discourage any other usage, but that seems like an odd "solution" to code review problem. If people start using __GFP_TRYLOCK just to shut up lockdep splats they will soon realize that it's an _oportunistic_ allocator. bpf doesn't need more than a page and single page will likely will be found in percpu free page pool, so this opportunistic approach will work most of the time for bpf, but unlikely for others that need order >=3D PAGE_ALLOC_COSTLY_ORDER (3). > Side note: I am in the process of hopefully getting rid of the > preempt_disable() from trace points. What remains then is attaching BPF > programs to any code/ function with a raw_spinlock_t and I am not yet > sure what to do here. That won't help the bpf core. There are tracepoints that are called after preemption is disabled. The worst is trace_contention_begin() and people have good reasons to attach bpf prog there to collect contention stats. In such case bpf prog has no idea what kind of spin_lock is contending. It might have disabled preemption and/or irqs before getting to that tracepoint. So removal of preempt_disable from tracepoint handling logic doesn't help bpf core. It's a good thing to do anyway.