From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BADD7C02183 for ; Tue, 14 Jan 2025 18:29:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 34B13280006; Tue, 14 Jan 2025 13:29:20 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2F8E96B00A1; Tue, 14 Jan 2025 13:29:20 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 199EF280006; Tue, 14 Jan 2025 13:29:20 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id EB23C6B00A0 for ; Tue, 14 Jan 2025 13:29:19 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id A6BA88030C for ; Tue, 14 Jan 2025 18:29:19 +0000 (UTC) X-FDA: 83006894838.03.1240479 Received: from mail-wr1-f51.google.com (mail-wr1-f51.google.com [209.85.221.51]) by imf27.hostedemail.com (Postfix) with ESMTP id A519040013 for ; Tue, 14 Jan 2025 18:29:17 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=CbM0jtMK; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf27.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.221.51 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736879357; a=rsa-sha256; cv=none; b=D/S906Ois2lciKC2ABRM+BafRutym5jow5qxDYignHpxoAVMV8Uf3dWxTlfRAOyfhtq5M9 /QJIaG4R8C+U4t3+/0HIVkBl7vHupuLmlHbEbk4eCJ4aG8zogceox5HrlVTG7yGmm1jU9d LlTy9W2yMsmCL3aNVHoeki4ujior25E= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=CbM0jtMK; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf27.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.221.51 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736879357; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=4B4h+GqI+a8SvIS5sye8hS5px9jGMW4hqMJvB4sEfG4=; b=nqNssJf0873rqXTNS05TL9154TipZZLsGdlMtllBsheW0yb2m/ED8L3VFWg//rVkcyrE/b E8Jc7ueFBa4nxwD1AMn/GV7rcl9awLvEELFmE16pniqfxkAHGDnNOdZ/UC8A/LsshZqvj6 F8r9T2V4SEjKueZuffiP0qLmJur0GmA= Received: by mail-wr1-f51.google.com with SMTP id ffacd0b85a97d-385ef8b64b3so5097622f8f.0 for ; Tue, 14 Jan 2025 10:29:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736879356; x=1737484156; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=4B4h+GqI+a8SvIS5sye8hS5px9jGMW4hqMJvB4sEfG4=; b=CbM0jtMK3GKjj5WyC/++ZJ4I9xQsqijmP/qaun0WOE0soOrtfBI2+hl5J/KNX/+6J9 rZZ4zHM5Qggl2Hgv/0z+IyT+wsHwP3e2rqjWREsTGOuzbyy4zjpwdWs6t75UriRgFCmg hPYJQqASeB9YlStJ7QTBraMZyT2WZvE6pzVVuHe+tmXOjhDwwdeewDFJlGn8SYqQsej5 rwtYQWUD7gAHtL+hiAq2QB49LiTCOKXoBh/Uyet1aMOHER7ZahbYVDV0wpcN71wDqYgB x8OC3ertBq2NwGfb46F3RlTYNuVC+m2iYo+udQrJEtg03+HvQ5vN52V5IUMNmc3VnpzR R0gw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736879356; x=1737484156; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4B4h+GqI+a8SvIS5sye8hS5px9jGMW4hqMJvB4sEfG4=; b=GA6LhzQ+qoVxPQyNavy8TCHYstkpa6yiK2RYjXhxhMO8Jqye337sxVHiPuM1HBwjtz mo/irlphD5oAapJ7cMum4zsCSSbGKXgx7I10VpLzUUSb0V1eR3+B2kOCacHE3PTVpFYv 0yfbRQYW8/cINEP7GYGD7gTPCqUAN0BjHI72+SvUQR5xMVDwXd/Xu+OTPD68QHrD1AgS 8Zm/NzJzpkviHydLLeFLJT760Q0rAmT42vt7x6dqnvE/Ieyi/pURUcHfE0qm8McSMf2w UBgQnWTyVV6lIMvu+kDBnASb+kYfgR2GQibMRD/EZPiSSxJKgPjRLjbbGpSl9lrFkWG9 1EHg== X-Forwarded-Encrypted: i=1; AJvYcCWuFlhf4xaI9hyEJv33LRcMbBgEv/clRnrtR+CC3UOQg3RtlO4eP8rFOZuPls1btM3iNQUcF3KWxg==@kvack.org X-Gm-Message-State: AOJu0YzO85I/kVLFjNHrhxr9sA6bAsu7wXnZvp1Y5qqOkAwD9Z4NwphM 7mssa2QMjWTHyTD0O41IrxO5WfPQi3x0AGs2b0em0mIIY0XQ6SzGD1igMs7nK0kyjnGkDLB4bnB dPXurmoiz6mTh7cGrwz8PykF7Yo/4rfEK X-Gm-Gg: ASbGncsVg+iNMbF30bi18jCGaltfhhGocMwFBgS8CdfypWlZ4OOcgNRYSiTHR5RJiuK WrDjZN6637cM+E9L1qoOThywa7yQMVKGQxWrA8TsukJ53dIKNOElBcg== X-Google-Smtp-Source: AGHT+IFRhXZ7KLAJ0znT8DioPYndt6WjqgNiXnghDtjXS3gFJpl4ymE/JUSDndcvRVmMnzwKZWOWLrEzSgzdaR3wlIM= X-Received: by 2002:adf:9b9a:0:b0:38a:4575:5ffd with SMTP id ffacd0b85a97d-38a8730fbedmr18845761f8f.45.1736879355939; Tue, 14 Jan 2025 10:29:15 -0800 (PST) MIME-Version: 1.0 References: <20250114021922.92609-1-alexei.starovoitov@gmail.com> <20250114021922.92609-2-alexei.starovoitov@gmail.com> <20250114095355.GM5388@noisy.programming.kicks-ass.net> <20250114103946.GC8362@noisy.programming.kicks-ass.net> In-Reply-To: <20250114103946.GC8362@noisy.programming.kicks-ass.net> From: Alexei Starovoitov Date: Tue, 14 Jan 2025 10:29:04 -0800 X-Gm-Features: AbW1kvYKbPAf1xVyn9_x3zxilLlIKkzAudqp-Kz34Sgg3trNLynhQN0gigkkqbY Message-ID: Subject: Re: [PATCH bpf-next v4 1/6] mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation To: Peter Zijlstra Cc: Michal Hocko , bpf , Andrii Nakryiko , Kumar Kartikeya Dwivedi , Andrew Morton , Vlastimil Babka , Sebastian Sewior , Steven Rostedt , Hou Tao , Johannes Weiner , Shakeel Butt , Matthew Wilcox , Thomas Gleixner , Jann Horn , Tejun Heo , linux-mm , Kernel Team Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: A519040013 X-Stat-Signature: jf9q47mffsyrety5wrmgjbgas8wk3f8y X-HE-Tag: 1736879357-500165 X-HE-Meta: U2FsdGVkX1+D+zYVdvNNzQc5ZlefO66I8NiKSD7fTDJ5Vpc/EB9Um8BEBydO7Pxbdsa/nxwA3Q9etEzhJrVgsEDfEBwL+0TnvfTGXhDdsDwQ6sOmagjFaBuWqtbptUtIyNBATHMntt4cp08eKUy5jmHdHofoQrw6XJnzqxQd2pdWWkwuWPuJbYMJHaPm309gIQ5qQUcEg9NMa7kgrdIu5d8Fe67Cz9wslgAGZasTVzxNrlxDN5u1uNHmAECt46+eTa06gEiZfYekT/YKY3pzyV8zdyys9+sV4rAHlwiINZRSBfQGxUFEQ7M37wrWNEscAfDrko4el7umSfPMoKeHxEGt5Ls6+CGn9o09fyKXf6IHn/CTmyB7Oeem3YB5FsOKDlrNIxYg850sYF8XhlbnSpt9IRlXfwHNEU8ZGjbXnNTKBUICjU22orxyibIVD3E7YSs9aPF8utToJj5tAjjIl7AziijQIfc6E/5oFPAiVxIa2wm9w2xVZaFk4Q51c26SlUJO+jh1uQNfZ+4Q2OzSwZlDwz3mR6UDn5OcOkdNMhIP3prbjEQiD9RVfdR0hsRlIxjZ0/tJKcsvNIhUJePh5agKhF41cbt8wRdyaLf2NtfT5eXuo1VbikHQZIzd6MZxr2dio6RmyGzcXEZg+10uU5SB2VOstvEXIk1MwZ2cCWmbVc+OMsZiA70sBNdPMn2O1f+ws61mI4kAF40XbjgT7bdnQi7Q+U43oKUVQ1w5LmWtv3JeDlYNYEx9rQYiY3GCbJx74YWVSic5vILw96OtO9OcZPWyh8sgtxdeflD3+F4K4ea6niT/s8MVfM5f9nEkzKtJot1AUJSSRZa/QfLENP0iP0/2u+he95wy1VZZC+181Ph/HnNLADYiHT94FAzIh8jo5/KpOccqiRdyCwc8IihukWSgWEhOL2N2OH0ywXuasqswSH2UR4+Hq9Z8ZtaaEMhTVkKUPpwagfj/Uwk mEd9PN8/ jh5fTob+ztdImnewpOn3azqI/9EvU3xbLjCJmrTLU6atyZGkMlhyq+nsva+e5Nl+YxuIbUyg7B/XXFlkZZlxj1bnYkTOeSkl/QZN5EG4hFn8s1mrvp/lPqjH06Oxrb7nIrZbbwDA4OPwCyMNK3ZcRvD+FhkivoYqkhrygfLSi50MDhhV4T7gj2RVmPbdW4la07TLLaO/6u8IOSJd5b8p7e3U5p26mcFIAW8F+sIh7B8s4oCLV/s3as1bOsmG7J8/JvOT93S+p7l83hKOqT8DQAVSX6DiTHVFMJG/7E/C7FluYyCe4wa58wcHPil+MCc5j9xnJQpeEwwLn0bAgKtCb44Bw9KhlBpUEVw6PLWIXy1OMVfXfZS2YN8TamPsyNm7ayLFKs11LJtbBeSvudwQ/LIaXHP/TZJfoef3YiAmjjTLCdIduNJuEEAX7n1HHWNL/aqgvFQ28Xbk7nWQjRP9bwzoxhdG9JRmaL65m X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jan 14, 2025 at 2:39=E2=80=AFAM Peter Zijlstra wrote: > > On Tue, Jan 14, 2025 at 11:19:41AM +0100, Michal Hocko wrote: > > On Tue 14-01-25 10:53:55, Peter Zijlstra wrote: > > > On Mon, Jan 13, 2025 at 06:19:17PM -0800, Alexei Starovoitov wrote: > > > > From: Alexei Starovoitov > > > > > > > > Tracing BPF programs execute from tracepoints and kprobes where > > > > running context is unknown, but they need to request additional > > > > memory. > > > > > > > The prior workarounds were using pre-allocated memory and > > > > BPF specific freelists to satisfy such allocation requests. > > > > Instead, introduce gfpflags_allow_spinning() condition that signals > > > > to the allocator that running context is unknown. > > > > Then rely on percpu free list of pages to allocate a page. > > > > The rmqueue_pcplist() should be able to pop the page from. > > > > If it fails (due to IRQ re-entrancy or list being empty) then > > > > try_alloc_pages() attempts to spin_trylock zone->lock > > > > and refill percpu freelist as normal. > > > > > > > BPF program may execute with IRQs disabled and zone->lock is > > > > sleeping in RT, so trylock is the only option. > > > > > > how is spin_trylock() from IRQ context not utterly broken in RT? > > > > + if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq())) > > + return NULL; > > > > Deals with that, right? > > Changelog didn't really mention that, did it? -- it seems to imply quite > the opposite :/ Hmm. Until you said it I didn't read it as "imply the opposite" :( The cover letter is pretty clear... " - Since spin_trylock() is not safe in RT from hard IRQ and NMI disable such usage in lock_trylock and in try_alloc_pages(). " and the patch 2 commit log is clear too... " Since spin_trylock() cannot be used in RT from hard IRQ or NMI it uses lockless link list... " and further in patch 3 commit log... " Use spin_trylock in PREEMPT_RT when not in hard IRQ and not in NMI and fail instantly otherwise, since spin_trylock is not safe from IRQ due to PI issues. " I guess I can reword this particular sentence in patch 1 commit log, but before jumping to an incorrect conclusion please read the whole set. > But maybe, I suppose any BPF program needs to expect failure due to this > being trylock. I just worry some programs will malfunction due to never > succeeding -- and RT getting blamed for this. > > Maybe I worry too much. Humans will find a way to blame BPF and/or RT for all of their problems anyway. Just days ago BPF was blamed in RT for causing IPIs during JIT. Valentin's patches are going to address that but ain't noone has time to explain that continuously. Seriously, though, the number of things that still run in hard irq context in RT is so small that if some tracing BPF prog is attached there it should be using prealloc mode. Full prealloc is still the default for bpf hash map.