From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 91C2BE77188 for ; Wed, 15 Jan 2025 01:23:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 151B26B0083; Tue, 14 Jan 2025 20:23:39 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1023B6B0085; Tue, 14 Jan 2025 20:23:39 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EE3FF280001; Tue, 14 Jan 2025 20:23:38 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id CFA526B0083 for ; Tue, 14 Jan 2025 20:23:38 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 8255A1A0ECE for ; Wed, 15 Jan 2025 01:23:38 +0000 (UTC) X-FDA: 83007938916.04.B572B86 Received: from mail-wr1-f49.google.com (mail-wr1-f49.google.com [209.85.221.49]) by imf24.hostedemail.com (Postfix) with ESMTP id 89BED180005 for ; Wed, 15 Jan 2025 01:23:36 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=e6mKLkOV; spf=pass (imf24.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.221.49 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736904216; a=rsa-sha256; cv=none; b=rEcSPADfSIpkS7Uvlh6qKLXUnNzC0RdZov6WttZNO4n10bkOU0tTJA7ZhTA2bG9vLQdes8 7/EVeYvpViqpY/IR9RPBSpB0MfAQJrQJUtGPRFk9d9h+Ps1SZYf/iuV1mTOc81mBc0EUzD LxyxsdHnFi+fgxjcBm0rUaQJqIJKvWI= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=e6mKLkOV; spf=pass (imf24.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.221.49 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736904216; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Ok9Gqy2J4Ancn8P8Xi7xjCCSatpnHob/7noj/m8WMWY=; b=0e0+KhET3duSPu53spb4ig/w5TM3gY4eRf+7w+HD+DvjGvnGREzh/8BZCgQysh8EAEhsW9 CpyGmhk2WjrmXk3RL6GXsx3F3It68cR+rKT96Law00GFN1Z2YGdp1aBMn1LSOJpG1RiYzK wdT59esE/70WyR4eGnaPl/ZgdtbHp6U= Received: by mail-wr1-f49.google.com with SMTP id ffacd0b85a97d-385dece873cso3037861f8f.0 for ; Tue, 14 Jan 2025 17:23:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736904215; x=1737509015; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Ok9Gqy2J4Ancn8P8Xi7xjCCSatpnHob/7noj/m8WMWY=; b=e6mKLkOVUYB7GT/UseTDZkxtfgM5zSxR5qHNJYuBXlr5fCvbkOxsgRE89UI0/JKDz7 b9g5Eup9Yal6ZVpqjHm/N7hwvrHDk1WMaKptNOb+wnJKsX1FNOq+IbeGLvOkeOHCwVql opOhl3VfDh1mMKd/dvJ3wGcURolRenbQ1RLO4mCcL3+UufLKyczgwVqmE8WiGmDxMpxZ lBG3iqP6LW6x2SMYEJuT01ynafLt4AkNqENEmNh9+jgy2lRsfNPLZWh8DGYggCTpXL9s fZnJTm81lvygsmrL/HxWGNFpA7/3+NcLTYhioZLMIus528yhpI3vse6ffrla481pK5m6 nfhQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736904215; x=1737509015; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Ok9Gqy2J4Ancn8P8Xi7xjCCSatpnHob/7noj/m8WMWY=; b=NOktC0FU8mXOGRjs3pKAwk8ygRUQGulNpnrEoOpo5ReKdt1NVCMc/IC2tttVAcYliC FObo3vOCAjbRsmi/s2Jo9okxCANCzSCKgCVtYoI0jht/2UbMOGjV/0A/uulOxjj6QLJx ArIQTVVdqAs9v3Y/ALMVJXFi3wgstHmlEOgreMnkjsO9OiSay8/HPMlDLZQJpoT+itik Ma7t58jM2xdc6yuw0GyNVDN2dsN0MH33wLo7BjAtgL6tQ7HS69D93hG5unROVixDbUxq tSVdHCsbB50FAOajbIaeNQoZZErlCTkHyjaCOZivF9orMUZs+NatAHWSm7hrvqioI276 5s5g== X-Forwarded-Encrypted: i=1; AJvYcCVGIxHu0OgvMl/rDxn0uKZCu+BF9GAvHTzlHRtEEvwBDSqkgEopgU7IER7uBXuKWgyKSXE7KSZTEQ==@kvack.org X-Gm-Message-State: AOJu0Yy4vMzJh0sRJExUjIVYUR7eVTS1LtUhwKn3fGX15mogJjcBvToP nmrOAP5AtmiiIl8enKkUZGQrw05W4Pz/wqj0zjIMyyS/Nf/wo8c7X2g6cN3lpvimVTTDESoUERl HromIQabRB+Z3RsA1A2eOnR311f0= X-Gm-Gg: ASbGncsy/Q3FV8+IJhl5fop+UMWb+saHmG7OMuwSJ3moJxYtEzgYx/1UVMmXWOn7UIy cO83zViDcwI2Cxmh1akkBqKZYKr64p+BGoSIJcPaOhlRLXMakmjEhHA== X-Google-Smtp-Source: AGHT+IEIDyHLvrsamKlandIWLQYxw5Y7eUDiworWj6iaDrcVK1rtsEXYoTLICDodaE/CCqnQWn3X4dSB28Q44qY0lE4= X-Received: by 2002:a05:6000:461b:b0:385:f17b:de54 with SMTP id ffacd0b85a97d-38a872f9866mr23366450f8f.5.1736904214825; Tue, 14 Jan 2025 17:23:34 -0800 (PST) MIME-Version: 1.0 References: <20250114021922.92609-1-alexei.starovoitov@gmail.com> <20250114021922.92609-2-alexei.starovoitov@gmail.com> In-Reply-To: From: Alexei Starovoitov Date: Tue, 14 Jan 2025 17:23:20 -0800 X-Gm-Features: AbW1kvYTbRdOozu_LRhOJL0e-n4FXoU9mFOu6DWy4O5wO2B81uhVsJSCtiIQXgc Message-ID: Subject: Re: [PATCH bpf-next v4 1/6] mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation To: Michal Hocko Cc: bpf , Andrii Nakryiko , Kumar Kartikeya Dwivedi , Andrew Morton , Peter Zijlstra , Vlastimil Babka , Sebastian Sewior , Steven Rostedt , Hou Tao , Johannes Weiner , Shakeel Butt , Matthew Wilcox , Thomas Gleixner , Jann Horn , Tejun Heo , linux-mm , Kernel Team Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 89BED180005 X-Stat-Signature: 4jjqbbb6pi56w3d6e9smmhsiepxhs36i X-Rspam-User: X-Rspamd-Server: rspam09 X-HE-Tag: 1736904216-641900 X-HE-Meta: U2FsdGVkX19zc+KKl8/6M52UAI3vYhQllxQHb2PRevrkLfMt+EaZS5UJ7blL8GOibgFw+TwcJ8ttijDzYuM3bZd6oE3xjGsRjGLaPTG4904YYl5fDaQWrhRrwFZxPYKwKWmkWrqr5hSpEbhG/0Zv8GB/EyQMR7wTMGuW0/UkIcLKaWuvaLXXyTg+rEokG7mTy3Kiq8RhqIqyBb/pcHlD60Y3VFX565/XRYHxJhHFMUKCOzljtY/WETDltaqFgQkxVlsBjG8CIF4RahWHOWm7qOSAYWhLNAt75FZGKLP+ohM/009QmdeA3hyVzCuLjwqHlEd7/KIdjl/nuIsCd+Dgr6nKd6yvsPbWrhwYA2TeGRZ0Xqhbjn65q4ltYykfgDp60mm8AWHBEjLY0OvJx8T9KJ7SS4FmQji65n3EhdKX7g9J4NkhnwteTAW9o9BIP66juVXgK2EH8b3YsX+txPkPoeEK2rTfMoqARPFLoIlSvmj4I9klzqwXWsyCtHc3exLv51NW6J6oeV2Rep06Qx3/G8BbzjOBS3qmqUcgJ4VN5vfMxOgGoR8QrVkiXqMLDwoQ3VMQbohw6XLzgX7r5Uk8Dt4FGZ23DCGRcYYo98o0B7sON5BndBKPtpSo8KydASv3LC6vu3F2y7x3Jmj5uSfKBBtLSw6TPhBvLtU7ta1Jyx4e+dm86t4IukZVP2VSafiNozdTzC5/5eb6k+Lqr6N/K5fkbUxCpbFWu+TQcDByR+Fi7FOTip/IaZ0lu+v+2BEpfbtOhpXXboi12F+8oVHSvD2XDPq2/rrUMkfYHH0P3Y5Aj6uD+pzirWITiKHhsOtGRkv4yI8gfP3N1EasQuwjnNwQ7j9dGt+eMIkgzZUaozsRQE5Q7nP0q3kMRa5MYL/frQGXLC7aNJiX4t5LyUWPjmlzbwBMNJQggUP1IrcE0B5Nrj+VGp/9XLLOpbZLwAe5+gTaZq8SJ1Li9DYU3pp cWT+uxtf eWpZm0pXagiYforj/LA9BI1xia724RSQWWZXP4nZAMu7ElNYNcmUPLUeN/vDglmIBh0pl2qs5lyz+cTj01K0M5bcEEgkwk4J/f2idmg3NvZYp/GWve5vOrV7MDlb/VsGgR3B/kBtaBDBOvSM1d5Ghc3LbV4ZV7WPHjPhrHEFnM+YmdvTCi/rIXHHRdmZL5Bp3khZLtZyU19HpEvsREoG1aCnPR425iZ54SQ0YR97yEFNIOg/WyfP2hZMd9I1Agw0dzLfsR37OKy2VaQqieEiMxtxV23Tj9tAu8hObZBeiZomLKhUAL8lkaKe1gB/Hz1d3YBBE1RToOoY9QtOvj5S1R7UVorZJo+nG8rg/faU/sooRLXPYlDhmmJr7fEXg+wTXDFY1TklrtZRV5VCmp0vafthC4YwTh2W9Gv3YbB73nSyjxuGHcJGpEQC6HqzDLo12UepWULZE7Sw8ej9shg5T6ef0UBAAW52byM5YtwgG56DET9nB/CSbLLRRXFlAjAMqjfYsg5n/UynEARfh2LDIO3HwcNlXSlXNYleN X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jan 14, 2025 at 2:31=E2=80=AFAM Michal Hocko wrot= e: > > On Mon 13-01-25 18:19:17, Alexei Starovoitov wrote: > > From: Alexei Starovoitov > > > > Tracing BPF programs execute from tracepoints and kprobes where > > running context is unknown, but they need to request additional > > memory. The prior workarounds were using pre-allocated memory and > > BPF specific freelists to satisfy such allocation requests. > > Instead, introduce gfpflags_allow_spinning() condition that signals > > to the allocator that running context is unknown. > > Then rely on percpu free list of pages to allocate a page. > > The rmqueue_pcplist() should be able to pop the page from. > > If it fails (due to IRQ re-entrancy or list being empty) then > > try_alloc_pages() attempts to spin_trylock zone->lock > > and refill percpu freelist as normal. > > BPF program may execute with IRQs disabled and zone->lock is > > sleeping in RT, so trylock is the only option. In theory we can > > introduce percpu reentrance counter and increment it every time > > spin_lock_irqsave(&zone->lock, flags) is used, but we cannot rely > > on it. Even if this cpu is not in page_alloc path the > > spin_lock_irqsave() is not safe, since BPF prog might be called > > from tracepoint where preemption is disabled. So trylock only. > > > > Note, free_page and memcg are not taught about gfpflags_allow_spinning(= ) > > condition. The support comes in the next patches. > > > > This is a first step towards supporting BPF requirements in SLUB > > and getting rid of bpf_mem_alloc. > > That goal was discussed at LSFMM: https://lwn.net/Articles/974138/ > > > > Signed-off-by: Alexei Starovoitov > > LGTM, I am not entirely clear on kmsan_alloc_page part though. Which part is still confusing? I hoped the comment below is enough: * Specify __GFP_ZERO to make sure that call to kmsan_alloc_page() below * is safe in any context. Also zeroing the page is mandatory for * BPF use cases. and once you zoom into: void kmsan_alloc_page(struct page *page, unsigned int order, gfp_t flags) { bool initialized =3D (flags & __GFP_ZERO) || !kmsan_enabled; ... if (initialized) { __memset(page_address(shadow), 0, PAGE_SIZE * pages); __memset(page_address(origin), 0, PAGE_SIZE * pages); return; } So it's safe to call it and it's necessary to call it when KMSAN is on. This was the easiest code path to analyze from doesnt-take-spinlocks pov. I feel the comment is enough. If/when people want to support !__GFP_ZERO case with KMSAN they would need to make stack_depot_save() behave in !gfpflags_allow_spinning() condition. Since __GFP_ZERO is necessary for the BPF use case I left all the extra work for the future follow ups. > As long as that part is correct you can add > Acked-by: Michal Hocko > > Other than that try_alloc_pages_noprof begs some user documentation. > > /** > * try_alloc_pages_noprof - opportunistic reentrant allocation from any c= ontext > * @nid - node to allocate from > * @order - allocation order size > * > * Allocates pages of a given order from the given node. This is safe to > * call from any context (from atomic, NMI but also reentrant > * allocator -> tracepoint -> try_alloc_pages_noprof). > * Allocation is best effort and to be expected to fail easily so nobody = should > * rely on the succeess. Failures are not reported via warn_alloc(). > * > * Return: allocated page or NULL on failure. > */ Nicely worded. Will add. Thanks for all the reviews. Appreciate it!