From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53CB3C4829E for ; Thu, 15 Feb 2024 20:51:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7C0008D000E; Thu, 15 Feb 2024 15:51:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 74AB18D0001; Thu, 15 Feb 2024 15:51:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5C2B08D000E; Thu, 15 Feb 2024 15:51:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 479638D0001 for ; Thu, 15 Feb 2024 15:51:11 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 161F1160401 for ; Thu, 15 Feb 2024 20:51:11 +0000 (UTC) X-FDA: 81795233142.28.93A6F25 Received: from mail-wm1-f42.google.com (mail-wm1-f42.google.com [209.85.128.42]) by imf29.hostedemail.com (Postfix) with ESMTP id 33280120002 for ; Thu, 15 Feb 2024 20:51:08 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="A0y2Xby/"; spf=pass (imf29.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.128.42 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708030269; a=rsa-sha256; cv=none; b=so86r/0xO2wOC6XokZy1vel6P5fIz5t/Vlw4z/KzpqqnyM39ZJhfVhst9pEriwkd7/s/kh TwUo1XtYAGuGfqtWOnRFFoNOT0e0iarLN2ReawV9g1fIndCvkgQBzeYM1qm4L2d3atCRl6 JZ/J0lhmwmR32CMB5zEvrkkRA/Y1eUI= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="A0y2Xby/"; spf=pass (imf29.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.128.42 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1708030269; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QCQCkGfwLZq3s4Iau/1AjCFP8ny+xskYknnBBlXlkxY=; b=ceaFInxi+9pI5zw+B8I6t5K+anGpDGawUXKAx+N1fQqG8iyo2msZYgRrCIJ+HZybXpxCEN yNeuBmOUrH08HvVy65CnzuQmqN2QAXAXhPKXw7NFhoaLL1ALNGMsRQORYMVMPpmRKRm3n/ BUCWnQPyKx89yXAps99wsAzhUuqqW3k= Received: by mail-wm1-f42.google.com with SMTP id 5b1f17b1804b1-410e820a4feso13892995e9.1 for ; Thu, 15 Feb 2024 12:51:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708030268; x=1708635068; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=QCQCkGfwLZq3s4Iau/1AjCFP8ny+xskYknnBBlXlkxY=; b=A0y2Xby/8DGYBTwlZxXr6Pvg9it0+C8O+dg3bk5cpjvICc+vqiN0YAhjO6b8RloiPM 61Qr3wwy/S7/53RMGWIn6in3ChrpCNBnca1UF8oOCtkA2orMjBfksKA0J32vaGNJC6Ss b7y3yEzIKnJ8H8AP1fKYdc+FqQDdiiD55yXQhusXyUTIawuWJshMXXAPAS9bNLRd83J4 2Z61KFtuBqv2Jc50+hBzJIF56eYqkej8jvNnU3hHY+vwf7UCC/nNRYFRSMvCajPdI7fH XAQc5LQLCYTHs9NsC3H6mQsh0vLPLj0SPJEef+KaC29xshrplx6deU2y8+LhQbi+Xq2N X51A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708030268; x=1708635068; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QCQCkGfwLZq3s4Iau/1AjCFP8ny+xskYknnBBlXlkxY=; b=XNYx//jy1rXJcMFvRCffbVoL8aM9fMv5FfLn9j8tI2yLByvwxtxDjnmEoYqg0pR1Kx ESCLoKFIsuoBZVsef0KQCEJRXGduuS5A8QXvzeghAlBFRqNGw61ok7SOUrzYvGzmSK5a u0OEbJfv20mnPANN9gh7CSH6wZua0M14YibGDiKc84OxjsLyZt6nWPIUPvy8TAKNEeCE ptsF/wcQ7hDkwL8t+8SIzkIsLGzz3qTKiSLjYCMFZfMnO5rPTZLPzNdOrfNh1RiS3LTx Jk/PJzg+jsLB2zjdXcjtda4ApI3DtsIbrgSC/D4GU713sMmc7Zajw1JEbhs6hAa2IXRV T/3w== X-Forwarded-Encrypted: i=1; AJvYcCWCXMWErjW4fo/+L/AionYdGeT80FAcctE18D5IlSbAt1wFnXHs+giNhPmZT8QbeuESpR1FX99P9GkFC5OAS6rFKvw= X-Gm-Message-State: AOJu0Yx7hdJqeTfKR39WUTwFWnspnTOUhbWG7bPtX2TElRh1eIZgoq7Q 6z1i5BLLfJHILNL7TdojWjrbel9t+NFGPX9nomIh6p367rMDzCKW/l5JKl0Br0ygMUH+79wk/xq HnUX7sDJm513YM1lVlbymcMEfJ3A= X-Google-Smtp-Source: AGHT+IEPnGsvbr1tJTG+gyyqiI3Y8G+EwvxKkP5fkOEAuVU78C4KIW2X18viQCnsXyibS/aviqAE7uV5AWjleP3wzIE= X-Received: by 2002:a5d:544f:0:b0:33b:69ef:dfb with SMTP id w15-20020a5d544f000000b0033b69ef0dfbmr2478786wrv.14.1708030267421; Thu, 15 Feb 2024 12:51:07 -0800 (PST) MIME-Version: 1.0 References: <20240209040608.98927-1-alexei.starovoitov@gmail.com> <20240209040608.98927-5-alexei.starovoitov@gmail.com> In-Reply-To: From: Alexei Starovoitov Date: Thu, 15 Feb 2024 12:50:55 -0800 Message-ID: Subject: Re: [PATCH v2 bpf-next 04/20] mm: Expose vmap_pages_range() to the rest of the kernel. To: Christoph Hellwig Cc: Linus Torvalds , bpf , Daniel Borkmann , Andrii Nakryiko , Kumar Kartikeya Dwivedi , Eddy Z , Tejun Heo , Barret Rhoden , Johannes Weiner , Lorenzo Stoakes , Andrew Morton , Uladzislau Rezki , linux-mm , Kernel Team Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 33280120002 X-Stat-Signature: mu3fp8ui58wc4186df63fbw4bbk3if6s X-Rspam-User: X-HE-Tag: 1708030268-912634 X-HE-Meta: U2FsdGVkX1/Gu4IFyNjYX3iR6m2Aj0DQ2jBUzQdQiP6FYQOf8+D+IxQkC80oMOcg1p+Br3JfGGompWqhjJ6wxmuWg0M7mBPz5rjibNz8uGQKZYvhsNw7lm9/xaSpN3kdZOKIaPx1yMmH0JpFyDO9E9Fy8rEhEE3zCZLKam2DdpvfiU//jeBtXdQE1K19aEeCz+841V+mYrPnq4sRV781C0huTWd/UrQmNUIIpSQJh+4uWAaSbBrnEUqsnfMU9790RtNcNqjV/8Uep9hx+13wJ9hUbDSarnKGCEPGkrqofpENP/boh2HHaXfVybLjOnwRL08GqFvKvh/mk8Z7X3PO0NFguuRiE4l04jO013hkcchDaszvBQj3ZIAeeS4wsMXO8LVsI22Qj/KnDcQBciHkix9HqrC7tgS0qQ12Qq/pGnKZIaH3vaEwwU6eY9gSx17HbMKDoaELQFdJUXcyZVqH2YihVfoNrHzWt0gtV5w5eOd8vBA0vPQuTvyqTtXLjHUiIXchmz1mYE9ucZJYadyImrVVeZzwMjUlBuvBJW/lJxbAFFQtJuBiltR0aDenNXy+CTKP7+0qsCw/NBECRLKDldxnAGB6vN6BFit4LfZGXTc+Finvp265wi0fG1JP4mEDq0arekE4B/uoJaK77WAKqOax1d78mMSocvNGM3AVnk9KYwXZcJxKSpYMgKUkU917iLtmhFpkOo9qTAzudztXXXcYljFliEvsTdfgIgnS8z5vDi/nRO4Dpt1QGRblIzqMJF1LwhgOSo+oV1SutDbvJOqAwkZ3Zh841yApq1GvLpTgYSaugdtLuGpT6YtBsZLPFe0FLcgxD8ySmapZLgzm09GPr28dcOyWBTEEeUvG/U0SYmcRHWHm9JXrbnFHCTWrOvtIMjkZ5m1JT3akjlh9pwP6DN+RS2V1kvYeJCUF49dXNCcZltQ8uOsqRynQuoT0vufnuT0WmLB6VrlrShT DcjrSb5x WjcryEPJ9/akj9+Y1zGldLVH5ggRg2CDARe0kdVNTg2JKRqvwK2TwitS2aU+WBZfhPGqAg1DWIhSZocgh24t9JOFtuQzQgQ/bHcUYhTimIDJ63b7g+Q8WK0ceqcmcpi50pnHo+9vsVWP0iAwBXbQDU69knck0fLMTWxy+8jkQdKHRZdFpM2WQ0iY0dE3qD9rSDiHPVs3Q1VY1qx0BvFXNcy6/nGCLxlVqryf/EBkSZgVVNZJMVFYtuxcoey6G3PoGVhIbeEdKd3Km9Z8bSMsm4wdwhFKTfG5Wg9E+B2NQ4GYslP192PsI9csM8uecA84430wQaJdWqDTemN0rwmWRhdjGTZA/xKPodQgxaCxuPjx+JTolT0oRIaxJeQ4R0mBxwX4JkL/r9IDcqv2M2JLexLArvHw8Ro+pJUbjh5k/pfhC29XKsNm5EUzalI1pwADS5Z3CBpVf8LI//j1V9Mv/yKiudd5tVz9ou6M7dHU3aqxQ9w61GHx/jVC7BUIAK0/XVYrk9qUBJ+zeTeQVXMyY3DO3+eshVDK89KzC X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Feb 14, 2024 at 10:58=E2=80=AFPM Christoph Hellwig wrote: > > On Wed, Feb 14, 2024 at 12:53:42PM -0800, Alexei Starovoitov wrote: > > On Wed, Feb 14, 2024 at 12:36=E2=80=AFAM Christoph Hellwig wrote: > > > > > > NAK. Please > > > > What is the alternative? > > Remember, maintainers cannot tell developers "go away". > > They must suggest a different path. > > That criteria is something you've made up. I didn't invent it. I internalized it based on the feedback received. > Telling that something > is not ok is the most important job of not just maintainers but all > developers. I'm not saying that maintainers should not say "no", I'm saying that maintainers should say "no", understand the problem being solved, and suggest an alternative. > Maybe start with a description of the problem you're > solving and why you think it matters and needs different APIs. bpf_arena doesn't need a different api. These 5 api-s below are enough. I'm saying that vmap_pages_range() is equivalent to apply_to_page_range() for all practical purposes. So, since apply_to_page_range() is available to the kernel (xen, gpu, kasan, etc) then I see no reason why vmap_pages_range() shouldn't be available as well, since: struct vmap_ctx { struct page **pages; int idx; }; static int __for_each_pte(pte_t *ptep, unsigned long addr, void *data) { struct vmap_ctx *ctx =3D data; struct page *page =3D ctx->pages[ctx->idx++]; /* TODO: sanity checks here */ set_pte_at(&init_mm, addr, ptep, mk_pte(page, PAGE_KERNEL)); return 0; } static int vmap_pages_range_hack(unsigned long addr, unsigned long end, struct page **pages) { struct vmap_ctx ctx =3D { .pages =3D pages }; return apply_to_page_range(&init_mm, addr, end - addr, __for_each_pte, &ctx); } Anything I miss? > > . get_vm_area - external > > . free_vm_area - EXPORT_SYMBOL_GPL > > . vunmap_range - external > > . vmalloc_to_page - EXPORT_SYMBOL > > . apply_to_page_range - EXPORT_SYMBOL_GPL > > > > and the last one is pretty much equivalent to vmap_pages_range, > > hence I'm surprised by push back to make vmap_pages_range available to = bpf. > > And the last we've been trying to get rid of by ages because we don't > want random modules to Get rid of EXPORT_SYMBOL from it? Fine by me. Or you're saying that you have a plan to replace apply_to_page_range() with something else ? With what ? > > > > For example, there is the public ioremap_page_range(), which is use= d > > > > to map device memory into addressable kernel space. > > > > > > It's not really public. It's a helper for the ioremap implementation > > > which really should not be arch specific to start with and are in > > > the process of beeing consolidatd into common code. > > > > Any link to such consolidation of ioremap ? I couldn't find one. > > Second hit on google: > > https://lore.kernel.org/lkml/20230609075528.9390-1-bhe@redhat.com/T/ Thanks. It sounded like you were referring to some future work. The series that landed was a good cleanup. No questions about it. > > I surely don't want bpf_arena to cause headaches to mm folks. > > > > Anyway, ioremap_page_range() was just an example. > > I could have used vmap() as an equivalent example. > > vmap is EXPORT_SYMBOL, btw. > > vmap is a good well defined API. vmap_pages_range is not. since vmap() is nothing but get_vm_area() + vmap_pages_range() and few checks... I'm missing the point. Pls elaborate. > > What bpf_arena needs is pretty much vmap(), but instead of > > allocating all pages in advance, allocate them and insert on demand. > > So propose an API that does that instead of exposing random low-level > details. The generic_ioremap_prot() and vmap() APIs make sense for the cases when phys memory exists with known size. It needs to vmap-ed and not touched after. bpf_arena use case is similar to kasan which reserves a giant virtual memory region, and then does apply_to_page_range() to populate certain pte-s with pages in that reg= ion, and later apply_to_existing_page_range() to free pages in kasan's region. bpf_arena is very similar, except it currently calls get_vm_area() to get a 4Gb+guard_pages region, and then vmap_pages_range() to populate a page in it, and vunmap_range() to remove a page. These existing api-s work, so not sure what you're requesting. I can guess many different things, but pls clarify to reduce this back and forth. Are you worried about range checking? That vmap_pages_range() can accidently hit an unintended range? btw the cover letter and patch 5 explain the higher level motivation from bpf pov in detail. There was a bunch of feedback on that patch, which was addressed, and the latest version is here: https://git.kernel.org/pub/scm/linux/kernel/git/ast/bpf.git/commit/?h=3Dare= na&id=3Da752b4122071adb5307d7ab3ae6736a9a0e45317