From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EA870C5478C for ; Sat, 24 Feb 2024 00:00:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 834336B006E; Fri, 23 Feb 2024 19:00:26 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7E47F6B0087; Fri, 23 Feb 2024 19:00:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6D3CD6B0088; Fri, 23 Feb 2024 19:00:26 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 5B8EE6B006E for ; Fri, 23 Feb 2024 19:00:26 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 2EB521C05F1 for ; Sat, 24 Feb 2024 00:00:26 +0000 (UTC) X-FDA: 81824740452.02.ECB471F Received: from mail-wr1-f49.google.com (mail-wr1-f49.google.com [209.85.221.49]) by imf19.hostedemail.com (Postfix) with ESMTP id 611DD1A001F for ; Sat, 24 Feb 2024 00:00:24 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=EiD2sRYi; spf=pass (imf19.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.221.49 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1708732824; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=EiH/ihU5kJ28qnwGKZlb7XOE+rqx3GPdjJ0oq03/PCg=; b=zItk8wgImFSzFOdJKZRF2tQQLXskF6iWIYQ5hN/8JGLZhdgubp60HKRzOex+QvDfT2Cagk /yzcvjItLKWseIlm9+BI/tSbv3GDEizJtkTIKPf9qem4Hv72ttHOmZXGfVLVS59D3udxMZ UwoiIOsyztOBe3274R8+i33+wM9W8dk= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=EiD2sRYi; spf=pass (imf19.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.221.49 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708732824; a=rsa-sha256; cv=none; b=nXk+aYz/sR9LnXEUJfs+1T2Ifsj02utwuXixWfao63s4zODpk/NVCQf2Yq7nFebMBxRrpw xBgzqCL0aeh3hjbcxORazGY02IcV5vjIC/p86BwDbVjsSItP5idqGZdkZ6v90k6zou7NnX yP9FZmbn8EONFQoHWryc3kvAEAVc2Gs= Received: by mail-wr1-f49.google.com with SMTP id ffacd0b85a97d-33d32f74833so730272f8f.3 for ; Fri, 23 Feb 2024 16:00:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708732823; x=1709337623; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=EiH/ihU5kJ28qnwGKZlb7XOE+rqx3GPdjJ0oq03/PCg=; b=EiD2sRYiHXIU0Hr71o2qE0PnYUnf6xi7hnhDlRZn1cxXuvtTwNrJYgbSo6x2uxxS/7 9ehmW9T/QdKWNFoZ029tGDR8iXYPrH7ycUeFoz3lAqXNZSFOtqvs8KScLfVm/dE+vCw8 9dR5ii6ghxG0x0DNeeQt03IMEupCnhA1To8yyo8DGgRNBNq6aLMVl9YacSQt6c1SvrKp KWU8VrJBDiJ8uVDHByqiKbf9l5UTJP20UVzkhN83DSqc5wCys1uJLmzkaQgAk9TsxIZM ep+A/Xq8TYNlVAjvubXplQpVtpTasuuZ5DJLubm9vc2R03kRZChN+UbdTjIxGph+PG5a XGNQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708732823; x=1709337623; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=EiH/ihU5kJ28qnwGKZlb7XOE+rqx3GPdjJ0oq03/PCg=; b=SjzWEfyvFoq3ssrF4V7AW1J3HUDYPRmYlbVBY1i/JLFOat6NXlhIZ/aSQJslVABNT4 lARmfpQT/G4OEG1EJ0uMllQQlYx7Iomkr5kXm7hpiGeh8+gqJfj0RporgW6/QyDpfXaZ Ztl5jpwpVw6hJ9jC7Hp3trxZMVuGZlSlNX0KMd6sMWnPE2XcRUGtnx7sPLGKUWTcOjbS QHyJtcOCrIqs8DdtAfpThx4L3rl5ZvgKl2Inq8r4+6C+bToLkgEayPKxNapbBxKn1Yrs +7izofuRbsGF8M8NUIy1fWnHwYUFKNNqwqRc7+PzKfXlSTNmZEKgcWAyA0znaebzpeLm 3R7g== X-Forwarded-Encrypted: i=1; AJvYcCUV8pPugjDSzkaZL9SziL8HvP/cG1bKFDJIk7kuqjqKKYt70dz8CCd2jSUICXBaTVmBxJNlxtQQJOCzOoemU2yqV9E= X-Gm-Message-State: AOJu0Yz8q4vXftFofQWDJLB7Ccy61awpaRu4k5tg6tYXbsZaEnKqqc3Q oOd/PFMjVkhQlyS39U/tbiSEtF1uVUwOE3pqVxkBgSktyXr7lXd5B6STkPgKgYwpsOWCFlJnCWs SUJGpgI4Owyu0Ql6eSEv/XPRISUM= X-Google-Smtp-Source: AGHT+IH699flScwkheJ3wEzd2dyMtC9M3LPD8Oi75txvNVSFcIXgutW1JFu5ook/d/KJrmFht6YchTZZZUySbo2elKo= X-Received: by 2002:adf:f783:0:b0:33d:23a6:56ba with SMTP id q3-20020adff783000000b0033d23a656bamr678786wrp.42.1708732822639; Fri, 23 Feb 2024 16:00:22 -0800 (PST) MIME-Version: 1.0 References: <20240220192613.8840-1-alexei.starovoitov@gmail.com> In-Reply-To: From: Alexei Starovoitov Date: Fri, 23 Feb 2024 16:00:11 -0800 Message-ID: Subject: Re: [PATCH bpf-next] mm: Introduce vm_area_[un]map_pages(). To: Christoph Hellwig Cc: bpf , Daniel Borkmann , Andrii Nakryiko , Linus Torvalds , Barret Rhoden , Johannes Weiner , Lorenzo Stoakes , Andrew Morton , Uladzislau Rezki , linux-mm , Kernel Team Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 611DD1A001F X-Rspam-User: X-Stat-Signature: 4cmr1dn9pxq3bakr8kqnfa97nym3b6bf X-Rspamd-Server: rspam01 X-HE-Tag: 1708732824-190792 X-HE-Meta: U2FsdGVkX1/vnlsIWTJMVasTlbcNEqbTq3rtMxbvvWCgTummxjrjeiqV2YuVViStw6THncmSRLU8zX/BvHKpFJupG5Bd3d4cWjQlW8gV4rPKxrubjuFwHTyYJ2o6EitxFSHoodZOoQ47EhrnbHP7UXugZy4RZULDWLZ0T1ldnv5NDZyuhER6TBp5dX1CzAYzK2rb9hSKt5L5HwE+U5mHmySgTaNZ9nu5mMwep9KX+SRXQ/D4suVNyAWgyWJ+fJqFB9sStJDv37IvFC1iUw3Gw2I+YAacI91kIB97mJTlqbGM5O3ruvYKPfasOz7ldd/zgP9HHBISOphQy9xTkYlweYS+kzWBAQ09aTiqvoW/r2eXQzACAINqqQ3ep0a3uJKa818LcIwAY1P2Z9UVe2uPPI8SDoNe3Tx4VigTHSWHYTwk9XzckMjUb/LLKXqDvBLYbPr8IvS8A4qQcW0Ooxbz1BJqdKXhEWc4IrBTpvwPX9lAFByyXoJsAMSWImLYbA2z5Wnl+6JCvmA+NyCipubpD38jt9fz0xrkkQeWi+zs3w8UHzhI02ppwse6FuSyxmPtua6mePUrIZ/7Oc+JqRSN1sGAURfjzRNtIZYMYFdE5S3ZZBDLHiYqJ27MTKpQr92GNzch4UpIYQrXYlKl7hTtRP3U2WobgWulw2uCAtFswwlUNCJ/llgkOJh5cF8n6xrnpL/DIjX1rr0AU59F5gehapbXwO0iaOPaPSY4CtW66xcLhpU02NFifg8BPPvyCJQxLHksG3EQi4xZwMR+XkUogX0wMCy3PnCYEI8JU5Y51Z1GNw1+9QOcrZZQw8o8rZV+t69y8s2gJRhK21Rfls1ac9DVMFpdlvtKaXs5Z8oR90LRXSegaHcY2RR19OZFpSsgzmlXWs8H6Wpwvlnj7KywdxJZPfQbTgi+mi5VXmcwwQU+Pk35ao4oQRGHAXGoKGVyKEnYGkGsyXeGfCAOR3v X/O5ImQA yUk76IZwkf4d9K+WZf1HOJx2Y+mhoEabCj1LY32j9fiakpqXs3QXgMIbjy6PeZZOpCrAyH2eVM39RsTzUb7EHKQr7wL4cptpe1nB1yaOtKQn07QyVrw6eilLJbQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Feb 22, 2024 at 3:25=E2=80=AFPM Alexei Starovoitov wrote: > > > > I can give it a shot. > > > > The ugly part is bpf_map_get_memcg() would need to be passed in somehow= . > > > > Another bpf specific bit is the guard pages before and after 4G range > > and such vm_area_alloc_pages() would need to skip them. > > I've looked at this approach more. > The somewhat generic-ish api for mm/vmalloc.c may look like: > struct vm_sparse_struct *area; > > area =3D get_sparse_vm_area(vm_area_size, guard_size, > pgoff_offset, max_pages, memcg, ...); > > vm_area_size is what get_vm_area() will reserve out of the kernel > vmalloc region. For bpf_arena case it will be 4gb+64k. > guard_size is the size of the guard area. 64k for bpf_arena. > pgoff_offset is the offset where pages would need to start allocating > after the guard area. > For any normal vma the pgoff=3D=3D0 is the first page after vma->vm_start= . > bpf_arena is bpf/user shared sparse region and it needs to keep lower 32-= bit > from the address that user space received from mmap(). > So that the first allocated page with pgoff=3D0 will be the first > page for _user_ vma->vm_start. > Hence for kernel vmalloc range the page allocator needs that > pgoff_offset. > max_pages is easy. It's the max number of pages that > this sparse_vm_area is allowed to allocate. > It's also driven by user space. When user does > mmap(NULL, bpf_arena_size, ..., bpf_arena_map_fd) > it gets an address and that address determines pgoff_offset > and arena_size determines the max_pages. > That arena_size can be 1 page or 1000 pages. Always less than 4Gb. > But vm_area_size will be 4gb+64k regardless. > > vm_area_alloc_pages(struct vm_sparse_struct *area, ulong addr, > int page_cnt, int numa_id); > is semantically similar to user's mmap(). > If addr =3D=3D 0 the kernel will find a free range after pgoff_offset > and will allocate page_cnt pages from there and vmap to > kernel's vm_sparse_struct area. > If addr is specified it would have to be >=3D pgoff_offset > and page_cnt <=3D max_pages. > All pages are accounted into memcg specified at vm_sparse_struct > creation time. > And it will use maple tree to track all these range allocation > within vm_sparse_struct. > > So far it looks like the bigger half of kernel/bpf/arena.c > will migrate to mm/vmalloc.c and will be very bpf specific. > > So I don't particularly like this direction. Feels like a burden > for mm and bpf folks. > > btw LWN just posted a nice article describing the motivation > https://lwn.net/Articles/961941/ > > So far doing: > > +#define VM_BPF 0x00000800 /* bpf_arena pages */ > or VM_SPARSE ? > > and enforcing that flag where appropriate in mm/vmalloc.c > is the easiest for everyone. > We probably should add > #define VM_XEN 0x00001000 > and use it in xen use cases to differentiate > vmalloc vs vmap vs ioremap vs bpf vs xen users. Here is what I had in mind: https://lore.kernel.org/bpf/20240223235728.13981-1-alexei.starovoitov@gmail= .com/