From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 32CD9C5475B for ; Wed, 6 Mar 2024 21:04:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 55D5E6B0071; Wed, 6 Mar 2024 16:04:35 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 50D3A6B0072; Wed, 6 Mar 2024 16:04:35 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3D4706B0074; Wed, 6 Mar 2024 16:04:35 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 2EB286B0071 for ; Wed, 6 Mar 2024 16:04:35 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id CA4F2140F9D for ; Wed, 6 Mar 2024 21:04:34 +0000 (UTC) X-FDA: 81867842868.13.C936CBB Received: from mail-oo1-f47.google.com (mail-oo1-f47.google.com [209.85.161.47]) by imf30.hostedemail.com (Postfix) with ESMTP id 04D5880028 for ; Wed, 6 Mar 2024 21:04:31 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b="b/mR1/vi"; spf=pass (imf30.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.161.47 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=pass (policy=none) header.from=soleen.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709759072; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=eqttQc0z5BWF+Q6AXLXQysSwEBzzLxVsfG6yMYfqEBc=; b=8UiaroFeU/7cIR7myYfL1drQDMrMws5tKjTOkMVkzN7OCpDtxi88wFCerTlIXeGd6aWEm0 fzDAMT0k4xm+8LhZKYD0twnDVj8UYNQ0iKaBUFltZRRIPRVs96SHAcxZCSGpEMtvRak7RD jHZGyVoJ5/pQM9CNB3wAZKOx3xseqHg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709759072; a=rsa-sha256; cv=none; b=m44GrRK7nBckMJG+BfETfYHFPE37x/sa9kpiqxgWDrtsdvfDd4naKoEVS1cnyKBZLoLFFm 0PvvrWeNO/1B0wVZXJ9ccwdhE/BhlhSg4vZjveYm/xJepIZBXAvFrkU9AP7N8hqSW38+If ZCrO9F+VLXqkGG9aAjFLFma1sQ9Azhg= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b="b/mR1/vi"; spf=pass (imf30.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.161.47 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=pass (policy=none) header.from=soleen.com Received: by mail-oo1-f47.google.com with SMTP id 006d021491bc7-5a127c1feb0so63636eaf.2 for ; Wed, 06 Mar 2024 13:04:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen-com.20230601.gappssmtp.com; s=20230601; t=1709759071; x=1710363871; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=eqttQc0z5BWF+Q6AXLXQysSwEBzzLxVsfG6yMYfqEBc=; b=b/mR1/vifN3N1eFDnKRlpGcU4ZbhHEeZ3Qk1kunkglql9uKJX3zBPkoPCV0R2uwm9M DAzwuxakq0kb+zdk9TCDJP5eWG8AiGuI2J1LPZ0DC19rlsLMBxBlXPox0tYi4hKR2uz8 nvUSLY0GXDyRmEQNPCQoYzGoWeP7G1/6ku85lJA03M1hNIWsg/aEuG83R/8ymlcvrEvm g0SRU1GQCYlyQ4c2xVYsXjE7cjB+7mYxsDSpF88hJWsFiBwRVl1xdeN7Ebu6DcL6A8dR mx+JR2Nrk6vjgQNgiWdGqX3d8Ynikmp37U3SnCyO4gSidgkzRib9vInoXVS6e2JONzYL lU+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709759071; x=1710363871; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=eqttQc0z5BWF+Q6AXLXQysSwEBzzLxVsfG6yMYfqEBc=; b=mzvIg7rlN55SS4+LWqUrRQYc5ckEGIhiOH4SiAehKpdBOrbom3glFY6E3/6MHDElcq VXI5qHNq5M+x/yAnwjfzeHc0fiAHAUTeVbnduWsN/RXNcu/RyvfVpstAqQ+2PwU2DdiD r1wwEJrjX26p3JNzogpiWSoGBb/3d/DMroQeP+X5ulb5hRivM5BAFx+5yoGakgPkOEJ6 WEy6SrSw4MZGs7vQKz+Fg0BfWVXzZOHw3TqeedvCAkhTCNFoSi92D7F8dNkFqhJo3ncR NXDJQRNX/YvadX7rscKlLDKHmPJ/7BIbjCi4TLoTW1R92txzpmSeEPhqo132xVfZ82gH Z9Cg== X-Forwarded-Encrypted: i=1; AJvYcCVIZzs+0CWhDoy67wVHkQCqQfoOs8w9DAeX8XKF4l1xEznefIJtLPiZQ6y3v1hBg/RmDrFLNcgl7DjpVXDiAYA1HHQ= X-Gm-Message-State: AOJu0Yz55RsyBlnxrrYiRstvcV3+ShrmYYPWLqMpGaOvg2xfQE3LMDI+ lOYmcwyCv/wwI9RDiLcdgEpPmiVjXKJOwH6TASmvMGPZTvubgNfvNTJ509V8QV6MSN2tY6KcfSw 1dQifz+RlH+nyNDUUznlejx6pszfWMNuuyshSWw== X-Google-Smtp-Source: AGHT+IH0C9ZIzGNFSz2j11GV3Dru4fT7GOz2PcdFewNaPrEAF+/MVK+QyAyKdBXc1oQAITbi8412yO35RUBzydE/0Wo= X-Received: by 2002:a05:6358:10a5:b0:17b:bd6e:22c9 with SMTP id j37-20020a05635810a500b0017bbd6e22c9mr8106933rwi.8.1709759071077; Wed, 06 Mar 2024 13:04:31 -0800 (PST) MIME-Version: 1.0 References: <20240305030516.41519-1-alexei.starovoitov@gmail.com> <20240305030516.41519-3-alexei.starovoitov@gmail.com> In-Reply-To: <20240305030516.41519-3-alexei.starovoitov@gmail.com> From: Pasha Tatashin Date: Wed, 6 Mar 2024 16:03:54 -0500 Message-ID: Subject: Re: [PATCH v4 bpf-next 2/2] mm: Introduce VM_SPARSE kind and vm_area_[un]map_pages(). To: Alexei Starovoitov Cc: bpf@vger.kernel.org, daniel@iogearbox.net, andrii@kernel.org, torvalds@linux-foundation.org, brho@google.com, hannes@cmpxchg.org, lstoakes@gmail.com, akpm@linux-foundation.org, urezki@gmail.com, hch@infradead.org, rppt@kernel.org, boris.ostrovsky@oracle.com, sstabellini@kernel.org, jgross@suse.com, linux-mm@kvack.org, xen-devel@lists.xenproject.org, kernel-team@fb.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: qxyz9hjm4umog3coeqte7q1grk3wbjb8 X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 04D5880028 X-Rspam-User: X-HE-Tag: 1709759071-502943 X-HE-Meta: U2FsdGVkX1+/NJTl8cKt0h4+AeZvwSAAFosfP4pK4q3InS1BVSKyZubZWcXqCiyLY2mkUcymUuUNbrfb/xGCGxzEh5aR/+GLdXQGl+E5erKPgfLwDZitmvohZQbJYU34G2pQA21qu1hRWpBFBOo0xSmrj6Iqty1KV6+4h5v7qSjhpckroV5+59j+5i5Wndz5yjOwErN+u5IPIf+kzg2aCRZugIrkV4ZOhV5WVUU6lHPoX/xwvsQGMJFn/twUGUMEUahvDTkll59L1Vb3QCXT52Aq5Wvu1ZRLmgnJWf98p5LLiEaAn43FI9FA+tili6ATAz2cDRocz/NGBUXxVQGudNMZcPY2gXuAT3vWM20XhUnal64EUP9AQm6jTWSdDEymhAtau7WSPSDsxMwV5eVZt2DqSovprTlWP6fDo5nnhyw9F6IO74kKRM67QnunDX/rQW+5KOdThIWF84NbZwf4wlgcXx8mlgF7vT3PdunjsLZfaACTzrKAENsUrEuw0MK5MTdyWv8O6osC9fpgesFNiLJh10MJ6G8ACmmKcopj81DMGJhvXDTFAiGYVDk3FCtjoMNCGvuJY7Yaj72SYnnmyb3mUq0LJE7jGwwKU2I1z8Ysb9pP3lSFqF4C6/6gl2Eclj+MjAxn5Uhaze7yYisujB5VpjhiQfn2DmPLozi1q3an55ZEod00z9yt8ybYvikA5o2y3af1SjE4PUV8GANIHQURdjRfrqckngNXeger4CggJaZRhjUgAeVqIdGdLJkzeNKSuQDtbszeA43GcvK8jEeTC4NeII2/AvGa3nGGY5YQ8feqrHgDCzJXh598zwHRYW5C3BfHX6gBlltIrdvc2sYLkAf7rWqIpIKUBE9Dy+z50a39ipx55fr4Vs+qplq7/vxBApA/tDQaKm35DNbRPS6a3To9DrJ0XUV2fDqb5c59KnXGt0tDkwoLj4px8GI0HgvxR9KZcuqrm373dRy 5oE9P7JO z1QQeNhvS2JUEgy41RtbD4f2M7VfVGfSo7VXoYXbgUQ1lR76EYBcnoBoSSDIES64zgrJR2z9U7mOH2y1eE2uvn3Ge479BbHRfQ6TZwUzOvE/r5Z/t4nLoEKDSstyN2V8/3MsG2ETUsUl0IgeYKMRO+Y+aDBCb2oujqt8/jAu8VdLrq2U= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Mar 4, 2024 at 10:05=E2=80=AFPM Alexei Starovoitov wrote: > > From: Alexei Starovoitov > > vmap/vmalloc APIs are used to map a set of pages into contiguous kernel > virtual space. > > get_vm_area() with appropriate flag is used to request an area of kernel > address range. It's used for vmalloc, vmap, ioremap, xen use cases. > - vmalloc use case dominates the usage. Such vm areas have VM_ALLOC flag. > - the areas created by vmap() function should be tagged with VM_MAP. > - ioremap areas are tagged with VM_IOREMAP. > > BPF would like to extend the vmap API to implement a lazily-populated > sparse, yet contiguous kernel virtual space. Introduce VM_SPARSE flag > and vm_area_map_pages(area, start_addr, count, pages) API to map a set > of pages within a given area. > It has the same sanity checks as vmap() does. > It also checks that get_vm_area() was created with VM_SPARSE flag > which identifies such areas in /proc/vmallocinfo > and returns zero pages on read through /proc/kcore. > > The next commits will introduce bpf_arena which is a sparsely populated > shared memory region between bpf program and user space process. It will > map privately-managed pages into a sparse vm area with the following step= s: > > // request virtual memory region during bpf prog verification > area =3D get_vm_area(area_size, VM_SPARSE); > > // on demand > vm_area_map_pages(area, kaddr, kend, pages); > vm_area_unmap_pages(area, kaddr, kend); > > // after bpf program is detached and unloaded > free_vm_area(area); > > Signed-off-by: Alexei Starovoitov > --- > include/linux/vmalloc.h | 5 ++++ > mm/vmalloc.c | 59 +++++++++++++++++++++++++++++++++++++++-- > 2 files changed, 62 insertions(+), 2 deletions(-) > > diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h > index c720be70c8dd..0f72c85a377b 100644 > --- a/include/linux/vmalloc.h > +++ b/include/linux/vmalloc.h > @@ -35,6 +35,7 @@ struct iov_iter; /* in uio.h */ > #else > #define VM_DEFER_KMEMLEAK 0 > #endif > +#define VM_SPARSE 0x00001000 /* sparse vm_area. not al= l pages are present. */ > > /* bits [20..32] reserved for arch specific ioremap internals */ > > @@ -232,6 +233,10 @@ static inline bool is_vm_area_hugepages(const void *= addr) > } > > #ifdef CONFIG_MMU > +int vm_area_map_pages(struct vm_struct *area, unsigned long start, > + unsigned long end, struct page **pages); > +void vm_area_unmap_pages(struct vm_struct *area, unsigned long start, > + unsigned long end); > void vunmap_range(unsigned long addr, unsigned long end); > static inline void set_vm_flush_reset_perms(void *addr) > { > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index f42f98a127d5..e5b8c70950bc 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -648,6 +648,58 @@ static int vmap_pages_range(unsigned long addr, unsi= gned long end, > return err; > } > > +static int check_sparse_vm_area(struct vm_struct *area, unsigned long st= art, > + unsigned long end) > +{ > + might_sleep(); This interface and in general VM_SPARSE would be useful for dynamically grown kernel stacks [1]. However, the might_sleep() here would be a problem. We would need to be able to handle vm_area_map_pages() from interrupt disabled context therefore no sleeping. The caller would need to guarantee that the page tables are pre-allocated before the mapping. Pasha [1] https://lore.kernel.org/all/CA+CK2bBYt9RAVqASB2eLyRQxYT5aiL0fGhUu3TumQC= yJCNTWvw@mail.gmail.com