From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 10FDAC54E41 for ; Wed, 6 Mar 2024 21:46:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A30756B0080; Wed, 6 Mar 2024 16:46:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9E0316B008C; Wed, 6 Mar 2024 16:46:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8F6A16B0092; Wed, 6 Mar 2024 16:46:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 803B66B008C for ; Wed, 6 Mar 2024 16:46:45 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 3134F40ED0 for ; Wed, 6 Mar 2024 21:46:45 +0000 (UTC) X-FDA: 81867949170.16.0B3B940 Received: from mail-qt1-f171.google.com (mail-qt1-f171.google.com [209.85.160.171]) by imf27.hostedemail.com (Postfix) with ESMTP id 897B940018 for ; Wed, 6 Mar 2024 21:46:43 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b="fs5QMCV/"; spf=pass (imf27.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.171 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=pass (policy=none) header.from=soleen.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709761603; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=sqaAoKPOgm/rSWCI7eBbS/kktgGPF2ytcO0T40EpJ3Q=; b=JH6AXY/cWIYUnaisAmriGOArJbQy8aaJI41fk74GL9O0/FRKTJHp9ZBVdHKOfV0CJX2VJB cWJPQM8bBn5Fs65Kh6UNvXWg0XPFSo+zQXhNrxb1ppBqoc3h5fJrJHMygwH8R53GB9M6mW HHy02Pj0tnKNX6du9J5J6A+wgBwrlGc= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b="fs5QMCV/"; spf=pass (imf27.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.171 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=pass (policy=none) header.from=soleen.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709761603; a=rsa-sha256; cv=none; b=8XBifAVszrfQHVYP8Jyos64MNW4vdK0hhGnBW9XEYQ/H9K6wEcdVSlzAMSfOMlYifGIg9X EwrzSABu85nLzzymvbwwF0vjrznZ7W5N3/25SegOfUyAsHenB1+3CxDlnrVcl1YB/DJ0Zl af4kDBnxWVRdKQKBs47kieLhvuddMgE= Received: by mail-qt1-f171.google.com with SMTP id d75a77b69052e-42a029c8e62so1224661cf.1 for ; Wed, 06 Mar 2024 13:46:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen-com.20230601.gappssmtp.com; s=20230601; t=1709761602; x=1710366402; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=sqaAoKPOgm/rSWCI7eBbS/kktgGPF2ytcO0T40EpJ3Q=; b=fs5QMCV/Z4z+jxW+H6jjfp8yq0Zg6iOpaqwtsrjwu/FMhKbBkeujy1DSmMoUhh341o PuVMANOPpl/xTR6RyYk4nber++L5iiFEBER5NZwlOBeHKJJBdoUGTmeR3mxM+/7L6Fnf YJgfP6ds811DYTRqWXFRBte1raO0gc3CGCl5PJVAKjZ0W/ibsCUOoOIKtBxo0P3HGUBL rPXwsSqrbgsMKuaxnjoLE421OmvYjsu+e84M+FAfn1ekZ37XJwRrU3V/sTAVyupaIyOJ 5SKkppET96wOBS5k4L+VV/hMh1JeHjye2UdKOfyybcO4/BSXEwiE3z4jYQ2iBRkUaOkS kkrQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709761602; x=1710366402; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=sqaAoKPOgm/rSWCI7eBbS/kktgGPF2ytcO0T40EpJ3Q=; b=xIyM5Jxa2G6s9zbGC0OGV6sOi7SBKwrTs1bbl6odSeHCcd4tlJNB5GZQ/x/MuwlC8Y EWl+2TZyyudbN+QKPym9cwhsvInIufSlT3v5DwW/OA5HgVX+UX5QplCakCnEOydM4II9 4Ra0qlsDO6b+MdslZzWRDnVSDj+MmkhEOpumVRC+OpcaQvoXfaOvfTNfD071k55snzXC OWmc19jtBPyczrwYIfqAB9sVjDeksDznOoyiqxKly7cWqHVyfxro6oI8UXephYwja3r2 QCPWqQ5Gw+J4YSOEWSfpQdzPRvmAsANdc987ocYZkimKTVjai8NbM1VVrrWFXVp4BKgZ 2tAA== X-Forwarded-Encrypted: i=1; AJvYcCUmoalKhA1lHsj8nsdScuSj/fs4FITOBezbOXaD98O6rDMbEjH6F25YioeNjOxizZXEF+9BynU96kwSzWyyau0IfwE= X-Gm-Message-State: AOJu0YwYN9J6YIEZtAoXLgGxBBd2TWXIGc9dIrzZjQq8TfKe/CFv3lvi BV9dVYJ5Qny2aCemoqKeowecL38qDbbI54VzsTYb4oXh76MvQcbmZpR9sjRrHgFtcU5On56TiO5 cKx1pWMjqnShTJb3Yf2ShI33zPA7QfzfhRjqekg== X-Google-Smtp-Source: AGHT+IFM2e6KeyuiMDLqxycUojt02VdH+rm+iHu3K+tnBqw9pzAEUVRf5SsdUcF0xgYDCB5n+exrwO1T+sS2NaqjfMM= X-Received: by 2002:ac8:7e86:0:b0:42e:f7cf:ff98 with SMTP id w6-20020ac87e86000000b0042ef7cfff98mr7140911qtj.16.1709761602628; Wed, 06 Mar 2024 13:46:42 -0800 (PST) MIME-Version: 1.0 References: <20240305030516.41519-1-alexei.starovoitov@gmail.com> <20240305030516.41519-3-alexei.starovoitov@gmail.com> In-Reply-To: From: Pasha Tatashin Date: Wed, 6 Mar 2024 16:46:05 -0500 Message-ID: Subject: Re: [PATCH v4 bpf-next 2/2] mm: Introduce VM_SPARSE kind and vm_area_[un]map_pages(). To: Alexei Starovoitov Cc: bpf , Daniel Borkmann , Andrii Nakryiko , Linus Torvalds , Barret Rhoden , Johannes Weiner , Lorenzo Stoakes , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Mike Rapoport , Boris Ostrovsky , sstabellini@kernel.org, Juergen Gross , linux-mm , xen-devel@lists.xenproject.org, Kernel Team Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 897B940018 X-Rspam-User: X-Stat-Signature: umb73peakib47nx7jhaxprfywccedtjk X-Rspamd-Server: rspam01 X-HE-Tag: 1709761603-280738 X-HE-Meta: U2FsdGVkX1+MJBvNajs6Krkf+2QV5kzbBXgd++7tglGtZ+SmIke26T3UF5ToHpDeM2RaneiV92Jl5UH/fDtNBWwQPuzS+wT6Uqv2E1QtXVP7jSsjQ2cfhAjthACSG30VUFA5onJeTiKcNUH7d0TqfDSPtXzPR4fsQBF/dxNLU6yDcce00jKjwvv77viV+OWRvj7FfnFrrLpEkVOkTS6DxUSDDLBzI9olYO6PjYMtZ8XQixHMZspXvsLFzY3QZcmTyK2MUg94f0bX+zwSEchJa1warcOtMP8wxKhhWHHR5+grklgw3pLenkaGPBUESK80hXIh6otU+QO+tCphTb3csB81mYTsmzSBBOOt83h4wxHJMP/joS6IRKaATrODsPEUJLa2KPdqMfIdDtNiqbKqGqZamOrTi/cxihdZpGuva6NrM5+s6E6n9gGL6ulkwa6xzNpfbxCW9aeQ8OpIkd7Zvsz9fv3an0bvT1iLGKNAm+ZW99QtsB+wtBXbkq9nrWJY+Feea/h3NDPZJnJV+QdefDf9l6Dtf16ToM44okvHB04Zy6g3L0oydRRM9odWSluIQjPolQkg1fXwo+x2rlIz52c4U6UQgABCxHa1aZ0NAkPnvtUOsrmYBzV+RjqgOgjc31/T2zdThoHz9LH+d1elsmCjOSfQ+CLQ85AV+2gjNwWoZFwU4zJbHRivSZT3GXA6+RKvHL6csuxxFCjdeLxGGy8pT47lp+BGuKgcFhj28rTAwDQClKwONcHyIOZyAy3rNnWFUgaSbGTZarzWSY4wr4t7m+0kBMedTkVXAUFFmrVQDkRX0WcosT0JZ99s7cz0+EtTnPCxmr/SWKHRSuitaKtxOTqobMWsxEysMRvr8gZ45cv/3QlTiFYV3ATd0Lwq5QXYYxcntlbDwvDJ6OW8WIceaBRaX7Gy+5bD5afqBZyzUCkNpjT8fvgx5l875jr0HNx6Dzljz0BTXyeiXS+ Cn2DGgNO OL72hsr5B5k0rFGoTLm+p2vguPMucNVFTPDVJb2cQQs4Xuvs0+QIrLgUVZ4rIsmExEAjTaGGD8BC++oZ0ZY0la25uE/kh2v+GMneSCbH22f8mgYSehO6ifdiun1a5gGzdRu9jRagG24CdBhMszMS9eC+h4oxYsFMzOGt3g5WAbDzG4R0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: > > This interface and in general VM_SPARSE would be useful for > > dynamically grown kernel stacks [1]. However, the might_sleep() here > > would be a problem. We would need to be able to handle > > vm_area_map_pages() from interrupt disabled context therefore no > > sleeping. The caller would need to guarantee that the page tables are > > pre-allocated before the mapping. > > Sounds like we'd need to differentiate two kinds of sparse regions. > One that is really sparse where page tables are not populated (bpf use case) > and another where only the pte level might be empty. > Only the latter one will be usable for such auto-grow stacks. > > Months back I played with this idea: > https://git.kernel.org/pub/scm/linux/kernel/git/ast/bpf.git/commit/?&id=ce63949a879f2f26c1c1834303e6dfbfb79d1fbd > that > "Make vmap_pages_range() allocate page tables down to the last (PTE) level." > Essentially pass NULL instead of 'pages' into vmap_pages_range() > and it will populate all levels except the last. Yes, this is what is needed, however, it can be a little simpler with kernel stacks: given that the first page in the vm_area is mapped when stack is first allocated, and that the VA range is aligned to 16K, we actually are guaranteed to have all page table levels down to pte pre-allocated during that initial mapping. Therefore, we do not need to worry about allocating them later during PFs. > Then the page fault handler can service a fault in auto-growing stack > area if it has a page stashed in some per-cpu free list. > I suspect this is something you might need for > "16k stack that is populated on fault", > plus a free list of 3 pages per-cpu, > and set_pte_at() in pf handler. Yes, what you described is exactly what I am working on: using 3-pages per-cpu to handle kstack page faults. The only thing that is missing is that I would like to have the ability to call a non-sleeping version of vm_area_map_pages(). Pasha