From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 45208C54E5D for ; Mon, 18 Mar 2024 15:14:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9A53B6B008A; Mon, 18 Mar 2024 11:14:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9549E8D0001; Mon, 18 Mar 2024 11:14:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7F7466B0092; Mon, 18 Mar 2024 11:14:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 6D1176B008A for ; Mon, 18 Mar 2024 11:14:10 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 41156405C5 for ; Mon, 18 Mar 2024 15:14:10 +0000 (UTC) X-FDA: 81910505460.12.9452E4F Received: from mail-oi1-f171.google.com (mail-oi1-f171.google.com [209.85.167.171]) by imf05.hostedemail.com (Postfix) with ESMTP id 7466410000B for ; Mon, 18 Mar 2024 15:14:08 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=A39sdJmV; dmarc=pass (policy=none) header.from=soleen.com; spf=pass (imf05.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.167.171 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710774848; a=rsa-sha256; cv=none; b=6Jp2OExiI2bXga5LDGe4fWtFmpWDqN2HrS2HGD2cgqWjNUKB/jBzAaZLC27F6Rw+qYYoux 62nZtHiE3bEQoAZ6zQQhsiY4Z58f1qhNCTcsOh6yULrEVEhNu70F3ZI+rrlUuwCztZvM0i B/TsqxCGQXq9luLVPDQR1V5Zd5VSj+k= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=A39sdJmV; dmarc=pass (policy=none) header.from=soleen.com; spf=pass (imf05.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.167.171 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710774848; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=qLoVguopjyYAO2Q+3Njnwr3an/oqQmvYaecgs71HWf0=; b=0uc2O9sIt01ZCj8aQMFpt/VnFNcbwKvckZ6AnfOX2MXg3wZCO0eukvSv3p0kk0IOXXzaJq pEzlGvl4sK5AG16fhDTXKYqR70WKxjk3H666Ja/hrRGyPY21oBhQ2RBv2UUzfP7IonL4cM 8V61vqyKNIUiECwHy2kD9nb1FyRsSG4= Received: by mail-oi1-f171.google.com with SMTP id 5614622812f47-3bd72353d9fso2684799b6e.3 for ; Mon, 18 Mar 2024 08:14:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen-com.20230601.gappssmtp.com; s=20230601; t=1710774847; x=1711379647; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=qLoVguopjyYAO2Q+3Njnwr3an/oqQmvYaecgs71HWf0=; b=A39sdJmV7RPPmwQRC+fQcf6lGTc/RCxHyQ5C3am7HZgiCS8fSf32yuGz4/pDM6eZaK zk7Uf21W3BWcPE8hjejWIv9oF6MZeOgYqRLWpg1NI68JyXfm+BMpgLgjTDJZYJfohukb IDCd+sePIMC6z/LR7vC+SfFQbPG8aM7BID1KBRftVpXIQqiGVfimHhYbccp7rMmzutSD YPhkyB5PiVok01J+rgpr4AR/oCezSg5gD+1vv3HCoQGvEGUVRS0dsMfL1GWmlObEuo+i PIv20vMaYH7Gm3lhY0lMy+3kgSCSVOesy0hD95gwE4l9lTK5CxdmJIrzl3/2WmbMtRVY Qngw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710774847; x=1711379647; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=qLoVguopjyYAO2Q+3Njnwr3an/oqQmvYaecgs71HWf0=; b=Sd0ps02rLeDI8IxqZtam7gL4+RiuyFkjLWloddKQH9hc5HVmNbIlj0cR+M3sVhs16B rRPBIxcpslV1YAZLqgrRLOTX5mLhcTlCX5CjaUqClzTjhCVxpfxC3yXK1WvUd3JyjVOh otRpo6QzjfHp+mzzYSDvAbucg9aR/pFMjWqyVTqMVZGF/IWZejdWejq2F1sU3uF9YGUL +i3oMnGINQjUXozjjIqgiZR8PFUFlyxMEiZSMT7TJpDly72B8Kkv4RMQX78LB03ILoKR PUogmNNPS0XnsG+dHD1Zuz+Ocz9d0mLjfe+PgzzWQJbvuafuCP6Qlgu2Qp6nDo0cdn1Z 3Aig== X-Forwarded-Encrypted: i=1; AJvYcCV+0VF6AAypV1qIFmTwt+WUDy11rJEjckOSstCFLb+DWH1hkkwjMQCmrAxy3sah0Wdwu/OT4ec4S6wM6CJviA1ydyE= X-Gm-Message-State: AOJu0YycqCKa6xOa9DPjMq7eM6XteYh+YoYo/aPZN8oRjVkBBdYV9Sd6 o0zjlpA/SeA0dzpNFcJpZ6pE3VpSTjaXntDA+SdQNjZ5L1UvMntlg6rHoM74XRF1S7bGqNlVkv1 QuHb+KIQZJJDcau5Ugx7BBQdOKuY7zRQsqCWwbw== X-Google-Smtp-Source: AGHT+IFe2glZzBoZL8XMsEbuSaNNgwqIgMuZ7g3ZjrppV1dgbS8Zr/7hhHIubBtbFYWuKVM+wODY+w8s7aNZp+D2xMM= X-Received: by 2002:a05:6808:1718:b0:3c2:1891:eb18 with SMTP id bc24-20020a056808171800b003c21891eb18mr13938572oib.57.1710774847467; Mon, 18 Mar 2024 08:14:07 -0700 (PDT) MIME-Version: 1.0 References: <20240311164638.2015063-1-pasha.tatashin@soleen.com> <2cb8f02d-f21e-45d2-afe2-d1c6225240f3@zytor.com> <2qp4uegb4kqkryihqyo6v3fzoc2nysuhltc535kxnh6ozpo5ni@isilzw7nth42> <39F17EC4-7844-4111-BF7D-FFC97B05D9FA@zytor.com> In-Reply-To: From: Pasha Tatashin Date: Mon, 18 Mar 2024 11:13:30 -0400 Message-ID: Subject: Re: [RFC 00/14] Dynamic Kernel Stacks To: David Laight Cc: "H. Peter Anvin" , Matthew Wilcox , Kent Overstreet , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "akpm@linux-foundation.org" , "x86@kernel.org" , "bp@alien8.de" , "brauner@kernel.org" , "bristot@redhat.com" , "bsegall@google.com" , "dave.hansen@linux.intel.com" , "dianders@chromium.org" , "dietmar.eggemann@arm.com" , "eric.devolder@oracle.com" , "hca@linux.ibm.com" , "hch@infradead.org" , "jacob.jun.pan@linux.intel.com" , "jgg@ziepe.ca" , "jpoimboe@kernel.org" , "jroedel@suse.de" , "juri.lelli@redhat.com" , "kinseyho@google.com" , "kirill.shutemov@linux.intel.com" , "lstoakes@gmail.com" , "luto@kernel.org" , "mgorman@suse.de" , "mic@digikod.net" , "michael.christie@oracle.com" , "mingo@redhat.com" , "mjguzik@gmail.com" , "mst@redhat.com" , "npiggin@gmail.com" , "peterz@infradead.org" , "pmladek@suse.com" , "rick.p.edgecombe@intel.com" , "rostedt@goodmis.org" , "surenb@google.com" , "tglx@linutronix.de" , "urezki@gmail.com" , "vincent.guittot@linaro.org" , "vschneid@redhat.com" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 7466410000B X-Stat-Signature: dpdg33oj5ycexzegihhjy4mffbi7oeda X-HE-Tag: 1710774848-887910 X-HE-Meta: U2FsdGVkX19nHxOCNe8G/CpagZKdWIbE2+qEbv4LcHpKCQJHlo0yBCU2+Ve7dkdrB49dghKmCGyVBYTV7vxPgxbYC++WOUIYeLH0+HAfTfdnFFCk0M7ZyrjG7qD8yzIa7+/AHgzk1awJFyTqPY932isdzPi3iRSPHgyF081Aoky8CDvoOjeAiKRWSqouOI16MVSZnh+UtKbZdub48AedVqEW5lk8BARW1DG03E297UqK2nnWNwA8wWryh+iUxqpI8UUADpzPUM3HHXoCYD6HLIlaAKgAxe4zp602e3V9uRGmF7x7aMEh25pNC+vpTIDUD2iHbhsWY1TSe2zkwkggck96z2gDvL9ycqTwyJ+c2Ex0g+QG2AmOmNxGcJNnJpX1lkDGG5MmWox4z6Z5sDRV9IGVzOQwKp341ogjdc2YbkBe236RqMlNQjpzV3fBRNZr9NwGAO7W6l5OtV1kIovbVWgAuQXNaFx7Ubuley33Yku+iSO+1w7YvpzE/Eln3QgJHxfM8DXPrrUU+4bJLjQPO4rAnLdOXmPeybR7N9kPaKzlMFLT47XOSE6q5SQjBexuoc/sqhPNIb/RGXEtgzAcKksJyGKPVuGDL90rhkhvYVIaBWx8jyNzua61H+OU77zlvzu9n04ovwDmQJif1m1jZoxyhq826ZL80ufw96ngzp0bsHKGA3XcXVtuzdHadEtrAK4/5kHW/LlK7AlDgv+gyhO9sqB17ArOl32Nxbm2L0BcpTDezxF2bTPSVWJxqNbcSVUew2xsfZKuAVvwc/ePcQQsgqmN5v9cJPO4JmHdSO21OpJV7SdBPvjeThByc0xCCD7hX69YxbCb6+KmWE/k93dVLPk6NE7INKZ3d02fLfmG1jOeuxm+1v9cOFLsTkR6jmYH7V3zYIv37mVForPkYoBLwSa9q+NVeiAQJvJGlXbykkHI67ui+FNsT9lx+TeCX6/arIq2O2zqPTFB9t0 AEMJbVHF T7JC9mIFHos0k2Vg6/SuwqEeZuOTL+2KdzjF73BNP9lkvtp4hkLR9e073A1OJR2O1odJM7Q/4Rc77+wfFwsKEFWHAsMz+lbhw7ZEoDV9voPL6XWKWxq2KSrVjN9SZFc7exxQ49QoOhy4+gCFrWbsns4ugqB+nLvUQ6hD3FbJzqXkk6GQMWnt9jAufZ2IBrTuimruzZzaUt9Q8/DrKXb6Pg9M6ln2nMhT9k+q/j2lNbnfPdDLny0wjTBiorNJY75OBkSCmnmVwhM0zzZHPgiyAwF8AhIeox3TbnW/mI4mAgyOyb2cIhF7UAH0FEHaUfyfh73NotLFjFjoJv8N565CYHByNuteKIN8rfINWcFj5/iCiKxdaiLRrnU82qD6Ev3FSvSakejh9sc9MPZwFTY2FCsEl2+5LuSNdA+DP8UdUmr5pzHnwJG2f913pGc/8lVNreVia X-Bogosity: Ham, tests=bogofilter, spamicity=0.005511, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Mar 18, 2024 at 11:09=E2=80=AFAM Pasha Tatashin wrote: > > On Sun, Mar 17, 2024 at 2:58=E2=80=AFPM David Laight wrote: > > > > From: Pasha Tatashin > > > Sent: 16 March 2024 19:18 > > ... > > > Expanding on Mathew's idea of an interface for dynamic kernel stack > > > sizes, here's what I'm thinking: > > > > > > - Kernel Threads: Create all kernel threads with a fully populated > > > THREAD_SIZE stack. (i.e. 16K) > > > - User Threads: Create all user threads with THREAD_SIZE kernel stack > > > but only the top page mapped. (i.e. 4K) > > > - In enter_from_user_mode(): Expand the thread stack to 16K by mappin= g > > > three additional pages from the per-CPU stack cache. This function is > > > called early in kernel entry points. > > > - exit_to_user_mode(): Unmap the extra three pages and return them to > > > the per-CPU cache. This function is called late in the kernel exit > > > path. > > > > Isn't that entirely horrid for TLB use and so will require a lot of IPI= ? > > The TLB load is going to be exactly the same as today, we already use > small pages for VMA mapped stacks. We won't need to have extra > flushing either, the mappings are in the kernel space, and once pages > are removed from the page table, no one is going to access that VA > space until that thread enters the kernel again. We will need to > invalidate the VA range only when the pages are mapped, and only on > the local cpu. The TLB miss rate is going to slightly increase, but very slightly, because stacks are small 4-pages with only 3-dynamic pages, and therefore only up-to 2-3 new misses per syscalls, and that is only for the complicated deep syscalls, therefore, I suspect it won't affect the real world performance. > > Remember, if a thread sleeps in 'extra stack' and is then resheduled > > on a different cpu the extra pages get 'pumped' from one cpu to > > another. > > Yes, the per-cpu cache can get unbalanced this way, we can remember > the original CPU where we acquired the pages to return to the same > place. > > > I also suspect a stack_probe() is likely to end up being a cache miss > > and also slow??? > > Can you please elaborate on this point. I am not aware of > stack_probe() and how it is used. > > > So you wouldn't want one on all calls. > > I'm not sure you'd want a conditional branch either. > > > > The explicit request for 'more stack' can be required to be allowed > > to sleep - removing a lot of issues. > > It would also be portable to all architectures. > > I'd also suspect that any thread that needs extra stack is likely > > to need to again. > > So while the memory could be recovered, I'd bet is isn't worth > > doing except under memory pressure. > > The call could also return 'no' - perhaps useful for (broken) code > > that insists on being recursive. > > The current approach discussed is somewhat different from explicit > more stack requests API. I am investigating how feasible it is to use > kernel stack multiplexing, so the same pages can be re-used by many > threads when they are actually used. If the multiplexing approach > won't work, I will come back to the explicit more stack API. > > > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, M= K1 1PT, UK > > Registration No: 1397386 (Wales)