From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 182DAC021B1 for ; Thu, 20 Feb 2025 12:21:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A6AE9280002; Thu, 20 Feb 2025 07:20:59 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9F207280001; Thu, 20 Feb 2025 07:20:59 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 87093280002; Thu, 20 Feb 2025 07:20:59 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 6A089280001 for ; Thu, 20 Feb 2025 07:20:59 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 55F9C120FFE for ; Thu, 20 Feb 2025 12:20:58 +0000 (UTC) X-FDA: 83140232196.17.2C893B0 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf29.hostedemail.com (Postfix) with ESMTP id B6BA5120019 for ; Thu, 20 Feb 2025 12:20:55 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf29.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740054056; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2JkB0MeaKKiwVUPpVkLisJC3oxfx5E6LXDSRPdkP6yM=; b=Cp8ods+abz1Yv/NKWbhQr0eqChAJhSoQzEgt6ydiKwJv1q1cLAmPJZ5Oi/jRBBQNGEsM4S UVci4g7UkmcN4KXjaY4T5TrGeyzjF1BpzLmLsqN4MNoc07YzKhIerdNMiVxtHqKHbIRJhR 4EtVs6yJuhqUbTepmnZupLpVXdbxe6k= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf29.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740054056; a=rsa-sha256; cv=none; b=jHOVvJP+hdRGpjQugipD1Ff5mzZAR0RMkkZEvY8d/B7qsTc3u0sp4weuGfyZs+Ed1JQeXf 1eS4+YA5kAqOpElKE9mMq2WSnlHS/+mJ+8KsvPRmOjHA450e7wo6K7y0SRuGOsokCNn+pX Pu+zNedec1yWJpjbL6n0xgpJ3WqVoAA= X-AuditID: a67dfc5b-3e1ff7000001d7ae-f4-67b71e250467 Date: Thu, 20 Feb 2025 21:20:48 +0900 From: Byungchul Park To: Hillf Danton Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel_team@skhynix.com Subject: Re: [RFC PATCH v12 00/26] LUF(Lazy Unmap Flush) reducing tlb numbers over 90% Message-ID: <20250220122048.GA8305@system.software.com> References: <20250220052027.58847-1-byungchul@sk.com> <20250220103223.2360-1-hdanton@sina.com> <20250220114920.2383-1-hdanton@sina.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250220114920.2383-1-hdanton@sina.com> User-Agent: Mutt/1.9.4 (2018-02-28) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrOLMWRmVeSWpSXmKPExsXC9ZZnoa6q3PZ0g03TlC0O/HzOYnF51xw2 i3tr/rM6MHts+jSJ3WPSC3ePz5vkApijuGxSUnMyy1KL9O0SuDIeP/jJXPBbqmLRiVXsDYwP RLoYOTkkBEwkfj46yw5jr2xoYAWxWQRUJd5O/scEYrMJqEvcuPGTGcQWEVCW6LwwC6yGWcBf YsKtDSwgtrBAhMSTQyfA6nkFzCV+3TwOVi8k0M0oseZZOkRcUOLkzCcsEL1aEjf+vQSq5wCy pSWW/+MACXMKmEr0H1nECGKLAq06sO04UAkX0Gk/WSX+LmmAulNS4uCKGywTGAVmIRk7C8nY WQhjFzAyr2IUyswry03MzDHRy6jMy6zQS87P3cQIDM5ltX+idzB+uhB8iFGAg1GJh3dG67Z0 IdbEsuLK3EOMEhzMSiK8bfVb0oV4UxIrq1KL8uOLSnNSiw8xSnOwKInzGn0rTxESSE8sSc1O TS1ILYLJMnFwSjUwFsZmSDeJJnwp6/u8KaaC+X6RwTwbjtunS+8ezPp51Z/r4MKEhy/1VMSN JL54C8080vDcaMae6YsrvLJUYpgljvU95H38bvEFn92Tvz1WLfnlLDAn+crrA3GGmv5yfrOb JuXdEw1I4mtbvEHPrm/7xGdSdzZO2vaj1MCzrH1q20+Wx+fCFs89q8RSnJFoqMVcVJwIAFUS jP1KAgAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFprHLMWRmVeSWpSXmKPExsXC5WfdrKsqtz3d4MMfCYsDP5+zWByee5LV 4vKuOWwW99b8Z3Vg8dj0aRK7x6QX7h6LX3xg8vi8SS6AJYrLJiU1J7MstUjfLoEr4/GDn8wF v6UqFp1Yxd7A+ECki5GTQ0LARGJlQwMriM0ioCrxdvI/JhCbTUBd4saNn8wgtoiAskTnhVlg NcwC/hITbm1gAbGFBSIknhw6AVbPK2Au8evmcbB6IYFuRok1z9Ih4oISJ2c+YYHo1ZK48e8l UD0HkC0tsfwfB0iYU8BUov/IIkYQWxRo1YFtx5kmMPLOQtI9C0n3LITuBYzMqxhFMvPKchMz c0z1irMzKvMyK/SS83M3MQJDbVntn4k7GL9cdj/EKMDBqMTD++Dx1nQh1sSy4srcQ4wSHMxK Irxt9VvShXhTEiurUovy44tKc1KLDzFKc7AoifN6hacmCAmkJ5akZqemFqQWwWSZODilGhhX ZPp5RDfbaKyr234xZtls07V+SRUzd90R3Pcuwdx69dYlsy9oHRKcJPrgxNl/NnukPogISZsZ zT+Vpn3t/sUYVtUrz5+IxIhzrX7myZq6smBB2youocfuIcHBEeqMTipfHdbo52QeXM7oFmD6 ZP1aP19ft1OSjzNOffPcrxi4+vXOGYbZQUJKLMUZiYZazEXFiQAE6RAjMQIAAA== X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: B6BA5120019 X-Stat-Signature: 7y71wmggg6jp9x3c5sk5xrt3k96ttp97 X-HE-Tag: 1740054055-595566 X-HE-Meta: U2FsdGVkX1+he/Q1M8GtQ+atWd8v22oS0nP6VmT6g06GN7z2wL0Y7qlwuxbrDp4JL3fcxwf9K8zXwnk9d8AL2ChijsUmYBFFeBMoUZiX22Qgsr40CYEivV6NrnT2A3djEsKlgyvgMd2Zns9R8/fIRjwDPTNBgg2ap/6LvVulQFZ3PwuFJYR9nN7cuecVEAYe7dbHQU1IT5sxZSQWiY5PYK8AlkOBp2VLbfH1+FZSL3vjFQKoAMnMG2LXnuTLNthQKyxjsJDhHNI5MYAHgHsXIcVnQ4cEM+kmX1Cpnic6ZznEXbQ64nVi3fnnvvMQcqoP9fmQOpHpafS4YRwl/qjoW8faEa2137GkNT14QJVFDveNgDLvr7dlf9tXgCinBM/3ymSll6PBpxlBEkfKLFUuPBBj4X3tjbkinBg6+2DsjyLb+P1zcaQfZTqxZcCVOabHcgHwEtD/UHnZj7sYQS+njVZ8UAg+/uIqcU0STez3SJV0GtDB4yPf6hrr+K4IypDuCmSecMI07Jj43R19E7T5ZVOO4R9QE1ZPvBBsdCEjA+adxTmOlJYdT/KkubeXOmDguVGpcpal1zDRKuKWVW6dfF2t5lQ6ZUWpCoQhzsCW5Rxyu1i2EY90w+jmsOrQoFNK1wbCu1x74turCqCRJdPUoPkFTNbhht99ZzdMbgT3JxGzJVf4q67xGwrY678c4WdHjSy3E1v33UQuOB//KaM98XwmfovgYHE+IfdLuT6Z0RSylz/WDrGM52LZ8SUcnnSFhqDDADI9gNR4q3iTrk8rgRmqkpSzGRjcLDE0q+nPeS5519mDll+cBcNILk+AbowKZOMRy8wSx61SunI61MPNQKBm7C5lWIjqV7sLBSYbbAyD3E5dPdfNpfplWxR+PtAxQ4Nn4FLSwFvQ9W1bh1QBbKqEGPdZ8FzyxQ8q7r+zbRrLGbnja9y0MpD4h27tg8gE/XYPeq9MosgcD0TaSUN ac3vHd2T C7UynXFQ6XyKnjQPa3yeP6Bhy4Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Feb 20, 2025 at 07:49:19PM +0800, Hillf Danton wrote: > On Thu, 20 Feb 2025 20:09:35 +0900 Byungchul Park wrote: > > On Thu, Feb 20, 2025 at 06:32:22PM +0800, Hillf Danton wrote: > > > On Thu, 20 Feb 2025 14:20:01 +0900 Byungchul Park > > > > To check luf's stability, I ran a heavy LLM inference workload consuming > > > > 210GiB over 7 days on a machine with 140GiB memory, and decided it's > > > > stable enough. > > > > > > > > I'm posting the latest version so that anyone can try luf mechanism if > > > > wanted by any chance. However, I tagged RFC again because there are > > > > still issues that should be resolved to merge to mainline: > > > > > > > > 1. Even though system wide total cpu time for TLB shootdown is > > > > reduced over 95%, page allocation paths should take additional cpu > > > > time shifted from page reclaim to perform TLB shootdown. > > > > > > > > 2. We need luf debug feature to detect when luf goes wrong by any > > > > chance. I implemented just a draft version that checks the sanity > > > > on mkwrite(), kmap(), and so on. I need to gather better ideas > > > > to improve the debug feature. > > > > > > > > --- > > > > > > > > Hi everyone, > > > > > > > > While I'm working with a tiered memory system e.g. CXL memory, I have > > > > been facing migration overhead esp. tlb shootdown on promotion or > > > > demotion between different tiers. Yeah.. most tlb shootdowns on > > > > migration through hinting fault can be avoided thanks to Huang Ying's > > > > work, commit 4d4b6d66db ("mm,unmap: avoid flushing tlb in batch if PTE > > > > is inaccessible"). > > > > > > > > However, it's only for migration through hinting fault. I thought it'd > > > > be much better if we have a general mechanism to reduce all the tlb > > > > numbers that we can apply to any unmap code, that we normally believe > > > > tlb flush should be followed. > > > > > > > > I'm suggesting a new mechanism, LUF(Lazy Unmap Flush), that defers tlb > > > > flush until folios that have been unmapped and freed, eventually get > > > > allocated again. It's safe for folios that had been mapped read-only > > > > and were unmapped, as long as the contents of the folios don't change > > > > while staying in pcp or buddy so we can still read the data through the > > > > stale tlb entries. > > > > > > > Given pcp or buddy, you are opening window for use after free which makes > > > no sense in 99% cases. > > > > Just in case that I don't understand what you meant and for better > > understanding, can you provide a simple and problematic example from > > the u-a-f? > > > Tell us if it is illegal to commit rape without pregnancy in your home town? Memory overcommit also looked cheating to someone like you. You definitely think it'd be totally non-sense that each task believes it can use its own full virtual space. We say uaf is illegal only when it can cause access the free area without *appropriate permission*. > PS defering flushing tlb [1,2] is no go. I will check this shortly. Byungchul > > Subject: Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs > [1] https://lore.kernel.org/lkml/20250127155146.GB25757@willie-the-truck/ > [2] https://lore.kernel.org/lkml/xhsmhwmdwihte.mognet@vschneid-thinkpadt14sgen2i.remote.csb/