From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E2F46C021B1 for ; Thu, 20 Feb 2025 10:32:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7CCA04401CF; Thu, 20 Feb 2025 05:32:41 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 77CA14401CC; Thu, 20 Feb 2025 05:32:41 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6457A4401CF; Thu, 20 Feb 2025 05:32:41 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 45DA94401CC for ; Thu, 20 Feb 2025 05:32:41 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id EC0B1C0753 for ; Thu, 20 Feb 2025 10:32:40 +0000 (UTC) X-FDA: 83139959280.22.77AA506 Received: from mail78-58.sinamail.sina.com.cn (mail78-58.sinamail.sina.com.cn [219.142.78.58]) by imf06.hostedemail.com (Postfix) with ESMTP id BEA21180010 for ; Thu, 20 Feb 2025 10:32:37 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf06.hostedemail.com: domain of hdanton@sina.com designates 219.142.78.58 as permitted sender) smtp.mailfrom=hdanton@sina.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740047559; a=rsa-sha256; cv=none; b=PfRyUCSZhUi83aB3omp0xS0jPV3KvrXwvmuEBlxdCWH7Yc5Irsd52bUYihWsTF1lj7DBRn CWLqvtL0i8gVzpxZQFxR+4UN5MrODC+X1Hx9FbRSpM82dJDs6FfJGbAn7xaVknJuNwfwOH umSKYUk1ZDxUc9RH2f61zSUcWuenyIc= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf06.hostedemail.com: domain of hdanton@sina.com designates 219.142.78.58 as permitted sender) smtp.mailfrom=hdanton@sina.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740047559; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/o4D+yY7+7vrcYYZl0TrajzkbguzI9QWlqZdf7JoAEY=; b=2Slzu2J2VEnPBWad9hdrV9zvHYzCe6FHDWEqQt+b4m5wt19UnICK94wdt+BupOye+Ptb6i 6IJIMXbDm1OFCkXDktTpVYsO5CHYyN9FOwmw9Tfy1z9jCwBW3DFg+Y+iGZkuqkUqbPyu6t 2M/sfBMzj1L3Mk1OFWV5aZVJ816owEM= X-SMAIL-HELO: localhost.localdomain Received: from unknown (HELO localhost.localdomain)([113.88.51.172]) by sina.com (10.185.250.24) with ESMTP id 67B704BE00007436; Thu, 20 Feb 2025 18:32:32 +0800 (CST) X-Sender: hdanton@sina.com X-Auth-ID: hdanton@sina.com X-SMAIL-MID: 64076010748088 X-SMAIL-UIID: 6DA3C6523AD94F4E983067A69F19E86E-20250220-183232-1 From: Hillf Danton To: Byungchul Park Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [RFC PATCH v12 00/26] LUF(Lazy Unmap Flush) reducing tlb numbers over 90% Date: Thu, 20 Feb 2025 18:32:22 +0800 Message-ID: <20250220103223.2360-1-hdanton@sina.com> In-Reply-To: <20250220052027.58847-1-byungchul@sk.com> References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: BEA21180010 X-Stat-Signature: sr3gc51bjjme1wmmyt4woc5w9y17zc1d X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1740047557-16788 X-HE-Meta: U2FsdGVkX18qokKwYlZ4Vr2rGcIRQU0rSKzTND8/W4iCKBTlUiAaccipdZSDFrsuYTdWX7nV5qOVoEoIfElfRBF7Ke1KjkbwnCrfjJzaqaER69wutzDhTJK2tOq3IlXHBSwd8at6d0CUT7ylDbZTiDPqo2ErmhhZ43QkS93mkgd7WYvUgQGJwCE/+BD10Lkxsay1q4aA5LtxjKg0aB80PMEG5dXLlDWpg+9rK1h0iWNECJN9AyvCY8hymRDKqiXkNA+CHXB+waLBOZzjDbc52zeORHwW+jAvTBU5IZbHX70DJx3ZshxpCFWlnRGaHE+ilc58BaX++trCcpu4532Bbd5uYeOggNmS/nl4BbdYOczB1dNdN2xK37Vus0rWSnAFBXdT1fESYEYjh+iUOkUz9B9ZjipsDMpp8+kYDDrBP5sDuBg2iy9OA+zfkC9gN9y4e76JpxIJvt8LNiLfDQwhGQq36Svj5ngqKZ6Jn+ajCmbmYAUXbqssBAB7M+3Yixnzwlr90bDEQdfevM9Yb9XB58kTY4Si+1gJjFC/YgavjbyfEBmzTUA6cH6OPGeLrGMVfisGIHVZjz6eJicCuRnw1yC3yN3Rzp0V0ZXWwlqYBd85MkpXFXIWLhZZWYH/GppezCHHLMXRZnXfXT7xezv/L5dMfsHCr8ReodnuB25yG42++uuz3sIMQUWqtW+2nRZ9tk5Z17wDa6jcStvZj5nNpTxhCOchV1NPwiOfG/uL8t5zhHPY3snzmTjCkC2W9g8C0YU7R9pRchTi88sy54IAhmmxny2jDBEmGMN84Yxq5D9toJakY4l3wQT3e4wsJokymgtBPpffpgn0FXKOfCeQ0qSgbvPAie1NiHX72ZhL4oS3Vq9BRlCCprs1shc3DgGOqcFb1jwi2v5c9BoeOSVNhm0kRMlb+LX5pOyTJp7EeXf+HmzWQtolP+oCigsvPN5GP+1UB/JpMNe/SeD07bn L0tUHD2J ReWpQwKkQtFHZp2K7uTlcPpn56mmgSUwXY+9F2pbmwAJqi3Z4fsd0w719itWScmxckU04/o8bwHSodh9b1CZ0xFvF3v0oEQpbQSxbEMd7JB3TNtkhGMCC/PKRhWUZAORz7Wn4FFBA7gt4+y3fo90VnpeI82Cji8qRbq9jQRyGeTQ6UuKBHgfORhu9q2kpmKbD/EBn8gVg8AEMqwzAMaFLgJYyemwK2TUd2ue9 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, 20 Feb 2025 14:20:01 +0900 Byungchul Park > To check luf's stability, I ran a heavy LLM inference workload consuming > 210GiB over 7 days on a machine with 140GiB memory, and decided it's > stable enough. > > I'm posting the latest version so that anyone can try luf mechanism if > wanted by any chance. However, I tagged RFC again because there are > still issues that should be resolved to merge to mainline: > > 1. Even though system wide total cpu time for TLB shootdown is > reduced over 95%, page allocation paths should take additional cpu > time shifted from page reclaim to perform TLB shootdown. > > 2. We need luf debug feature to detect when luf goes wrong by any > chance. I implemented just a draft version that checks the sanity > on mkwrite(), kmap(), and so on. I need to gather better ideas > to improve the debug feature. > > --- > > Hi everyone, > > While I'm working with a tiered memory system e.g. CXL memory, I have > been facing migration overhead esp. tlb shootdown on promotion or > demotion between different tiers. Yeah.. most tlb shootdowns on > migration through hinting fault can be avoided thanks to Huang Ying's > work, commit 4d4b6d66db ("mm,unmap: avoid flushing tlb in batch if PTE > is inaccessible"). > > However, it's only for migration through hinting fault. I thought it'd > be much better if we have a general mechanism to reduce all the tlb > numbers that we can apply to any unmap code, that we normally believe > tlb flush should be followed. > > I'm suggesting a new mechanism, LUF(Lazy Unmap Flush), that defers tlb > flush until folios that have been unmapped and freed, eventually get > allocated again. It's safe for folios that had been mapped read-only > and were unmapped, as long as the contents of the folios don't change > while staying in pcp or buddy so we can still read the data through the > stale tlb entries. > Given pcp or buddy, you are opening window for use after free which makes no sense in 99% cases.