From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DEE27C25B78 for ; Tue, 28 May 2024 08:42:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 766466B008C; Tue, 28 May 2024 04:42:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 716226B0092; Tue, 28 May 2024 04:42:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5DDA66B0093; Tue, 28 May 2024 04:42:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 4022F6B008C for ; Tue, 28 May 2024 04:42:05 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id EDB5B160395 for ; Tue, 28 May 2024 08:42:04 +0000 (UTC) X-FDA: 82167162168.09.6D5D4A7 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf13.hostedemail.com (Postfix) with ESMTP id C5A702000D for ; Tue, 28 May 2024 08:42:02 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="fZKkBW/R"; spf=pass (imf13.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1716885722; a=rsa-sha256; cv=none; b=Snu38gmztljCEDXVBkwSbRyVEbPvfT5IVS5PsQCvBjzzYbWjCw7UGy9yRpl+Zag9uikD/s ULjsah/TMZZFOQrnPu4RQUuaOaM1ftq/lkYpNua7U4n9fD/7wbGCqCdeWUSKvi50IBiA1E rbKORvMc+4T56NSo6l5/dO25RTisXwU= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="fZKkBW/R"; spf=pass (imf13.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1716885722; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=pUUEi4NFq5/6ujzPIEhzepE5FYBnLSr5RCh2O1Q//wA=; b=KWjYBXbE4/t9aBs/5iQC8HemLVh5/WyrGc66JqhDHobh2jLG8HhtBJoKVNiX+8xE849vDW 4DIiXhaToSHL8CpfEOqXetqxPpcJJBklCUpYVTT3Po439+37Oijbwqwdh91Dm1pGH6WhCz eGCt14MXpHfIQXzFPBKIM7MRR27iJP4= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1716885722; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pUUEi4NFq5/6ujzPIEhzepE5FYBnLSr5RCh2O1Q//wA=; b=fZKkBW/R4f4OQYlG/dt/LKvjjRQXdWBolXAjeoC2BPs1PEwl0sIb8FIBvQsPgz2PccK5m6 FGQPbYIOeFNngSjPxybztqmxvXq64MjrXe3LBBKj07aOfN1h7L3xoGL1jgpj3wz9H+16nY heElBrOMIE91tbcB6huWcoK/5uL4QGY= Received: from mail-ej1-f72.google.com (mail-ej1-f72.google.com [209.85.218.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-591-7dHpSm9SNWWyVAcP4zAFxQ-1; Tue, 28 May 2024 04:41:58 -0400 X-MC-Unique: 7dHpSm9SNWWyVAcP4zAFxQ-1 Received: by mail-ej1-f72.google.com with SMTP id a640c23a62f3a-a6265d3cb78so41804466b.0 for ; Tue, 28 May 2024 01:41:58 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716885717; x=1717490517; h=content-transfer-encoding:in-reply-to:content-language:from :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=pUUEi4NFq5/6ujzPIEhzepE5FYBnLSr5RCh2O1Q//wA=; b=PTcnoOsAsk0UCp7MoSWGf5VMJWGVloEAbsB9dpNelMLAbeo+Sqn0uApOfvDHtpCJKA YhKeeQ7wHhOZhXDSNqh3CDcZLBjv9V8EY4qo/BWT2N+NWH0P+0L+z/RkBmNbarEaE15j XWBH+xrTOJZ3nwqTxel/Sxnprvydl2z1nvcImtPxen2u48dRdDd6eHohD5vL3E2/m6KR EA5kDpaKaUIo8xJa8nU91iUOV4pSFTh4LDHzWDtSjybKHNhxX8eqfXL/QtMnqtkE6eCt G63Fj9KVz2HolewCD7/+8vWTnwAGIcDR5aX7Cuh8TFWC3bX6oQAGi2AhV1vFwOEEGhj9 xEfw== X-Forwarded-Encrypted: i=1; AJvYcCXkZheRSZj+GjDHfDC/SrKzSp5MfvGHBE+D4TWdBiKzpslwEbKWk+PzgbZI53f04HmoyJoSwBZeqU36Lip4m0lNSyU= X-Gm-Message-State: AOJu0Yw+E7dAUXgKW+fAsiVx+b7MuedyNtepzgHrjKJa8ieukAKM+Bd7 vjJ/y87rem4OP03XjrjV/QXquvHAx5BXBGr4eKsqkQIPHUjwJpp1guCE/mO9tGYWf1OnQWPcr+L Wdh9ObFisqm8LDFIc1W+ccuD+CoJfKF911wRztxFz6gpzPeAn X-Received: by 2002:a17:906:d8d2:b0:a62:c02:425a with SMTP id a640c23a62f3a-a626525d448mr817029866b.74.1716885717506; Tue, 28 May 2024 01:41:57 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFdFD9ql2/8w9cDbZsgKBm+WaL/wEXxkf8Jsu4WI1Wl5SGpzVoidZnRLsUUlVlZlyCzn7wrOA== X-Received: by 2002:a17:906:d8d2:b0:a62:c02:425a with SMTP id a640c23a62f3a-a626525d448mr817028266b.74.1716885717007; Tue, 28 May 2024 01:41:57 -0700 (PDT) Received: from ?IPV6:2003:d8:2f28:4600:d3a7:6c26:54cf:e01e? (p200300d82f284600d3a76c2654cfe01e.dip0.t-ipconnect.de. [2003:d8:2f28:4600:d3a7:6c26:54cf:e01e]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a626cc8c860sm586492066b.168.2024.05.28.01.41.55 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 28 May 2024 01:41:56 -0700 (PDT) Message-ID: <07686f06-f1a8-4282-bb48-fc4a5b554552@redhat.com> Date: Tue, 28 May 2024 10:41:54 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v10 00/12] LUF(Lazy Unmap Flush) reducing tlb numbers over 90% To: Byungchul Park , linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, ying.huang@intel.com, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, rjgolo@gmail.com References: <20240510065206.76078-1-byungchul@sk.com> From: David Hildenbrand In-Reply-To: <20240510065206.76078-1-byungchul@sk.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: C5A702000D X-Rspam-User: X-Rspamd-Server: rspam12 X-Stat-Signature: siiok95hje6yk9hyd8ofa77cpoqh7mst X-HE-Tag: 1716885722-738509 X-HE-Meta: U2FsdGVkX1++URLX51tCT8rQUZEG9O/26PcbsGLWjX8nygbN4d/iDtA5XAxRctTIOFv2ltN3gNK+rzFVg4/D/m59IYl7wAdzJ7z/qI6sl5Kc+AgO6JLgQMeetN/vt2aivOUkKjkFOw20QaAE59gNUhyqa25bP3aW7dAhMlAfmDb1msiwM8meTDUz0uvS8PbVmZagvyfSEYggFClg04Q8nFtcltxw3zcfAfbXs0CRYjC+5SYgPgiZ3nCLY75keaDiLGjMMSchbDVyqm3m1dCICU9y1X1E5gfF4cZD5yXBTS6d4w09oN8aV1ZLZ+mjifN6Sejy2HV2pEynEDxv6ofyrZpIL1fCGjuS6dqXLtVuQl3CMEWLJrnNpa6VVHa9yx4uWPiAG9qBVCbB9JZn4mIvoiXlFhV47mNmagAawQsdG8FKGQvvLDrnSeLAw2P//qNqLGMc/cBRZgaDIhMveznmEzKCrt+BUZBwnsfSNE8e4YfIpFamjatFG16WsxBPp7s+T/TZm5b7JVHa40Mv6wa/bBFGN/gX8QJB4LD39wmP/4y9rv+/DraZSjXOcsggVl4bswuJI51OkgPWIgeZSknSiHDRosaFFU3NYw58PeeZtQm+aK55ZrHsQUxhDfb9YAOXf+srJaqAjxccofDSmHE509U1ZPi4ORdNpCQLBj22fmJTUiLNQDBU5PBMMHIOxQZgvzXNxehXD3YeCJ08Fbavl8lUrmXJ8m5D65uvtnwZQ4yl3r826ViLR5ixdQUR/OmcN6nl3wMTK3ijY+W7kwIQhpnXU1rBxeI3k/l+PRMzsBE6BukeefzlEc/61aG96wLb6BBG2ui6XK8obuwjDDf1mnmXm9lc/3EX9U2Mucn7rboIZtUc4cnNsQBjT1GhXSX0KWXSsLoOjU2zCaBd3mlwVE3QsaN3Gye9txIQyJINQvk5J1+mIVG7Xn90UreF0mdPXfVF1AZLCS3J0+QtVPP Lfr99ef1 LPzBU6/kN9vLzM7x+/Os9jDBAv0eUzooZWt55zEPXE7O7XFcJbHGwGJzCGTFrw5sAcL25Cqpol9ySbuJ10hslPyRPkQc8jL9XkaQH6gVYw+WoXikWc4hjAwPO55a6ThvVGI5hXkCho2kK2uN1NOypvdHi7MUBBjhiTM1svnK862jVAAd+CWV94hTSynfMG9bgWrzr3zZqNlKu4pfXIBakOJzERAPTo5VVIrUWmZPiDPQGX4pFdASSYfwnWbuQy5mLUbAQiUT+z8T6woVj9GUbL2WHc/3+Cmg1y4XXmLRtRCU2ZkHMzFspZRl75S53BzGCWzlOUfEuwH3hg2lCYPjalELF8ltbx5ffcak+32TYuK0qgyAzkfmVgJxlW6Dxjtd1yfoooiy9+lgjoDjAIqQnesRmywy7l0sdf0oIVmUVBT+wHq06O85aBk8eXbahGlLsZQvOjFRYM1c/4FgmNM9hPEJkCGNoglF6aIC3L3tBtXj804uk1afXVa2SiNedKN8wqt+Fq3ZZUephCl/4K2jbqube1oOvfCEsPIXq/EyGpHFZF9/8wTWWBLBLll+4jjXTI5oDXavok8Sd3HH7vhUIuppHalzWsqFXapu6M4cQKzENofA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Am 10.05.24 um 08:51 schrieb Byungchul Park: > Hi everyone, > > While I'm working with a tiered memory system e.g. CXL memory, I have > been facing migration overhead esp. tlb shootdown on promotion or > demotion between different tiers. Yeah.. most tlb shootdowns on > migration through hinting fault can be avoided thanks to Huang Ying's > work, commit 4d4b6d66db ("mm,unmap: avoid flushing tlb in batch if PTE > is inaccessible"). See the following link for more information: > > https://lore.kernel.org/lkml/20231115025755.GA29979@system.software.com/ > > However, it's only for migration through hinting fault. I thought it'd > be much better if we have a general mechanism to reduce all the tlb > numbers that we can apply to any unmap code, that we normally believe > tlb flush should be followed. > > I'm suggesting a new mechanism, LUF(Lazy Unmap Flush), defers tlb flush > until folios that have been unmapped and freed, eventually get allocated > again. It's safe for folios that had been mapped read-only and were > unmapped, since the contents of the folios don't change while staying in > pcp or buddy so we can still read the data through the stale tlb entries. > > tlb flush can be defered when folios get unmapped as long as it > guarantees to perform tlb flush needed, before the folios actually > become used, of course, only if all the corresponding ptes don't have > write permission. Otherwise, the system will get messed up. > > To achieve that: > > 1. For the folios that map only to non-writable tlb entries, prevent > tlb flush during unmapping but perform it just before the folios > actually become used, out of buddy or pcp. Trying to understand the impact: Effectively, a CPU could still read data from a page that has already been freed, until that page gets reallocated again. The important part I can see is 1) PCP/buddy must not change page content (e.g., poison, init_on_free), otherwise an app might read wrong content. 2) If we mess up the flush-before-realloc, an app might observe data written by whoever allocated the page. 3) We must reliably detect+handle any read-only PTEs for which we didn't flush the TLB yet, otherwise an app could see its memory writes getting lost. I recall that at least uffd-wp might defer TLB flushes (see comment in do_wp_page()). Not sure about other pte_wrprotect() callers that flush the TLB after processing multiple page tables, whereby rmap code might succeed in unmapping a page before the TLB flush happened. Any other possible issues you stumbled over that are worth mentioning? -- Thanks, David / dhildenb