From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0FBB8E77188 for ; Mon, 30 Dec 2024 20:22:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 811ED6B00A8; Mon, 30 Dec 2024 15:22:26 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7C1536B00AA; Mon, 30 Dec 2024 15:22:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 661CE6B00AB; Mon, 30 Dec 2024 15:22:26 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 482EE6B00A8 for ; Mon, 30 Dec 2024 15:22:26 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id D9A4E1C944A for ; Mon, 30 Dec 2024 20:22:25 +0000 (UTC) X-FDA: 82952745666.19.51E749F Received: from mail-vs1-f46.google.com (mail-vs1-f46.google.com [209.85.217.46]) by imf29.hostedemail.com (Postfix) with ESMTP id 8221A120006 for ; Mon, 30 Dec 2024 20:21:17 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=L3JUrVOa; spf=pass (imf29.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.46 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1735590122; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=uCKrfKPkFJYl4BFntvZYJ/IAdxlUtX0BbW1cPl6Onyo=; b=iye/Ofs1u6YLpLqp4Ea1XnEJBPeE0DcZNK2lfoAJyVqAXolQIvchyUMzEboIznLk1pNuEW +ARGaw7L5ZpmWFXdAKStB5vDgX57vutZUuTRfsB5svDK0FVBW2Swrxp4Hlm85T0kF915TP jd0sqDnRAdzbc58/42mF1Il6hcmt7ds= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1735590122; a=rsa-sha256; cv=none; b=550g8lvERm9EJpLcLJLSfsykS9nZxJzN6lj1gu/xxfusMftG3So/sDzTMCODokqdWL+9t0 TMf3fBphLKg86Ym2O6IIrrK43rXNrPPtZz10+Z883zWTOoEpKjPfXbkfMoU2O5GoIDJqcp gMC6a0K/XTnWATJBDu+MAeMNvhrj4WU= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=L3JUrVOa; spf=pass (imf29.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.46 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-vs1-f46.google.com with SMTP id ada2fe7eead31-4affbb7ef2dso6251074137.0 for ; Mon, 30 Dec 2024 12:22:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1735590143; x=1736194943; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=uCKrfKPkFJYl4BFntvZYJ/IAdxlUtX0BbW1cPl6Onyo=; b=L3JUrVOazYkJFzlp1tBcvSGxjckGQSZomOGQy9qIcPCSHpH5fV1AtIRHB1me6HIrFq iokjszuiHeOp2bzFeORuiDS9zDarfcX3mJGJtIRzzcxCa3sHNjjkoiJyc2PtmKiJxoHX dH0DXKOBLQqSp/m6DcIYG4XldJC+GkMxlIB3RkwqY3ZPd7BgdH8AmD+PF0KYdidHhtp1 9u+RKX56b2gTGsyR37QKYjHVoRRqvDQhz9pUl5RUUgx9yHa/C5uEQZQZ0CgDlAAF5Zd3 a5PvwytIDKwYTTxV9KqeSeGBvDWRUYcLKoN+c+EZNZuR2vL7BNlf3zndZfpNTlFzJTzr QXNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1735590143; x=1736194943; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=uCKrfKPkFJYl4BFntvZYJ/IAdxlUtX0BbW1cPl6Onyo=; b=oC3Ds+poJ3OjXpBbifGcOiVHNg7s1e8XV38dM1I9VfH5micVPk/qW4RXP7s0LweJCE 4Q5HbdJ+j68rtuNmeN+69whOWrnvK5ZLF9PI7KwewYioV35cVcu1wateJH3fSp5WAYvE JG0mn7h0e0uIZBt6aXXWDf7tH/3YZCqe9R58r8mHTDtMNVExyk0JKqpQUAOvPFbIYnPg WidfHuX3/87V9TnKMxYfa7tYU2fctf0cmiFbbCnEcOu02HhvmL4JuK9LQ3PjNTiWk05B qeLeq055bqHm8gMPqzB4XyGm+vvP9qN8LClLEPEuV2aWKCmNH0sZ4jLBJ2tnunRjAkhK uy/A== X-Forwarded-Encrypted: i=1; AJvYcCV30TL6uE5Pfm8jUKg1u5R8I20ccWOMLj6LVfcNwiisprVUuommAWIgBDWPo9NTS/B9+kqaEG1wMA==@kvack.org X-Gm-Message-State: AOJu0Yy1DDE28AeQ2h70KMmX8OXil8l2RE7v+J63P+JIU+2AwkU3GOd8 m6WTVCmZQ8j3ovTSyv8jULMmwQZqFGxyOFsAtmItrCXxstPa22/p1Zz2G9UzJFOgtKOTjSNuTSt IXaF1oRuOfyxb4abCV7GD4wdZcpw= X-Gm-Gg: ASbGncsSpphZfyoBcmiHfhJYHPdjHZ3NYQUNkxWSakQySNt91quYV2liCIImxjxC353 +GMEf+8mV5b8ge7Zoo7DY78+X0iYem7J/AK48arSAIJLNPNYgRMcD9JtA20h4VYtmSYis7eQg X-Google-Smtp-Source: AGHT+IHBA8HF1Ar9QempoztykBIbpo4A8FtseTOJoe9RoHYFWVYa9ERHeZn4cYnQ45GbM5nisc4bvvymL8cXwOJoi4Q= X-Received: by 2002:a05:6102:160f:b0:4b1:1b07:f7c3 with SMTP id ada2fe7eead31-4b2cc455785mr31746501137.20.1735590143130; Mon, 30 Dec 2024 12:22:23 -0800 (PST) MIME-Version: 1.0 References: <142a47b6-ac31-465c-917e-7b2e98fddb2f@redhat.com> <8690de27-a1be-4440-a2d6-1a5cc56dcceb@redhat.com> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Tue, 31 Dec 2024 09:22:12 +1300 Message-ID: Subject: Re: All MADV_FREE mTHPs are fully subjected to deferred_split_folio() To: David Hildenbrand Cc: Lance Yang , Linux-MM , Ryan Roberts , Baolin Wang , Andrew Morton Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 8221A120006 X-Stat-Signature: stzg4inqjns581u7xgk7yph4xuas3i1r X-Rspam-User: X-HE-Tag: 1735590077-340636 X-HE-Meta: U2FsdGVkX1+71CsSXrpHtiVARvHfTSM4wMyUk5qShsbQ1Cn2Cpmo8WQy/NoKF6WCVASRYb8lFTUPJ+kZ8D1zaTPWGn7br39HNW3y5MP7ptUj8mZ1sB8obfOQXnYa4AwYudSaUXmfV8RYVzbCuHrCSA6pabocQtAGy2V18QZ7iF+kml1UybeakZQOUv3xl2hSgP4/Gf0EjvnT39A6D/+A/wIsZd28Wea7K5AWuHeLbyfj3tjYrxbkySm7mOeJnvWs5H08OcnxBrALkYzFUlKOsHNQ18nf9Q/egIMUN3HKBVEIYly6FZNBr3F3iEpC52RLv/xuO8JtcmaKgNSNsI5sDgN2SEB+CrmpokCyOLC8Mgf9YpuGsEENAGsxupv86lvBoxdSDCDacpwV2ae23A2Emm5m6NhgugN4vmlGwVh98M16lJp/kV2e5I5G+TxFD0zMDWn6FobEUibt7mql28d1pRljq5W5U4o+LhxZzdRCAQ/zuGJN+Fewn5/6WpI4TEnx07gGUS5NpDResfLXuhiERkTZDXDlo+2x6LDziXddGUIm3KwbR3yxE54aIvq2PXUBuV5SKJMQ7aAfHzhDkg8AqLmd8KTWYf8LV1UIngbEsupaTfQvqhZqGytvyIW+xzEo7YYzurVb5LG7WKeyYpDORcQtbDRMoeMmps9Mkk1IvesIUj08vKhMTwolDQTDSV20syzMCbV3MLa0sXTzaz3yD+8KM9EDquvGqUcNIF87378PsXjMQJu8Tz7A08q2bOcUjy2+SSu2u4pAw2doMeArhQvkS9/yFpzTWMgh2PNeHiRHoqhYaRaDHuEUr/l7FUGlXkJcX6aVkWekS+RGNXfGqWdL5q5Kz+jwg8nR8cpUIARQVNBrK4w5sTZGfc3zOW9CC5pWL6bBWCLxLKEvThDhVzqbkgtnac3eS1Ywld0imkY57PkzcnH1mhwZVXwMzjOy5sbBGe2xyz18xliJuN+ npgbaHFo klxumFk/pC4fmxL8SRhAnaQsAB1MwHUr9h5YzoUwVtMbAPYAWNyxyyVOdqZl+00ulyuUfDcdObzwd478lq75JlJmiiYyaHAT6HofFc1Vt7F9nxZyHMv5n0YUz4hwckIS/tEo7baC5v7RlIxpWVuIUUtK+fLnexlvQRkLNXDNGilHTWOy9Gs2reGD3aL8ipFzHKgjZdkQSr8OpWozSocUWi8GNInU8GbxWI3RKSGPQvPsmPtBjYV7cJ6PjjcxdEDuX8I26j1rKlDODmpgdwcnx1J+X0Y3NI1+B+mJieNr4Mwvgncca6LbXiWvJcP+WDlJWWmH/v/8XAqtUJm1MArT3z/Ux5D54ogaqmlYmgsn+0e34jCcNtrqEj1f4ue5pTKpITOX5 X-Bogosity: Ham, tests=bogofilter, spamicity=0.005583, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Dec 31, 2024 at 8:32=E2=80=AFAM David Hildenbrand wrote: > > >> goto discard; > >> > > > > I agree that this is necessary, but I'm not sure it addresses my > > concerns. MADV_FREE'ed mTHPs are still being added to `deferred_split`, > > and this does not resolve the issue of them being partially unmapped > > though it is definitely better than the existing code, at least folios = are > > not moved back to swap-backed. > > > On the other hand, users might rely on the `deferred_split` counter = to > > assess how aggressively userspace is performing address/size unaligned > > operations > > like MADV_DONTNEED or unmapped behavior. However, our debugging shows > > that the majority of `deferred_split` counter increments result from > > aligned MADV_FREE operations. This diminishes the counter's usefulness > > in reflecting unaligned userspace behavior. > > Optimizing that is certainly something to look into, but the bigger > issue you describe arises from bad handling of speculative references. > > Just imagine you indeed have a partially-mapped anon folio and the > remaining pages are MADV_FREE'ed. The problem with the speculative > reference would still apply. > > > > > If possible, I am still looking for some approach to entirely avoid > > adding the folio to deferred_split and partially being unmapped. > > > > Could the concept be something like this? > > Very likely it's wrong, because you really have to assure that that > folio range is mapped here. > > Proper folio PTE batching should be applied here -- folio_pte_batch() etc= . > I agree that using `folio_pte_batch()` to check if all PTEs are mapped and determining `any_dirty` for setting swap-backed is the right direction. I'm just curious if `(!list_empty(&folio->_deferred_list))` or `folio_test_partially_mapped(folio)` could replace it if we're aiming for a smaller change :-) > That can please the counters in many, but not all cases. Again, maybe > the deferred-split handling should be handled differently, and not > synchronously from rmap code. > > I see 3 different work items > > 1) Fix mis-handling of speculative references > Agreed, the patch you're sending is absolutely necessary. I'd prefer it lands sooner in some way. Would you like to post it? > 2) Perform proper PTE batching during unmap/migration. Will improve > performance in any case. Agreed. I remember discussing this with Ryan in an email thread about a year ago, even for normal (non-MADV_FREE'ed) folios, but it seems everyone has been busy with other priorities. This seems like a good time to start exploring the idea. We could begin with MADV_FREE'ed folios and later extend it to normal folios=E2=80=94for instance, by implementing batched setting of swap entries. > > 3) Try moving deferred-split handling out of rmap code into reclaim/ > access-bit handling. I'm not quite sure we still need this after having 1 and 2. With those, we've been able to operate on the mTHP as a whole. Do we still need to move deferred_split out of rmap? > > -- > Cheers, > > David / dhildenb > Thanks Barry