From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 27D09C4828D for ; Mon, 5 Feb 2024 20:27:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B41D86B0074; Mon, 5 Feb 2024 15:27:13 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id ACA616B0078; Mon, 5 Feb 2024 15:27:13 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 96B166B007B; Mon, 5 Feb 2024 15:27:13 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 7E5E76B0074 for ; Mon, 5 Feb 2024 15:27:13 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 456A91405E4 for ; Mon, 5 Feb 2024 20:27:13 +0000 (UTC) X-FDA: 81758884746.13.B037956 Received: from mail-ed1-f47.google.com (mail-ed1-f47.google.com [209.85.208.47]) by imf27.hostedemail.com (Postfix) with ESMTP id 7B51F40005 for ; Mon, 5 Feb 2024 20:27:11 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=z+HKjR2P; spf=pass (imf27.hostedemail.com: domain of zokeefe@google.com designates 209.85.208.47 as permitted sender) smtp.mailfrom=zokeefe@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707164831; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0XjG1RQUVjgj2DA3G0T2Y3UPfKUu84AOrDd8UoQyLxo=; b=swxUxiPKVUGme6ak0UYBR9mAIG/08+bxqB9a7gnRixxDPw0XGewFJHUb1v8jr7do6oIsDo igAeU+eGPyvbSTpYBxAnpaEEFHHRHb9wPOD1vY/yirj7BoEsVdtPWZOa78LzMKZ7ihS1wX EdhE09p1u1LAnpfnSbn4/pwuBKYBFhY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707164831; a=rsa-sha256; cv=none; b=0OjQeuFB20s+ApGc0QwA79GWqNLE62pH8zZV7oHpkP822pCKSUo6zSwxZMCT5ds0oO+2BT RxJZ/A8aoFXLd8aio2Yq3AmJyCyoeJKDPvK0AKnZNNBdIbyApWHxkp0itmdLgK5TKJAEDE 1FfiquplhVtUEp464YbBZVDvirRIA+E= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=z+HKjR2P; spf=pass (imf27.hostedemail.com: domain of zokeefe@google.com designates 209.85.208.47 as permitted sender) smtp.mailfrom=zokeefe@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-ed1-f47.google.com with SMTP id 4fb4d7f45d1cf-56012eeb755so4727a12.0 for ; Mon, 05 Feb 2024 12:27:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1707164830; x=1707769630; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=0XjG1RQUVjgj2DA3G0T2Y3UPfKUu84AOrDd8UoQyLxo=; b=z+HKjR2PGlsomLxNX8hAcN0DlPNEyZovXXu4TB56C24M+ytJP4f1jdfvgXP2u6JEOI xwGHiKU6HeezBgFNv+T75oFRZI4Wn9Plx5dWFn+H/48dCDOb39TG+YV3qbOp1tvk3+u2 oOJig2HfQfqRtlRYor9sAN58JEVuxRwAppzrJFgUBtkiMfmBzZtNR1S/7c9pMMoXoPc/ MCzyCLpxoWPXqF7qD9f+u4KLEMq33RZLkaCYyPAH5crfNsXBH5T7aed0e82y0oylovWw B8JLg3N/npVJP2/YEDrGsx43aR264dasTpBFI1VwTfoOTXca0Ibe7qA5HbMYI0wwCF8E 8ktA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707164830; x=1707769630; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0XjG1RQUVjgj2DA3G0T2Y3UPfKUu84AOrDd8UoQyLxo=; b=FU8TwvV3N7Av8AT/g/wG5FGebdTRy126l1BPEBnNNOdK4o9kcS3atI4J4wRe18xNiq eSgZWQ733KV37IER83e9BcDZhnPVhIm+KevaySnBeEdZ6fKCRO6Yb1EnsTxooBDHMNGI IMRALe+5XVx/x/90Hc2IE3KnmDttrhq2av1X1GRLbcxf5oWrgrPHFT1aH8VDg0OixkjB KzQRav/DBTSd6S2xZ1E695P7/Krt8UUwet/oazLpctQVJswv5YTIrENzEdygMvrFA7zc WCgYmyTp3ChPm3QDm4mMcR/DCMdlpCGtc8h7Irw/tbFB/OzxAZqmzLPPOCXkY+ipM5rB 6G/g== X-Gm-Message-State: AOJu0Yz4Y4SHxsb32U0nmRRcM9Vt0L+Tc2BE0A+tNu++xAKpmcz8GuFN FlzeVgU9VTzy+2/mMx4W8PEKjlOqft26OXTAJcc0QAorQK/g78cw8vDr4yo+2rlihVoqODdcKca JHSuGjYgt6Sgfqi+4Z3XDH/NbfVz3irOK3b4Y X-Google-Smtp-Source: AGHT+IGEau1j9sSQ9IGKaPyso8HXCatnAanFJ2rT2oGvMA9YeX4CA5AWVPmOfYKVQcL9LvIw1fhAU/vg8ybXYlhsfPs= X-Received: by 2002:a50:a6d2:0:b0:55f:8a86:e694 with SMTP id f18-20020a50a6d2000000b0055f8a86e694mr6040edc.3.1707164829753; Mon, 05 Feb 2024 12:27:09 -0800 (PST) MIME-Version: 1.0 References: <20240201125226.28372-1-ioworker0@gmail.com> In-Reply-To: From: "Zach O'Keefe" Date: Mon, 5 Feb 2024 12:26:31 -0800 Message-ID: Subject: Re: [PATCH 1/1] mm/khugepaged: skip copying lazyfree pages on collapse To: Yang Shi Cc: Michal Hocko , Lance Yang , akpm@linux-foundation.org, david@redhat.com, songmuchun@bytedance.com, peterx@redhat.com, minchan@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 7B51F40005 X-Rspam-User: X-Stat-Signature: r6rf85k6ex6u11qoeipmb8o8dnk8df7t X-Rspamd-Server: rspam03 X-HE-Tag: 1707164831-654746 X-HE-Meta: U2FsdGVkX18vgNG3mZ7Z4XH+cChpV4r2o34CGtPzqkYe4iUm8Rv6QvPLcEYtFJHovGLB+YuomcOHcAR+HUcwMXhqPIyY0PQz7QRZ8ga/BXDFO7En/LZGKxmfpHFN4Peousw7nAcYhXXTZbD+fOnaZiI4LRMz7cAJIQiTAuJ0QkVDi95k5Vs0m4VXXlekps2J8WKr5ZbYWZmzJLEJtFu3cT+5EpWHnExv4viat+RfZpwM/WKNFwaoGO1e4NvaevLQhe9MhHFJp4dqQUcENlodZr0UX78YRp1yuheg1nYvl8815l2TRXZpuuc6X4mvgJkFbu7J4/8H7O+bjPrh133MrxVg+RcNSLDSRmAv56TIKoUJzHxCa+Emql+2wXiKdM2lSOhiklun/bw63nGUnW6fwOvUJrDE/WKpNhywdmSHQSGRjPkUhFpaPxkYDkv3Y/fIL8gNd/5q8FGR/RqSDxoB3HVPNJBccfN4qLW7rsRd/VSxhd1Q5wcl7CtFaAYv2FXH9I3eSS6XVrChrqen1PcQdVoaZ6QMJGxWPQT3YmaZA3oMqV6FqkvaGaM5qUG30WJjoz92z2eBQxEv4QpaDucjnQA+Qgc/+4bLd+mE1Cb0WlFssluTjK/EW/1QZHTugW58XqN9JUuXWujIE5QG2R3pD2WxwDIe8tdmH+Z3U3E3ivw8mQ7LvGZY2aY7fRFwFKkQyxIRw4wKK0VKmNhPiQl/7/yhjb20q5OCQ72hlpTnEvmZNT5EWccL9VFqF2Jfn+xp8ImluBWbipz0t9nkLmPTEdawzqKnWNDmSji4AbInswXOmwVAzdiDQ6m9nQqYGnHZK8wPspx6ZDXESD1HZ3juUQxNEK2LPJQJJO6awDv2L+sfoBGfT6aYqoTu72Ze5p9qbwP20EwRTLAoio1I6l0reCsopI30E/QQKncehLeYg/6kv+JnT+CKTbZhHmCX1evvXo1eqWaDeZtWXS878nY rQWSu3Iv Q3Gg4W5uOsDo/nd+uXSvrWzB/ZxK8Ib/gknizTwKjmwJKQ6TxXY4omcP1AADtDAVHt+ACVqq3EL+IWIglmECZDTguoSVgJUncuZZg2kqhhTgaHK9mHNI1YY5PnxNtX9FQoGapcc5UVitdoYsO1rsbMjARuEiQuJOeDqOj9LVaPH2RYvFl8UJsVOP5U366Kr0J5cTbmcBt9WReG0K1V8j5LBtrn8atDuU+U20e9npb8gZ7ZSMhs8VxiiSIirPfTg5DkHz8DLZDuORnH4SGTmFOXVrvfWC+XoY6+MmsUeza6yjAdMmeoWZ6+m1zag== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Feb 5, 2024 at 11:43=E2=80=AFAM Yang Shi wrot= e: > > On Mon, Feb 5, 2024 at 1:45=E2=80=AFAM Michal Hocko wro= te: > > > > On Fri 02-02-24 09:42:27, Yang Shi wrote: > > > But if the partial range is MADV_FREE, khugepaged won't skip them. > > > This is what your second test case does. > > > > > > Secondly, I think it depends on the semantics of MADV_FREE, > > > particularly how to treat the redirtied pages. TBH I'm always confuse= d > > > by the semantics. For example, the page contained "abcd", then it was > > > MADV_FREE'ed, then it was written again with "1234" after "abcd". So > > > the user should expect to see "abcd1234" or "00001234". > > > > Correct. You cannot assume the content of the first page as it could > > have been reclaimed at any time. > > > > > I'm supposed it should be "abcd1234" since MADV_FREE pages are still > > > valid and available, if I'm wrong please feel free to correct me. If > > > so we should always copy MADV_FREE pages in khugepaged regardless of > > > whether it is redirtied or not otherwise it may incur data corruption= . > > > If we don't copy, then the follow up redirty after collapse to the > > > hugepage may return "00001234", right? > > > > Right. As pointed above this is a valid outcome if the page has been > > dropped. User has means to tell that from /proc/vmstat though. Not in a > > great precision but I think it would be really surprising to not see an= y > > pglazyfreed yet the content is gone. I think it would be legit to call > > it a bug. One could argue the bug would be in the accounting rather tha= n > > the khugepaged implementation because madvised pages could be dropped a= t > > any time. But I think it makes more sense to copy the existing content. +1. I agree that the content should be dropped iff pglazyfreed is incremented. Of course, we could do that here, but I don't think there is a good reason to, and we should just copy the contents. > Yeah, as long as khugepaged sees the MADV_FREE pages, it means they > have "NOT" been dropped yet. It may be dropped later if memory > pressure occurs, but anyway khugepaged wins the race and khugepaged > can't assume the pages will be dropped before they get redirtied. So > copying the content does make sense. Per Lance, I kinda get that this "undermines" MADV_FREE, insofar that, from the user's perspective, that memory which was intended as a buffer against OOM kill scenarios, is no longer there to reclaim trivially.= I don't have a real world example where this is an issue, however. Also, not copying the contents doesn't change that fact. The proper alternative, if you want to make the "undermining" argument, is for khugepaged to stay away from hugepage regions with any MADV_FREE pages. I think it's fair to assume MADV_FREE'd memory is likely cold memory, and therefore not a good hugepage target anyways. However, it'd be unfortunate if there were a couple MADV_FREE pages in the middle of an otherwise hot / highly-utilized hugepage region that would prevent it from being pmd-mapped via khugepaged. But.. this is exactly-ish what you get when hugepage-ware system/runtime allocators split THPs to free up internal caches. Best, Zach > > -- > > Michal Hocko > > SUSE Labs