From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8EA2EC48BC4 for ; Tue, 20 Feb 2024 10:16:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CFFFC6B0075; Tue, 20 Feb 2024 05:16:05 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CB07D6B0078; Tue, 20 Feb 2024 05:16:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B77DC6B007B; Tue, 20 Feb 2024 05:16:05 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id A36B46B0075 for ; Tue, 20 Feb 2024 05:16:05 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 3E0DEC05DD for ; Tue, 20 Feb 2024 10:16:05 +0000 (UTC) X-FDA: 81811776690.10.ABBBA11 Received: from mail-yw1-f175.google.com (mail-yw1-f175.google.com [209.85.128.175]) by imf21.hostedemail.com (Postfix) with ESMTP id 75F7B1C001C for ; Tue, 20 Feb 2024 10:16:03 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="U/fABOmu"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf21.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.128.175 as permitted sender) smtp.mailfrom=ioworker0@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1708424163; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=uvy9dbKaRp8OwtO8eEkXBs3FsE0K1ieWy2Kp6/Qphx0=; b=76EFu8iSKeuYyzk87T0+9Zw0R+29wNBBzBNsaGOs+golxrm9Y3/V9yr4qNUWa8AdvJVJng qpW+vNS5OqsMEHnqFJINeqGQJoHGHaQ9yMUxCnK3MgBu+liBjHk0/RloOm19xxEKwT1dGQ 9f2L7zaU0ZXDdV2JWgtotyT2oGoKMyM= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="U/fABOmu"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf21.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.128.175 as permitted sender) smtp.mailfrom=ioworker0@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708424163; a=rsa-sha256; cv=none; b=79DIP/ukyMGBUHs07dm075wXRdAmMv86GWNpzBbiOfqnGag/h1mztlsurugQf9T0d8CGkU fX5wQrj0ksPpj/zLkdDQJ+MekYmvZwqpna3B+djnVtSP+3TrfaA9sYfhdw3lLn4Zi+nrwj DOebexpDNyWYJSX47jWPTqnOdm/A2n8= Received: by mail-yw1-f175.google.com with SMTP id 00721157ae682-6087192b092so882027b3.0 for ; Tue, 20 Feb 2024 02:16:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708424162; x=1709028962; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=uvy9dbKaRp8OwtO8eEkXBs3FsE0K1ieWy2Kp6/Qphx0=; b=U/fABOmu0SsvvENUAugK6ZMSWDGBGhks/qICR/wFXeyGV7RBOSuYVyqhAtV5ZYWbj5 A3Qv/nDfP/UtoG4QBX85/W7XzXOfqVUlnYfvaZh0/SWc0H55ccoLBosTBXfm4tOB1XTx VWKmstsYLB35IU0b/ffbUHtdDOA5Mu9c4FVe2L6H3VXGsmeVz/bEh+KtYAJwfqj5uTQw oSUb1Z/yX87DzfuXS2m4FDk6nw6SG1+2yVqGFSKJpzT+Pc9tOhz/G/jZ5FiFytyi6c5h Biw3qUUuOGLR2aeKJtgOInMx4mKesdRov6Ce1jyBd1yZ80yR3OgtsXMjy4wiix6R1ImH L4KA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708424162; x=1709028962; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=uvy9dbKaRp8OwtO8eEkXBs3FsE0K1ieWy2Kp6/Qphx0=; b=w76CEXNUT+KVaeGjnnSklBG/kLSNDiFkBtwVkQibCAPtqNQtd/8hnzAU2j1Zk0oZbE FgeAb6f1z77YMecy4xqcA/neH02tZP2mllW3awsjYcohNqicy8wigDt8FIbFMjfiooW2 2dvYUmwjfBcJvfsk9Dt395kuI8nENkeEyAlBA8EZ22W90OioMPE13myQKZuozBx7BbC2 95amd33UjgME30MfeaQgyzjuFYifibwsYgLsguiLqf5PMbqunpA+Jzl0fCLtxw9iLdom apnEJnFi74Sx8cAmzxMlkRqHxYmD/i6re312PqoyPUKHElZyu1y9Gz+67socRg+J3ilH kH2g== X-Forwarded-Encrypted: i=1; AJvYcCWuQYnzTgHGz6vgRXrsvkr5HtKgPP28fPKEy9aGdtx9uyvKyf51t/a8lqJ9vGxttGwJcdzoIwT2ujdhqbhytRwOs78= X-Gm-Message-State: AOJu0Ywdqe4xfyU1q0DiY6NdTsdA052R3SsnjRqgz55OoAyh1HjIeor0 SUSnzZy5mDgHWNsEiMqCT0FBaktwkV5QveAyliSIBwxWHB6rG0sqjz8Szm+6MI2LWN7h4hMAn6a twgMZtTeVl+N7S5HmSg68pN2lIes= X-Google-Smtp-Source: AGHT+IEERfe6iW/E1dAnOCGoibxXNFpoPD9BgzOgYNqIFNzs9JcewOVVVUhEgwt2Tr9ifjKtiHGjho2ny43YNt4NLOM= X-Received: by 2002:a05:690c:d17:b0:608:4bab:8b06 with SMTP id cn23-20020a05690c0d1700b006084bab8b06mr4145412ywb.45.1708424162444; Tue, 20 Feb 2024 02:16:02 -0800 (PST) MIME-Version: 1.0 References: <20240201125226.28372-1-ioworker0@gmail.com> In-Reply-To: From: Lance Yang Date: Tue, 20 Feb 2024 18:15:48 +0800 Message-ID: Subject: Re: [PATCH 1/1] mm/khugepaged: skip copying lazyfree pages on collapse To: "Zach O'Keefe" , Yang Shi , Michal Hocko , David Hildenbrand Cc: akpm@linux-foundation.org, songmuchun@bytedance.com, peterx@redhat.com, minchan@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 75F7B1C001C X-Stat-Signature: xur3pfjsqmwf89kih6t1yqct1rcxwrm3 X-Rspam-User: X-HE-Tag: 1708424163-527132 X-HE-Meta: U2FsdGVkX1+vDaySyQsE4wF64rXIGBTSizKCjar5JIrEvw1AYSDA0X61GsjB5LeFmfsfBrZHKtuKUw4xr/XYuVqm0vZG634AtutSY5yp6tks6ya1PmIRix5Uod4XCqjc6Y8xHqOHRpAUBN+V9RdxRWg1JsDo6BJXsYvvCadejoXqAAuL9xbFiumGJBgLq2plBwMMGnpOTUAlsmVHJsrw1Kd7y8tasHxZAdmzoq0YXyKW5mo0Zf8wko1zj3LHY0bck53WQrHLbRFVVdf+gogJWBzPYzo5+Ithkg57vWB46rUPnPcrQ1TR1nJVs9MR1wIxe8lML8r5+5qjCE50QTd1D0k+nItu+3Qevo3Np1YM/G2x5wzi9mZTv2nWILgklc0+v3PWSe1vufHAQnBpmruaWzHfQ5hBHOhffTkDULACmCu7NNrblWRtU8HKE+wpwR+14f8VP067CGnU/DVuYUC9XSsMDjrmMvy45dRcG/YIE7greYRiT8OyC2pVH2Yfg0ipJqlf5NgjP5tXG0yOF0PdpAXtHJ8MCDUuu9Jz72GYF1YMVzQ0fG8dI550UsoPUShRcTa+xia9b38I3EnN2BoPI4Z91n7XyDaAihcj/4jFWvPCFxKPAyGfZt+Q13i3qaIMHRIX2zwe2H/WCcdApsz2/hI1sTX19hSB3pA0wHN4NXgkAAM6J2SdNMLA1h9q2jEqzh42ml9wudLcWK6ZO2i8pMHYX1GbVc/FrwyH9hYCbMT7VskjZksRzvByYnCEzZx24/ji8r7FUm4p8SnkR40VB4CqTuFrBR2tJ4Kd6ss4gVIg7vrsYN66Hgnd4dyCIFBiJMOvBfabrIuEGGzoNGX0Qw9KMkH98arSuO9eog/r7KeU5eQnhDpjiZzmhsQwE+ApOWIf8e+A/sf9rMwjjxh/Jm9za1JzJaby0d+6TAtbVQj0oBa3mptrywa74Hq8uOAUk5x7pG7kqU3LAIof5eP j8vPo6A/ N95bMp6giIbUDC9O77aY9Jgtos2RAw3Rp5aIkOvTshVuY5XQaO7VP8k9kse3FRoFr9ECPIOxhMSKmc15tguDwgX2mMp/vgLWHYA+jN0rpXhlj01alSTBv7XU8FJ6kIbnAfUxswHpaQB/+jaivQ2ugtBDaSPGFUcCXv4cG6mQEC010d0PGUAGAvDhyq4E95x2dHITWfxKwjYyQ8GxGToHGLEI7E8HxR21thfnxltd79ot6iIekxIOrO/NXvtqDgR2DZotJGP7XxAo/8h/v27m+Lsj2g1xjh8E3rywR03f37L95GftMGviWk7IlFPcN5HvEvnyvbpfTZIdNrFBj81A9W/J7zaJWAU+3ST6iwShy9RaccK451bzCi1Vol17iCERtDmI8bJ0JVa8xWXrhC2YnMtzCPCalvyRu0xj//VBAmeV+h86w8dPB8eWbMm1MyLnqakS9CCYI8t+cbW4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hey Zach, Yang, Michal, and David, Please accept my sincerest apologies for the delayed response. Thanks for the replies; it=E2=80=98s been very helpful to me! I also appreciate the valuable information you=E2=80=99ve shared! I agree that it=E2=80=99s not a good idea to let khugepaged avoid any pages marked with MADV_FREE. Thanks again for your time! Best, Lance On Tue, Feb 6, 2024 at 4:27=E2=80=AFAM Zach O'Keefe wr= ote: > > On Mon, Feb 5, 2024 at 11:43=E2=80=AFAM Yang Shi wr= ote: > > > > On Mon, Feb 5, 2024 at 1:45=E2=80=AFAM Michal Hocko w= rote: > > > > > > On Fri 02-02-24 09:42:27, Yang Shi wrote: > > > > But if the partial range is MADV_FREE, khugepaged won't skip them. > > > > This is what your second test case does. > > > > > > > > Secondly, I think it depends on the semantics of MADV_FREE, > > > > particularly how to treat the redirtied pages. TBH I'm always confu= sed > > > > by the semantics. For example, the page contained "abcd", then it w= as > > > > MADV_FREE'ed, then it was written again with "1234" after "abcd". S= o > > > > the user should expect to see "abcd1234" or "00001234". > > > > > > Correct. You cannot assume the content of the first page as it could > > > have been reclaimed at any time. > > > > > > > I'm supposed it should be "abcd1234" since MADV_FREE pages are stil= l > > > > valid and available, if I'm wrong please feel free to correct me. I= f > > > > so we should always copy MADV_FREE pages in khugepaged regardless o= f > > > > whether it is redirtied or not otherwise it may incur data corrupti= on. > > > > If we don't copy, then the follow up redirty after collapse to the > > > > hugepage may return "00001234", right? > > > > > > Right. As pointed above this is a valid outcome if the page has been > > > dropped. User has means to tell that from /proc/vmstat though. Not in= a > > > great precision but I think it would be really surprising to not see = any > > > pglazyfreed yet the content is gone. I think it would be legit to cal= l > > > it a bug. One could argue the bug would be in the accounting rather t= han > > > the khugepaged implementation because madvised pages could be dropped= at > > > any time. But I think it makes more sense to copy the existing conten= t. > > +1. I agree that the content should be dropped iff pglazyfreed is > incremented. Of course, we could do that here, but I don't think there > is a good reason to, and we should just copy the contents. > > > Yeah, as long as khugepaged sees the MADV_FREE pages, it means they > > have "NOT" been dropped yet. It may be dropped later if memory > > pressure occurs, but anyway khugepaged wins the race and khugepaged > > can't assume the pages will be dropped before they get redirtied. So > > copying the content does make sense. > > Per Lance, I kinda get that this "undermines" MADV_FREE, insofar that, > from the user's perspective, that memory which was intended as a > buffer against OOM kill scenarios, is no longer there to reclaim triviall= y. I > don't have a real world example where this is an issue, however. Also, > not copying the contents doesn't change that fact. > > The proper alternative, if you want to make the "undermining" > argument, is for khugepaged to stay away from hugepage regions with > any MADV_FREE pages. I think it's fair to assume MADV_FREE'd memory is > likely cold memory, and therefore not a good hugepage target anyways. > However, it'd be unfortunate if there were a couple MADV_FREE pages in > the middle of an otherwise hot / highly-utilized hugepage region that > would prevent it from being pmd-mapped via khugepaged. But.. this is > exactly-ish what you get when hugepage-ware system/runtime allocators > split THPs to free up internal caches. > > Best, > Zach > > > > > -- > > > Michal Hocko > > > SUSE Labs