From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0A930C4828F for ; Fri, 2 Feb 2024 17:42:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6FACE6B0072; Fri, 2 Feb 2024 12:42:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6AACA6B007B; Fri, 2 Feb 2024 12:42:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 54B396B007D; Fri, 2 Feb 2024 12:42:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 4321D6B0072 for ; Fri, 2 Feb 2024 12:42:43 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 01107A2657 for ; Fri, 2 Feb 2024 17:42:42 +0000 (UTC) X-FDA: 81747583806.06.37817EF Received: from mail-pf1-f181.google.com (mail-pf1-f181.google.com [209.85.210.181]) by imf11.hostedemail.com (Postfix) with ESMTP id 302E340017 for ; Fri, 2 Feb 2024 17:42:40 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=JHExK0PB; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf11.hostedemail.com: domain of shy828301@gmail.com designates 209.85.210.181 as permitted sender) smtp.mailfrom=shy828301@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706895761; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=YbZrUNv0lorsBlq8MWkZPKDTCLxKxKnhmcAsSst1DGI=; b=2wle79heHdkqyZBhRKXuKHNrubNrQSSSJ8Zks4QAoE+gTdDqWRe8y4j6z45nCdVralF6Pq Y3i4txIIujFyji/J6F1XYDRlDbK2og4+ELXkk62cnvm9IL7iJVCv6FeiZ37Vako+i1swa0 I75NQGSFpnCU9zQWKZzmPOb8srNl6GI= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=JHExK0PB; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf11.hostedemail.com: domain of shy828301@gmail.com designates 209.85.210.181 as permitted sender) smtp.mailfrom=shy828301@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706895761; a=rsa-sha256; cv=none; b=i4Z1zNDidbP3FWB9pIw6J34xGcL8A0Itd4qP3BKEai8df9jHw+E6WC4uZh+eFct41fCoi9 xDv+K36mb65NkAAWFZvQNDKj2w7r1O1SSWzOCxtd6Xpfw9krtrQVAY92dORKW2C7TcqQ8E qBImr77Q1wYSeZK6H4068oJD98xVJws= Received: by mail-pf1-f181.google.com with SMTP id d2e1a72fcca58-6da202aa138so1513559b3a.2 for ; Fri, 02 Feb 2024 09:42:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1706895760; x=1707500560; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=YbZrUNv0lorsBlq8MWkZPKDTCLxKxKnhmcAsSst1DGI=; b=JHExK0PB10xvHdTsub2TS3JsIxd63KXeTnoWMYKaSySWOLPaMVuDHUyvEHmjqmFITd pRYaacqWgioSQBi7jzZtGcJNfpBSH6zKAJhLi6xeSyq+YpxD4+wUYc51nnJ8UyBoaR3c JfISIr6iZCx9hJ0QFb4URA9JN3mmpxqZXLInJzx2ykD3UEMTM+hdJxiGdo2y5vtZ729/ EKWZYXGdrCEhFavF+xEwvmT+p0iloGfI3GayQNqxM8KJiTS5p++/A1PwLtMvdxd1/kk1 FN53zOVk72+Mmq3JuMZ9isjniOiuCXwagkD1/OnEGNfX9h/Y+en/DYaTiin+UULfjgqg FwNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706895760; x=1707500560; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YbZrUNv0lorsBlq8MWkZPKDTCLxKxKnhmcAsSst1DGI=; b=gVpoTyN+jWDWBJU7T7tqNfnkP8Tp1f4BpCn8jeyoit4Ly6y6b5JHg/nAz7dCc9TsIA d82FNpd/rmS9MzfIiSLqunpsXD+XvMzlGO+d93iRpzFVadkQ5TuUpx3QJJc4EtfZEzKY nN0DujkYY5ub55lBfnM2OHJqgW59RqfEe9ay52AOXcYikwFdtV+kyHjzOXJ1TV8vI5gn Qp0n7aZIbwlCGGAmVfpw2eZ1fV0FJL+yzpBYy3hR7IfD21qfYNIRxfx4Mt1hDjSmyxOt VGytsDfNzCx6//JxtRx/SEnhcadpqY9qSf7x7r+W3stR2eM2HnKEPLJ4FnP01ctfv3ii 5PvA== X-Gm-Message-State: AOJu0Yypu0q0XYmwMWxTtj4+vnKHeZSFaZ25adxB8qSSJ4s1+SC1+UPd IpC/LRuy45JknM5/20SDxMl+5ghCjtxDVKYNIEHKWQmdWXKPbRigJEdZm1ebn5TmCvOzezn0+4D uXx1dS/FSczqjGv+eIS/1RMEETLY= X-Google-Smtp-Source: AGHT+IEagYrFGa1mLHqR4YR39HzL8/XAu8T17Suk0bkr6hsml9Fy+Y3fUJy/q1XbAv78Ia5V9IAZUXWTGwhxoTOAoKI= X-Received: by 2002:a62:f24b:0:b0:6d9:aaef:89a7 with SMTP id y11-20020a62f24b000000b006d9aaef89a7mr2981119pfl.10.1706895759977; Fri, 02 Feb 2024 09:42:39 -0800 (PST) MIME-Version: 1.0 References: <20240201125226.28372-1-ioworker0@gmail.com> In-Reply-To: From: Yang Shi Date: Fri, 2 Feb 2024 09:42:27 -0800 Message-ID: Subject: Re: [PATCH 1/1] mm/khugepaged: skip copying lazyfree pages on collapse To: Lance Yang Cc: Michal Hocko , akpm@linux-foundation.org, zokeefe@google.com, david@redhat.com, songmuchun@bytedance.com, peterx@redhat.com, minchan@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 302E340017 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: s7tqwfsoyhzmq8618413hbsydhkkgwkm X-HE-Tag: 1706895760-510994 X-HE-Meta: U2FsdGVkX19aLXRNyVCDEb3vHIKKZ3rTCU5Fh9cebWLf/WBtBVMCocfRXMtGxi/Tx+lqTdf17St3R/XQJmPQGNpDlqZ6bQkOm0qJCnAwjZ7fXv0erutpdXpIyJzQOKp8ZghGXHqni6LaLXNj2HQuStsgL02wddCLXlvE84/Yw8+BySbdEZgB58X9Jq4d9laCWGDOdnU5dp+fozykayP+mcwwDjCKXbfBZaQbUA1nNZUDzTPr9zsHk7WrhbEuVhgdz4JQ7+yy/DUk3js2rkQd8/mWRgbEPjBcuc+PMlI/9fE7V3pFuGwmcr6j1h70Jjo6MTxfOnUreaiYapdpno0yApB3KQPYLbpfBrd9NeV2Zl+OVo3gigw9LCAxpss8quEQcRz/7qDSqK4NLdye1EU4B0dkHjEJp6C1VoTfpptpRICJc+64PhQcBTKFnVpk2cRFyS3OhrxEGkGWO3sRitibtgO1qbyLT8fio6X+0aXjoXYOi7r20amWaRjzQL/HCmhn6ACPT4+2HfERFhqcsQiKmfRsYnyR/mZ7MipU4djHXfl+L0lDxmdcaq6ur4JRFRJRJIe17KOq9FHpqGBUawIJbklCG86PY2HXHBigUTooLc7vQO36SgPwAWOkTFJ4EjgJEXz+HsQEgCW5JNcn4z9Lcf91iwwzgn+CnoJXV/8yoljKiHhkWMQE+rmi4gFX9Y3qy8JjsJaexrWGtRfnhCkpeCnJysWyQWjP0xqi7okQ1IhF3gcxFqeZNLC2rbOmmFeCRQ1MDRr/5smuYuipHtgevyT/fvWIxyVtAQaUAYH8S0YHzwAeYIwKX/gyPrulXZWGy/iqQ9mnN4/mVlNm8DhDmhXgGgsT75vvNL3iHk4LHGjaRcS/4hzSdtaP+6z7eAxqM443pCTMy6EhWvc5qDpwO4XFvHhUPCZif/fMPl9IfpJopxmFBseFkoF6z03R48XVF+ZLpRl+ill8SATOwkY pFXX1FaF m0eSlutr0OtU7AS8JBNch7htvsDrBNLTsv4uwsxq52buCeesPMZJj8asjbRLz79zM5HgqL38P/IGy5r0cwsDDseUm7xEkwHZ3AONq47bbSLzSte+91o4y69Utx7ymcATjsbNQn6WWIBEkZUJvZBmG/+Whr765xsj2eSIEZoU014V6roB8sAHeF1B4UuVWiUqrrStfG71lLkDqeoPA7ZgO2fRbO8ORkVJ31/I5WaNBZl+hLimJI8KrqrQU48x2ihLA1cTP2NeGuCQF8kUbSpRfhVOwdJsnU4Nm29K8y1jbfDbhp3XIuHNi1EoHp2bjLVlT75P7P6m5z4O/MWd6mhsll+btqwaoadBAY/1H1pkFh+NqDjy9WOC9kptvyaSEykUX9wdGW0G736xdpYXCebYvrX+/k2zc2sMw/FU2GQBQlkLjE0qCM4J8EOGOsg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.001966, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Feb 2, 2024 at 6:53=E2=80=AFAM Lance Yang wro= te: > > How about blocking khugepaged from > collapsing lazyfree pages? This way, > is it not better to keep the semantics > of MADV_FREE? > > What do you think? First of all, khugepaged doesn't treat MADV_FREE pages as pte_none IIUC. The khugepaged does skip the 2M block if all the pages are old and unreferenced pages in the range in hpage_collapse_scan_pmd(), then repeat the check in collapse_huge_page() again. And MADV_FREE pages are just old and unreferenced. This is actually what your first test case does. The whole 2M range is MADV_FREE range, so they are skipped by khugepaged. But if the partial range is MADV_FREE, khugepaged won't skip them. This is what your second test case does. Secondly, I think it depends on the semantics of MADV_FREE, particularly how to treat the redirtied pages. TBH I'm always confused by the semantics. For example, the page contained "abcd", then it was MADV_FREE'ed, then it was written again with "1234" after "abcd". So the user should expect to see "abcd1234" or "00001234". I'm supposed it should be "abcd1234" since MADV_FREE pages are still valid and available, if I'm wrong please feel free to correct me. If so we should always copy MADV_FREE pages in khugepaged regardless of whether it is redirtied or not otherwise it may incur data corruption. If we don't copy, then the follow up redirty after collapse to the hugepage may return "00001234", right? The current behavior is copying the page. > > Thanks, > Lance > > On Fri, Feb 2, 2024 at 10:42=E2=80=AFPM Michal Hocko wr= ote: > > > > On Fri 02-02-24 21:46:45, Lance Yang wrote: > > > Here is a part from the man page explaining > > > the MADV_FREE semantics: > > > > > > The kernel can thus free thesepages, but the > > > freeing could be delayed until memory pressure > > > occurs. For each of the pages that has been > > > marked to be freed but has not yet been freed, > > > the free operation will be canceled if the caller > > > writes into the page. If there is no subsequent > > > write, the kernel can free the pages at any time. > > > > > > IIUC, if there is no subsequent write, lazyfree > > > pages will eventually be reclaimed. > > > > If there is no memory pressure then this might not > > ever happen. User cannot make any assumption about > > their content once madvise call has been done. The > > content has to be considered lost. Sure the userspace > > might have means to tell those pages from zero pages > > and recheck after the write but that is about it. > > > > > khugepaged > > > treats lazyfree pages the same as pte_none, > > > avoiding copying them to the new huge page > > > during collapse. It seems that lazyfree pages > > > are reclaimed before khugepaged collapses them. > > > This aligns with user expectations. > > > > > > However, IMO, if the content of MADV_FREE pages > > > remains valid during collapse, then khugepaged > > > treating lazyfree pages the same as pte_none > > > might not be suitable. > > > > Why? > > > > Unless I am missing something (which is possible of > > course) I do not really see why dropping the content > > of those pages and replacing them with a THP is any > > difference from reclaiming those pages and then faulting > > in a non-THP zero page. > > > > Now, if khugepaged reused the original content of MADV_FREE > > pages that would be a slightly different story. I can > > see why users would expect zero pages to back madvised > > area. > > -- > > Michal Hocko > > SUSE Labs