From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C5B0BC433EF for ; Sat, 12 Feb 2022 11:11:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C690E6B0072; Sat, 12 Feb 2022 06:11:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C19FB6B0073; Sat, 12 Feb 2022 06:11:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AE2466B0078; Sat, 12 Feb 2022 06:11:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0058.hostedemail.com [216.40.44.58]) by kanga.kvack.org (Postfix) with ESMTP id 9E2336B0072 for ; Sat, 12 Feb 2022 06:11:51 -0500 (EST) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 583BC918D0 for ; Sat, 12 Feb 2022 11:11:51 +0000 (UTC) X-FDA: 79133862822.26.3F806CF Received: from mail-yb1-f177.google.com (mail-yb1-f177.google.com [209.85.219.177]) by imf27.hostedemail.com (Postfix) with ESMTP id 596644000C for ; Sat, 12 Feb 2022 11:11:50 +0000 (UTC) Received: by mail-yb1-f177.google.com with SMTP id c6so32194175ybk.3 for ; Sat, 12 Feb 2022 03:11:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=xrTak1VFmwYfWi/5C91pgPU3GcQuoEOjal0LN32/VtE=; b=CRBUDFDPb5R16DSAWpwZf/x5195N0j/8r4hfLE0UbeTlIlNE3JMq8vFj1Q0xmCztG4 //BPPIY5FfbLE5Eu0N9UygveVJF8xWRZk7QxsM0kgu27NVucgdp9qSa9ZPuMkNhpgRaH q2T626YECEwJgtidwqOSiRLVOBFTT6KIAaLqEMkFy83d8ETDdgwc6p9FxhcTuM+ux7JR 4N1n71TM2ra90WWyzsUKko7/RLvyFLJ5ZFh3UH9IlTc5MS5obNGOWXeWp+mvnNmRIK5y AlDE3jIpjcr07RN0LwEZN7KwU/5QW8+vMNgkC7mrDpkWZgmOf9Qlzo3mHR+rY/PAy/8O XW/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=xrTak1VFmwYfWi/5C91pgPU3GcQuoEOjal0LN32/VtE=; b=Yh15FI5sJS7bzqyRwURrOW+au3Ex6UQkOexCUzKBS1O8/mpVSy46m7/onATYMP7mzF Ai8R9JwqkDnEUd/WxhBn15sdDklg29maJmT1NbeV254OwC1moAuXunAj5z2L05SSqTHy 9Ir25xkcY1mTpKRu+NKmqPOspIyH0J/eqr/RzkYfsOrc9cEgqr292GCJWLrNdQa9SlKA 00o8qr54Ic7qYEYkoD9umzqYPZ7T/KHhkQ0xFJLZbec4tOfc9AnofuYr7YMPiVjykRur F86mWUlg+FuwPng+mJWDn6ZgU1enIxfEO20hn5r8MHMjHti295Q+VZqmwRF0dvMP+60X 4TFg== X-Gm-Message-State: AOAM532FbLqK6vgw+HulMGk8f5fzfJAWTCHeei37EDPLFJON90B71v6A sGW17TjYl2oBygfdD2IYgJvCYEQ2l2urPONgIV+9jQ== X-Google-Smtp-Source: ABdhPJyzi0DB9toSCVpkfD/oWJTS16ZkfcTvJtuyuZj4HY/NlpJ3rJtNI3SnPC60K5CR6cLB+J1EoQsHVS4+JnYv398= X-Received: by 2002:a25:1e82:: with SMTP id e124mr2221807ybe.485.1644664309419; Sat, 12 Feb 2022 03:11:49 -0800 (PST) MIME-Version: 1.0 References: <20220210193345.23628-1-joao.m.martins@oracle.com> <20220210193345.23628-6-joao.m.martins@oracle.com> In-Reply-To: From: Muchun Song Date: Sat, 12 Feb 2022 19:11:13 +0800 Message-ID: Subject: Re: [PATCH v5 5/5] mm/page_alloc: reuse tail struct pages for compound devmaps To: Joao Martins Cc: Linux Memory Management List , Dan Williams , Vishal Verma , Matthew Wilcox , Jason Gunthorpe , Jane Chu , Mike Kravetz , Andrew Morton , Jonathan Corbet , Christoph Hellwig , nvdimm@lists.linux.dev, Linux Doc Mailing List Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 596644000C X-Stat-Signature: d7yaequb43py6hsu4j71jqa8j5pbb5qd X-Rspam-User: Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=CRBUDFDP; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf27.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.219.177 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com X-HE-Tag: 1644664310-536331 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Feb 11, 2022 at 8:48 PM Joao Martins wrote: > > On 2/11/22 05:07, Muchun Song wrote: > > On Fri, Feb 11, 2022 at 3:34 AM Joao Martins wrote: > >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >> index cface1d38093..c10df2fd0ec2 100644 > >> --- a/mm/page_alloc.c > >> +++ b/mm/page_alloc.c > >> @@ -6666,6 +6666,20 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn, > >> } > >> } > >> > >> +/* > >> + * With compound page geometry and when struct pages are stored in ram most > >> + * tail pages are reused. Consequently, the amount of unique struct pages to > >> + * initialize is a lot smaller that the total amount of struct pages being > >> + * mapped. This is a paired / mild layering violation with explicit knowledge > >> + * of how the sparse_vmemmap internals handle compound pages in the lack > >> + * of an altmap. See vmemmap_populate_compound_pages(). > >> + */ > >> +static inline unsigned long compound_nr_pages(struct vmem_altmap *altmap, > >> + unsigned long nr_pages) > >> +{ > >> + return !altmap ? 2 * (PAGE_SIZE/sizeof(struct page)) : nr_pages; > >> +} > >> + > > > > This means only the first 2 pages will be modified, the reset 6 or 4094 pages > > do not. In the HugeTLB case, those tail pages are mapped with read-only > > to catch invalid usage on tail pages (e.g. write operations). Quick question: > > should we also do similar things on DAX? > > > What's sort of in the way of marking deduplicated pages as read-only is one > particular CONFIG_DEBUG_VM feature, particularly page_init_poison(). HugeTLB > gets its memory from the page allocator of already has pre-populated (at boot) > system RAM sections and needs those to be 'given back' before they can be > hotunplugged. So I guess it never goes through page_init_poison(). Although > device-dax, the sections are populated and dedicated to device-dax when > hotplugged, and then on hotunplug when the last user devdax user drops the page > reference. > > So page_init_poison() is called on those two occasions. It actually writes to > whole sections of memmap, not just one page. So either I gate read-only page > protection when CONFIG_DEBUG_VM=n (which feels very wrong), or I detect inside > page_init_poison() that the caller is trying to init compound devmap backed > struct pages that were already watermarked (i.e. essentially when pfn offset > between passed page and head page is bigger than 128). Got it. I haven't realized page_init_poison() will poison the struct pages. I agree with you that mapping with read-only is wrong. Thanks.