From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4AC47C433F5 for ; Wed, 12 Jan 2022 16:23:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9EA556B0194; Wed, 12 Jan 2022 11:23:03 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 972CE6B0197; Wed, 12 Jan 2022 11:23:03 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7ECC16B0198; Wed, 12 Jan 2022 11:23:03 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0009.hostedemail.com [216.40.44.9]) by kanga.kvack.org (Postfix) with ESMTP id 53CC66B0194 for ; Wed, 12 Jan 2022 11:23:03 -0500 (EST) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id DBF3D944F1 for ; Wed, 12 Jan 2022 16:23:02 +0000 (UTC) X-FDA: 79022154204.17.13240B2 Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) by imf27.hostedemail.com (Postfix) with ESMTP id 98A2B40013 for ; Wed, 12 Jan 2022 16:23:01 +0000 (UTC) Received: by mail-pl1-f177.google.com with SMTP id p14so4779321plf.3 for ; Wed, 12 Jan 2022 08:23:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=I0AqeS6BA3pCC8i1HAk3s92KqvM91Rqh0lsE3L78uKo=; b=gm4xXWIbL+cDI5UJ0hlkwBYXUPbZsc1AghUEf7KbVXdePqYx7BBeYZPLqgFfeBIFf0 658REpLSjaUQM/SuyQKnR70LAW8sYVPOxHRWA6HL92HjN5zk+dnA7PUChYqAfdj6s8sp xl/kiCnaxxCdIa3dZ9uJC9UsUBkbqRxv1kXI5zo3AUak/VshzZlazU8f85zfOho759mV sUZ0C0Rdbc2wP5Umg90jnd2984TMQVbw3wr1nXO8iqF1QayP0/ojH6MesrPV8rQXq0Qr +ICTRKKXmCwqx5NRMNe1WwEPhqfcXImm7B1UEN/oaUbMNJBvIVNQr61pZgd6MgiRUPU1 Fb3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to; bh=I0AqeS6BA3pCC8i1HAk3s92KqvM91Rqh0lsE3L78uKo=; b=TuJsaL5NM7dxFmUz/rh/bx25vkv44RruU9mantB4sV6K3n6uYvU+LG8wZGgL5HfKWk 3whntQLFNFZBZYjQThoVzXFAYoQ8vLrPpBfO53xf7Iq0nUsEP2/TsE2oDaqAV+8CZoDY UntJjmb32oAkrCoXQ5nMi+SOYaFnzSdjpv6ObYaSK2Q1x3Nnsv/LSUHa9O5+LaT59kZ1 bxLyyTvSi5MIHICDECwiSHl1jpXxTUu4kA3Rgsp/+snI3JSSMuDHYMwQdp4nxM7d7o44 WT4QUt9eCorzwTE4Eto0BqvUGWCeb309rFG3Q1BjlftU/tL0aw7k0XRo/1XNdli1MQDj G/3A== X-Gm-Message-State: AOAM531w2vb1y9Kkx52+RbI7GhxiZTJbdi3OX3bxBOv5rhgSJIz19z1x ZgGJmWKOODriTbbCkdcXV4k= X-Google-Smtp-Source: ABdhPJxspsQsUfyfHSgHp0ltYeX16vTV8UqzZ+Xs+PYxC+gbBp6mW9CgWOZInRdKjGKZhezt/PgVmQ== X-Received: by 2002:a17:902:8201:b0:149:8dd5:f0ed with SMTP id x1-20020a170902820100b001498dd5f0edmr384279pln.31.1642004580258; Wed, 12 Jan 2022 08:23:00 -0800 (PST) Received: from google.com ([2620:15c:211:201:b6c7:c163:623d:56bc]) by smtp.gmail.com with ESMTPSA id z16sm172592pgi.89.2022.01.12.08.22.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 12 Jan 2022 08:22:59 -0800 (PST) Date: Wed, 12 Jan 2022 08:22:57 -0800 From: Minchan Kim To: David Hildenbrand Cc: Andrew Morton , Michal Hocko , linux-mm , LKML , Suren Baghdasaryan , John Dias , huww98@outlook.com, John Hubbard Subject: Re: [RFC v2] mm: introduce page pin owner Message-ID: References: <20211228175904.3739751-1-minchan@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 98A2B40013 X-Stat-Signature: bmg69pkwk5juo4ra5szk7i4fianb59d6 Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=gm4xXWIb; spf=pass (imf27.hostedemail.com: domain of minchan.kim@gmail.com designates 209.85.214.177 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=kernel.org (policy=none) X-HE-Tag: 1642004581-976989 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jan 12, 2022 at 01:25:04PM +0100, David Hildenbrand wrote: > On 28.12.21 18:59, Minchan Kim wrote: > > A Contiguous Memory Allocator(CMA) allocation can fail if any page > > within the requested range has an elevated refcount(a pinned page). > > > > Debugging such failures is difficult, because the struct pages only > > show a combined refcount, and do not show the callstacks or > > backtraces of the code that acquired each refcount. So the source > > of the page pins remains a mystery, at the time of CMA failure. > > > > In order to solve this without adding too much overhead, just do > > nothing most of the time, which is pretty low overhead. However, > > once a CMA failure occurs, then mark the page (this requires a > > pointer's worth of space in struct page, but it uses page extensions > > to get that), and start tracing the subsequent put_page() calls. > > As the program finishes up, each page pin will be undone, and > > traced with a backtrace. The programmer reads the trace output and > > sees the list of all page pinning code paths. > > > > It's worth noting that this is a pure debug feature, right? Sure. > > > I like the general approach, however, IMHO the current naming is a bit > sub-optimal and misleading. All you're doing is flagging pages that > should result in a tracepoint when unref'ed. > > "page pinners" makes it somewhat sound like you're targeting FOLL_PIN, > not simply any references. > > "owner" is misleading IMHO as well. > > > What about something like: > > "mm: selective tracing of page reference holders on unref" > > PAGE_EXT_PIN_OWNER -> PAGE_EXT_TRACE_UNREF > > $whatever feature/user can then set the bit, for example, when migration > fails. I couldn't imagine put_page tracking is generally useful except migration failure. Do you have reasonable usecase in your mind to make the feature general to be used? Otherwise, I'd like to have feature naming more higher level to represent page migration failure and then tracking unref of the page. In the sense, PagePinOwner John suggested was good candidate(Even, my original naming PagePinner was worse) since I was trouble to abstract the feature with short word. If we approach "what feature is doing" rather than "what's the feature's goal"(I feel the your suggestion would be close to what feature is doing), I'd like to express "unreference on migraiton failed page" so PAGE_EXT_UNMIGRATED_UNREF (However, I prefer the feature naming more "what we want to achieve") > > I somewhat dislike that it's implicitly activated by failed page > migration. At least the current naming doesn't reflect that. > > > > This will consume an additional 8 bytes per 4KB page, or an > > additional 0.2% of RAM. In addition to the storage space, it will > > have some performance cost, due to increasing the size of struct > > page so that it is greater than the cacheline size (or multiples > > thereof) of popular (x86, ...) CPUs. > > I think I might be missing something. Aren't you simply reusing > &page_ext->flags ? I mean, the "overhead" is just ordinary page_ext > overhead ... and whee exactly are you changing "struct page" layout? Is > this description outdated? The feature enables page_ext which adds up 8 bytes per 4KB and on every put operation, it need to access the additional flag on page_ext which affects performance since page_put is the common operation. Yeah, the struct size stuff in the wording is rather misleading. Let me change the workding something like this: This will consume an additional 8 bytes per 4KB page, or an additional 0.2% of RAM. In addition to the storage space, it will have some performance cost, due to checking additional flag on every put_page opeartion. > > > > > The idea can apply every user of migrate_pages as well as CMA to > > know the reason why the page migration failed. To support it, > > the implementation takes "enum migrate_reason" string as filter > > of the tracepoint(see below). > > > > I wonder if we could achieve the same thing for debugging by > > a) Tracing the PFN when migration fails > b) Tracing any unref of any PFN > > User space can then combine both information to achieve the same result. > I assume one would need a big trace buffer, but maybe for a debug > feature good enough? I definitely tried it for cma allocation failure but it generated enormous output(Please keep it in mind that we also need stacktrace) due to too frequent put_page and compaction operation(Even, I filter them out to track only cma pages but it was still huge since the CMA size is 1/8 of the system). Even though I increased the buffer size a lot, the buffer was easily overwritten. Moreover, even though it's debug feature, we need to release the feature into dogfooder to catch the real problem in the field so consuming too much memory as well as backtrace operhead on every put page are tough to be used in field.