From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE984C433F5 for ; Wed, 12 Jan 2022 20:41:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 45CCC6B0248; Wed, 12 Jan 2022 15:41:10 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3E4CD6B0249; Wed, 12 Jan 2022 15:41:10 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 25E8D6B024A; Wed, 12 Jan 2022 15:41:10 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0033.hostedemail.com [216.40.44.33]) by kanga.kvack.org (Postfix) with ESMTP id 10B2D6B0248 for ; Wed, 12 Jan 2022 15:41:10 -0500 (EST) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id CA5D08249980 for ; Wed, 12 Jan 2022 20:41:09 +0000 (UTC) X-FDA: 79022804658.10.94CE0B4 Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) by imf27.hostedemail.com (Postfix) with ESMTP id 540EA40013 for ; Wed, 12 Jan 2022 20:41:09 +0000 (UTC) Received: by mail-pj1-f43.google.com with SMTP id hv15so7385469pjb.5 for ; Wed, 12 Jan 2022 12:41:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=yw5pOG+8EF6dNFjVo0umaJQRgBhhV+jKo/RFqA+BDRQ=; b=ADTpEsXWEei+8iad7W7IC4k1Ri30WifPSOBeSIdU2MRdcvMgKVe91BluRfdWSZdAbN OGB+fwu3Hc8d4pRWiSVDaSqeuUzciUIodwVrDu50noVKq290lmlqAof6EsCfSXuJOFA0 U8UsUjJi/e77brE7pzrx1Bt9xrW1EbvYB/NiH3cTqarFCSRJ/HNcuQdx3g8lOZZTEZvD t4U2rghSDx3U6U8+C8Samxw2BvUANRLGfQJ2XLMpsnjUTcUWBUJ5zDeogKnY0WrqGdYJ k1AcsH5J3S3prOry+WtC5TRzNDD0NGF4qrAA5WZGyZMCa3YS0slRlPlSEunJcUrgqbIL ph6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to; bh=yw5pOG+8EF6dNFjVo0umaJQRgBhhV+jKo/RFqA+BDRQ=; b=iUGcpbVqtId81tiD/K8/4n4G5JeVioHrJnsRvAhzd/45ifdpZG1RnsWcSRLBNXYmLW Kgl9t23OHm4qPkldwXm3rBrramYzhxG++oWt1fmyt9WLNfu+AuqKzW/L6onuMk7z41d8 bhPdC7EG5PWkF7HTI6rMIUKGx8vm6i3TK/iiOUGbg6zdWOWL6EwNzmY3rhUdMaU0nUQ5 kmFfo9x/8B6L9QJDwPlNLpYm2vqhvMQ5j6QFdmfoIKlHZtbpng+NqUprbYgOa7kGmzhS ObIanNb9DQxHC38fZNRnWbihPu7QGuUacsHHUTr9olTO147Otg7BqfdgI8qqppn8RiVg 1ZTw== X-Gm-Message-State: AOAM5304OT139H7X3OwFFUK2GOj8/5teqFdmDeCF/pamXU6lTseJh1ab I0Mb6XSBgpqNCoI9N92aHiM= X-Google-Smtp-Source: ABdhPJyGjGP0AEPX6zhDmLCbOIauT7pTOFdHB4rayIkA3lGU1KREdgFdhMXsdQSoJxPThoAQGxkwBg== X-Received: by 2002:a63:b57:: with SMTP id a23mr591594pgl.443.1642020068127; Wed, 12 Jan 2022 12:41:08 -0800 (PST) Received: from google.com ([2620:15c:211:201:b6c7:c163:623d:56bc]) by smtp.gmail.com with ESMTPSA id x25sm447462pfu.113.2022.01.12.12.41.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 12 Jan 2022 12:41:07 -0800 (PST) Date: Wed, 12 Jan 2022 12:41:05 -0800 From: Minchan Kim To: David Hildenbrand Cc: Andrew Morton , Michal Hocko , linux-mm , LKML , Suren Baghdasaryan , John Dias , huww98@outlook.com, John Hubbard Subject: Re: [RFC v2] mm: introduce page pin owner Message-ID: References: <20211228175904.3739751-1-minchan@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 540EA40013 X-Stat-Signature: xewcnxjepmou3rqp579mra7swkiuxbbw Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=ADTpEsXW; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=kernel.org (policy=none); spf=pass (imf27.hostedemail.com: domain of minchan.kim@gmail.com designates 209.85.216.43 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com X-Rspamd-Server: rspam06 X-HE-Tag: 1642020069-449325 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jan 12, 2022 at 06:42:21PM +0100, David Hildenbrand wrote: > >> > >> What about something like: > >> > >> "mm: selective tracing of page reference holders on unref" > >> > >> PAGE_EXT_PIN_OWNER -> PAGE_EXT_TRACE_UNREF > >> > >> $whatever feature/user can then set the bit, for example, when migration > >> fails. > > > > I couldn't imagine put_page tracking is generally useful except > > migration failure. Do you have reasonable usecase in your mind > > to make the feature general to be used? > > HWpoison etc. purposes maybe? Trace who still held a reference a page > that was poisoned and couldn't be removed? Or in general, tracking I am not familiar with hwpoison so here dumb question goes: Is that different one with __soft_offline_page? It uses migrate_pages so current interface supports it with filter. echo "memory_failure" > $trace_dir/events/page_pin_owner/report_page_pinners/filter > references to something that should have a refcount of 0 because it > should have been freed, but for some reason there are still references > around? Sounds like you are talking about memory leak? What's the purpose with trace, not using other existing tool to find memory leak? > > > Otherwise, I'd like to have feature naming more higher level > > to represent page migration failure and then tracking unref of > > the page. In the sense, PagePinOwner John suggested was good > > candidate(Even, my original naming PagePinner was worse) since > > Personally, I dislike both variants. > > > I was trouble to abstract the feature with short word. > > If we approach "what feature is doing" rather than "what's > > the feature's goal"(I feel the your suggestion would be close > > to what feature is doing), I'd like to express "unreference on > > migraiton failed page" so PAGE_EXT_UNMIGRATED_UNREF > > (However, I prefer the feature naming more "what we want to achieve") > > > E.g., PAGE_EXT_TRACE_UNREF will trace unref to the page once the bit is > set. The functionality itself is completely independent of migration > failures. That's just the code that sets it to enable the underlying > tracing for that specific page. I agree that make something general is great but I also want to avoid create something too big from the beginning with just imagination. So, I'd like to hear more concrete and appealing usecases and then we could think over this trace approach is really the best one to achieve the goal. Once it's agreed, the naming you suggested would make sense. > > >> > >> I somewhat dislike that it's implicitly activated by failed page > >> migration. At least the current naming doesn't reflect that. > >> > >> > >>> This will consume an additional 8 bytes per 4KB page, or an > >>> additional 0.2% of RAM. In addition to the storage space, it will > >>> have some performance cost, due to increasing the size of struct > >>> page so that it is greater than the cacheline size (or multiples > >>> thereof) of popular (x86, ...) CPUs. > >> > >> I think I might be missing something. Aren't you simply reusing > >> &page_ext->flags ? I mean, the "overhead" is just ordinary page_ext > >> overhead ... and whee exactly are you changing "struct page" layout? Is > >> this description outdated? > > > > The feature enables page_ext which adds up 8 bytes per 4KB and on every > > put operation, it need to access the additional flag on page_ext which > > affects performance since page_put is the common operation. > > Yeah, the struct size stuff in the wording is rather misleading. > > Let me change the workding something like this: > > > > This will consume an additional 8 bytes per 4KB page, or an > > additional 0.2% of RAM. In addition to the storage space, it will > > have some performance cost, due to checking additional flag on > > every put_page opeartion. > > I'd adjust to > > As this feature depends on page_ext->flags, it will consume an > additional 8 bytes per 4KB page to enable page_ext if not already > enabled. ... NP. > > > > >> > >>> > >>> The idea can apply every user of migrate_pages as well as CMA to > >>> know the reason why the page migration failed. To support it, > >>> the implementation takes "enum migrate_reason" string as filter > >>> of the tracepoint(see below). > >>> > >> > >> I wonder if we could achieve the same thing for debugging by > >> > >> a) Tracing the PFN when migration fails > >> b) Tracing any unref of any PFN > >> > >> User space can then combine both information to achieve the same result. > >> I assume one would need a big trace buffer, but maybe for a debug > >> feature good enough? > > > > I definitely tried it for cma allocation failure but it generated > > enormous output(Please keep it in mind that we also need stacktrace) > > due to too frequent put_page and compaction operation(Even, I filter > > them out to track only cma pages but it was still huge since the CMA > > size is 1/8 of the system). Even though I increased the buffer size > > a lot, the buffer was easily overwritten. Moreover, even though it's > > debug feature, we need to release the feature into dogfooder to catch > > the real problem in the field so consuming too much memory as well as > > backtrace operhead on every put page are tough to be used in field. > > Makes sense, I was expecting the output to be large, but possible it's > going to be way too large. > > Would it also make sense to track for a flagged page new taken > references, such that you can differentiate between new (e.g., > temporary) ones and previous ones? Feels like a reasonable addition. I actually tried it and it showed 2x times bigger output. For me to debug CMA alloation failure, the new get_page callstack after migration failure were waste since they repeated from lru adding, isolate from the LRU something. Rather than get callsite, I needed only put call sites since I could deduce where the pair-get came from.