From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A0F64C433EF for ; Fri, 14 Jan 2022 16:51:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DEA026B0071; Fri, 14 Jan 2022 11:51:29 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D98C26B0073; Fri, 14 Jan 2022 11:51:29 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C12D96B0074; Fri, 14 Jan 2022 11:51:29 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0027.hostedemail.com [216.40.44.27]) by kanga.kvack.org (Postfix) with ESMTP id B0F4B6B0071 for ; Fri, 14 Jan 2022 11:51:29 -0500 (EST) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 6ED1B95AF6 for ; Fri, 14 Jan 2022 16:51:29 +0000 (UTC) X-FDA: 79029483498.18.0622B9C Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf10.hostedemail.com (Postfix) with ESMTP id E4838C0002 for ; Fri, 14 Jan 2022 16:51:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1642179088; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NTinIqj672rVWBI2bzwduD2LLaFLaP9MEgcSai0QIrM=; b=IOx+kbtG1d1mTZjPjBRHsg5pWILaNIe5xUzNrOSeX5su1A7Frwb5pjdjHtBqjijy0dGX4H eR+4E56Kq6hscipcvKOojBmpxJoAXb/UZlIDlqQk6JEqwqdZC+RL5HvGlev1sg5D4DJJq3 k8PNN1T3sX5ZEuOlb0ii/oxpyz2tn60= Received: from mail-ed1-f72.google.com (mail-ed1-f72.google.com [209.85.208.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-210-On5RJSHnPSWkPPhoMlZJnQ-1; Fri, 14 Jan 2022 11:51:27 -0500 X-MC-Unique: On5RJSHnPSWkPPhoMlZJnQ-1 Received: by mail-ed1-f72.google.com with SMTP id q15-20020a056402518f00b003f87abf9c37so8597405edd.15 for ; Fri, 14 Jan 2022 08:51:27 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:organization:in-reply-to :content-transfer-encoding; bh=NTinIqj672rVWBI2bzwduD2LLaFLaP9MEgcSai0QIrM=; b=o2TRcGrUljZ/4Sj/0uX1+Ob8oXjRT/B8z3sbsjRvq23O2mu2CQJNSr7PMUYpAn0Aqs fyirL/k68s8P7dSVhWGNXlVY5RnN62qfuSNMfVF4XzRS0CLwef6EcjGYpcZ67b2aAjOT UarA8nVzeMN+J3hmjjZMg3jFCYKg9X5iqZIFLQ3SOSQ7StBwWW8386Ls5PDShxtt6KLY 93Dkp4Q/8td6gZN+I5386Znm9EjLreuk8DcVBIrKE3YJ3cqgxLNKuxECnmoO/9WKpjxU MxJf9lR8kMq6RwsFwpn14kMc0/2aK+4Gu8NFcl5PdoCblsMfqEaExDymhFKlSYHaDhkT HjTw== X-Gm-Message-State: AOAM533vQg24cycRKlTPNMOjDnbl6c+sxuCgh1JAUNCEDBskgwyYYDSO OtIM3CTxXgsip5W/Bh0KlEbv01o9PBJqpW6GqS7gPEyWgncRDq9EOHYav4ESW7rPBQ7PxKi4Uh0 /jTi1kH8Hb9k= X-Received: by 2002:a17:907:6e0c:: with SMTP id sd12mr8084687ejc.536.1642179086111; Fri, 14 Jan 2022 08:51:26 -0800 (PST) X-Google-Smtp-Source: ABdhPJx5Y3VQCzdrrXi69B9FQRczjUJwIwieT4Ls4ORNDY9QKjxRD49sSUiW5Q6K2fDho+MJLg1oug== X-Received: by 2002:a17:907:6e0c:: with SMTP id sd12mr8084668ejc.536.1642179085764; Fri, 14 Jan 2022 08:51:25 -0800 (PST) Received: from ?IPV6:2003:cb:c701:9d00:ff87:1c9b:108a:9702? (p200300cbc7019d00ff871c9b108a9702.dip0.t-ipconnect.de. [2003:cb:c701:9d00:ff87:1c9b:108a:9702]) by smtp.gmail.com with ESMTPSA id z16sm1962496eji.153.2022.01.14.08.51.24 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 14 Jan 2022 08:51:25 -0800 (PST) Message-ID: <8f02e71b-5de7-4342-7371-a7fe19b114b5@redhat.com> Date: Fri, 14 Jan 2022 17:51:24 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.4.0 Subject: Re: [RFC v2] mm: introduce page pin owner To: Minchan Kim Cc: Andrew Morton , Michal Hocko , linux-mm , LKML , Suren Baghdasaryan , John Dias , huww98@outlook.com, John Hubbard References: <20211228175904.3739751-1-minchan@kernel.org> <35d7f27c-44e3-3921-c8d3-b6dee3b01238@redhat.com> From: David Hildenbrand Organization: Red Hat In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=IOx+kbtG; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf10.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com X-Stat-Signature: h9j954igkk6mt1734p4ou45jmhe3py4x X-Rspamd-Queue-Id: E4838C0002 X-Rspamd-Server: rspam12 X-HE-Tag: 1642179088-572285 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 14.01.22 17:39, Minchan Kim wrote: > On Fri, Jan 14, 2022 at 02:31:43PM +0100, David Hildenbrand wrote: >> On 12.01.22 21:41, Minchan Kim wrote: >>> On Wed, Jan 12, 2022 at 06:42:21PM +0100, David Hildenbrand wrote: >>>>>> >>>>>> What about something like: >>>>>> >>>>>> "mm: selective tracing of page reference holders on unref" >>>>>> >>>>>> PAGE_EXT_PIN_OWNER -> PAGE_EXT_TRACE_UNREF >>>>>> >>>>>> $whatever feature/user can then set the bit, for example, when migration >>>>>> fails. >>>>> >>>>> I couldn't imagine put_page tracking is generally useful except >>>>> migration failure. Do you have reasonable usecase in your mind >>>>> to make the feature general to be used? >>>> >>>> HWpoison etc. purposes maybe? Trace who still held a reference a page >>>> that was poisoned and couldn't be removed? Or in general, tracking >>> >>> I am not familiar with hwpoison so here dumb question goes: >>> Is that different one with __soft_offline_page? >>> It uses migrate_pages so current interface supports it with filter. >> >> __soft_offline_page() won't kill the target and try to migrate because >> the pages are about to be damaged and we can still access them. >> >> ordinary memory errors mean we kill the target because we cannot access >> the page anymore without triggering MCEs (or worse IIUC) again. >> >> So in my thinking, after memory_failure(), it could eventually be >> helpful to figure out who still has a (temporary) reference to such a >> broken page, even after killing the process. But that's just one idea I >> quickly came up with. > > > Thanks for the clarification. Is the trace best fit in the case? > Presumably you know the broken page, can't you find who owns the page > using /proc/pid/pagemap? > Furthermore, page_get/put operations commonly could happen in > different contexts regardless of page's owner so the trace from > different context is still helpful? > > If it's helpful, could you tell what you want to make the interface to > set the bit of broken page? For example, as-is case for page migration, > report_page_pinners is called to mark failed page at unmap_and_move. > Do you want to add something similar(maybe, set_page_ref_track under > rename) in memory-failure.c? > > It would be very helpful to design the feature's interface(surely, > naming as well) and write description to convince others "yeah, > sounds like so useful for the case and that's best fit than other way". I currently don't have time to explore this further, it was just a random thought how else this might be useful. >> >>> >>> echo "memory_failure" > $trace_dir/events/page_pin_owner/report_page_pinners/filter >>> >>>> references to something that should have a refcount of 0 because it >>>> should have been freed, but for some reason there are still references >>>> around? >>> >>> Sounds like you are talking about memory leak? What's the purpose >>> with trace, not using other existing tool to find memory leak? >>> >> >> IIRC, kmemleak can find objects that are no longer referenced, and we >> cannot track buddy allocations, but only kmalloc and friends. > > PageOwner is your good buddy. Page owner tracks who owns a page but not who else holds a reference (some buffer, gup, whatsoever), right? > >> >>>> >>>>> Otherwise, I'd like to have feature naming more higher level >>>>> to represent page migration failure and then tracking unref of >>>>> the page. In the sense, PagePinOwner John suggested was good >>>>> candidate(Even, my original naming PagePinner was worse) since >>>> >>>> Personally, I dislike both variants. >>>> >>>>> I was trouble to abstract the feature with short word. >>>>> If we approach "what feature is doing" rather than "what's >>>>> the feature's goal"(I feel the your suggestion would be close >>>>> to what feature is doing), I'd like to express "unreference on >>>>> migraiton failed page" so PAGE_EXT_UNMIGRATED_UNREF >>>>> (However, I prefer the feature naming more "what we want to achieve") >>>>> >>>> E.g., PAGE_EXT_TRACE_UNREF will trace unref to the page once the bit is >>>> set. The functionality itself is completely independent of migration >>>> failures. That's just the code that sets it to enable the underlying >>>> tracing for that specific page. >>> >>> I agree that make something general is great but I also want to avoid >>> create something too big from the beginning with just imagination. >>> So, I'd like to hear more concrete and appealing usecases and then >>> we could think over this trace approach is really the best one to >>> achieve the goal. Once it's agreed, the naming you suggested would >>> make sense. >> >> At least for me it's a lot cleaner if a feature clearly expresses what >> it actually does. Staring at PAGE_EXT_PIN_OWNER I initially had no clue. >> I was assuming we would actually track (not trace!) all active FOLL_PIN >> (not unref callers!). Maybe that makes it clearer why I'd prefer a >> clearer name. > > I totally agree PagePinOwner is not 100% straightforward. I'm open for > other better name. Currently we are discussing how we could generalize > and whether it's useful or not. Depending on the discussion, the design/ > interface as well as naming could be changed. No problem. PagePinOwner is just highly misleading. Because that's not what the feature does. Having that said, i hope we'll get other opinions as well. >> >>>> >>>> Makes sense, I was expecting the output to be large, but possible it's >>>> going to be way too large. >>>> >>>> Would it also make sense to track for a flagged page new taken >>>> references, such that you can differentiate between new (e.g., >>>> temporary) ones and previous ones? Feels like a reasonable addition. >>> >>> I actually tried it and it showed 2x times bigger output. >> >> Is 2x that bad? Or would it be worth making it configurable? > > For my goal, 2x was bad because I need to minimize the trace buffer. > Furthermore, the new get operation was not helpful but just noisy. > If some usecase is not enough with only put callsite, we add get > easily. Implementation is not hard. The matter is how it's useful > in real practice since we would expose the interface to the user, > I guess. Right. but it's a debugging feature after all, so I asume we care little about KABI. We can always add it on top, I agree. > >> >>> For me to debug CMA alloation failure, the new get_page callstack >>> after migration failure were waste since they repeated from lru >>> adding, isolate from the LRU something. Rather than get callsite, >>> I needed only put call sites since I could deduce where the pair-get >>> came from. >> >> Could maybe some filters help that exclude such LRU activity? > > For my case, the filter wouldn't be helpful because I shouldn't > miss the LRU activity since sometime they makes the allocation > failure. Thus, I could deduce the lru's get site from put callback. > get callsite is not helpful for my case but just wasted memory > and perf. > Makes sense. -- Thanks, David / dhildenb