From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D1787C48260
	for <linux-mm@archiver.kernel.org>; Tue, 13 Feb 2024 09:21:54 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 5AFE26B007E; Tue, 13 Feb 2024 04:21:54 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 538FA6B0080; Tue, 13 Feb 2024 04:21:54 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 3B2E46B0081; Tue, 13 Feb 2024 04:21:54 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12])
	by kanga.kvack.org (Postfix) with ESMTP id 2520A6B007E
	for <linux-mm@kvack.org>; Tue, 13 Feb 2024 04:21:54 -0500 (EST)
Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay05.hostedemail.com (Postfix) with ESMTP id E57A040B52
	for <linux-mm@kvack.org>; Tue, 13 Feb 2024 09:21:53 +0000 (UTC)
X-FDA: 81786238506.25.9DD9A72
Received: from mail-vk1-f178.google.com (mail-vk1-f178.google.com [209.85.221.178])
	by imf23.hostedemail.com (Postfix) with ESMTP id 351CF140006
	for <linux-mm@kvack.org>; Tue, 13 Feb 2024 09:21:52 +0000 (UTC)
Authentication-Results: imf23.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=Yg0+kKdg;
	spf=pass (imf23.hostedemail.com: domain of elver@google.com designates 209.85.221.178 as permitted sender) smtp.mailfrom=elver@google.com;
	dmarc=pass (policy=reject) header.from=google.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1707816112;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=BljRftqO+MfcFjCEAz03kydOwq7rbFyh2ciaeRJBXcY=;
	b=S8/jmHSMJfGM4WDP85MwlPcuARqrX0P5Uhu4SLNhlwLmwDTkP3c7MhQhYmLDV0gd/wO5ff
	VyawJX9hui1QHz1PzfNc5G6ivxiaGRWEwKZNm0MnhJJVwkvtLmuKw8zaE8kkMHaXsSUzPI
	4LrVl5FRSrz8Cnh59/X2O6753raiSp4=
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707816112; a=rsa-sha256;
	cv=none;
	b=fSxGP1BOUQPfbaF0vCqzsyvMmNpHSzQGcmRHrwxpbuPP9DQsjg/fE/6o6RXH1sIK2qomms
	dBbVSGGZUrekkdUkNIdb0NMUgA6t/7N+HVSZ/GeaYCkakKDBy7Dw5YWNqKQjPmWaY49GIb
	VJDzkmmzprVyxIGFQ79987TPiTy/ZWQ=
ARC-Authentication-Results: i=1;
	imf23.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=Yg0+kKdg;
	spf=pass (imf23.hostedemail.com: domain of elver@google.com designates 209.85.221.178 as permitted sender) smtp.mailfrom=elver@google.com;
	dmarc=pass (policy=reject) header.from=google.com
Received: by mail-vk1-f178.google.com with SMTP id 71dfb90a1353d-4c0215837e2so757084e0c.1
        for <linux-mm@kvack.org>; Tue, 13 Feb 2024 01:21:51 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1707816111; x=1708420911; darn=kvack.org;
        h=cc:to:subject:message-id:date:from:in-reply-to:references
         :mime-version:from:to:cc:subject:date:message-id:reply-to;
        bh=BljRftqO+MfcFjCEAz03kydOwq7rbFyh2ciaeRJBXcY=;
        b=Yg0+kKdgZK+NgHfGLdUPFLfQ1mW1dnqheZhAK6IMMWNO2Zv2VBfIZsTB0x+rOVlpNx
         F7VXGq5tVGf9sKnYd7gyocm5GTmD3o9Syzd8bxHymcpp+b+zA2o0t8YBoSuJN85DdhXC
         +pDAEJe2sV+lawC7K3evai0tay3tHgueVoyRHotNfAnQ7fkVTiWt0IUtfNSwQQe/OELf
         9crcnTkMhIj1yP+yUDvaj4hrAptKCCLCK8o37cHcvTulYACLpWDhQ0guuJhoSmeVInlN
         lOkZL80HBJwc1wlCgMLiMvlv/H31TxddCu6hj8AnZJd6rEcpwY0hGDbnfN/4SPmFjynr
         HK3g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1707816111; x=1708420911;
        h=cc:to:subject:message-id:date:from:in-reply-to:references
         :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=BljRftqO+MfcFjCEAz03kydOwq7rbFyh2ciaeRJBXcY=;
        b=WeEPTjHJuknE5lCfoxazFXyh3FDolXf4dTqKHlaLjjnHi2kTGYHvQUcISVHYW1Z9+j
         TIR/liWvFFrcD0sChU43xpLEO7HLhoeu4aIN51aZTMdxtKjJi5FW1syJ5ALkAFLdQsa0
         DtBV8CucWxWoZUodwBkLF+NGdN23lN1UnI0/qr4pt8WnMdEtSqz2VE6sj0J9kky5c7C/
         zQUKR6ms1QSDCgd5hEjeY6dbLAHz3qMQDGZr4cO24S4WSJ619p6eBG9nuj7+DCXm9RfP
         G+K7M4M6ihb0aMYnfghUaS5TEWHDSKly6YqqItHDjC32duVq77TJlBTLbVPYcKWExFvM
         YNcw==
X-Gm-Message-State: AOJu0Yz8R11auVh49NJUnpbA+Y8LSrcttzSE/UNBRDuraPJ/iVD7f+nc
	jp3p+1ZmFjLpUAHO+Ei86/h5bbc6P3ViWN55ofeeXecE2Mj/rVguIUYX6cxesnVyxzCbBMhRWDj
	61ESGmuEWdzcnhunvg4/cbartKQF7rkWZE9bc
X-Google-Smtp-Source: AGHT+IED0rD2PlqDCn3KWB8HVjF3tQRSmZ/ILb/KSKI4IGZUcO0P4QZ5jXV0sM4w0SM9dimXmpC9iD9DB0dXWqsyFec=
X-Received: by 2002:a1f:df42:0:b0:4c0:3000:8b26 with SMTP id
 w63-20020a1fdf42000000b004c030008b26mr5619300vkg.4.1707816111112; Tue, 13 Feb
 2024 01:21:51 -0800 (PST)
MIME-Version: 1.0
References: <20240212223029.30769-1-osalvador@suse.de> <20240212223029.30769-3-osalvador@suse.de>
 <fc4f498b-fc35-4ba8-abf0-7664d6f1decb@suse.cz>
In-Reply-To: <fc4f498b-fc35-4ba8-abf0-7664d6f1decb@suse.cz>
From: Marco Elver <elver@google.com>
Date: Tue, 13 Feb 2024 10:21:14 +0100
Message-ID: <CANpmjNO8CHC6gSFVEOSzYsTAP-j5YvfbfzZMUwnGqSAC1Y4A8g@mail.gmail.com>
Subject: Re: [PATCH v8 2/5] mm,page_owner: Implement the tracking of the
 stacks count
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Oscar Salvador <osalvador@suse.de>, Andrew Morton <akpm@linux-foundation.org>, 
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, 
	Michal Hocko <mhocko@suse.com>, Andrey Konovalov <andreyknvl@gmail.com>, 
	Alexander Potapenko <glider@google.com>
Content-Type: text/plain; charset="UTF-8"
X-Rspamd-Queue-Id: 351CF140006
X-Rspam-User: 
X-Rspamd-Server: rspam11
X-Stat-Signature: q5b3rhpu6u4dmdsroxbytghmk8w4zymh
X-HE-Tag: 1707816112-280123
X-HE-Meta: U2FsdGVkX19kPdstJ2rnGnegWZ4KZKlRGN+D3x3BZZVyHTPy942yZjEA2Dtwe9xuZ4PxFHIA5I99SO921sfzHU0TD0iviDyYHQiR4bhvkPDVSmWBnh0M6sRx9XY17F19mbMHfo+9Jq2Si8PEuhfRyk8PYTUshavJsPdGq9orQOeRUSzZ43MUmqYSQOx11XHjgir1iJ6ozBiFQGZfBGTdDvATyYgUwwFdkQ59+WpLmf7Tq6Kepqutt/6G97UiE68yUJfHlGZaddAKFffviDgpKc83PcPc92PkGLU1V6P0uRiTIaLa5AFQSBs9VRjGZsT0amcmDajXsdv6U0KL9mQq0HN/WssJbVjS26AQFWaUVUYrbV9cb8IxUsayVc7OknNUFc0ucpQW+H+74ERG6Ci8JL1YeeQNglhUCgnuV4uDI2Y47oRVQmBkKAItPgJ1kgpREYsgD/e24xI8OVxCQmWCiHRBAIpR10qfqxk0hDHUuHc6dGevQc6mjDa3+fd2K5AJQIsXbKCNlbjXa/sNdfA87GaHBN0Ei2zujZMCBDJIbyXpG9DJEqyZousCWju+dNBa8HeUfvqFHSzjVDbcHWvDBVK3bWvSEcY1Gi8WLoPTpBwf/AUg5jXwGjGeoB1Z5kexkWse3r6CBNftXrFQOvOjnW1zPHGz7tzYfH8VL7FnQ5eoVIMp7RispPkTKHEPqaxBcx0fXcvUhB49xmxCnJvfq8i62+JSdVKR3HxFDW9WbERuNWv6FEoFzrVDoQtH4r6A/kURc5pqIyS0VhGaPtvmUDQBP97D7mKDWSPxxPFDVSYpqFse4AC9+RglnHwa0BPEvrqtHQjbv/1oDE6rz6q6SsELlV5zc1bXKxQseeSvnagrv4cKX3BiiJ6pqCvRCY1JQjo18HGpkpgXrBOv1STK5M/Gxp80ps84p4LVjEAwOF6vzBzxV8lRFtyfyaBb9brS4nH8PFB5MK4xLpcPPb1
 sTaPEXj2
 jQGGI+yJGp6HfsJdh3wRiHHSNokxQ1w+aYOyNSjTzjSIHBsCA3FvVyzRZEZUJBpWw2/nfsN065EyCHPSIGWWWDqrVf4Opv1cv2aX3jEtiVKVHTUd92YqQ1fOttg/mgx4xGXQJ3+zPgyZE/Md9rc7lz/bmeY5q8G1BjO8CZAHfPaq3spFqMo73gGSfBT37YnLa+3NI8B7P64t6x9thhASl7GVhkmefnStZRnNifbFxN9XgYXL9ZQnMU16BaFrGe1kb5KRqFTAE7m3/cpzKzF5C6+byjQ==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Tue, 13 Feb 2024 at 10:16, Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 2/12/24 23:30, Oscar Salvador wrote:
> > page_owner needs to increment a stack_record refcount when a new allocation
> > occurs, and decrement it on a free operation.
> > In order to do that, we need to have a way to get a stack_record from a
> > handle.
> > Implement __stack_depot_get_stack_record() which just does that, and make
> > it public so page_owner can use it.
> >
> > Also implement {inc,dec}_stack_record_count() which increments
> > or decrements on respective allocation and free operations, via
> > __reset_page_owner() (free operation) and __set_page_owner() (alloc
> > operation).
> >
> > Traversing all stackdepot buckets comes with its own complexity,
> > plus we would have to implement a way to mark only those stack_records
> > that were originated from page_owner, as those are the ones we are
> > interested in.
> > For that reason, page_owner maintains its own list of stack_records,
> > because traversing that list is faster than traversing all buckets
> > while keeping at the same time a low complexity.
> > inc_stack_record_count() is responsible of adding new stack_records
> > into the list stack_list.
> >
> > Modifications on the list are protected via a spinlock with irqs
> > disabled, since this code can also be reached from IRQ context.
> >
> > Signed-off-by: Oscar Salvador <osalvador@suse.de>
> > ---
> >  include/linux/stackdepot.h |  9 +++++
> >  lib/stackdepot.c           |  8 +++++
> >  mm/page_owner.c            | 73 ++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 90 insertions(+)
>
> ...
>
>
> > --- a/mm/page_owner.c
> > +++ b/mm/page_owner.c
> > @@ -36,6 +36,14 @@ struct page_owner {
> >       pid_t free_tgid;
> >  };
> >
> > +struct stack {
> > +     struct stack_record *stack_record;
> > +     struct stack *next;
> > +};
> > +
> > +static struct stack *stack_list;
> > +static DEFINE_SPINLOCK(stack_list_lock);
> > +
> >  static bool page_owner_enabled __initdata;
> >  DEFINE_STATIC_KEY_FALSE(page_owner_inited);
> >
> > @@ -61,6 +69,57 @@ static __init bool need_page_owner(void)
> >       return page_owner_enabled;
> >  }
> >
> > +static void add_stack_record_to_list(struct stack_record *stack_record)
> > +{
> > +     unsigned long flags;
> > +     struct stack *stack;
> > +
> > +     stack = kmalloc(sizeof(*stack), GFP_KERNEL);
>
> I doubt you can use GFP_KERNEL unconditionally? Think you need to pass down
> the gfp flags from __set_page_owner() here?
> And what about the alloc failure case, this will just leave the stack record
> unlinked forever? Can we somehow know which ones we failed to link, and try
> next time? Probably easier by not recording the stack for the page at all in
> that case, so the next attempt with the same stack looks like the very first
> again and attemps the add to list.
> Still not happy that these extra tracking objects are needed, but I guess
> the per-users stack depot instances I suggested would be a major change.
>
> > +     if (stack) {
> > +             stack->stack_record = stack_record;
> > +             stack->next = NULL;
> > +
> > +             spin_lock_irqsave(&stack_list_lock, flags);
> > +             if (!stack_list) {
> > +                     stack_list = stack;
> > +             } else {
> > +                     stack->next = stack_list;
> > +                     stack_list = stack;
> > +             }
> > +             spin_unlock_irqrestore(&stack_list_lock, flags);
> > +     }
> > +}
> > +
> > +static void inc_stack_record_count(depot_stack_handle_t handle)
> > +{
> > +     struct stack_record *stack_record = __stack_depot_get_stack_record(handle);
> > +
> > +     if (stack_record) {
> > +             /*
> > +              * New stack_record's that do not use STACK_DEPOT_FLAG_GET start
> > +              * with REFCOUNT_SATURATED to catch spurious increments of their
> > +              * refcount.
> > +              * Since we do not use STACK_DEPOT_FLAG_{GET,PUT} API, let us
> > +              * set a refcount of 1 ourselves.
> > +              */
> > +             if (refcount_read(&stack_record->count) == REFCOUNT_SATURATED) {
> > +                     refcount_set(&stack_record->count, 1);
>
> Isn't this racy? Shouldn't we use some atomic cmpxchg operation to change
> from REFCOUNT_SATURATED to 1?

If 2 threads race here, both will want to add it to the list as well
and take the lock. So this could just be solved with double-checked
locking:

if (count == REFCOUNT_SATURATED) {
  spin_lock(...);
  if (count == REFCOUNT_SATURATED) {
    refcount_set(.., 1);
    .. add to list ...
  }
  spin_unlock(..);
}