From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D1787C48260 for ; Tue, 13 Feb 2024 09:21:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5AFE26B007E; Tue, 13 Feb 2024 04:21:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 538FA6B0080; Tue, 13 Feb 2024 04:21:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3B2E46B0081; Tue, 13 Feb 2024 04:21:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 2520A6B007E for ; Tue, 13 Feb 2024 04:21:54 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id E57A040B52 for ; Tue, 13 Feb 2024 09:21:53 +0000 (UTC) X-FDA: 81786238506.25.9DD9A72 Received: from mail-vk1-f178.google.com (mail-vk1-f178.google.com [209.85.221.178]) by imf23.hostedemail.com (Postfix) with ESMTP id 351CF140006 for ; Tue, 13 Feb 2024 09:21:52 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Yg0+kKdg; spf=pass (imf23.hostedemail.com: domain of elver@google.com designates 209.85.221.178 as permitted sender) smtp.mailfrom=elver@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707816112; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=BljRftqO+MfcFjCEAz03kydOwq7rbFyh2ciaeRJBXcY=; b=S8/jmHSMJfGM4WDP85MwlPcuARqrX0P5Uhu4SLNhlwLmwDTkP3c7MhQhYmLDV0gd/wO5ff VyawJX9hui1QHz1PzfNc5G6ivxiaGRWEwKZNm0MnhJJVwkvtLmuKw8zaE8kkMHaXsSUzPI 4LrVl5FRSrz8Cnh59/X2O6753raiSp4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707816112; a=rsa-sha256; cv=none; b=fSxGP1BOUQPfbaF0vCqzsyvMmNpHSzQGcmRHrwxpbuPP9DQsjg/fE/6o6RXH1sIK2qomms dBbVSGGZUrekkdUkNIdb0NMUgA6t/7N+HVSZ/GeaYCkakKDBy7Dw5YWNqKQjPmWaY49GIb VJDzkmmzprVyxIGFQ79987TPiTy/ZWQ= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Yg0+kKdg; spf=pass (imf23.hostedemail.com: domain of elver@google.com designates 209.85.221.178 as permitted sender) smtp.mailfrom=elver@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-vk1-f178.google.com with SMTP id 71dfb90a1353d-4c0215837e2so757084e0c.1 for ; Tue, 13 Feb 2024 01:21:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1707816111; x=1708420911; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=BljRftqO+MfcFjCEAz03kydOwq7rbFyh2ciaeRJBXcY=; b=Yg0+kKdgZK+NgHfGLdUPFLfQ1mW1dnqheZhAK6IMMWNO2Zv2VBfIZsTB0x+rOVlpNx F7VXGq5tVGf9sKnYd7gyocm5GTmD3o9Syzd8bxHymcpp+b+zA2o0t8YBoSuJN85DdhXC +pDAEJe2sV+lawC7K3evai0tay3tHgueVoyRHotNfAnQ7fkVTiWt0IUtfNSwQQe/OELf 9crcnTkMhIj1yP+yUDvaj4hrAptKCCLCK8o37cHcvTulYACLpWDhQ0guuJhoSmeVInlN lOkZL80HBJwc1wlCgMLiMvlv/H31TxddCu6hj8AnZJd6rEcpwY0hGDbnfN/4SPmFjynr HK3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707816111; x=1708420911; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=BljRftqO+MfcFjCEAz03kydOwq7rbFyh2ciaeRJBXcY=; b=WeEPTjHJuknE5lCfoxazFXyh3FDolXf4dTqKHlaLjjnHi2kTGYHvQUcISVHYW1Z9+j TIR/liWvFFrcD0sChU43xpLEO7HLhoeu4aIN51aZTMdxtKjJi5FW1syJ5ALkAFLdQsa0 DtBV8CucWxWoZUodwBkLF+NGdN23lN1UnI0/qr4pt8WnMdEtSqz2VE6sj0J9kky5c7C/ zQUKR6ms1QSDCgd5hEjeY6dbLAHz3qMQDGZr4cO24S4WSJ619p6eBG9nuj7+DCXm9RfP G+K7M4M6ihb0aMYnfghUaS5TEWHDSKly6YqqItHDjC32duVq77TJlBTLbVPYcKWExFvM YNcw== X-Gm-Message-State: AOJu0Yz8R11auVh49NJUnpbA+Y8LSrcttzSE/UNBRDuraPJ/iVD7f+nc jp3p+1ZmFjLpUAHO+Ei86/h5bbc6P3ViWN55ofeeXecE2Mj/rVguIUYX6cxesnVyxzCbBMhRWDj 61ESGmuEWdzcnhunvg4/cbartKQF7rkWZE9bc X-Google-Smtp-Source: AGHT+IED0rD2PlqDCn3KWB8HVjF3tQRSmZ/ILb/KSKI4IGZUcO0P4QZ5jXV0sM4w0SM9dimXmpC9iD9DB0dXWqsyFec= X-Received: by 2002:a1f:df42:0:b0:4c0:3000:8b26 with SMTP id w63-20020a1fdf42000000b004c030008b26mr5619300vkg.4.1707816111112; Tue, 13 Feb 2024 01:21:51 -0800 (PST) MIME-Version: 1.0 References: <20240212223029.30769-1-osalvador@suse.de> <20240212223029.30769-3-osalvador@suse.de> In-Reply-To: From: Marco Elver Date: Tue, 13 Feb 2024 10:21:14 +0100 Message-ID: Subject: Re: [PATCH v8 2/5] mm,page_owner: Implement the tracking of the stacks count To: Vlastimil Babka Cc: Oscar Salvador , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Michal Hocko , Andrey Konovalov , Alexander Potapenko Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 351CF140006 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: q5b3rhpu6u4dmdsroxbytghmk8w4zymh X-HE-Tag: 1707816112-280123 X-HE-Meta: U2FsdGVkX19kPdstJ2rnGnegWZ4KZKlRGN+D3x3BZZVyHTPy942yZjEA2Dtwe9xuZ4PxFHIA5I99SO921sfzHU0TD0iviDyYHQiR4bhvkPDVSmWBnh0M6sRx9XY17F19mbMHfo+9Jq2Si8PEuhfRyk8PYTUshavJsPdGq9orQOeRUSzZ43MUmqYSQOx11XHjgir1iJ6ozBiFQGZfBGTdDvATyYgUwwFdkQ59+WpLmf7Tq6Kepqutt/6G97UiE68yUJfHlGZaddAKFffviDgpKc83PcPc92PkGLU1V6P0uRiTIaLa5AFQSBs9VRjGZsT0amcmDajXsdv6U0KL9mQq0HN/WssJbVjS26AQFWaUVUYrbV9cb8IxUsayVc7OknNUFc0ucpQW+H+74ERG6Ci8JL1YeeQNglhUCgnuV4uDI2Y47oRVQmBkKAItPgJ1kgpREYsgD/e24xI8OVxCQmWCiHRBAIpR10qfqxk0hDHUuHc6dGevQc6mjDa3+fd2K5AJQIsXbKCNlbjXa/sNdfA87GaHBN0Ei2zujZMCBDJIbyXpG9DJEqyZousCWju+dNBa8HeUfvqFHSzjVDbcHWvDBVK3bWvSEcY1Gi8WLoPTpBwf/AUg5jXwGjGeoB1Z5kexkWse3r6CBNftXrFQOvOjnW1zPHGz7tzYfH8VL7FnQ5eoVIMp7RispPkTKHEPqaxBcx0fXcvUhB49xmxCnJvfq8i62+JSdVKR3HxFDW9WbERuNWv6FEoFzrVDoQtH4r6A/kURc5pqIyS0VhGaPtvmUDQBP97D7mKDWSPxxPFDVSYpqFse4AC9+RglnHwa0BPEvrqtHQjbv/1oDE6rz6q6SsELlV5zc1bXKxQseeSvnagrv4cKX3BiiJ6pqCvRCY1JQjo18HGpkpgXrBOv1STK5M/Gxp80ps84p4LVjEAwOF6vzBzxV8lRFtyfyaBb9brS4nH8PFB5MK4xLpcPPb1 sTaPEXj2 jQGGI+yJGp6HfsJdh3wRiHHSNokxQ1w+aYOyNSjTzjSIHBsCA3FvVyzRZEZUJBpWw2/nfsN065EyCHPSIGWWWDqrVf4Opv1cv2aX3jEtiVKVHTUd92YqQ1fOttg/mgx4xGXQJ3+zPgyZE/Md9rc7lz/bmeY5q8G1BjO8CZAHfPaq3spFqMo73gGSfBT37YnLa+3NI8B7P64t6x9thhASl7GVhkmefnStZRnNifbFxN9XgYXL9ZQnMU16BaFrGe1kb5KRqFTAE7m3/cpzKzF5C6+byjQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, 13 Feb 2024 at 10:16, Vlastimil Babka wrote: > > On 2/12/24 23:30, Oscar Salvador wrote: > > page_owner needs to increment a stack_record refcount when a new allocation > > occurs, and decrement it on a free operation. > > In order to do that, we need to have a way to get a stack_record from a > > handle. > > Implement __stack_depot_get_stack_record() which just does that, and make > > it public so page_owner can use it. > > > > Also implement {inc,dec}_stack_record_count() which increments > > or decrements on respective allocation and free operations, via > > __reset_page_owner() (free operation) and __set_page_owner() (alloc > > operation). > > > > Traversing all stackdepot buckets comes with its own complexity, > > plus we would have to implement a way to mark only those stack_records > > that were originated from page_owner, as those are the ones we are > > interested in. > > For that reason, page_owner maintains its own list of stack_records, > > because traversing that list is faster than traversing all buckets > > while keeping at the same time a low complexity. > > inc_stack_record_count() is responsible of adding new stack_records > > into the list stack_list. > > > > Modifications on the list are protected via a spinlock with irqs > > disabled, since this code can also be reached from IRQ context. > > > > Signed-off-by: Oscar Salvador > > --- > > include/linux/stackdepot.h | 9 +++++ > > lib/stackdepot.c | 8 +++++ > > mm/page_owner.c | 73 ++++++++++++++++++++++++++++++++++++++ > > 3 files changed, 90 insertions(+) > > ... > > > > --- a/mm/page_owner.c > > +++ b/mm/page_owner.c > > @@ -36,6 +36,14 @@ struct page_owner { > > pid_t free_tgid; > > }; > > > > +struct stack { > > + struct stack_record *stack_record; > > + struct stack *next; > > +}; > > + > > +static struct stack *stack_list; > > +static DEFINE_SPINLOCK(stack_list_lock); > > + > > static bool page_owner_enabled __initdata; > > DEFINE_STATIC_KEY_FALSE(page_owner_inited); > > > > @@ -61,6 +69,57 @@ static __init bool need_page_owner(void) > > return page_owner_enabled; > > } > > > > +static void add_stack_record_to_list(struct stack_record *stack_record) > > +{ > > + unsigned long flags; > > + struct stack *stack; > > + > > + stack = kmalloc(sizeof(*stack), GFP_KERNEL); > > I doubt you can use GFP_KERNEL unconditionally? Think you need to pass down > the gfp flags from __set_page_owner() here? > And what about the alloc failure case, this will just leave the stack record > unlinked forever? Can we somehow know which ones we failed to link, and try > next time? Probably easier by not recording the stack for the page at all in > that case, so the next attempt with the same stack looks like the very first > again and attemps the add to list. > Still not happy that these extra tracking objects are needed, but I guess > the per-users stack depot instances I suggested would be a major change. > > > + if (stack) { > > + stack->stack_record = stack_record; > > + stack->next = NULL; > > + > > + spin_lock_irqsave(&stack_list_lock, flags); > > + if (!stack_list) { > > + stack_list = stack; > > + } else { > > + stack->next = stack_list; > > + stack_list = stack; > > + } > > + spin_unlock_irqrestore(&stack_list_lock, flags); > > + } > > +} > > + > > +static void inc_stack_record_count(depot_stack_handle_t handle) > > +{ > > + struct stack_record *stack_record = __stack_depot_get_stack_record(handle); > > + > > + if (stack_record) { > > + /* > > + * New stack_record's that do not use STACK_DEPOT_FLAG_GET start > > + * with REFCOUNT_SATURATED to catch spurious increments of their > > + * refcount. > > + * Since we do not use STACK_DEPOT_FLAG_{GET,PUT} API, let us > > + * set a refcount of 1 ourselves. > > + */ > > + if (refcount_read(&stack_record->count) == REFCOUNT_SATURATED) { > > + refcount_set(&stack_record->count, 1); > > Isn't this racy? Shouldn't we use some atomic cmpxchg operation to change > from REFCOUNT_SATURATED to 1? If 2 threads race here, both will want to add it to the list as well and take the lock. So this could just be solved with double-checked locking: if (count == REFCOUNT_SATURATED) { spin_lock(...); if (count == REFCOUNT_SATURATED) { refcount_set(.., 1); .. add to list ... } spin_unlock(..); }