From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 13419C32789 for ; Tue, 23 Aug 2022 16:39:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 42700940008; Tue, 23 Aug 2022 12:39:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3B117940007; Tue, 23 Aug 2022 12:39:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 229B1940008; Tue, 23 Aug 2022 12:39:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 0E69E940007 for ; Tue, 23 Aug 2022 12:39:22 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id CEC19815D6 for ; Tue, 23 Aug 2022 16:39:21 +0000 (UTC) X-FDA: 79831417722.14.FD55317 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf31.hostedemail.com (Postfix) with ESMTP id 51DD620035 for ; Tue, 23 Aug 2022 16:39:21 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id E32C11F8CE; Tue, 23 Aug 2022 16:39:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1661272759; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=APF4S/U/sEkDmpY003a4wVnaH+ckSeuJHRoeO0TDF2k=; b=fH6HIcA5srJh5hliTl8beWCa/PuXGo+na3R/57GeaMcr5z5iuQxTJYdCWpans9gKqTKQBT jOCpOtyxbuIcMxUbjJYNDmgNUnijAi53TLznzoK+NJDGT9Bs5PUvgJSI5LquaFZFNh6No+ CrymUhhIeNQmigIFonq9Ya7kwTT9vmY= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1661272759; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=APF4S/U/sEkDmpY003a4wVnaH+ckSeuJHRoeO0TDF2k=; b=+T0hfBttzpWjKXwdaGgsSMQJ92LlQPVoXXA8y10RPEM4ZlzSE7mVS73H5ldYhld5HiXbXZ MCwJpQcdYmk9gNCg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id AF15413AB7; Tue, 23 Aug 2022 16:39:19 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id yGwGKrcCBWOTOQAAMHmgww (envelope-from ); Tue, 23 Aug 2022 16:39:19 +0000 Message-ID: <9674adec-7973-398b-6367-8f9100782eb0@suse.cz> Date: Tue, 23 Aug 2022 18:39:19 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.1.2 Subject: Re: [PATCH 2/5] mm/slub: restrict sysfs validation to debug caches and make it safe Content-Language: en-US To: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Rongwei Wang , Christoph Lameter , Joonsoo Kim , David Rientjes , Pekka Enberg , Roman Gushchin , linux-mm@kvack.org, Sebastian Andrzej Siewior , Thomas Gleixner , Mike Galbraith References: <20220812091426.18418-1-vbabka@suse.cz> <20220812091426.18418-3-vbabka@suse.cz> From: Vlastimil Babka In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661272761; a=rsa-sha256; cv=none; b=DkU15pg2nMBk8w/r1dHfx4dAp9lQTK0BObRKjC5w9bQCB/wOmQtw37bn5orZQ97IZlThG9 WLzloDejkG3E7SqY6vcNdcX6YKWcnVXdC5cNO+J42burTSw0VU9ZcivRWNwClMkd+9Fnqy Eajtg9QbvKF2EG1w5IY+fDYa9jDyE+I= ARC-Authentication-Results: i=1; imf31.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=fH6HIcA5; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=+T0hfBtt; spf=pass (imf31.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.29 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661272761; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=APF4S/U/sEkDmpY003a4wVnaH+ckSeuJHRoeO0TDF2k=; b=K2L4IyYpqoHbhwaWXoPGBs0/fPEieD5R9kVjbvy8/rhtrgLtYmBadOvdUTnjWIJxHRYhqv IlT1pj+wQUOc4DY3np64Sk4B8WBnSPdSHP3f8eJEAK+56x4AtVqDqCiUt104B+pNzFkRNV jCuPNKe6qfFtSyOUhA2pQtrZuv1W5V0= X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 51DD620035 Authentication-Results: imf31.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=fH6HIcA5; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=+T0hfBtt; spf=pass (imf31.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.29 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none X-Stat-Signature: jucozoqexjmj6s4de7m5uwsryuhtw63g X-HE-Tag: 1661272761-554103 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 8/14/22 16:39, Hyeonggon Yoo wrote: > On Fri, Aug 12, 2022 at 11:14:23AM +0200, Vlastimil Babka wrote: >> Rongwei Wang reports [1] that cache validation triggered by writing to >> /sys/kernel/slab//validate is racy against normal cache >> operations (e.g. freeing) in a way that can cause false positive >> inconsistency reports for caches with debugging enabled. The problem is >> that debugging actions that mark object free or active and actual >> freelist operations are not atomic, and the validation can see an >> inconsistent state. >> >> For caches that do or don't have debugging enabled, additional races >> involving n->nr_slabs are possible that result in false reports of wrong >> slab counts. >> >> This patch attempts to solve these issues while not adding overhead to >> normal (especially fastpath) operations for caches that do not have >> debugging enabled. Such overhead would not be justified to make possible >> userspace-triggered validation safe. Instead, disable the validation for >> caches that don't have debugging enabled and make their sysfs validate >> handler return -EINVAL. >> >> For caches that do have debugging enabled, we can instead extend the >> existing approach of not using percpu freelists to force all alloc/free >> perations to the slow paths where debugging flags is checked and acted >> upon. There can adjust the debug-specific paths to increase n->list_lock >> coverage against concurrent validation as necessary. > > s/perations/operations OK >> @@ -1604,9 +1601,9 @@ static inline >> void setup_slab_debug(struct kmem_cache *s, struct slab *slab, void *addr) {} >> >> static inline int alloc_debug_processing(struct kmem_cache *s, >> - struct slab *slab, void *object, unsigned long addr) { return 0; } >> + struct slab *slab, void *object) { return 0; } >> >> -static inline int free_debug_processing( >> +static inline void free_debug_processing( >> struct kmem_cache *s, struct slab *slab, >> void *head, void *tail, int bulk_cnt, >> unsigned long addr) { return 0; } > > IIRC As reported by bot on earlier patch, void function > should not return 0; OK >> +/* >> + * Called only for kmem_cache_debug() caches to allocate from a freshly >> + * allocated slab. Allocate a single object instead of whole freelist >> + * and put the slab to the partial (or full) list. >> + */ >> +static void *alloc_single_from_new_slab(struct kmem_cache *s, >> + struct slab *slab) >> +{ >> + int nid = slab_nid(slab); >> + struct kmem_cache_node *n = get_node(s, nid); >> + unsigned long flags; >> + void *object; >> + >> + spin_lock_irqsave(&n->list_lock, flags); >> + >> + object = slab->freelist; >> + slab->freelist = get_freepointer(s, object); >> + slab->inuse = 1; >> + >> + if (!alloc_debug_processing(s, slab, object)) { >> + /* >> + * It's not really expected that this would fail on a >> + * freshly allocated slab, but a concurrent memory >> + * corruption in theory could cause that. >> + */ >> + spin_unlock_irqrestore(&n->list_lock, flags); >> + return NULL; >> + } >> + > > Nit: spin_lock_irqsave() can be here as freshly allocated > slab has no other reference. Right. >> +out: >> + if (checks_ok) { >> + void *prior = slab->freelist; >> + >> + /* Perform the actual freeing while we still hold the locks */ >> + slab->inuse -= cnt; >> + set_freepointer(s, tail, prior); >> + slab->freelist = head; >> + >> + /* Do we need to remove the slab from full or partial list? */ >> + if (!prior) { >> + remove_full(s, n, slab); >> + } else if (slab->inuse == 0) { >> + remove_partial(n, slab); >> + stat(s, FREE_REMOVE_PARTIAL); >> + } >> + >> + /* Do we need to discard the slab or add to partial list? */ >> + if (slab->inuse == 0) { >> + slab_free = slab; >> + } else if (!prior) { >> + add_partial(n, slab, DEACTIVATE_TO_TAIL); >> + stat(s, FREE_ADD_PARTIAL); >> + } >> + } >> + >> + if (slab_free) { >> + /* >> + * Update the counters while still holding n->list_lock to >> + * prevent spurious validation warnings >> + */ >> + dec_slabs_node(s, slab_nid(slab_free), slab_free->objects); >> + } > > This looks good but maybe kmem_cache_shrink() can lead to > spurious validation warnings? Good catch, I'll fix that too. > > Otherwise looks good to me! > Thanks!