From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 743F8C5475B for ; Wed, 6 Mar 2024 21:55:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E59F66B0096; Wed, 6 Mar 2024 16:55:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E07EC6B0098; Wed, 6 Mar 2024 16:55:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CCF8C6B0099; Wed, 6 Mar 2024 16:55:28 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id BE1CF6B0096 for ; Wed, 6 Mar 2024 16:55:28 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 67BB11A0C01 for ; Wed, 6 Mar 2024 21:55:28 +0000 (UTC) X-FDA: 81867971136.14.66C0896 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf23.hostedemail.com (Postfix) with ESMTP id C3000140006 for ; Wed, 6 Mar 2024 21:55:26 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=jFqxCTgX; spf=none (imf23.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709762126; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=j78ysYcZ4IHY0/SrxHC+yDt2s+ZDLWDzhMxJVsf8RwI=; b=fDQYPVxuLVx+XBsIUhV8mQ0z7CjOeLUca8Rf+p10jJ4fDS25mSF5B0NIMebc+KIQp8wKSd rMvD4hY/KpPs0Kt8gVXgoufvtjoOx7Q+ycHTb9qUz2r6xhzj+L4xnzjIpTpU0WJ0mMfw2R kMBs4mEjYHH5a4vCKaVlv7bOd1rXKBY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709762126; a=rsa-sha256; cv=none; b=EaEThDa1ihzyRdRBhia3Oc6eBhmm5FsZpqHPBQiUfdml+J3unF5RQ/v/pTnKf2k9Xq2eJX Fa4/dczJh/7RM02UU/edWUa99eOnj9Uz17PQx3GqB5yJ/semM9PoYdOwPVx01Avvz599L1 r+NTXAFR5MJDkHq6G3RpAyH93d3Wy0A= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=jFqxCTgX; spf=none (imf23.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=j78ysYcZ4IHY0/SrxHC+yDt2s+ZDLWDzhMxJVsf8RwI=; b=jFqxCTgXF2AuwMPBrfJQhQxWTS afgjzhQcbCw+xWTMjjm80V6hPLd+Ean1rSvGWUWGYtoCCX0DJTw84XeXD4fsMaKgpVDFvLxOn/Sy6 8JXOplMNRuWpXghiMF62NJXXiWuexOR99tuY9d2MAoh+JxI7XxmpVtq/VXMKWSuJsp9/oY0MSr2sr 9Y1+NFyjCNou/qVPNwQ3SiQfNi8jJUdqP/1xbcby8Nk40/jm2+cn3+Huf0N3ZGDTAbfyfQnBn1388 gxUCYAmtj3XOnEstjE0qmvulpLq6N3x7sDrGVHBqikjtiRzemME6q7axG7MGhQqbB6fP4+7wL29ta 05o7e16A==; Received: from willy by casper.infradead.org with local (Exim 4.97.1 #2 (Red Hat Linux)) id 1rhzES-00000007hC1-0alv; Wed, 06 Mar 2024 21:55:24 +0000 Date: Wed, 6 Mar 2024 21:55:24 +0000 From: Matthew Wilcox To: Zi Yan Cc: Ryan Roberts , Andrew Morton , linux-mm@kvack.org, Yang Shi , Huang Ying Subject: Re: [PATCH v3 10/18] mm: Allow non-hugetlb large folios to be batch processed Message-ID: References: <20240227174254.710559-1-willy@infradead.org> <20240227174254.710559-11-willy@infradead.org> <367a14f7-340e-4b29-90ae-bc3fcefdd5f4@arm.com> <85cc26ed-6386-4d6b-b680-1e5fba07843f@arm.com> <36bdda72-2731-440e-ad15-39b845401f50@arm.com> <03CE3A00-917C-48CC-8E1C-6A98713C817C@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: C3000140006 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: 6m6g3zwwmuo68f9gc6nf46isu55tfgpk X-HE-Tag: 1709762126-719505 X-HE-Meta: U2FsdGVkX19e7LVvWXgEkHzlFlRtMHdtBCo8u9lAxGcvJLnwS5L1qGb59SE8+OYDepCbabFHyUaTEsJu8Be25QU8nG+qHKdYVQfEjWNQI01BXvw0dk41avRMO8xcyrROmDJRm6ApyfvP0g57cNUMDVnkcaCuKFE/2dL2PPq/UlcnlyKguGr521WvR5TSjBV/pJfE35Rqs/JGbNA0X+Lo/VRa+7FxhTQhTqFvH1A0fJraUnfhBGTSdFibuRlgDBVD8mT3RjsR6n9myBAbmDbh/tdGr8d6jfylVNcwlb/zJwZOp0PZ3aC4RaX/MWvCl8Nvf6WlXM2hg3UVbWIA4NOlmMdXoGy4MHcZhddtj87eZRh2m27s++lbZoM2DFynOUUGKkb+1LCZdZ0M9R67ee1ciPljJR4AvrBnrGM5y2DORtGdv1H7gGJH06M0bM8IRQLzpTENusRiV/+woisVxsbekAAvs3wLhrICCOGTwf8JhvgVhhXJiw06xjnf0nhA2MsSquZIF5xo5Y1Jv1G0Wda8rnAcyotBx+cXgEsEHdMTTyyNJTMGUs2y9IW3Y+jUxgvtGOrGBUmFEY+2AnNHlAGIEXkieGP2D1IwLdoSevOwnK56/RBchfNz+EkaJZbnj3/BhPrLdB7sFxNfnoP24IxpmE7Z95ZVtYDo+kl3E0GvtYi3yV9cBFzyF/TH5uERx3D5AMp4q/f/EuGmLyum7iVi7cc5mS3lryB6Jvp55bg3gIVPkWn4bsH3mErF383BCQ3m0SHDirf9LIO0mRu8t3+Hf4TXzB+7ShoGM9m+mkhIfHR7ehx4roamg9jM60cDiPK0bDxSC8m7yYfr+FBEHcJ5oE9SxOM21LEVGAckW53K+xSzE3rZyQZsYdFo13WJC80dDagRn1sYVF2ofgIyQhz61AJx4Ay18gcqtvPJQf+j20wG6n9YV6b2Zf7QlQ2qmchebBM51y+lKVvY14QJaSF dTQKWjmO OAcE0V6+/twtwGVOfputktDlE15h+Yctk+3v/kQ5d8dSNs/M8JDJf4OcKvxsCq7EqxzJ0DoIwrGjwOqeU+krGyo3LfS74aZsw5RIZmzYG1u9Difi/T9bCbYNfjKlbTmxUnYfecQJh75hBulg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Mar 06, 2024 at 07:55:50PM +0000, Matthew Wilcox wrote: > Hang on, I think I see it. It is a race between folio freeing and > deferred_split_scan(), but page migration is absolved. Look: > > CPU 1: deferred_split_scan: > spin_lock_irqsave(split_queue_lock) > list_for_each_entry_safe() > folio_try_get() > list_move(&folio->_deferred_list, &list); > spin_unlock_irqrestore(split_queue_lock) > list_for_each_entry_safe() { > folio_trylock() <- fails > folio_put(folio); > > CPU 2: folio_put: > folio_undo_large_rmappable > ds_queue = get_deferred_split_queue(folio); > spin_lock_irqsave(&ds_queue->split_queue_lock, flags); > list_del_init(&folio->_deferred_list); > *** at this point CPU 1 is not holding the split_queue_lock; the > folio is on the local list. Which we just corrupted *** > > Now anything can happen. It's a pretty tight race that involves at > least two CPUs (CPU 2 might have been the one to have the folio locked > at the time CPU 1 caalled folio_trylock()). But I definitely widened > the window by moving the decrement of the refcount and the removal from > the deferred list further apart. > > > OK, so what's the solution here? Personally I favour using a > folio_batch in deferred_split_scan() to hold the folios that we're > going to try to remove instead of a linked list. Other ideas that are > perhaps less intrusive? I looked at a few options, but I think we need to keep the refcount elevated until we've got the folios back on the deferred split list. And we can't call folio_put() while holding the split_queue_lock or we'll deadlock. So we need to maintain a list of folios that isn't linked through deferred_list. Anyway, this is basically untested, except that it compiles. Opinions? Better patches? diff --git a/mm/huge_memory.c b/mm/huge_memory.c index fd745bcc97ff..0120a47ea7a1 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3312,7 +3312,7 @@ static unsigned long deferred_split_scan(struct shrinker *shrink, struct pglist_data *pgdata = NODE_DATA(sc->nid); struct deferred_split *ds_queue = &pgdata->deferred_split_queue; unsigned long flags; - LIST_HEAD(list); + struct folio_batch batch; struct folio *folio, *next; int split = 0; @@ -3321,37 +3321,41 @@ static unsigned long deferred_split_scan(struct shrinker *shrink, ds_queue = &sc->memcg->deferred_split_queue; #endif + folio_batch_init(&batch); spin_lock_irqsave(&ds_queue->split_queue_lock, flags); /* Take pin on all head pages to avoid freeing them under us */ list_for_each_entry_safe(folio, next, &ds_queue->split_queue, _deferred_list) { - if (folio_try_get(folio)) { - list_move(&folio->_deferred_list, &list); - } else { - /* We lost race with folio_put() */ - list_del_init(&folio->_deferred_list); - ds_queue->split_queue_len--; + if (!folio_try_get(folio)) + continue; + if (!folio_trylock(folio)) + continue; + list_del_init(&folio->_deferred_list); + if (folio_batch_add(&batch, folio) == 0) { + --sc->nr_to_scan; + break; } if (!--sc->nr_to_scan) break; } spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); - list_for_each_entry_safe(folio, next, &list, _deferred_list) { - if (!folio_trylock(folio)) - goto next; - /* split_huge_page() removes page from list on success */ + while ((folio = folio_batch_next(&batch)) != NULL) { if (!split_folio(folio)) split++; folio_unlock(folio); -next: - folio_put(folio); } spin_lock_irqsave(&ds_queue->split_queue_lock, flags); - list_splice_tail(&list, &ds_queue->split_queue); + while ((folio = folio_batch_next(&batch)) != NULL) { + if (!folio_test_large(folio)) + continue; + list_add_tail(&folio->_deferred_list, &ds_queue->split_queue); + } spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); + folios_put(&batch); + /* * Stop shrinker if we didn't split any page, but the queue is empty. * This can happen if pages were freed under us.