From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F01F7C54E41 for ; Wed, 6 Mar 2024 19:55:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5EDE06B0072; Wed, 6 Mar 2024 14:55:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 59D806B0074; Wed, 6 Mar 2024 14:55:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 465466B0075; Wed, 6 Mar 2024 14:55:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 32F1E6B0072 for ; Wed, 6 Mar 2024 14:55:58 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id D726C140A61 for ; Wed, 6 Mar 2024 19:55:57 +0000 (UTC) X-FDA: 81867669954.03.7C82FD8 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf24.hostedemail.com (Postfix) with ESMTP id 15C3A180008 for ; Wed, 6 Mar 2024 19:55:55 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=IPaVxI1M; dmarc=none; spf=none (imf24.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709754956; a=rsa-sha256; cv=none; b=XNzZvqbd/aHXOTpQ218mZhIhbVpLchMT9E+q6DkTHT1uCoKHG74RkBkMr9qSTsKAjjByzg LygSgueOKK9JqCz5ERQa18un+JgAXqNzYEvGA9u5giNBrYgVcsQQ+e8PBw0SjJGGVlR4UC wZdjAhQCMTN80b95DEu6X/o9o3c19cs= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=IPaVxI1M; dmarc=none; spf=none (imf24.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709754956; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=v9OFRFLYA05SvpcRB09/7YlcsM7+30YuTP/HEKXjtCQ=; b=lx/JC/xwV2fqoO27cfFyoPCwjx6ymdTmq8SMJPizmLD53/Fvgl4PqhnBn1fNMQpUmqNato obmOr2HF0O5BuDX1s1KuSkDPbgU8jJLI9TmZAa4C1GPJ3+WQ1496jUfzbNU/O0se1GMR+m fTtw1R8+wSNyhhlhtRy0vNKhs81HWm0= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=v9OFRFLYA05SvpcRB09/7YlcsM7+30YuTP/HEKXjtCQ=; b=IPaVxI1MoFRj0e8K7hftFayP/Y 2adEnUdAyvqRvISJzq5MQd6j6DYFRPSnSL4JgYcVpQq42JSKkvlt42lyXR8PPT0yXCLcatUMA2jrg r4kCWdghDHiEYvttsdqudXIxswpUc+amv4o0vDe+R5ug7qYn24ucU04O0qECIqmewlF23H5uinZvz vA85tu5Eu4w7jv89YR4kax2u7W2tFIC7ZeKNpzy7F3/qvw3eO/zdR/VlRh1bAXCCZN+XPvTPVdzSj eV5uHxYSMCYu5rqyAiyQarv6PmZJ9pmo4EjHW9r4NGCvm3+7g++C4c8FNebAS/zlIJmFlqr8KAonQ +bQh/aEw==; Received: from willy by casper.infradead.org with local (Exim 4.97.1 #2 (Red Hat Linux)) id 1rhxMk-00000007VE6-1S77; Wed, 06 Mar 2024 19:55:50 +0000 Date: Wed, 6 Mar 2024 19:55:50 +0000 From: Matthew Wilcox To: Zi Yan Cc: Ryan Roberts , Andrew Morton , linux-mm@kvack.org, Yang Shi , Huang Ying Subject: Re: [PATCH v3 10/18] mm: Allow non-hugetlb large folios to be batch processed Message-ID: References: <20240227174254.710559-1-willy@infradead.org> <20240227174254.710559-11-willy@infradead.org> <367a14f7-340e-4b29-90ae-bc3fcefdd5f4@arm.com> <85cc26ed-6386-4d6b-b680-1e5fba07843f@arm.com> <36bdda72-2731-440e-ad15-39b845401f50@arm.com> <03CE3A00-917C-48CC-8E1C-6A98713C817C@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <03CE3A00-917C-48CC-8E1C-6A98713C817C@nvidia.com> X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 15C3A180008 X-Stat-Signature: xhqdd6r5m1fu4anefj1jezru3mq1qtrq X-HE-Tag: 1709754955-962090 X-HE-Meta: U2FsdGVkX19jMz3DsLH7T8bnPA8+yaFe5/CALYOyPquwrzUEVAleZkxzJ1cArcMqEDW4E6s02X+QHe56sSkNDh8iaqrHJy3U20/qSIZblrO0EFCJj7MIXhYohHHeQI7fM5liACOgVk+daB6PVNGoGZ6K6mV3fQIJY91vp6s2chM7J1jNC41IUpAzErD0fkOtSVQCn9CHLAav30wgRuAQayNTm+9EAp39wvRTrjM3N9W/21Xuupm50B1HFO5GSqkNeYxS0mG1XzBBHMB8Pmsi22u7rFE//D/rBx1eoWFOXNmRVA1pZL1GHxrrQJ6tfJYIiprl2TeBZmhRgrHYa0rzzSO0DvmEGLMNfqpyxed9/nwxuivJB2eVPC9nQmLjm748Z7iLFxhzR+RJpbIJ6R8yybNGpUfYG/3LIBb+0S9OyyKFqLseNaPAjeJ4NVC6hhK8JIUMwlLINX4KTzWC4kkEu02f0aEf8IToXgitTz5sjq/8GRi21Z5P/LnPbBBFotwVJThYAw+GLnVwoLyrk+G6XMaCNnmR1KetEA5qJAApfG7reEZ6M8XC5p1h2oKoRUAV9DhWrpogBAXnZAS8zFNQABFZvVkZlyCuwJEClGJ2NKBnfpndDJQtxstkv3Ni7Tbq2udwTU1YwtSAj1pzWsQTJ+ljQ0r1+j6g1FEpiAcbtX/Vup6Uu+cFVcLP2MUyvVfujLYBYM+jiVzNNPHbTmfZz3OzXJ6JW9zVv3aH4a2iRwy4O7K2NHKqdudqvvSLO0Z5zgNgKfZz0EqyEHnQw2VlxtRwAyfqS0TsL2+H9BQvcr1f1pUYRY7BZp5zAM40w2OFyF54nduBKp4zdB1VYr+e0A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Mar 06, 2024 at 01:41:13PM -0500, Zi Yan wrote: > I had a chat with willy on the deferred list mis-handling. Current migration > code (starting from commit 616b8371539a6 ("mm: thp: enable thp migration in > generic path")) does not properly handle THP and mTHP on the deferred list. > So if the source folio is on the deferred list, after migration, > the destination folio will not. But this seems a benign bug, since > the opportunity of splitting a partially mapped THP/mTHP is gone. > > In terms of potential races, the source folio refcount is elevated before > migration, deferred_split_scan() can move the folio off the deferred_list, > but cannot split it. During folio_migrate_mapping() when folio is frozen, > deferred_split_scan() cannot move the folio off the deferred_list to begin > with. > > I am going to send a patch to fix the deferred_list handling in migration, > but it seems not be related to the bug in this email thread. ... IOW the source folio remains on the deferred list until its refcount goes to 0, at which point we call folio_undo_large_rmappable() and remove it from the deferred list. A different line of enquiry might be the "else /* We lost race with folio_put() */" in deferred_split_scan(). If somebody froze the refcount, we can lose track of a deferred-split folio. But I think that's OK too. The only places which freeze a folio are vmscan (about to free), folio_migrate_mapping() (discussed above), and page splitting. In none of these cases do we want to keep the folio on the deferred split list because we're either freeing it, migrating it or splitting it. Oh, and there's something in s390 that I can't be bothered to look at. Hang on, I think I see it. It is a race between folio freeing and deferred_split_scan(), but page migration is absolved. Look: CPU 1: deferred_split_scan: spin_lock_irqsave(split_queue_lock) list_for_each_entry_safe() folio_try_get() list_move(&folio->_deferred_list, &list); spin_unlock_irqrestore(split_queue_lock) list_for_each_entry_safe() { folio_trylock() <- fails folio_put(folio); CPU 2: folio_put: folio_undo_large_rmappable ds_queue = get_deferred_split_queue(folio); spin_lock_irqsave(&ds_queue->split_queue_lock, flags); list_del_init(&folio->_deferred_list); *** at this point CPU 1 is not holding the split_queue_lock; the folio is on the local list. Which we just corrupted *** Now anything can happen. It's a pretty tight race that involves at least two CPUs (CPU 2 might have been the one to have the folio locked at the time CPU 1 caalled folio_trylock()). But I definitely widened the window by moving the decrement of the refcount and the removal from the deferred list further apart. OK, so what's the solution here? Personally I favour using a folio_batch in deferred_split_scan() to hold the folios that we're going to try to remove instead of a linked list. Other ideas that are perhaps less intrusive?