From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 01FA2D116F4 for ; Fri, 25 Oct 2024 05:41:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 475206B0082; Fri, 25 Oct 2024 01:41:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 425636B0083; Fri, 25 Oct 2024 01:41:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2ED7C6B0085; Fri, 25 Oct 2024 01:41:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 0F67E6B0082 for ; Fri, 25 Oct 2024 01:41:38 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id C7303C03B1 for ; Fri, 25 Oct 2024 05:41:16 +0000 (UTC) X-FDA: 82711026888.24.D26C196 Received: from mail-ot1-f43.google.com (mail-ot1-f43.google.com [209.85.210.43]) by imf30.hostedemail.com (Postfix) with ESMTP id 1843380003 for ; Fri, 25 Oct 2024 05:40:58 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Q7THIa5N; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf30.hostedemail.com: domain of hughd@google.com designates 209.85.210.43 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729834782; a=rsa-sha256; cv=none; b=0pFgxz7YEDln8b2FOTlb6jHnzoFdZtVQKa2woP+eJwasGlEJSn72PHf/IBf2sOyNuIUl0K U4bdZOT3I8Veege1q6uhS7e25f1fL9FIA+wnShfMBUk3Q+aQMNVELE+VDdv6K7bfREtfYg 0E/wLm9dt4c5wqILWuOuDhEu1caXx3w= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Q7THIa5N; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf30.hostedemail.com: domain of hughd@google.com designates 209.85.210.43 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729834782; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=U8gDvgJg3Z19JQm8nPvjNW9GLKb42/hZuKryzCAKj00=; b=Zl1sEG2LggRr+54JJQvaYZhIwVSV/nHplxKsjXqnseRJe6Gjqi8JrCpvAYAtbwKgV/QxfS rO3wd4wHO2PAkQPpij96IUnl1oOlXre8hnNFFp9bJZ1UK2jggojS+7IxGh2vceb/pejwer TAfT/Ra/sTScrEr0mhTMdxjK+ilnEpc= Received: by mail-ot1-f43.google.com with SMTP id 46e09a7af769-7180dc76075so915814a34.3 for ; Thu, 24 Oct 2024 22:41:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1729834894; x=1730439694; darn=kvack.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=U8gDvgJg3Z19JQm8nPvjNW9GLKb42/hZuKryzCAKj00=; b=Q7THIa5NdvNMuFNqxa3bssMnxzqQX1r6PWNhiwUBvjWqXDOWbzK9KfL6k8T65iio3R ghUOQQgDrVH6MrYg+gUmyU3rkF1xafyV2xllnSmdCXrAk5slrgWwsOMnr6mXlM/nuRog GnFtoYIoeqhbjjaH1Qgh6NfAhVyRRoLAZRGvFeFEcU7VTdwJlYa4VDGFLAev9mzhchMv R/elNS5sLu/sZUOivvGAzWzQgPi3s2yEg2o5VNA57cNUGFAkS03ZHepZ0ZXN73lNBC7A 4PSGsf0TROg9AMyQ04zDXPdaFI/2TP6NuQIiW+Hfbd+RCHAYSGVaYiRMKMrjrk4Sc9WR CiUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729834894; x=1730439694; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=U8gDvgJg3Z19JQm8nPvjNW9GLKb42/hZuKryzCAKj00=; b=C3h2RxbD/gssucM0HFcvg0F5aLlaX18qrpeuRs8uaXphfNoNu4+VTNURn7Tg8Yvk2D 9k9p9OFP5F/RUdmd1Onad4LSu45QSCNYZ6fzwWkA+VWNSDcB66NfgbRRPTXmZns1lESb 8/PwB7t4gkHAYboC6y9eKiVJyrR/yPSgPTWZqeC4g0ifU4CElauoBfUrUqbbVgkrRfL4 0UVnlFuJ0tdLCZTEociBbNhiGLrP2RS18+2l0Nd82n4WaUScNIboBSTW951wzfNmXsNq s6UUNQSOi8kUiTZ1LnIrw+Es2sCdulfQ4O8QvxJfpp/7rPQIv1LYm17Ad/CbHcZCcklt UHnw== X-Forwarded-Encrypted: i=1; AJvYcCX8N88e0XEM8hKGg0LdVGLI7SS+rKFKhND0dfr12tKDDl+R1Pu0K/WYc4+TDDodNY6nglB/cc/mnQ==@kvack.org X-Gm-Message-State: AOJu0YzrrhEcFmpDe30zmGe7XtMFKoaO2NJreEcuprJS8Fl04L1esTWE sfYx8ruSN78ZJ3plbiEROCECYtJD3hkR2OFtUl75AXyDazu0Ck//C7Vku8ILeQ== X-Google-Smtp-Source: AGHT+IH35LbnqZnz2MEhPcWo8pALPNMvotL/XItOiTeeEUgG53iEA89bzekf4xU3ZhH9A1wuQ24HeA== X-Received: by 2002:a05:6830:310c:b0:717:f864:90cb with SMTP id 46e09a7af769-7185940e4admr4008880a34.4.1729834894298; Thu, 24 Oct 2024 22:41:34 -0700 (PDT) Received: from darker.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-72057a3af59sm332273b3a.186.2024.10.24.22.41.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 24 Oct 2024 22:41:32 -0700 (PDT) Date: Thu, 24 Oct 2024 22:41:21 -0700 (PDT) From: Hugh Dickins To: Zi Yan cc: Hugh Dickins , Andrew Morton , Usama Arif , Yang Shi , Wei Yang , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Johannes Weiner , Baolin Wang , Barry Song , Kefeng Wang , Ryan Roberts , Nhat Pham , Chris Li , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH hotfix 1/2] mm/thp: fix deferred split queue not partially_mapped In-Reply-To: <3A1E5353-D8C5-4D38-A3FF-BFC671FC25CE@nvidia.com> Message-ID: <966a4aff-f587-c4bb-1e10-2673734c2aa0@google.com> References: <760237a3-69d6-9197-432d-0306d52c048a@google.com> <3A1E5353-D8C5-4D38-A3FF-BFC671FC25CE@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspamd-Queue-Id: 1843380003 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: pypf6rs4yib4kth57iehbi8p6h6eqccw X-HE-Tag: 1729834858-27948 X-HE-Meta: U2FsdGVkX18Zdlule1KOgTmBDroM2+Qo0vT5sA7lCj8M1HrfwzTJEkH5c5oSw9fc8puOTbpR0ruDcz02P5H4pSSHnYC+Qmip/V845lnoaHy2T561QXMmxB6phb6STIV4Rbna4QP5h+FUfyhy/0aUVvKaC28KdmXDJzDHNxim/iUSkqjpCkxYlelfWozLtmsknYAm6lslxclajAnohJhKE4V/CfCKu+wCpLPJZyF6HY20zcT0clBHnlyQJvigHqq4rqDLcZAzI+fivLlrL514PAjcdHqPyfUonG14W8kUlcBnTo6UTVmG7TbS6M3gULgCy7hiOZeewBptnxb1EM55gFhoGpdbT+u5Y6R/LpufMDTcX4xcHcZhCUktmxf8v8lvnugzJRtZbuarAd763YitOLwE8GiVJlFAJANBoRMyIGg2XZn/vgq+lZUWI/rbm54+vo9RlVEeuEDCFxkMtvTB2ldlFPJDKSv8sXLTGiDk4Td0z8wA9QgM4JDHpV8Z/YIhPnvRMUk2InKLCCkFLp6t9BehMUHkHsZNKxmM7P5eUXfzhb6QIeyCvBqZsXl3QHW0nHqevBrZjyGuuHnRz6mqLJnIctluZbPMP0Px9Z7b3E6RstMjyAzsjU8V9JcryQo5Zkz5GCHL4IjAJz6wXbx2dLpbZ4PNHy4NmP5WHIoLZcWnjpr2lbfnOhl+DQ/9l2c59I+VPZEnZ7zPmgn8oLGVNb6m+Mxevw7crq0xlMLZd8ANpiZulITpSlhzrgH+uD6T1AYD/K6zXWqs7QuqPGFQCpf3pM0GhOFaH3zO5ePzQhhTs0dLkt6raxpHSFIu6m2eyQZ7NnyOQOaRJjfW6kknTr594o6M89WSbpkiU2vC+djw3vSAWgzEV4j5V78bHX+gfLKqbUuXRiNihqMeEI+itDJXjTKZFnW+vV5jbtTwBMO18jF3M1zNlK4dam9ddnwUuuqjTj+yKnuYGnLLfeW MSWFEEaX LCWNIpx0x12Bg1/UmPaP3PsoHgRAaqWFY2ZavIQaXMcNNad4OKo9CT3cuqMN09RcBY0xzuDKQw8TwcZftqfb+TTraCYdccgEWPeYxxQ7tbH8tPnUFzZGeTf3Jp8lmTAssaJob6Uin37Q5Vi3wfyW5hidLOyPDeSQkEThz8KQHxLbEp8LRC0126Goz5fRmj8iL+PIz/viFU+UbzxnqJujTwCHlZxgJFbUmkgX8CKO+SZajk/FiNSK0uzGlFSG0k2G4YDTHO/1tZukaV2pvzQUKgmo+WsMnkMAM8L6zTZpOIo8AwVT6gB7algSA/yxquKMfzCl+W+fJ/Oju5MZkwul2IaKdJLjkEU38WaEoUrwhu55z8LDJmvhaIzu4tWRisLWx3oUsUOsXBEEet2ICBclk19MokIeuoR7+DDgjG1eV33mtS5rPcR1gmf1yTKG2EaT+FbObSVBOj40sG5v8dcB+g6lJSRILxOSgFqF/ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, 24 Oct 2024, Zi Yan wrote: > On 24 Oct 2024, at 0:10, Hugh Dickins wrote: > > > The new unlocked list_del_init() in deferred_split_scan() is buggy. > > I gave bad advice, it looks plausible since that's a local on-stack > > list, but the fact is that it can race with a third party freeing or > > migrating the preceding folio (properly unqueueing it with refcount 0 > > while holding split_queue_lock), thereby corrupting the list linkage. > > > > The obvious answer would be to take split_queue_lock there: but it has > > a long history of contention, so I'm reluctant to add to that. Instead, > > make sure that there is always one safe (raised refcount) folio before, > > by delaying its folio_put(). (And of course I was wrong to suggest > > updating split_queue_len without the lock: leave that until the splice.) > > I feel like this is not the right approach, since it breaks the existing > condition of changing folio->_deferred_list, namely taking > ds_queue->split_queue_lock for serialization. The contention might not be > as high as you think, since if a folio were split, the split_queue_lock > needed to be taken during split anyway. So the worse case is the same > as all folios are split. Do you see significant perf degradation due to > taking the lock when doing list_del_init()? > > I am afraid if we take this route, we might hit hard-to-debug bugs > in the future when someone touches the code. You have a good point: I am adding another element of trickiness to that already-tricky local-but-not-quite list - which has tripped us up a few times in the past. I do still feel that this solution is right in the spirit of that list; but I've certainly not done any performance measurement to justify it, nor would I ever trust my skill to do so. I just tried to solve the corruptions in what I thought was the best way. (To be honest, I found this solution to the corruptions first, and thought the bug went back to the original implemention: that its put_page() at the end of the loop was premature all along. It was only when writing the commit message two days ago, that I came to realize that even put_page() or folio_put() would be safely using the lock to unqueue: that it is only this new list_del_init() which is the exception which introduces the bug.) Looking at vmstats, I'm coming to believe that the performance advantage of this way is likely to be in the noise: that mTHPs alone, and the !partially_mapped case on top, are greatly increasing the split_deferred stats: and may give rise to renewed complaints of lock contention, with or without this optimization. While I still prefer to stick with what's posted and most tested, I am giving the locked version a run overnight. Thanks a lot for the reviews and acks everyone: at present Zi Yan is in the minority preferring a locked version, but please feel free to change your vote if you wish. Hugh