From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1F568C54F61 for ; Tue, 19 Mar 2024 14:40:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7A88F6B0082; Tue, 19 Mar 2024 10:40:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 757C16B008A; Tue, 19 Mar 2024 10:40:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5F8AF6B008C; Tue, 19 Mar 2024 10:40:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 512246B0082 for ; Tue, 19 Mar 2024 10:40:21 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 1510E1211F0 for ; Tue, 19 Mar 2024 14:40:21 +0000 (UTC) X-FDA: 81914049042.11.5E67FAC Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf27.hostedemail.com (Postfix) with ESMTP id B71174001E for ; Tue, 19 Mar 2024 14:40:18 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=none; spf=pass (imf27.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710859219; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1GqtNYte7QrrIm9cCDvngaVI1mfgniZJKrkD7xFS7Vw=; b=npNtJCM2moZTbb8PTSlR8gJeoA1cdwvgY1u3fLfU1VoOrBNRV16Udw8xkNoW9RlNwypC3S T+ZT9EgYNXDnaHHs6PpDDJ1YHPIrnUEhimI0gZlHFReoxbLvQR3gV3lb3RyQtPRmGtMHDg Ffpdn3b7wRIKUz6YEKDmlZ5lB4CWm1Y= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710859219; a=rsa-sha256; cv=none; b=G1ObuusEvfahq1Ca9zDIL1INGbn7TJPalmlVwmvTgqJw1+pHZKW2lo2VMFiBQGML+KjjyE FhNMUOZvbj7i/LOL4xyf+1tCTJDnooLqAYfCRUruzTqf53Ild/kitm7KKn20MXrr4C1KHK 37sS7DojZS74GZM//YacOjIo7bewb7Y= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=none; spf=pass (imf27.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 18955106F; Tue, 19 Mar 2024 07:40:52 -0700 (PDT) Received: from [10.1.30.191] (XHFQ2J9959.cambridge.arm.com [10.1.30.191]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id F155D3F762; Tue, 19 Mar 2024 07:40:14 -0700 (PDT) Message-ID: <7df7fdcb-2c3e-43da-824e-586a348ae840@arm.com> Date: Tue, 19 Mar 2024 14:40:12 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v4 5/6] mm: vmscan: Avoid split during shrink_folio_list() Content-Language: en-GB To: Yin Fengwei , David Hildenbrand , "Huang, Ying" Cc: "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , Andrew Morton , Matthew Wilcox , Gao Xiang , Yu Zhao , Yang Shi , Michal Hocko , Kefeng Wang , Barry Song <21cnbao@gmail.com>, Chris Li References: <20240311150058.1122862-1-ryan.roberts@arm.com> <20240311150058.1122862-6-ryan.roberts@arm.com> <1db242d3-5ff1-4ef5-b20a-578a317fa859@arm.com> <5cc147fa-492e-46c3-963d-2f6e6680dce9@arm.com> <878r2gwahg.fsf@yhuang6-desk2.ccr.corp.intel.com> <29f06352-d234-4f4d-b804-9560816ed055@intel.com> From: Ryan Roberts In-Reply-To: <29f06352-d234-4f4d-b804-9560816ed055@intel.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Stat-Signature: 7tf38wwydg68qnbjy7jxxxg6f3odw6xe X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: B71174001E X-Rspam-User: X-HE-Tag: 1710859218-879618 X-HE-Meta: U2FsdGVkX1+L1VUlAJJ+Y71qT9GTgZLVVHyuyUkGveer+8hn4HRyFj5dgxtbQRiIvgBcZfjUcZIiHR3KaTWU3BtIk/3x+sL7BZ7rprpBRgGXThxJNT2voiH+S8RmQJnSKa1J6P053hyvktqGfaX8U0EzqZyzU9F9w1BZ410aQpG+dobCSDaQc5b4zWigpm1jB630jqN2OnLOIT/85mjY8hHR42CkhOfqiF8XS2WOSt7u/PCbQ3z0vwzpVEjyAl2P6WP824LIGWpJbYdyK5je4iPWdaNao4cGgDOOXTTKuBLRIZOm4/ckYv2KV06N3Y4UtXDNao2qrFmlfz4sUlqlyd6JJMBSjHth5e0hgF27lX2NR93ttDkiql80yTuWfb2BquuSRG7bbEqKxifFoVwX3O33tEdVyRcuWpZtjc1l6LEgmJYmfQ8ZFrMXWF3NpRLOGZWswV9WM8HE0OtuQ4PYd/Lji5lJz3earT6wBmrLq2Bw6J+QRPmmfMfEXqW0PnrAbetDEL2k9xjrVvaTwWauEqUg1eNvgm48XojyWwAKO07iEphN52NK7sEv3jEmgHjcQmQXM/wdQFNzcNMcJ51FwX+WqwSfBDTwjGtSRZNlWa1qpyAqi8OoL4eaCGSabJYp17zhfIB5plfCM3EGlocAgIfWVs+tZgvn0GQbuc7dui9eH8o2xn+izkIb9g0nwnuorTIrP53/w8TNDM1hhO6aM5PjU2MojTesXxQ1ITsZcHDlmcksPvnoe8IBv+fYzVkLEgMVZm+5VVj9e3W+Ok+nHGnaCpVl0HyqBQGEbpoBHAEYka++t2ms34Xptt00qE/MqyUdSer4V4Sk3JYk70QEx5PkeQQ3nHtsVSgFzWPDKFnkfCHFaVZAAZMw9jxufXPMXAAM5/BelVzC/AjSXWUI2WZGlYS8oQ3Hdxfwl2XRL/acfuiqRoSKTahOw/UeCzxlYe6gaKjLf1uoykf28Wg BjAQ4yvw PMeT/DoxVYOgJoVNTDEedTMM9tFBmM53ssN3vw6ETzcB5e2XReiAo4RAQXJh3wsZtfqMWAE+RFXEyo3xKnj0L/KefnUKtvjm25B4y2bFdysgFeU0RGZdaN0Y3/2owJFu4AxemLPXvM1cz9cuGHxA5IkzSTG0lrO9xgYEIM+30p2MRWf8wCxevh1eZuOwKYPmlvrzZ2W02l3jKUPs7K/a1TJKQZve2Uy2Uu4BQgSi+ojifXZpp8DMaUmKvGYkszj3ErH7OX8ve3C5VYQxWJD0CUGwmcJ+3lMuY9N5O9b9jd2Ef6s8hIR+53r53V811qaq9CG2/zPpI0LeFHIQpJIVN4uS4fGS+nThjhvEtHinXyH2Z/MpXNDOGr4MdgVyAMYpAuiwzb3zGO5/WSqShGLKGsIaGaA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 19/03/2024 02:20, Yin Fengwei wrote: > > > On 3/18/24 23:35, Ryan Roberts wrote: >> On 18/03/2024 10:05, David Hildenbrand wrote: >>> On 18.03.24 11:00, Yin, Fengwei wrote: >>>> >>>> >>>> On 3/18/2024 10:16 AM, Huang, Ying wrote: >>>>> Ryan Roberts writes: >>>>> >>>>>> Hi Yin Fengwei, >>>>>> >>>>>> On 15/03/2024 11:12, David Hildenbrand wrote: >>>>>>> On 15.03.24 11:49, Ryan Roberts wrote: >>>>>>>> On 15/03/2024 10:43, David Hildenbrand wrote: >>>>>>>>> On 11.03.24 16:00, Ryan Roberts wrote: >>>>>>>>>> Now that swap supports storing all mTHP sizes, avoid splitting large >>>>>>>>>> folios before swap-out. This benefits performance of the swap-out path >>>>>>>>>> by eliding split_folio_to_list(), which is expensive, and also sets us >>>>>>>>>> up for swapping in large folios in a future series. >>>>>>>>>> >>>>>>>>>> If the folio is partially mapped, we continue to split it since we want >>>>>>>>>> to avoid the extra IO overhead and storage of writing out pages >>>>>>>>>> uneccessarily. >>>>>>>>>> >>>>>>>>>> Signed-off-by: Ryan Roberts >>>>>>>>>> --- >>>>>>>>>>      mm/vmscan.c | 9 +++++---- >>>>>>>>>>      1 file changed, 5 insertions(+), 4 deletions(-) >>>>>>>>>> >>>>>>>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c >>>>>>>>>> index cf7d4cf47f1a..0ebec99e04c6 100644 >>>>>>>>>> --- a/mm/vmscan.c >>>>>>>>>> +++ b/mm/vmscan.c >>>>>>>>>> @@ -1222,11 +1222,12 @@ static unsigned int shrink_folio_list(struct >>>>>>>>>> list_head >>>>>>>>>> *folio_list, >>>>>>>>>>                          if (!can_split_folio(folio, NULL)) >>>>>>>>>>                              goto activate_locked; >>>>>>>>>>                          /* >>>>>>>>>> -                     * Split folios without a PMD map right >>>>>>>>>> -                     * away. Chances are some or all of the >>>>>>>>>> -                     * tail pages can be freed without IO. >>>>>>>>>> +                     * Split partially mapped folios map >>>>>>>>>> +                     * right away. Chances are some or all >>>>>>>>>> +                     * of the tail pages can be freed >>>>>>>>>> +                     * without IO. >>>>>>>>>>                           */ >>>>>>>>>> -                    if (!folio_entire_mapcount(folio) && >>>>>>>>>> +                    if (!list_empty(&folio->_deferred_list) && >>>>>>>>>>                              split_folio_to_list(folio, >>>>>>>>>>                                      folio_list)) >>>>>>>>>>                              goto activate_locked; >>>>>>>>> >>>>>>>>> Not sure if we might have to annotate that with data_race(). >>>>>>>> >>>>>>>> I asked that exact question to Matthew in another context bt didn't get a >>>>>>>> response. There are examples of checking if the deferred list is empty >>>>>>>> with and >>>>>>>> without data_race() in the code base. But list_empty() is implemented like >>>>>>>> this: >>>>>>>> >>>>>>>> static inline int list_empty(const struct list_head *head) >>>>>>>> { >>>>>>>>       return READ_ONCE(head->next) == head; >>>>>>>> } >>>>>>>> >>>>>>>> So I assumed the READ_ONCE() makes everything safe without a lock? Perhaps >>>>>>>> not >>>>>>>> sufficient for KCSAN? >>>> I don't think READ_ONCE() can replace the lock. >> >> But it doesn't ensure we get a consistent value and that the compiler orders the >> load correctly. There are lots of patterns in the kernel that use READ_ONCE() >> without a lock and they don't use data_race() - e.g. ptep_get_lockless(). > They (ptep_get_lockless() and deferred_list) have different access pattern > (or race pattern) here. I don't think they are comparable. > >> >> It sounds like none of us really understand what data_race() is for, so I guess >> I'll just do a KCSAN build and invoke the code path to see if it complains. > READ_ONCE() in list_empty will shutdown the KCSAN also. OK, I found some time to run the test with KCSAN; nothing fires. But then I read the docs and looked at the code a bit. Documentation/dev-tools/kcsan.rst states: In an execution, two memory accesses form a *data race* if they *conflict*, they happen concurrently in different threads, and at least one of them is a *plain access*; they *conflict* if both access the same memory location, and at least one is a write. It also clarifies the READ_ONCE() is a "marked access". So we would have a data race if there was a concurrent, *plain* write to folio->_deferred_list.next. This can occur in a couple of places I believe, for example: deferred_split_folio() list_add_tail() __list_add() new->next = next; deferred_split_scan() list_move() list_add() __list_add() new->next = next; So if either partially deferred_split_folio() or deferred_split_scan() can run concurrently with shrink_folio_list(), for the same folio (I beleive both can can), then we have a race, and this list_empty() check needs to be protected with data_race(). The race is safe/by design, but it does need to be marked. I'll fix this in my next version. Thanks, Ryan > >> >> >>>> >>>>>>> >>>>>>> Yeah, there is only one use of data_race with that list. >>>>>>> >>>>>>> It was added in f3ebdf042df4 ("THP: avoid lock when check whether THP is in >>>>>>> deferred list"). >>>>>>> >>>>>>> Looks like that was added right in v1 of that change [1], so my best guess is >>>>>>> that it is not actually required. >>>>>>> >>>>>>> If not required, likely we should just cleanup the single user. >>>>>>> >>>>>>> [1] >>>>>>> https://lore.kernel.org/linux-mm/20230417075643.3287513-2-fengwei.yin@intel.com/ >>>>>> >>>>>> Do you have any recollection of why you added the data_race() markup? >>>>> >>>>> Per my understanding, this is used to mark that the code accesses >>>>> folio->_deferred_list without lock intentionally, while >>>>> folio->_deferred_list may be changed in parallel.  IIUC, this is what >>>>> data_race() is used for.  Or, my understanding is wrong? >>>> Yes. This is my understanding also. >>> >>> Why don't we have a data_race() in deferred_split_folio() then, before taking >>> the lock? >>> >>> It's used a bit inconsistently here. >>> >>