From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A3E52C47DD9 for ; Fri, 22 Mar 2024 02:41:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 38CFC6B0083; Thu, 21 Mar 2024 22:41:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 33CC26B0087; Thu, 21 Mar 2024 22:41:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1B7E56B0088; Thu, 21 Mar 2024 22:41:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 07B726B0083 for ; Thu, 21 Mar 2024 22:41:43 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id C414BA0BF8 for ; Fri, 22 Mar 2024 02:41:42 +0000 (UTC) X-FDA: 81923124444.10.BB21CF6 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) by imf03.hostedemail.com (Postfix) with ESMTP id B4E8720004 for ; Fri, 22 Mar 2024 02:41:40 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="b+N/bT/t"; spf=pass (imf03.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.13 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711075301; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=YHgtP0UnPT5gOOO/oAckxZJiUw1BBAm2JnA8kLwutEc=; b=l3+lMCKVHpRf06iAhEtIyy33Pbok67bQ7OAtY0ZA+JhB+xvKlfocrtR1zlcMGbpsMr4FUj 8nv9z/2s4TJs+A3u9RjTJaZDcQxoQ74+wajhqlT7lBkArAd/N5FkJ3Oz3tWM7G2v+DXvv8 jq+iYlSyxkU4QyKWpMFw+MKXC6gZmT8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711075301; a=rsa-sha256; cv=none; b=qm11ylmZunrhFvvVbZOeGJ6IgV8LIfebceJKixmaPas1h7+3zDEdajOsj0LHEVmh3iLXaS A6etSvBNucHrsvgEFiKvqLcZwqrtiERGsbXdeZqK/c6vDFtFQCJfaALoDObkEIpcCcwsJI VxOm2mFQoAuI97DbNvo5zrubx3nqvuM= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="b+N/bT/t"; spf=pass (imf03.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.13 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1711075301; x=1742611301; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=S+KJCnYbiOXWCaXB4wBl+59lZCLZQieFdwqcCOeL4J4=; b=b+N/bT/tylATqfggD8cbKSzxMX+h6Nd8VlcPV5LstzDOC60dpY/x9GBN M5DcRf61hvf4bze9JnM3IVtWVuzNTOXVz3+8icd0erdmsoF0MW7sSo0g1 Wr+UOEH7Xos/V9sbky7M/RVd3G3vCoAW2Xk8RXEwd4tCD5V/eEudoqgch c4jJp2bPCdjYq7QSDgUS7hQwhCCAZVEJNXPkiODTmt3deI6pUW2oc14Rq SwjwCYQbL+j8ktnED43dK1ot+9xKvHmNDymTomNhA7WcRnjHUKUIgMj30 rt4SjqJUbadE5KAT9j27lfm6hCPBbgZHCipXxQ925XOWCr9tj1LWNEXDC w==; X-IronPort-AV: E=McAfee;i="6600,9927,11020"; a="17258333" X-IronPort-AV: E=Sophos;i="6.07,144,1708416000"; d="scan'208";a="17258333" Received: from orviesa004.jf.intel.com ([10.64.159.144]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Mar 2024 19:41:39 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,144,1708416000"; d="scan'208";a="19437709" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orviesa004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Mar 2024 19:41:36 -0700 From: "Huang, Ying" To: Ryan Roberts , "Paul E. McKenney" Cc: Andrew Morton , David Hildenbrand , Matthew Wilcox , Gao Xiang , Yu Zhao , Yang Shi , Michal Hocko , Kefeng Wang , Barry Song <21cnbao@gmail.com>, Chris Li , , Subject: Re: [PATCH v4 4/6] mm: swap: Allow storage of all mTHP orders In-Reply-To: (Ryan Roberts's message of "Thu, 21 Mar 2024 12:21:46 +0000") References: <20240311150058.1122862-1-ryan.roberts@arm.com> <20240311150058.1122862-5-ryan.roberts@arm.com> <87jzm751n3.fsf@yhuang6-desk2.ccr.corp.intel.com> <8734skryev.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Fri, 22 Mar 2024 10:39:42 +0800 Message-ID: <87plvnq9ap.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Queue-Id: B4E8720004 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: fzqqxighrsuqizhbhqsawrwx7u1gu8gr X-HE-Tag: 1711075300-22370 X-HE-Meta: U2FsdGVkX1/RvKJR4dwCfEuHuaRZ/0dy7dhJPF7IQBJzcs03VVz3bh7JOAASsqIFSILgP2hYlX0tFhgNcWwjzSQaX9S9Ct5wt0aYptgmaXzp3gOfQpgR9yBzO/4vcS/hP+Ggohlzf/O2WpYEETQE4FHWTK1KtysUSGvKJu6gP8LJGyAvLMdLBQhxDk8V2+wziUpHzCWE0jxd8ekHBwb9IGl1pDThVdWV2LmVbd4keYeQdeBaP6YVL1IMvsU1O9j7JDvkBNXbyuZLEtwzqf1AySEOxdD4sMPySZsgZ8uBn6+pX5rAJv2ZU1Qq1JyP6JTERY4MFYlbp7UByqQjSY6r3PkwjUBXHCwyAcbhTp2tcZCS2e9KJMHAJtvlWGuUQcA0OqGfJj4rsw9QeZb70NzPfFNW3vLkivTGdr2hcPj2Xk1K/3CBFAAGb+W1mup52k+6c83bfErTZ2dV39REA7R95+UQlzYaXGMcIyrlTXTad5IF/QPKqiuV3jPDwFX3TYqwMjBz9cnYKwtHf7YUMWqQiEievklOWNsWAcYvViwu0PR7gLDXkB3vKKVrIQZ9szaZEhz754LblwPKbDhcWZc3ysUxeKNh7AsdifU7oyUHcx8guq2XqjCtvKH5qIs/VgKfXKFwY/j1juysyVVeQ1a2RhQOzmXE0MGSlEVCGooOKwPlWwtpsWmjkRSeObkHcPORtL75+exCeKjgtxxaFHlxI3SAR8FEqIFKhZq7GSK9fekH/XbafCeXgSncJGYahkh7/xHArz5EpnQR/MSGF35GRm8cJ5wTTWycuaqwoNAyDRkUfn1IDjDsFIMFhXs5DwLQZ4ysQEQxjoKomOtkFwkglt3DYC3PIkj56hBdkA/9sF5eIlaN/56RjsD+78aLiGjhhSfG7tAf+nIjsazjsa9vpprJpHNJtqlg5GRws993JCjPjWLGVHj1vPxce9SiFx/uyQccQuMCYVIwr0Gf/UK 5YlXJpl6 KLAukb07MZOGRDmgXYexWL60BxFSaccczLmh6qwalWJycO/ZhpIb8DNGVgACvBQhJTNJ2WcZjrINL6S4XS3pwmAp+XBrd+0o4DVP9YzadfZdh321+sBFDqhzqoYrYMhMqBL7+qiW6pKn2A+8RbIq8YeH0Jng+0GKaT6UJSci44P778Bi6htVGlzNZDOEmWuw0DB1EuaVW3xJ5INQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Ryan Roberts writes: > On 21/03/2024 04:39, Huang, Ying wrote: >> Ryan Roberts writes: >> >>> Hi Huang, Ying, >>> >>> >>> On 12/03/2024 07:51, Huang, Ying wrote: >>>> Ryan Roberts writes: >>> [...] >>> >>>>> @@ -905,17 +961,18 @@ static int scan_swap_map_slots(struct swap_info_struct *si, >>>>> } >>>>> >>>>> if (si->swap_map[offset]) { >>>>> + VM_WARN_ON(order > 0); >>>>> unlock_cluster(ci); >>>>> if (!n_ret) >>>>> goto scan; >>>>> else >>>>> goto done; >>>>> } >>>>> - WRITE_ONCE(si->swap_map[offset], usage); >>>>> - inc_cluster_info_page(si, si->cluster_info, offset); >>>>> + memset(si->swap_map + offset, usage, nr_pages); >>>> >>>> Add barrier() here corresponds to original WRITE_ONCE()? >>>> unlock_cluster(ci) may be NOP for some swap devices. >>> >>> Looking at this a bit more closely, I'm not sure this is needed. Even if there >>> is no cluster, the swap_info is still locked, so unlocking that will act as a >>> barrier. There are a number of other callsites that memset(si->swap_map) without >>> an explicit barrier and with the swap_info locked. >>> >>> Looking at the original commit that added the WRITE_ONCE() it was worried about >>> a race with reading swap_map in _swap_info_get(). But that site is now annotated >>> with a data_race(), which will suppress the warning. And I don't believe there >>> are any places that read swap_map locklessly and depend upon observing ordering >>> between it and other state? So I think the si unlock is sufficient? >>> >>> I'm not planning to add barrier() here. Let me know if you disagree. >> >> swap_map[] may be read locklessly in swap_offset_available_and_locked() >> in parallel. IIUC, WRITE_ONCE() here is to make the writing take effect >> as early as possible there. > > Afraid I'm not convinced by that argument; if it's racing, it's racing - the It's not a race. > lockless side needs to be robust (it is). Adding the compiler barrier limits the > compiler's options which could lead to slower code in this path. If your > argument is that you want to reduce the window where > swap_offset_available_and_locked() could observe a free swap slot but then see > that its taken after it gets the si lock, that seems like a micro-optimization > to me, which we should avoid if we can. Yes. I think that it is a micro-optimization too. I had thought that it is a common practice to use WRITE_ONCE()/READ_ONCE() or barrier() in intentional racy data accessing to make the change available as soon as possible. But I may be wrong here. > By remnoving the WRITE_ONCE() and using memset, the lockless reader could > observe tearing though. I don't think that should cause a problem (because > everything is rechecked with under the lock), but if we want to avoid it, then > perhaps we just need to loop over WRITE_ONCE() here instead of using memset? IIUC, in practice that isn't necessary, because type of si->swap_map[] is "unsigned char". It isn't possible to tear "unsigned char". In theory, it may be better to use WRITE_ONCE() because we may change the type of si->swap_map[] at some time (who knows). I don't have a strong opinion here. >>>>> + add_cluster_info_page(si, si->cluster_info, offset, nr_pages); >>>>> unlock_cluster(ci); -- Best Regards, Huang, Ying