From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6B5A3C5475B for ; Tue, 20 Feb 2024 16:32:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E58B26B0071; Tue, 20 Feb 2024 11:32:34 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DE0696B0072; Tue, 20 Feb 2024 11:32:34 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C33446B0074; Tue, 20 Feb 2024 11:32:34 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id A86F46B0071 for ; Tue, 20 Feb 2024 11:32:34 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 424A41C0804 for ; Tue, 20 Feb 2024 16:32:34 +0000 (UTC) X-FDA: 81812725428.30.732415A Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf15.hostedemail.com (Postfix) with ESMTP id 41EC6A0024 for ; Tue, 20 Feb 2024 16:32:32 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=CmnP2qmD; spf=pass (imf15.hostedemail.com: domain of chrisl@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chrisl@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1708446752; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ls08yq8nhrYpi2Pv1Jz+6L3a4Qhn15l5sZlXtEqhKbE=; b=xbIoHBrcX88HWKMidQ354WauwiW+WaFZn6zlugWDMC69S49tbMjcqXYCM42hivlItBO8vQ Q/PqIm9XT0bF5WMtXXtyr+F9x8ZnCAGqGIF0Z7wXB+KotrXanUrdtIq2e8VJIuBRNuSZZK FJMK324TYGgIUoJPJ6p4i1PIqdUQ3AY= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=CmnP2qmD; spf=pass (imf15.hostedemail.com: domain of chrisl@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chrisl@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708446752; a=rsa-sha256; cv=none; b=qVdG30uNDNxtzMoko/kROcZlAXLX/Njgct8+Er18roYGEcZg+5XiK5xXMZLofEI1h+XDds bGly09aqhaqmCns+LwZGa7KjgVAKHuEuOeMnSomds2VOFv7QrG/1+rLVP7Qjpzfwy2orOB u3X1GDBGhehq05ath7vwKiULPJ85OSo= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 1B64B611E2 for ; Tue, 20 Feb 2024 16:32:31 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5FD71C433B2 for ; Tue, 20 Feb 2024 16:32:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1708446750; bh=N355tVdPSOKMObJsCfJAPqCnsKJxMxvzjEgXaDCSUXw=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=CmnP2qmD/M9yEkgydZZU3v+WxaSDjI+CJo+RQ5FVKERrUziDft1RI0x52ykfTUBOn WSKpPyDCsVtJ4RiDG9sYGTUb8v6FNZmHEyn1rJX0pdBb4CD2jw6FfgrlutN5qA+lVu qXInCRbgKUGGj+d3hCrgIfBUEuSg3v2lDHaUoOeMhy5/LmjUWLzAbe4SOFl2WLJh4u oiUTKmm4IXeGIJYzlz1SoOuPdibZTVLSCXlk8rMgfBAl/EwjmrNRoe/JLlozAUA+hK l7y/yX86cPsCdoqgJLiGkFnLhk7Dczd/HAz7uBQ8/UMdTcgt6d1ok905KlP+M7TE0z Z51worHqqkYdg== Received: by mail-il1-f175.google.com with SMTP id e9e14a558f8ab-365220a2c3dso14596755ab.0 for ; Tue, 20 Feb 2024 08:32:30 -0800 (PST) X-Forwarded-Encrypted: i=1; AJvYcCXkqgPw75PHcZNCMvnlcq/OB9EFR14UswzijzyUwuMSkkwY7RvRuBkrN1BIOPQVWlwnNbQpc7jzsSl9bamq31O5AVA= X-Gm-Message-State: AOJu0YzupPBIZhu2QQW0BhtQSIoeZ6g/eZJ/qwPv2ZpeLpg/rdxN/2PT WrF89xwUe5LX0vVA5oRnjj3u/J2u6SXjUL5ofkVEGxLPxscQ33Nny8cZpnLB1XsPwQm55DiI44r VTfJ86ME15tOPOCBpoaEa1XdLz/VYUfqyFnEO X-Google-Smtp-Source: AGHT+IHoX5iSsVEGfeRD4cVDodXDxIVSA1k9wV22OCHIw4jn0j9xiCArJfC2r0sinIV6wprQrsUc1IoQGOjKhC7E+Mc= X-Received: by 2002:a05:6e02:154c:b0:364:216e:d1dc with SMTP id j12-20020a056e02154c00b00364216ed1dcmr17350823ilu.22.1708446745860; Tue, 20 Feb 2024 08:32:25 -0800 (PST) MIME-Version: 1.0 References: <20240219082040.7495-1-ryncsn@gmail.com> <20240219173147.3f4b50b7c9ae554008f50b66@linux-foundation.org> In-Reply-To: From: Chris Li Date: Tue, 20 Feb 2024 08:32:13 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v4] mm/swap: fix race when skipping swapcache To: Kairui Song Cc: Barry Song <21cnbao@gmail.com>, Andrew Morton , linux-mm@kvack.org, "Huang, Ying" , Minchan Kim , Barry Song , Yu Zhao , SeongJae Park , David Hildenbrand , Hugh Dickins , Johannes Weiner , Matthew Wilcox , Michal Hocko , Yosry Ahmed , Aaron Lu , stable@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 41EC6A0024 X-Rspam-User: X-Stat-Signature: p13g7ttd1d617ictk8p3bzbj7srye4n9 X-Rspamd-Server: rspam01 X-HE-Tag: 1708446752-230053 X-HE-Meta: U2FsdGVkX1/14s29+qtZOrgg8YMnAak9goat/HRIs9JuB/6OmyfIWVJSiO+Kw2hPzvoM49vKa+Uid0sgUi9eWoN8hIQ5w/267z7rNFpjeHgc+4eEllEjJQqLapCc6Wf7kzj0qH5nxyzaDfLuglhdddcPvXjXlPp2EIYNWg8v5E+nGTzT0pSWlqbpk88U8X7hlSrFIq3PIDB21XVjUe0OmUKGLMTDqgyKk76EUnEw2QWTmX+eMexOf992kvYKLJgYoCFCCj5NGmvpzKX/AeRyGifp2zRW3+67de/I4Kv2zMRVJCrjnnW+sZ1c9LQOGuQafwf0t0HSo4Ahyy6h307wiXndzVpsoVEPDMcUDYfxtGBCedDdT71aZpy0cKdkoYOS6ffqVPginSJTfSmnho+9m4AwhGHwRha4L6ukuk2kLskDvyt9ikR13jtzDFauXH+IAmLrwMLPzp+OCFQcwmoDmrDqB4QvOytuySU9yBaGWm5xfB1lcL7Ve1GrHwVQ3X7tzi9dfdfETADBmbqwTVUTgwcKYo0QSinxDuGD8iGDe6jqwXOMoo3KxcVkFXsLXwdc6+B8slNrTblnEa8YecILG3mg092nSUlXl/e0AgRdNB6WURmJR2TJNgr3kj6/+k9X6OOYDFePQfXCDDLbkbHwS17i3s7dKRZAA1c+K+cqtgOJgsg1YnoTtQc90ansLOTUmmIK4+jpgjkITUvCfQjms0DoXOSVoUrjv9l85pq8NRVSQlGG9jtUvX19KunvI43r7MN+qa4QRGnTocNklxEE3H5qltr1ea0pokmBth8hNJbTAZVp5fiISJEXpVF+fS+fW+6kvKUtE7Ccm15339GwDZxe8aoKns/CwAAwfUdcJStAMsG630DshQJ7mkhhJjP9w2sOR3W0FWpc6LD0wuOZR2SVonchvY0kzBxpSDbnq+kQgFap+I+HI0QoMsDD312YgkDFchfPG4J4vlN/hur xcYlpCxS whzc2+pNaCUx8EQmNs1SlZkIHIn2CR2dnIRPYxkaVS6vWfnE7XYR2Ma+Gpj05su1DeRHVT4pbtPFrxiOQy5/RUKTW370N73gCN6qct5s1kBPC4z1Rr4/gux0JxcGwVqHbe1GqIyn21c7AR7DNpqZ61OvqKOvovDHhrdt+0mGj1UHfDUqUugHKaijyWJtaUTE8/7aASA3qjiZwfpH/3TJ8oxKBjQe6K+6hCGE0+nocc6F1Ot6gDYy2kMUVYw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Feb 19, 2024 at 8:56=E2=80=AFPM Kairui Song wrot= e: > > Hi Barry, > > > it might not be a problem for throughput. but for real-time and tail la= tency, > > this hurts. For example, this might increase dropping frames of UI whic= h > > is an important parameter to evaluate performance :-) > > > > That's a true issue, as Chris mentioned before I think we need to > think of some clever data struct to solve this more naturally in the > future, similar issue exists for cached swapin as well and it has been > there for a while. On the other hand I think maybe applications that > are extremely latency sensitive should try to avoid swap on fault? A > swapin could cause other issues like reclaim, throttled or contention > with many other things, these seem to have a higher chance than this > race. Yes, I do think the best long term solution is to have some clever data structure to solve the synchronization issue and allow racing threads to make forward progress at the same time. I have also explored some (failed) synchronization ideas, for example having the run time swap entry refcount separate from swap_map count. BTW, zswap entry->refcount behaves like that, it is separate from swap entry and manages the temporary run time usage count held by the function. However that idea has its own problem as well, it needs to have an xarray to track the swap entry run time refcount (only stored in the xarray when CPU fails to get SWAP_HAS_CACHE bit.) When we are done with page faults, we still need to look up the xarray to make sure there is no racing CPU and put the refcount into the xarray. That kind of defeats the purpose of avoiding the swap cache in the first place. We still need to do the xarray lookup in the normal path. I came to realize that, while this current fix is not perfect, (I still wish we had a better solution not pausing the racing CPU). This patch stands better than not fixing this data corruption issue and the patch remains relatively simple. Yes it has latency issues but still better than data corruption. It also doesn't stop us from coming up with better solutions later on. If we want to address the synchronization in a way not blocking other CPUs, it will likely require a much bigger change. Unless we have a better suggestion. It seems the better one among the alternatives so far. Chris > > > BTW, I wonder if ying's previous proposal - moving swapcache_prepare() > > after swap_read_folio() will further help decrease the number? > > We can move the swapcache_prepare after folio alloc or cgroup charge, > but I didn't see an observable change from statistics, for some > workload the reading is even worse. I think that's mostly due to > noise, or higher swap out rate since all raced threads will alloc an > extra folio now. Applications that have many pages swapped out due to > memory limit are already on the edge of triggering another reclaim, so > a dozen more folio alloc could just trigger that... > > And we can't move it after swap_read_folio()... That's exactly what we > want to protect. >