From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D7C8AC54E67 for ; Wed, 27 Mar 2024 07:14:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 636FD6B0089; Wed, 27 Mar 2024 03:14:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5E8EB6B008A; Wed, 27 Mar 2024 03:14:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4AFE56B0096; Wed, 27 Mar 2024 03:14:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 3AF2D6B0089 for ; Wed, 27 Mar 2024 03:14:24 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 054BB80980 for ; Wed, 27 Mar 2024 07:14:24 +0000 (UTC) X-FDA: 81941955648.25.3D6EDBE Received: from mail-lj1-f182.google.com (mail-lj1-f182.google.com [209.85.208.182]) by imf08.hostedemail.com (Postfix) with ESMTP id 05C8F160021 for ; Wed, 27 Mar 2024 07:14:21 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="SN/wUhLr"; spf=pass (imf08.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.182 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711523662; a=rsa-sha256; cv=none; b=AY5MFbfLrb2hL+KNgucVvlMYZ+dL0xRG+VVIcGQJz6unC7yVCJoBt088sWwtNSt9MnzjrP 3EfAgirFIowcGWqVxNKSlEhu+8YK5JqE+dNJwDXFYimmtPawLkVWFAHHDMOnqmZwubnRsS crSe5Iu5fQHqTcs+nUb0IL2b/WGnNkY= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="SN/wUhLr"; spf=pass (imf08.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.182 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711523662; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=vxe26Pql1ZJJLo/Yz0MpSRJlZcF3hlpO0Bfv+hcO1oY=; b=NLLffXpC0fvZTjOU936gzb5SqOLYhfljCNeUopxRhNXZZj1FSGQ4rkc1ih1atdkdikivcG 54pl8s3zuGqA/wEAuHV1h8KVUnzAAcBIotMcW74RMLkY0YZsOJxKCYdVFQyRv5VWcnNtNc BbEFI9mfkI8TQFQDq1JfZGniTLXHcnM= Received: by mail-lj1-f182.google.com with SMTP id 38308e7fff4ca-2d23114b19dso93846021fa.3 for ; Wed, 27 Mar 2024 00:14:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1711523660; x=1712128460; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=vxe26Pql1ZJJLo/Yz0MpSRJlZcF3hlpO0Bfv+hcO1oY=; b=SN/wUhLrIRGUl0a+vNbWu06TpPaTMAmQgi+6IORBgFFkcHT7Jr3SzMnuvLY/1JDJW7 726j/Zcfb4qgq9lmdQnlyI1oT1SbgHSaC5rlmIcfeRA/DogA51ff6mN6JuHWoseqybuI yFvwoeqG0i1a+jgwmIVXacuT3JGupWbjHA7vA+9aYFFbM58mRjav8k+Wz83cDyXyhIfn COUQMdprM2N92P0mHTfD7rBnkxdnCcglHQFjBbXblAGnc7PjI8vmCAPb+mkNZivmvCCB w2mVmKKZra3dMRzOePqda3LM0o847dMAEXpi6pYywkwpyRsZ4Tv3O+FHtX5Seuxy1b2v /1OA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711523660; x=1712128460; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vxe26Pql1ZJJLo/Yz0MpSRJlZcF3hlpO0Bfv+hcO1oY=; b=lOtd6xLJZSFkF0XtlPPrPFjiiErRbAZuhvV+SEG4QerqrcC012z0yGWLAoZ2rHSyA/ nZX+7kqM4gyM9CEbY6p+xNamM8DXjmjrZo6Epv6NymRFpjQgo8uhpZCZM0YkwPezcqmf LBZA8gAHdT2yDXfRb+buoD+tklEs+Qeibqs4E0Mide9tzffFE1FF1GwFk94mtyIxDSyo 2zpURM7XIMVgtUVSu/pcDQfbxok+NRDCIOrok5MbLV7qnzGONZ57a5yF35K+fm/fwhDm GPHJz++9Jy+j054F7LWEoQKriOlKBN2zr/uqiRJlvvNGKLQqCWTHWtWcNCxw2REggCpM nfcw== X-Gm-Message-State: AOJu0YwqudhUUFVx1dTGrY5GSkMz4Z0pbyNXaV3nx/aG1+020u3yQs9+ RT3Xc+BW93K9tBHsHvkz6FuymGEHHEm7J+jZ4dyH/YDN2WXkOcmxXCtJ28woHA22dDjT1VIZyTC O2mzoxLk02Xrn/cYyoq97iEee0Go= X-Google-Smtp-Source: AGHT+IGn29++H1s36pYWPuwmuhYyG+ecT3VIG72udl5R+aEEWNe2A9t2E5ezL2nNvcqhReWwS0bNTHVBDikEFZRWM3I= X-Received: by 2002:a05:651c:1a09:b0:2d4:3e82:117e with SMTP id by9-20020a05651c1a0900b002d43e82117emr3461865ljb.32.1711523660182; Wed, 27 Mar 2024 00:14:20 -0700 (PDT) MIME-Version: 1.0 References: <20240326185032.72159-1-ryncsn@gmail.com> <20240326185032.72159-11-ryncsn@gmail.com> <87zfukmbwz.fsf@yhuang6-desk2.ccr.corp.intel.com> <87r0fwmar4.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: <87r0fwmar4.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Kairui Song Date: Wed, 27 Mar 2024 15:14:03 +0800 Message-ID: Subject: Re: [RFC PATCH 10/10] mm/swap: optimize synchronous swapin To: "Huang, Ying" Cc: linux-mm@kvack.org, Chris Li , Minchan Kim , Barry Song , Ryan Roberts , Yu Zhao , SeongJae Park , David Hildenbrand , Yosry Ahmed , Johannes Weiner , Matthew Wilcox , Nhat Pham , Chengming Zhou , Andrew Morton , linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 05C8F160021 X-Stat-Signature: 1rumfztrz5eiipoh3epkb4yf59ao6kfi X-Rspam-User: X-HE-Tag: 1711523661-975214 X-HE-Meta: U2FsdGVkX188kNNuXpPmGaCCtnI6X00JIsk7TcEVPtnVki/844v0UBPxWEJ8tsnhRTe55xPqzoto1SuBP1bKS8jqu73xTDG7FoRDAdOiMaFSHZr7yP4vDaPpeBU/2hOHn2aGcPFjl6jO+liJnqya9TJ69RJTEHEa4H1O97FKuePrERpm70JT5m8H07AxebXF/r84m7zY+XBbqmqxHrpPOV3k/e0FIjw4xtahEmlWkShGn9o4s9V2L5T0kunWwVrYjS3BG8gb/TojrhoqxWRzze69zel82yBmvmG8eFjM4f2CqNKPwhMebM2Layp9LhNtTBaCtwiPTzLdnnjXY7j8vGzStzVV0IT57IHCchQoOz9iMCtlUMg1dQIMKbBA2beYXh3C47waX26o63xbvYp7/FSabycasqSPov1ZE+TXUpUcjIohBEUcttpDwTEesZWlU9jD/sjlolXXoCuQchkUE+mTO7bzUt9E5VHLZbpMaKjtPzriyvfkMNYg8qOY4uJO/1zSwM1Jyutdkdu5iRHVNIlKu2VJXNLMZlKioixmiUvU15ZwMYupDSR4Di1Huyo8EEjNP59Z7LJ9FogEyUBT8YTvTuRalpkJzjSR1V5s3mXp7hdm/dnZFT4rCVtVAy1JL+bn+DelC1YKLMjzwwMnCi3fZlcvPREqFs9hdsJFIJI+libqd7Q7WhYhDcRh+wWTodPTVcfttZru5xrekSgxkEmYNTZntysr2kL5RyiorOeZwkCNQzRcSK01kycMPoHD4ag/qG4tDde39trgUKULvXDNgLNPrHx1F35l/TKnzHcJAtdpSAAyvWinOr+dJZcfjj7TWKCeFV3dDESpRpd92625+WDihWVGSlqFGBTY57rXhxutow6z04AIL8iPp88y16gs545PRDO1D7jvaiAC+sPcok1wY/3PVDnB/9hjF/5esh1G/zmE+97d1rWS3kt31X8xTt7oykeesVXcmS3 PIZyDLkq HfSD5XGF2F1mll7Vh0JIF+hXsFP/Q6MUyLkom9NYgvvVN/bCh8ZP2UqNKhAcyA70pPJ8InoCm1g/kJ+y4AhSG4WHJf4vuI8ASRjgpouJoTOZ8Qjg7UNkXNCw1oKZehx913njX1Ly1aube7lhN+tSRO580vrBrr6eulx3/jQKn+O83+nrqmsdAEejLvC/ycP+RV4V6xiHOJRaKJKNHkFv1P/j7DR0oiFjGAL1xDxBYX9iUkAvfda4CkHFBrsCxLrl/7mfrNETU6ESzsjpc5w33KkJXAtdrFU3DhROStY8DEFfeFn4WnSXVc3/g19mvB7vKNH37YA/RU4+OQHDX4g5NV8kHj9RhXRtml7HC5qFWmLuskpD5XnUXuQj/B0l/QWP4id0V6VLlmT/SlXhCD7ZPEhhUVyPwaSQkCdB8jMQiS3Qvpgvo+4zvJB5ehk5CMkdhaND5MfQIvyd4OLEo2m4RKz1mFGdLHFekDhx2375KzTKkf+s= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Mar 27, 2024 at 2:49=E2=80=AFPM Huang, Ying = wrote: > > Kairui Song writes: > > > On Wed, Mar 27, 2024 at 2:24=E2=80=AFPM Huang, Ying wrote: > >> > >> Kairui Song writes: > >> > >> > From: Kairui Song > >> > > >> > Interestingly the major performance overhead of synchronous is actua= lly > >> > from the workingset nodes update, that's because synchronous swap in > >> > >> If it's the major overhead, why not make it the first optimization? > > > > This performance issue became much more obvious after doing other > > optimizations, and other optimizations are for general swapin not only > > for synchronous swapin, that's also how I optimized things step by > > step, so I kept my patch order... > > > > And it is easier to do this after Patch 8/10 which introduces the new > > interface for swap cache. > > > >> > >> > keeps adding single folios into a xa_node, making the node no longer > >> > a shadow node and have to be removed from shadow_nodes, then remove > >> > the folio very shortly and making the node a shadow node again, > >> > so it has to add back to the shadow_nodes. > >> > >> The folio is removed only if should_try_to_free_swap() returns true? > >> > >> > Mark synchronous swapin folio with a special bit in swap entry embed= ded > >> > in folio->swap, as we still have some usable bits there. Skip workin= gset > >> > node update on insertion of such folio because it will be removed ve= ry > >> > quickly, and will trigger the update ensuring the workingset info is > >> > eventual consensus. > >> > >> Is this safe? Is it possible for the shadow node to be reclaimed afte= r > >> the folio are added into node and before being removed? > > > > If a xa node contains any non-shadow entry, it can't be reclaimed, > > shadow_lru_isolate will check and skip such nodes in case of race. > > In shadow_lru_isolate(), > > /* > * The nodes should only contain one or more shadow entries, > * no pages, so we expect to be able to remove them all and > * delete and free the empty node afterwards. > */ > if (WARN_ON_ONCE(!node->nr_values)) > goto out_invalid; > if (WARN_ON_ONCE(node->count !=3D node->nr_values)) > goto out_invalid; > > So, this isn't considered normal and will cause warning now. Yes, I added an exception in this patch: - if (WARN_ON_ONCE(node->count !=3D node->nr_values)) + if (WARN_ON_ONCE(node->count !=3D node->nr_values && mapping->host !=3D NULL)) The code is not a good final solution, but the idea might not be that bad, list_lru provides many operations like LRU_ROTATE, we can even lazy remove all the nodes as a general optimization, or add a threshold for adding/removing a node from LRU. > > >> > >> If so, we may consider some other methods. Make shadow_nodes per-cpu? > > > > That's also an alternative solution if there are other risks. > > This appears a general optimization and more clean. I'm not sure if synchronization between CPUs will make more burden, because shadow nodes are globally shared, one node can be referenced by multiple CPUs, I can have a try to see if this is doable. Maybe a per-cpu batch is better but synchronization might still be an issue.