From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C63CC5ACB3 for ; Tue, 21 Nov 2023 08:33:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A84E96B0479; Tue, 21 Nov 2023 03:33:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A365A6B047A; Tue, 21 Nov 2023 03:33:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8FC376B047B; Tue, 21 Nov 2023 03:33:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 7EC9D6B0479 for ; Tue, 21 Nov 2023 03:33:07 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 5C9ACA0904 for ; Tue, 21 Nov 2023 08:33:07 +0000 (UTC) X-FDA: 81481296414.11.EE682B7 Received: from mail-lj1-f176.google.com (mail-lj1-f176.google.com [209.85.208.176]) by imf11.hostedemail.com (Postfix) with ESMTP id 82E8E4000C for ; Tue, 21 Nov 2023 08:33:05 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=DTSFnJwb; spf=pass (imf11.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.176 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700555585; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=kktn2kCkgc4KLPqSGzjESgPBOL1CAp2D1XThTJuldgc=; b=0VZL2cy+bMPuInuYuldG59BXwBSKCjlTOA5BIVaQeGaOJaxM9neNL2DyNIT4ZoThQgEq0/ rmpHX3Mg9/G5jdOLUcRGTyj6wKr/rgznsRusN+eTT3QAd7qpdlab2Wl9034DwT52Zm+wfL twfSoBRDKp13Rz1a2MtoljjCABEAYqY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700555585; a=rsa-sha256; cv=none; b=1d+7r+XryLusVuLZ4wynuI+NWOGgWqoiHxgeCU5kcH2nvzRLu8BIb1GdUalJ2xQfSXJCGj OJFuDQMWy5U82+XLz7FPjgYHhTC37l8BGOA9tX8fxnoes/Zj57fhSWoz9X3qr/YVU/qseF owSxYozTRdnfYylo+VLEVMCHV7CnYBY= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=DTSFnJwb; spf=pass (imf11.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.176 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-lj1-f176.google.com with SMTP id 38308e7fff4ca-2c5039d4e88so68238061fa.3 for ; Tue, 21 Nov 2023 00:33:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1700555583; x=1701160383; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=kktn2kCkgc4KLPqSGzjESgPBOL1CAp2D1XThTJuldgc=; b=DTSFnJwbHk8FwBuoTH8kr93J48QRsmPA+G3ePit4uypGZse1uR16BW3k7KDlJG2zmA ErQHSlkKM4R4gqZmmcinFgwUG07Y4miTugYCyeub/hwmRoNCyziOqMz2x1xjDd5X4IXr iYbke+oHKEE/ZHqJzMpiCW3Ky7Z/hj511JqbUfwdZJHCgUOeVkI9Mk4EsfSC6IErTG0O GawSjqyn7Fb1B8msrwKXW/x8QpEIR1uJ5DfavYLlikE5U5UR6RL14yvi2hCtEX48HRZ5 h80Eho6iZ9V39inj7ox4QCQAxE+bIbOKEBnIbNoGpyywuKPXUoRfrYRcNXWTh8jAvHhS qHpg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700555583; x=1701160383; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kktn2kCkgc4KLPqSGzjESgPBOL1CAp2D1XThTJuldgc=; b=b3n3zxq/xCZD2EGJd3MwI1zSVsmDWMwl2tbN388CYasA17cigOLifv3X7NsadHkv1i AM/GTzg4DmgqQSuT2/aKPhC4h3OEoudE7q/ER8Cz6pHd1Ckkl0uHCr+vUG6DrPG/rwa/ ukek98iS6cWJM8n0aCtqX2VNm8vptEEdfAm/g1oncK+1daNdG03SNZHKS6PUJSO2iO+Q W4H4a0uZygI4/HM5ZmvZ9ZffZVgSBTT8gNWC8Q8dIz3Ci11s5eq2wOzhPasyZq/YrDKj yuY6YY6qxiXZRzP7C9Z/0G9JP5Hn7+pdP9ho2QdTi5J3TvVnf5oxyJvNLDAiMIab8OMf w+nQ== X-Gm-Message-State: AOJu0Yx7PEXV/hwAZDILsMW0N5L4ffxvXExLJCJ3MFSOor+/GCRk2k3O qJY6F5M0aoPt+yklJqhuSZvecpVMxjl4YWfHMJI= X-Google-Smtp-Source: AGHT+IF/qOJGJTO2jraPYdjK5ZA8IM2A5XciQUwXIlfRnuBEqvG83GhqFgMIBjpolAW/GjGsEzBvL/gKHJQr1rZxMSw= X-Received: by 2002:a2e:6e05:0:b0:2c5:32b:28ea with SMTP id j5-20020a2e6e05000000b002c5032b28eamr6351919ljc.32.1700555583359; Tue, 21 Nov 2023 00:33:03 -0800 (PST) MIME-Version: 1.0 References: <20231119194740.94101-1-ryncsn@gmail.com> <20231119194740.94101-6-ryncsn@gmail.com> In-Reply-To: From: Kairui Song Date: Tue, 21 Nov 2023 16:32:45 +0800 Message-ID: Subject: Re: [PATCH 05/24] mm/swap: move readahead policy checking into swapin_readahead To: Chris Li Cc: linux-mm@kvack.org, Andrew Morton , "Huang, Ying" , David Hildenbrand , Hugh Dickins , Johannes Weiner , Matthew Wilcox , Michal Hocko , linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 82E8E4000C X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: e1sui5qkc9z8yq47zap1nrbebiaw6u6w X-HE-Tag: 1700555585-764122 X-HE-Meta: U2FsdGVkX18sSbT8G87u5XLe2FYwRPNR5aOoE+9veCVnS8sySfge57SJWXvPJXDC5tLWVV9qNFIm/vwSgk3oOZbyQtpm9SSzuCi9Nceub79dCtlFH5fPrOoLAzP+N9j581FsoB2jh4qrsrsNB83PpRRFhD0KfcvFrICvB+Q4IN0KWIwcEZgyT6ItsQ0IH8s554gYKtGKpoOR7Lqq39a8ebRHxisfDLkAiJgicE1R6DYD+laNdfWiTbaiR1VCEJPKuxlwAWHcJkGtU/33BJqvX3tavk1gPUgL0AOVE7RIqpgryLZHvvksMZKF7FdteBwQRxw7M9FWyzo0uF/5joj7ro90IbxbbsCxUfFoyFJmQM1XdWroRXTuPuFdOP98Z0i7cKobijJuGZvoZ88JMo5OMVJ62+vJ+OfTvGTxEUHEsWcj2rMfjYQ22ezs1LcQiqpBv0LfFAT5WxV8i9xg7FvSI+jF24Z8wNIv5DGZWDe1bVMSBoMNiNFiRAc6JfPPN6hJbpCxNkY9HQruVRkwBxryKdxjj5m31zkDDovbJsWnlw+vO5mRSiZYbpM2VEvGngeH5CahV81PQGFux5gWul08o84DyZtBag7+VA3TBovv0dUFsw7ucSGPv4rtGZcGOAtxeYqYScfkmjoREaMoKAOSeFsr5OMw9vv1UwYkNuDTam3yEkgdhmJEapvcv/WBXDjXeicP/F7X8CLJHNKVHZrOOU4LIIj8G6SWhOXVffr2ZgjO3CiOnkd6jansabTfj4H59fqcgolD94tYci21yiKgJd5HAszitBGGYPeZmZngZa+vKKuuEB/RlukG2lJxokKQLa2fYJsk7fPqk80d0cKJOTeqN4ZkdDpYn1ONVL3YRgdE4nCA2cCwsCoourL+ukfkBfRAiIl0JbA/HCXsm9r7GMoNGSyJ4ox5CFUNRdSVWWAyHblfYIETfAcx3UTfvcI3RkLFG1rFGPlJ66Tv3vF hRA7QDuC I5hO9Gab7cc68SxarFbs+3SefY7uSAcvvCMaM8XZOiTm4U/bvpC5wqO1/j3w15Bc47caKZvrDLLpGvBNbf3qSSluJnz5/yV1/hmKjYRjV+Ys7hierbMrsTHu2gAWqUjoF+CD1HxPgMggRTkl04D5h27o2G5sDKV6gmObHO8nu+aRZHucZ4ohlO1Xa+ZajJs8ztHdUNtzES2mVB65YaOLXHRfkDAjp6klzDP5uBsngJFZULp7Z5dDleElD0e1him3q3ZkWG7K60w4QsAwR9mNuhZxge6avgTbZHBSw5tH+aMHHuuPD0zryUwi5eHQrAsQFMkEGatNTlnHecT5Ty5piNpi4ZfwWYqdL9WHgRPhsnszAu9cUoDDS8seyAHkShZGI6XSvUZKUihPyxXAJVNesYp0msgk2K5dG1XUpwctMlJVRZEtjeT2rjwyWFaxWyM2WG1sdRJmKGQouE1z0ArnIH++fKWlIRF+rR/vw3pLU5W/upMC+RwzEuk7kzFH16N2mk8UZrv+q64XWId5EfGHWm8XCfeCtLANEu1ME X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Chris Li =E4=BA=8E2023=E5=B9=B411=E6=9C=8821=E6=97=A5= =E5=91=A8=E4=BA=8C 15:41=E5=86=99=E9=81=93=EF=BC=9A > > On Mon, Nov 20, 2023 at 10:35=E2=80=AFPM Kairui Song w= rote: > > > > Chris Li =E4=BA=8E2023=E5=B9=B411=E6=9C=8821=E6=97= =A5=E5=91=A8=E4=BA=8C 14:18=E5=86=99=E9=81=93=EF=BC=9A > > > > > > On Sun, Nov 19, 2023 at 11:48=E2=80=AFAM Kairui Song wrote: > > > > > > > > From: Kairui Song > > > > > > > > This makes swapin_readahead a main entry for swapin pages, > > > > prepare for optimizations in later commits. > > > > > > > > This also makes swapoff able to make use of readahead checking > > > > based on entry. Swapping off a 10G ZRAM (lzo-rle) is faster: > > > > > > > > Before: > > > > time swapoff /dev/zram0 > > > > real 0m12.337s > > > > user 0m0.001s > > > > sys 0m12.329s > > > > > > > > After: > > > > time swapoff /dev/zram0 > > > > real 0m9.728s > > > > user 0m0.001s > > > > sys 0m9.719s > > > > > > > > And what's more, because now swapoff will also make use of no-reada= head > > > > swapin helper, this also fixed a bug for no-readahead case (eg. ZRA= M): > > > > when a process that swapped out some memory previously was moved to= a new > > > > cgroup, and the original cgroup is dead, swapoff the swap device wi= ll > > > > make the swapped in pages accounted into the process doing the swap= off > > > > instead of the new cgroup the process was moved to. > > > > > > > > This can be easily reproduced by: > > > > - Setup a ramdisk (eg. ZRAM) swap. > > > > - Create memory cgroup A, B and C. > > > > - Spawn process P1 in cgroup A and make it swap out some pages. > > > > - Move process P1 to memory cgroup B. > > > > - Destroy cgroup A. > > > > - Do a swapoff in cgroup C. > > > > - Swapped in pages is accounted into cgroup C. > > In a strange way it makes sense to charge to C. > Swap out =3D=3D free up memory. > Swap in =3D=3D consume memory. > C turn off swap, effectively this behavior will consume a lot of memory. > C gets charged, so if the C is out of memory, it will punish C. > C will not be able to continue swap in memory. The problem gets under con= trol. Yes, I think charging either C or B makes sense in their own way. To me I think current behavior is kind of counter-intuitive. Image if there are cgroup PC1, and its child cgroup CC1, CC2. If a process swapped out some memory in CC1 then moved to CC2, and CC1 is dying. On swapoff the charge will be moved out of PC1... And swapoff often happens in some unlimited admin cgroup or some cgroup for management agents. If PC1 has a memory limit, the process in it can breach the limit easily, we will see a process that never left PC1 having a much higher RSS than PC1/CC1/CC2's limit. And if there is a limit for the management agent cgroup, the agent will be OOM instead of OOM in PC1. Simply moving a process between the child cgroup of the same parent cgroup won't cause a similar issue, things get weird when swapoff is involved. And actually with multiple layers of swap, it's less risky to swapoff a device since other swap devices can catch over committed memory. Oh, and there is one more case I forgot to cover in this series: Moving a process is indeed something not happening very frequently, but a process run in cgroup then exit, and leave some shmem swapped out could be a common case. Current behavior on swapoff will move these charges out of the original parent cgroup too. So maybe a more ideal solution for swapoff is: simply always charge a dying cgroup parent cgroup? Maybe a sysctl/cmdline could be introduced to control the behavior.