From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8C933C4167B for ; Thu, 7 Dec 2023 01:31:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 22B016B00AB; Wed, 6 Dec 2023 20:31:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1DB176B00AC; Wed, 6 Dec 2023 20:31:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0CA9D6B00AD; Wed, 6 Dec 2023 20:31:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id EEF346B00AB for ; Wed, 6 Dec 2023 20:31:00 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id BFDAD12033D for ; Thu, 7 Dec 2023 01:31:00 +0000 (UTC) X-FDA: 81538293480.24.1FB94EB Received: from mail-qt1-f173.google.com (mail-qt1-f173.google.com [209.85.160.173]) by imf12.hostedemail.com (Postfix) with ESMTP id 05CC240007 for ; Thu, 7 Dec 2023 01:30:58 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=TS3jFVJq; spf=pass (imf12.hostedemail.com: domain of yuzhao@google.com designates 209.85.160.173 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701912659; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=O2zo2U+sNI4+OgaTjf0Valql9hDrUGd6ADOT6fl9bdQ=; b=heKug8vWqcMcVotx7+qmwj39lp5H4gbBx3T8pGGBd/3bA21zK08KbJUD9FeCkZt30q8j95 wBtM4b5sy+zDJpjXjF/ANtgbchZbZZL2VA3VpaySVHmPbraGQX6bGybE0lV6xPnK423Smh 7KqA2opppzkuV+mQVXckndeTa1IdNJc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701912659; a=rsa-sha256; cv=none; b=lyBNx1sS46EqDPUn8MnZwMz0uqb3nDvFu54cg+ZOzPt2EqdD1pUZk3BtIcaXTSW7T0h+XH Pc2GFghkxk9ZFhJFrRUPL/Kwml6+Q8VmvJA/vhB/QYwCRFtxOj6ZRz3LNMPqqsjfSXw9B/ 4yic4QUJhiXrMl/aLfN9c/HpriM1YqQ= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=TS3jFVJq; spf=pass (imf12.hostedemail.com: domain of yuzhao@google.com designates 209.85.160.173 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-qt1-f173.google.com with SMTP id d75a77b69052e-423f28ae2d0so78761cf.1 for ; Wed, 06 Dec 2023 17:30:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1701912658; x=1702517458; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=O2zo2U+sNI4+OgaTjf0Valql9hDrUGd6ADOT6fl9bdQ=; b=TS3jFVJqVS4TfDTRn81M6qhj1E0dPi92RCX2KMVwXRXf7Bwq7k2Hj4iFd1fE/kc9em ah/snqwUV6J29YvwhBcMDg4OHcnM8FFis0pHU2pH7CU6u/QmG3+4LRSlh5q2NyPTHc8I cANiPTHwXjqwjNh/VfsiU/5VBqwGqvKYY/Qtgk+QdJqUCqCa/Mgi1aRjQpYdAQ/Apzvm pTkAN+ozkjWjmyi9cpwDy8N8xSCBv7IbFEklAMKSY8X7SLdTPQc/XWaMLd8Umf5puqWE 7tPvjbtlfAlvh6aThrFF0peQinNs1zysYkcNXg35rI79RFkB279G6Y9b0Csy4nIwUQnR iFCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701912658; x=1702517458; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=O2zo2U+sNI4+OgaTjf0Valql9hDrUGd6ADOT6fl9bdQ=; b=QHfIhshB+1+SkoOjTKxWEbri28TNGPZ28l0oLFpaOjAE4JXRN6X95XcjX/DRso842O LwBIZ72zaAurDhaI0RNZd0UhM3fIc06a3q2uyMnpWCIH8m7vA9iw11FcYrQcBRQKAj/p GKN7vxn3OQmdPDOFM1GnHS1cfMiYvIK0h27lcNQaLqahqQYD6Z4kcoen+yvdct8aBGNI BnL6C1KsGr8/tdizkM2AhIkFLN/+QYXfAaxEr8Ru4Zavjorcsj5gdRYtSEPKQsFajB+x 2lG9KxyM2AQcBqI72QxgZB2fOSQA/XjysWI3GQgqACV+0hwZzMA8SeycSwn26cyGqknI 5NAQ== X-Gm-Message-State: AOJu0YxGp+J7b5SIAjepbln7mcilWIqFtfJzRoe1e6fu/POYXp807CMz rIk6PX7qlmRRGLLbvahJOpI7YxZ2MZVjViulPKfSmA== X-Google-Smtp-Source: AGHT+IHL8E/hDTudTK8Hyu22QQModhzXxymgzYmfFYZyb8GWrpnpPUX7eVad5IWdC7skobooAJrUoQ17vEC3gBx1osc= X-Received: by 2002:ac8:4255:0:b0:425:75cf:90e9 with SMTP id r21-20020ac84255000000b0042575cf90e9mr382422qtm.22.1701912657893; Wed, 06 Dec 2023 17:30:57 -0800 (PST) MIME-Version: 1.0 References: <951fb7edab535cf522def4f5f2613947ed7b7d28.1701853894.git.henry.hj@antgroup.com> In-Reply-To: <951fb7edab535cf522def4f5f2613947ed7b7d28.1701853894.git.henry.hj@antgroup.com> From: Yu Zhao Date: Wed, 6 Dec 2023 18:30:20 -0700 Message-ID: Subject: Re: [RFC v2] mm: Multi-Gen LRU: fix use mm/page_idle/bitmap To: Henry Huang Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, =?UTF-8?B?6LCI6Ym06ZSL?= , =?UTF-8?B?5pyx6L6JKOiMtuawtCk=?= , akpm@linux-foundation.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: hbny7idzwdg6cqh3xdhsgwisucc7noe4 X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 05CC240007 X-Rspam-User: X-HE-Tag: 1701912658-264026 X-HE-Meta: U2FsdGVkX19/IN3Jcwe8CNHnkpRowFmrYB3VlOpxCOua/iGOUpGQfClOvL55F7/1PbcnggzYHh5eQPvlx8FX4QwMB0lvnYASMEYpI6rT+gYjbFkOHLIxgGtEru8ultQSSDtFNacVJOopMHSPMOJupUTYkX0Lm/us9j1X59C7nYxdu3rqejtRxmlxzxWUp/PFxF4taovicIzjSMm/Uv8XShgAQEWn9wPrkY4x/J212Ejc1G2kYLEfJplIuBhyHGnZuRjwn5FoF7ChtD4xlXcNGOa1WPaCS2GtBOh+6HxRTcnmjoqm+eDfdUjS3lE1PHvbV1ca8kqzr5P+UaDEze5bXgT31GqoBHjJhGyBxKSygnMxqjzDp7Uq0WWUTAqwTRhPGXdS0YP+wSxHG8+O2ArkTcODUJkoRLQz0NgAkpYqt/wOgU3HEmwQMvVDRU2597nLA9+hF0N04xk9O3srN9n4arUauCrQZmKyHywwXOcf9wXWjV0yF1JGbMibttpVrkYKNn06nDodLIig9s7TBIP4J9IqCTPlVNvIwq3+pTsuG6x2BlfgCxVvC6jE+jsiLkeLMr/M3tlF8vk8dt5KAU0S5mVo4WH/cJLy2D20xW+F8F2nSV/eKKhcLtOUu/DgvDSEV6vkftb4sBQHYeDyJhqu4sYPM4UWMeNdbQ1zQpSK6iZImI6TmTJMfraM43mArsTvL9MZitsYM+1n7Gn9LI10NtkENGqubs+xwl62verBb4iTkycBpS4N8tUrI4UAyt+owzR/bNoUVaBx/xGbx1zwmacfQhcgLACGMFz6+GgwOGPi2FYdS96IoQJnPJ55gyBt3ecRYlHJ90YSY5KyKa4FSTyTsrhLjQH4v4OCs7wwA4wLCwoZLP/wp/b8RYBuPTxjfXv/Tjt+EYyx/xxcgzF+bODEq7AXlk31gcN35wAMg2sAgNaOCT3qqVVm5UjS/MQhdvLyknX2vhZ47RrT4LA q6TcH5v0 RqsRYHnuFWHfdYUT5bAZ9mImxPT1QLnSrVv81ghLHtztD4nc9b717wmkd4JThuGLryDoIPP0TsMLC3Inevawu8+q/q2F144dgZtrB1BTT/5gYKqOXyM5P+25ZpFMnvCflS5gn65tI5TnRw0xv2hqaWMYZdhXWA3YkCN0FZ2S3c54KUiBQXJ/3pTlHbCOoKIpe+9ggfSG9yZen4bTRN9qgbMZIzQGnDIWgpv5TUONaXaKbeIIpJP1zbYRQ4AWHpwtloyIcqGDXAaofBS6v3n/JUMXamoS5XAFNxRaML9MLvgtRpZMb57cB8GZM3HebWgStMw1FGM/nLfwXncvu/4GbRbvkWcUX8aFSzYhs6ZZJN3uoaalCixHZlPAu/BtR3MD5QlaWQ++8C8lwslpUCpV7ITwcs/AusfMmdoHzIn2cINB0M0rH99NeBx1DlyHwY80yOqM8 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Dec 6, 2023 at 5:51=E2=80=AFAM Henry Huang = wrote: > > Multi-Gen LRU page-table walker clears pte young flag, but it doesn't > clear page idle flag. When we use /sys/kernel/mm/page_idle/bitmap to chec= k > whether one page is accessed, it would tell us this page is idle, > but actually this page has been accessed. > > For those unmapped filecache pages, page idle flag would not been > cleared in folio_mark_accessed if Multi-Gen LRU is enabled. > So we couln't use /sys/kernel/mm/page_idle/bitmap to check whether > a filecache page is read or written. > > What's more, /sys/kernel/mm/page_idle/bitmap also clears pte young flag. > If one page is accessed, it would set page young flag. Multi-Gen LRU > page-table walker should check both page&pte young flags. > > how-to-reproduce-problem > > idle_page_track > a tools to track process accessed memory during a specific time > usage > idle_page_track $pid $time > how-it-works > 1. scan process vma from /proc/$pid/maps > 2. vfn --> pfn from /proc/$pid/pagemap > 3. write /sys/kernel/mm/page_idle/bitmap to > mark phy page idle flag and clear pte young flag > 4. sleep $time > 5. read /sys/kernel/mm/page_idle/bitmap to > test_and_clear pte young flag and > return whether phy page is accessed > > test ---- test program > > #include > #include > #include > #include > #include > #include > #include > > int main(int argc, const char *argv[]) > { > char *buf =3D NULL; > char pipe_info[4096]; > int n; > int fd =3D -1; > > buf =3D malloc(1024*1024*1024UL); > memset(buf, 0, 1024*1024*1024UL); > fd =3D open("access.pipe", O_RDONLY); > if (fd < 0) > goto out; > while (1) { > n =3D read(fd, pipe_info, sizeof(pipe_info)); > if (!n) { > sleep(1); > continue; > } else if (n < 0) { > break; > } > memset(buf, 0, 1024*1024*1024UL); > puts("finish access"); > } > out: > if (fd >=3D0) > close(fd); > if (buf) > free(buf); > > return 0; > } > > prepare: > mkfifo access.pipe > ./test > ps -ef | grep test > root 4106 3148 8 06:47 pts/0 00:00:01 ./test > > We use /sys/kernel/debug/lru_gen to simulate mglru page-table scan. > > case 1: mglru walker break page_idle > ./idle_page_track 4106 60 & > sleep 5; echo 1 > access.pipe > sleep 5; echo '+ 8 0 6 1 1' > /sys/kernel/debug/lru_gen > > the output of idle_page_track is: > Est(s) Ref(MB) > 64.822 1.00 > only found 1MB were accessed during 64.822s, but actually 1024MB were > accessed. > > case 2: page_idle break mglru walker > echo 1 > access.pipe > ./idle_page_track 4106 10 > echo '+ 8 0 7 1 1' > /sys/kernel/debug/lru_gen > lru gen status: > memcg 8 /user.slice > node 0 > 5 772458 1065 9735 > 6 737435 262244 72 > 7 538053 1184 632 > 8 59404 6422 0 > almost pages should be in max_seq-1 queue, but actually not. > > Signed-off-by: Henry Huang It's never intended for MGLRU to support page_idle/bitmap or PG_idle/young because: 1. page_idle/bitmap isn't a capable interface at all -- yes, Google proposed the idea [1], but we don't really use it anymore because of its poor scalability. 2. PG_idle/young, being a boolean value, has poor granularity. If anyone must use page_idle/bitmap for some specific reason, I'd recommend exporting generation numbers instead. [1] https://lore.kernel.org/cover.1426706637.git.vdavydov@parallels.com/