From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5CDA3C3DA42 for ; Wed, 10 Jul 2024 07:11:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BB4C66B008A; Wed, 10 Jul 2024 03:11:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B650C6B008C; Wed, 10 Jul 2024 03:11:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A2C586B0092; Wed, 10 Jul 2024 03:11:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 85F326B008A for ; Wed, 10 Jul 2024 03:11:45 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id D6D821416DB for ; Wed, 10 Jul 2024 07:11:44 +0000 (UTC) X-FDA: 82322972928.15.1F280C1 Received: from mail-vk1-f169.google.com (mail-vk1-f169.google.com [209.85.221.169]) by imf04.hostedemail.com (Postfix) with ESMTP id 1359B40006 for ; Wed, 10 Jul 2024 07:11:41 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="Pt3R0lc/"; spf=pass (imf04.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.169 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720595470; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=He2L5WrIWi5QoCXzzcw2qnnMDdoTPVxOHG3qIMG7HS8=; b=xApNF2c/BfIZhDgL1s1YCxlQpSxnfZeEvwtRLz18g57c8i8kF2XFgn0CFNlGl//YR34ORm yo6TfepBc2Rgkcmdz50JcXRKVcHs/QqPK+UJPtx5bzhAVsQcvQBo3Bnz5OZK7fMKXhMMqr dCiNlIh6/ryFGu4oZSR3aY7zJgt05Co= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720595470; a=rsa-sha256; cv=none; b=7ammCGheqOu9okEvGcGDCDoxzEIoyfZaxn2IFHiRKFxZmZfoaLLsiVEY8k/FheB68DSzun vtEOa8czECxkz8qK56vqZ+EzHT4RUW+7i5rJple2iopW8UkQYKqB6lfMlzs8f80NjHXv04 a21odRoMgDp/8B3H6CkTMswcu88d8go= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="Pt3R0lc/"; spf=pass (imf04.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.169 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-vk1-f169.google.com with SMTP id 71dfb90a1353d-4f2f51eb86bso2230291e0c.3 for ; Wed, 10 Jul 2024 00:11:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720595501; x=1721200301; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=He2L5WrIWi5QoCXzzcw2qnnMDdoTPVxOHG3qIMG7HS8=; b=Pt3R0lc/1Y9SOTlW8PR/bRarp67Pg/pM7ljRSgk5tWUu5O6YOgURDFeKv6HvNbkAwr pfb7s7bl2Wv1FrvGCtEg3RIRlrqvWmpVKTRJ/URalvOLKY+Yr6qhuB6b0SostLgidnDp XqDRCWzmWf5f6CjZ/jJwmkQ7Rh+ZCMsNyt9tc1wfoaIm+SrTo7z8IPXAoo0JhLkXx0ux GxTkLP8MHkQuosIu507rnLXFb9a91E8TTqV6VHW+9oxMio632Rhp0PmdW2JAImI1X+eW Sz4JLHUduDvE+ob0dkFU3hCIN2flB3QV1K1qp5mBQCarDjJkUDDULG0NyB/iKPn856Ob TOXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720595501; x=1721200301; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=He2L5WrIWi5QoCXzzcw2qnnMDdoTPVxOHG3qIMG7HS8=; b=bnS1c6/SKvXd9E7B+BcfEx5AFl/Z0Ji2fOqAo56CQNmYhuA+CiAxL5afrROnEikJmy Dmd1/sdG92KagngdqjENbVQ/s33IfCobKJ0iHyC1NF4HbAjGuk/d8jUhiSLBcjTxFZMc AHyTE35RDLOGeUDG9qXKyIh19xwfyWpGzoMo34nMyFe/DzugVGukGwD3b1D62gchddU3 Lie26zkO49HrRzIMJgkfiUUp7IhINGeGscnwFlfMZe6dDTza8k1AYqjDjNeCTJAXFWeM htHwL0Khf0lsqoI2ouZta3unn5aSb9/9vyj6i2lTPgFtEy99QXRS6vOLX6zz5mOfNxj4 jKFQ== X-Forwarded-Encrypted: i=1; AJvYcCUoj1XnQk1xjAgmA7DMo+6FAgAF+0hSbjjWmjGX6yEWi6x3tGERBBQmCOyApFRhqGYCcEWnHFmHTggmR1fTuLfxrIA= X-Gm-Message-State: AOJu0YzzNZ5QJ+0SazHH0GZ5S6OGuNtL2Q/kecFgDb835bxv0u2IdpmA P6GZLUU6ZXQqT5yZftskoIxW/iKDWm8dgXKbs6TiDPS8GPdHTc3PiPZdXeGC33Zn/DOzBsZAFzj 2/hDcWVUxpaY07EQDvFsxP+ik2r0= X-Google-Smtp-Source: AGHT+IFAjl4gZJyeHnaT6uSffenpdOL61wv9ZowgTtk/gj4Lcvq45arB9s57StgvMBPiEfJeRBRK9ObdOT9pToW8jKg= X-Received: by 2002:a05:6122:4114:b0:4f3:1e1:f10e with SMTP id 71dfb90a1353d-4f33f2293c3mr6198768e0c.9.1720595501004; Wed, 10 Jul 2024 00:11:41 -0700 (PDT) MIME-Version: 1.0 References: <20240709142312.372b20d49c6a97ecd2cd9904@linux-foundation.org> <20240710033212.36497-1-21cnbao@gmail.com> <9d77dc44-f61c-4e52-938f-c268daf0e169@redhat.com> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Wed, 10 Jul 2024 19:11:29 +1200 Message-ID: Subject: Re: [PATCH v7] mm: shrink skip folio mapped by an exiting process To: zhiguojiang Cc: David Hildenbrand , akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, opensource.kernel@vivo.com, willy@infradead.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 1359B40006 X-Stat-Signature: txnt3frxbu81zwwue4u84d9tex9ngrks X-HE-Tag: 1720595501-575407 X-HE-Meta: U2FsdGVkX1/r/pudyvMxi12Qbv2JIhiSnzjByqYocBINGMFKPJcLursGgrf71LV1TEbxuELT+k9FvPwkhRPPdh/A0PiqAhH53Eqq5DMkHiAzf5/K3NzvvrHrlXpg5cOC+3xlSbGrBpmXRZ3Vv0rWo64ta5gMZjsBVzJipLHjAU0ZXqaaEGk9dpY646XwNst+raH8VKg4T/OJGr+HY0K3zrGCo8J/xg9FHI8jku2Znd1L9iql04k4AnKw/BFCSJOCGvQSNw95rmgWBYrqAwoAqC8VO8znmui2UxRp4BAM3ydU908DMQrK7Ck/nnqi1+yy1vYYeK9Gz9FT9e5+8aIE9MqCmC2YWbv0Nfo4n15pSuRRv/V3Ni0KuZWihaZTSFaaM3MyYbZW1VC2c/D7kkd57Xo9tiUEY8LAZ8X2BTVAma4PzeCHTr+aLhN5aVjroNy5u/BDSA5Rabim5BRRSnm8Q115mqn2Ml2MNQaFP+GcYXTpJcsySvCsSy/bGPlfiEL7+sX6p0tFyK6z2S5Z2ZhhbkB2I9h01tFofyzay6cSWCjXpmMsk9iJYy1nNb/Trk73MwAb23LJtD2ZtLA/aIXz0wjXAqjplY3GDKCJwFb28oXcvY+4r1B/X3ZIR7cT337b0DGM/f3CoUh7etdopuEnzE6hm5WE8Zk7k1//FHrgI5NvNOxpbQ/EUHuvYDpj6OFkhqRCHXAbgSQ6NgZDY6EPvkY1mZSgrpwSLP+5Matc0CMTpFP8Okvspo+YYBn7kaQffe8nPGr+5dS+ez3t02xJPSqgXXJMP8vNReOULqZ2HhupGon61fWpCs1xRxeSQbcyjuBpS/jIruihDFhfeTvd+XmXBPBFW/WewkNVFKP6bxa89rbL7WHZ/Rksg2DgP8OPA6wMvSvvA+i+sIFzzQx5IhS20ajG7HLij7vBUlXTlz60ZadCcdjYI+/Wwfnr3gL8mT1AQ9HLZB5Hmz2BVT+ mAluI1TP zH1LgjWmRL1D0xT+diwOnpXDv2YX8RYJ9bEXlhwzWJJkTVmEvzna8eBeTpciGtiWd95LIAsGNHTFbR5CNLIm307hHY0dpqUaq4JJzdOkqU8KCBmGj+r1IQHeVlSSK4kB3K5r5EUuEQnwcYEskv6bG7kypyJKMYwwzp6MH2NFSCskNowRW4WqzA9p9pmKSso+byI+0LQiBVqy8ipt/3LCs+vZKSWOAl3KbrVisUV4OApKbkhQrc5SedN161+TZ/OGib1aHN7NtRW0C4MPFxGnsT0SFp6ZnO1gm4SSPVDLsCDSCcvjiQob8ptp81MQtqHkBvyGk8Z/9PJogsggYVW5C16d2eThsqpRFHHEUk1AEwI+5K+OBibQnZZkaVPxV1dYQ7piYrWgum5yZKMZ7GvT1xPfrilxG4vCn1QkaoBLdA6wXky4JtAHGUdt5EdtORyASt8C8U/qtNCKzyh8e2OB4e0fxQSb+h5VAMoK0G+LZOxGEbf3s6zMxjg4znTe+MtshI3cetJx1OnQSoRCM8RbNE8utiaNbXwpI+kiEfOiEYUsxK0lCFqY0hoHiPZxEtk7ojfxhoihJV8a2wXU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jul 10, 2024 at 6:47=E2=80=AFPM zhiguojiang = wrote: > > > > =E5=9C=A8 2024/7/10 12:44, Barry Song =E5=86=99=E9=81=93: > > [Some people who received this message don't often get email from 21cnb= ao@gmail.com. Learn why this is important at https://aka.ms/LearnAboutSende= rIdentification ] > > > > On Wed, Jul 10, 2024 at 4:04=E2=80=AFPM David Hildenbrand wrote: > >> On 10.07.24 06:02, Barry Song wrote: > >>> On Wed, Jul 10, 2024 at 3:59=E2=80=AFPM David Hildenbrand wrote: > >>>> On 10.07.24 05:32, Barry Song wrote: > >>>>> On Wed, Jul 10, 2024 at 9:23=E2=80=AFAM Andrew Morton wrote: > >>>>>> On Tue, 9 Jul 2024 20:31:15 +0800 Zhiguo Jiang wrote: > >>>>>> > >>>>>>> The releasing process of the non-shared anonymous folio mapped so= lely by > >>>>>>> an exiting process may go through two flows: 1) the anonymous fol= io is > >>>>>>> firstly is swaped-out into swapspace and transformed into a swp_e= ntry > >>>>>>> in shrink_folio_list; 2) then the swp_entry is released in the pr= ocess > >>>>>>> exiting flow. This will result in the high cpu load of releasing = a > >>>>>>> non-shared anonymous folio mapped solely by an exiting process. > >>>>>>> > >>>>>>> When the low system memory and the exiting process exist at the s= ame > >>>>>>> time, it will be likely to happen, because the non-shared anonymo= us > >>>>>>> folio mapped solely by an exiting process may be reclaimed by > >>>>>>> shrink_folio_list. > >>>>>>> > >>>>>>> This patch is that shrink skips the non-shared anonymous folio so= lely > >>>>>>> mapped by an exting process and this folio is only released direc= tly in > >>>>>>> the process exiting flow, which will save swap-out time and allev= iate > >>>>>>> the load of the process exiting. > >>>>>> It would be helpful to provide some before-and-after runtime > >>>>>> measurements, please. It's a performance optimization so please l= et's > >>>>>> see what effect it has. > >>>>> Hi Andrew, > >>>>> > >>>>> This was something I was curious about too, so I created a small te= st program > >>>>> that allocates and continuously writes to 256MB of memory. Using QE= MU, I set > >>>>> up a small machine with only 300MB of RAM to trigger kswapd. > >>>>> > >>>>> qemu-system-aarch64 -M virt,gic-version=3D3,mte=3Doff -nographic \ > >>>>> -smp cpus=3D4 -cpu max \ > >>>>> -m 300M -kernel arch/arm64/boot/Image > >>>>> > >>>>> The test program will be randomly terminated by its subprocess to t= rigger > >>>>> the use case of this patch. > >>>>> > >>>>> #include > >>>>> #include > >>>>> #include > >>>>> #include > >>>>> #include > >>>>> #include > >>>>> #include > >>>>> #include > >>>>> > >>>>> #define MEMORY_SIZE (256 * 1024 * 1024) > >>>>> > >>>>> unsigned char *memory; > >>>>> > >>>>> void allocate_and_write_memory() > >>>>> { > >>>>> memory =3D (unsigned char *)malloc(MEMORY_SIZE); > >>>>> if (memory =3D=3D NULL) { > >>>>> perror("malloc"); > >>>>> exit(EXIT_FAILURE); > >>>>> } > >>>>> > >>>>> while (1) > >>>>> memset(memory, 0x11, MEMORY_SIZE); > >>>>> } > >>>>> > >>>>> int main() > >>>>> { > >>>>> pid_t pid; > >>>>> srand(time(NULL)); > >>>>> > >>>>> pid =3D fork(); > >>>>> > >>>>> if (pid < 0) { > >>>>> perror("fork"); > >>>>> exit(EXIT_FAILURE); > >>>>> } > >>>>> > >>>>> if (pid =3D=3D 0) { > >>>>> int delay =3D (rand() % 10000) + 10000; > >>>>> usleep(delay * 1000); > >>>>> > >>>>> /* kill parent when it is busy on swapping */ > >>>>> kill(getppid(), SIGKILL); > >>>>> _exit(0); > >>>>> } else { > >>>>> allocate_and_write_memory(); > >>>>> > >>>>> wait(NULL); > >>>>> > >>>>> free(memory); > >>>>> } > >>>>> > >>>>> return 0; > >>>>> } > >>>>> > >>>>> I tracked the number of folios that could be redundantly > >>>>> swapped out by adding a simple counter as shown below: > >>>>> > >>>>> @@ -879,6 +880,9 @@ static bool folio_referenced_one(struct folio *= folio, > >>>>> check_stable_address_space(vma->vm_mm)) && > >>>>> folio_test_swapbacked(folio) && > >>>>> !folio_likely_mapped_shared(folio)) { > >>>>> + static long i, size; > >>>>> + size +=3D folio_size(folio); > >>>>> + pr_err("index: %d skipped folio:%lx total s= ize:%d\n", i++, (unsigned long)folio, size); > >>>>> pra->referenced =3D -1; > >>>>> page_vma_mapped_walk_done(&pvmw); > >>>>> return false; > >>>>> > >>>>> > >>>>> This is what I have observed: > >>>>> > >>>>> / # /home/barry/develop/linux/skip_swap_out_test > >>>>> [ 82.925645] index: 0 skipped folio:fffffdffc0425400 total size:6= 5536 > >>>>> [ 82.925960] index: 1 skipped folio:fffffdffc0425800 total size:1= 31072 > >>>>> [ 82.927524] index: 2 skipped folio:fffffdffc0425c00 total size:1= 96608 > >>>>> [ 82.928649] index: 3 skipped folio:fffffdffc0426000 total size:2= 62144 > >>>>> [ 82.929383] index: 4 skipped folio:fffffdffc0426400 total size:3= 27680 > >>>>> [ 82.929995] index: 5 skipped folio:fffffdffc0426800 total size:3= 93216 > >>>>> ... > >>>>> [ 88.469130] index: 6112 skipped folio:fffffdffc0390080 total siz= e:97230848 > >>>>> [ 88.469966] index: 6113 skipped folio:fffffdffc038d000 total siz= e:97296384 > >>>>> [ 89.023414] index: 6114 skipped folio:fffffdffc0366cc0 total siz= e:97300480 > >>>>> > >>>>> I observed that this patch effectively skipped 6114 folios (either = 4KB or 64KB > >>>>> mTHP), potentially reducing the swap-out by up to 92MB (97,300,480 = bytes) during > >>>>> the process exit. > >>>>> > >>>>> Despite the numerous mistakes Zhiguo made in sending this patch, it= is still > >>>>> quite valuable. Please consider pulling his v9 into the mm tree for= testing. > >>>> BTW, we dropped the folio_test_anon() check, but what about shmem? T= hey > >>>> also do __folio_set_swapbacked()? > >>> my point is that the purpose is skipping redundant swap-out, if shmem= is single > >>> mapped, they could be also skipped. > >> But they won't get necessarily *freed* when unmapping them. They might > >> just continue living in tmpfs? where some other process might just map > >> them later? > >> > > You're correct. I overlooked this aspect, focusing on swap and thinking= of shmem > > solely in terms of swap. > > > >> IMHO, there is a big difference here between anon and shmem. (well, > >> anon_shmem would actually be different :) ) > > Even though anon_shmem behaves similarly to anonymous memory when > > releasing memory, it doesn't seem worth the added complexity? > > > > So unfortunately it seems Zhiguo still needs v10 to take folio_test_ano= n() > > back? Sorry for my bad, Zhiguo. > If folio_test_anon(folio) && folio_test_swapbacked(folio) condition is > used, can > it means that the folio is anonymous anther than shmem definitely? So doe= s > folio_likely_mapped_shared() need to be removed? No, shared memory (shmem) isn't necessarily shared, and private anonymous memory isn't necessarily unshared. There is no direct relationship between them. In the case of a fork, your private anonymous folio can be shared by two or more processes before CoW. > > > >> -- > >> Cheers, > >> > >> David / dhildenb > >> > > Thanks > > Barry > Thanks > Zhiguo >