在 2022/10/20 02:48, Alex Zhu (Kernel) 写道: > > >> On Oct 18, 2022, at 10:12 PM, Yu Zhao wrote: >> >> On Tue, Oct 18, 2022 at 9:42 PM wrote: >>> >>> From: Alexander Zhu >>> >>> Currently, when /sys/kernel/mm/transparent_hugepage/enabled=always >>> is set >>> there are a large number of transparent hugepages that are almost >>> entirely >>> zero filled.  This is mentioned in a number of previous patchsets >>> including: >>> https://lore.kernel.org/all/20210731063938.1391602-1-yuzhao@google.com/ >>> https://lore.kernel.org/all/ >>> 1635422215-99394-1-git-send-email-ningzhang@linux.alibaba.com/ >>> >>> Currently, split_huge_page() does not have a way to identify zero filled >>> pages within the THP. Thus these zero pages get remapped and continue to >>> create memory waste. In this patch, we identify and free tail pages that >>> are zero filled in split_huge_page(). In this way, we avoid mapping >>> these >>> pages back into page table entries and can free up unused memory within >>> THPs. This is based off the previously mentioned patchset by Yu Zhao. >> >> Hi Alex, >> >> Generally the process [1] to follow is that you keep my patches >> separate from yours, rather than squash them into one, e.g., [2]. >> >> [1]https://www.kernel.org/doc/html/latest/process/submitting-patches.html >> [2]https://lore.kernel.org/linux-mm/cover.1665568707.git.christophe.leroy@csgroup.eu/ >> >> Also it's a courtesy to cc Ning, since his approach is (very) similar >> to yours. Naturally he would wonder if you are reinventing the wheel, >> so you'd have to address it in your cover letter. > > Sorry about that. Will cc Ning as well in future iterations. I will > split out the second patch into a few patches as well. > > This patchset differs from Ning's RFC in that we make use of list_lru > and a shrinker, as discussed previously: > https://lore.kernel.org/linux-mm/CAOUHufYeuMN9As58BVwMKSN6viOZKReXNeCBgGeeL6ToWGsEKw@mail.gmail.com/ > > The approach is different, but we are fundamentally still cleaning up > underutilized THPs (contain a large number of zero pages). > I have used a shrinker in previous version (see https://gitee.com/anolis/cloud-kernel/commit/62f8852885cc7f23063886d36fd36d94b48d3982) . But the shrinker has a problem that it can't control the split number accurately. For example, I only want to split two THPs to avoid OOM, but shrinker may split many THPs. >> >>> However, we chose to free anonymous zero tail pages whenever they are >>> encountered instead of only on reclaim or migration. >> >> What are cases that are not on reclaim or migration? > > It would be any case where split_huge_page is called on anonymous > memory. split_huge_page is also called from KSM and madvise. It can > also be called from debugfs, which is what the self test relies on. We > thought this implementation would be more generic. As far as I can > tell there is no reason to keep zero pages around in anonymous THPs > that have been split. > > We also handled remapping to a shared zero page on userfaultfd in a > previous iteration. That is the only use case I am aware of where we > do not want to zap the zero pages. >> >> As I've explained off the mailing list, it's likely a bug if you >> really have one. And I don't think you do. I'm currently under the >> impression that you have a slab shrinker, and slab shrinkers are on >> the reclaim path. >> >> Thanks. > > This shrinker is not only for slabs. It’s for all anonymous THPs in > physical memory. That’s why we needed to add list_lru_add_page and > list_lru_delete_page as well, as list_lru_add/delete assumes slab > objects. > >