* [BUG] ZSwap leaks memory upon being disabled @ 2024-10-24 13:02 Konstantin Kharlamov 2024-10-24 20:47 ` Yosry Ahmed 0 siblings, 1 reply; 19+ messages in thread From: Konstantin Kharlamov @ 2024-10-24 13:02 UTC (permalink / raw) To: linux-mm When ZSWAP is disabled, the `Zswap` and `Zswapped` in meminfo are still non-zero. IOW, ZSWAP doesn't free memory upon being disabled. Stumbled upon this while trying to figure out where did ≈4G of my SWAP memory disappear. Been seeing some unknown memory in SWAP for years, now I suspect ZSWAP might be the culprit. But no way to know for sure because of this bug. # Steps to reproduce 1. Enable ZSWAP 2. Wait for `grep Zswap /proc/meminfo` to become non-zero 3. Disable ZSWAP via `sudo sh -c "echo 0 > /sys/module/zswap/parameters/enabled"` 4. Look at `grep Zswap /proc/meminfo` ## Expected The rows are zero because ZSWAP is disabled. ## Actual The rows doesn't change. # Additional information Kernel: 6.11.3 OS: Archlinux ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] ZSwap leaks memory upon being disabled 2024-10-24 13:02 [BUG] ZSwap leaks memory upon being disabled Konstantin Kharlamov @ 2024-10-24 20:47 ` Yosry Ahmed 2024-10-25 6:41 ` Konstantin Kharlamov 0 siblings, 1 reply; 19+ messages in thread From: Yosry Ahmed @ 2024-10-24 20:47 UTC (permalink / raw) To: Konstantin Kharlamov; +Cc: linux-mm, Johannes Weiner, Nhat Pham, Chengming Zhou On Thu, Oct 24, 2024 at 6:02 AM Konstantin Kharlamov <Hi-Angel@yandex.ru> wrote: > > When ZSWAP is disabled, the `Zswap` and `Zswapped` in meminfo are still non-zero. > IOW, ZSWAP doesn't free memory upon being disabled. > > Stumbled upon this while trying to figure out where did ≈4G of my SWAP memory > disappear. Been seeing some unknown memory in SWAP for years, now I suspect ZSWAP > might be the culprit. But no way to know for sure because of this bug. > > # Steps to reproduce > > 1. Enable ZSWAP > 2. Wait for `grep Zswap /proc/meminfo` to become non-zero > 3. Disable ZSWAP via `sudo sh -c "echo 0 > /sys/module/zswap/parameters/enabled"` > 4. Look at `grep Zswap /proc/meminfo` > > ## Expected > > The rows are zero because ZSWAP is disabled. Not really, the expected behavior is that further swapouts will not go to zswap, but pages that are already compressed in zswap will not be written out to the backing swapfile or swapped back to memory. A swapoff would be required for the latter. This is documented in: https://docs.kernel.org/admin-guide/mm/zswap.html#overview. > > ## Actual > > The rows doesn't change. > > # Additional information > > Kernel: 6.11.3 > OS: Archlinux > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] ZSwap leaks memory upon being disabled 2024-10-24 20:47 ` Yosry Ahmed @ 2024-10-25 6:41 ` Konstantin Kharlamov 2024-10-25 7:50 ` Yosry Ahmed 0 siblings, 1 reply; 19+ messages in thread From: Konstantin Kharlamov @ 2024-10-25 6:41 UTC (permalink / raw) To: Yosry Ahmed; +Cc: linux-mm, Johannes Weiner, Nhat Pham, Chengming Zhou On Thu, 2024-10-24 at 13:47 -0700, Yosry Ahmed wrote: > On Thu, Oct 24, 2024 at 6:02 AM Konstantin Kharlamov > <Hi-Angel@yandex.ru> wrote: > > > > When ZSWAP is disabled, the `Zswap` and `Zswapped` in meminfo are > > still non-zero. > > IOW, ZSWAP doesn't free memory upon being disabled. > > > > Stumbled upon this while trying to figure out where did ≈4G of my > > SWAP memory > > disappear. Been seeing some unknown memory in SWAP for years, now I > > suspect ZSWAP > > might be the culprit. But no way to know for sure because of this > > bug. > > > > # Steps to reproduce > > > > 1. Enable ZSWAP > > 2. Wait for `grep Zswap /proc/meminfo` to become non-zero > > 3. Disable ZSWAP via `sudo sh -c "echo 0 > > > /sys/module/zswap/parameters/enabled"` > > 4. Look at `grep Zswap /proc/meminfo` > > > > ## Expected > > > > The rows are zero because ZSWAP is disabled. > > Not really, the expected behavior is that further swapouts will not > go > to zswap, but pages that are already compressed in zswap will not be > written out to the backing swapfile or swapped back to memory. A > swapoff would be required for the latter. > > This is documented in: > https://docs.kernel.org/admin-guide/mm/zswap.html#overview. Oh, I see, thank you, sorry for the noise. Then, I'm curious, is it correct to assume that this `Zswap`-prefixed memory mentioned in meminfo is never the one that is in SWAP? I mean, Zswap being a buffer before data goes to swap kind of implies that yes, the data *either* in zswap or in swap. But just wanted to hear that explicitly. The background to my question is that I'm trying to find the culprit some "phantom memory" eventually filling up my SWAP. This memory is not one accounted to apps (as calculated via `smem`), nor to tmpfs. So my next suspect was something related to ZSwap. > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] ZSwap leaks memory upon being disabled 2024-10-25 6:41 ` Konstantin Kharlamov @ 2024-10-25 7:50 ` Yosry Ahmed 2024-10-26 11:33 ` Konstantin Kharlamov 0 siblings, 1 reply; 19+ messages in thread From: Yosry Ahmed @ 2024-10-25 7:50 UTC (permalink / raw) To: Konstantin Kharlamov; +Cc: linux-mm, Johannes Weiner, Nhat Pham, Chengming Zhou On Thu, Oct 24, 2024 at 11:41 PM Konstantin Kharlamov <Hi-Angel@yandex.ru> wrote: > > On Thu, 2024-10-24 at 13:47 -0700, Yosry Ahmed wrote: > > On Thu, Oct 24, 2024 at 6:02 AM Konstantin Kharlamov > > <Hi-Angel@yandex.ru> wrote: > > > > > > When ZSWAP is disabled, the `Zswap` and `Zswapped` in meminfo are > > > still non-zero. > > > IOW, ZSWAP doesn't free memory upon being disabled. > > > > > > Stumbled upon this while trying to figure out where did ≈4G of my > > > SWAP memory > > > disappear. Been seeing some unknown memory in SWAP for years, now I > > > suspect ZSWAP > > > might be the culprit. But no way to know for sure because of this > > > bug. > > > > > > # Steps to reproduce > > > > > > 1. Enable ZSWAP > > > 2. Wait for `grep Zswap /proc/meminfo` to become non-zero > > > 3. Disable ZSWAP via `sudo sh -c "echo 0 > > > > /sys/module/zswap/parameters/enabled"` > > > 4. Look at `grep Zswap /proc/meminfo` > > > > > > ## Expected > > > > > > The rows are zero because ZSWAP is disabled. > > > > Not really, the expected behavior is that further swapouts will not > > go > > to zswap, but pages that are already compressed in zswap will not be > > written out to the backing swapfile or swapped back to memory. A > > swapoff would be required for the latter. > > > > This is documented in: > > https://docs.kernel.org/admin-guide/mm/zswap.html#overview. > > Oh, I see, thank you, sorry for the noise. > > Then, I'm curious, is it correct to assume that this `Zswap`-prefixed > memory mentioned in meminfo is never the one that is in SWAP? I mean, > Zswap being a buffer before data goes to swap kind of implies that yes, > the data *either* in zswap or in swap. But just wanted to hear that > explicitly. I know this makes sense, but unfortunately no. Zswap is currently transparent to the rest of the system. For all intents and purposes, pages in zswap are considered in swap. You cannot even use zswap with an actual swapfile. So the zswap stats should be a subset of the swap stats. FWIW, Nhat is working on restructuring this to have zswap be its own entity, separate from any swapfiles. > > The background to my question is that I'm trying to find the culprit > some "phantom memory" eventually filling up my SWAP. This memory is not > one accounted to apps (as calculated via `smem`), nor to tmpfs. So my > next suspect was something related to ZSwap. > > As I mentioned, zswap should be transparent to the rest of the system, so it shouldn't make a difference in this case whether the pages are in zswap or in the swapfile. You can use the memory.swap.current counter to find out which memory cgroup currently has swapped out pages (in zswap or in the swapfile). This should help find the application that has memory in swap. If you want to find the exact type of memory (e.g. anon vs tmpfs), that would be more tricky. Perhaps you can swapoff and see what counters increase in memory.stat of the relevant memory cgroup? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] ZSwap leaks memory upon being disabled 2024-10-25 7:50 ` Yosry Ahmed @ 2024-10-26 11:33 ` Konstantin Kharlamov 2024-10-26 17:47 ` Yosry Ahmed 0 siblings, 1 reply; 19+ messages in thread From: Konstantin Kharlamov @ 2024-10-26 11:33 UTC (permalink / raw) To: Yosry Ahmed; +Cc: linux-mm, Johannes Weiner, Nhat Pham, Chengming Zhou On Fri, 2024-10-25 at 00:50 -0700, Yosry Ahmed wrote: > On Thu, Oct 24, 2024 at 11:41 PM Konstantin Kharlamov > <Hi-Angel@yandex.ru> wrote: > > > > On Thu, 2024-10-24 at 13:47 -0700, Yosry Ahmed wrote: > > > On Thu, Oct 24, 2024 at 6:02 AM Konstantin Kharlamov > > > <Hi-Angel@yandex.ru> wrote: > > > > > > > > When ZSWAP is disabled, the `Zswap` and `Zswapped` in meminfo > > > > are > > > > still non-zero. > > > > IOW, ZSWAP doesn't free memory upon being disabled. > > > > > > > > Stumbled upon this while trying to figure out where did ≈4G of > > > > my > > > > SWAP memory > > > > disappear. Been seeing some unknown memory in SWAP for years, > > > > now I > > > > suspect ZSWAP > > > > might be the culprit. But no way to know for sure because of > > > > this > > > > bug. > > > > > > > > # Steps to reproduce > > > > > > > > 1. Enable ZSWAP > > > > 2. Wait for `grep Zswap /proc/meminfo` to become non-zero > > > > 3. Disable ZSWAP via `sudo sh -c "echo 0 > > > > > /sys/module/zswap/parameters/enabled"` > > > > 4. Look at `grep Zswap /proc/meminfo` > > > > > > > > ## Expected > > > > > > > > The rows are zero because ZSWAP is disabled. > > > > > > Not really, the expected behavior is that further swapouts will > > > not > > > go > > > to zswap, but pages that are already compressed in zswap will not > > > be > > > written out to the backing swapfile or swapped back to memory. A > > > swapoff would be required for the latter. > > > > > > This is documented in: > > > https://docs.kernel.org/admin-guide/mm/zswap.html#overview. > > > > Oh, I see, thank you, sorry for the noise. > > > > Then, I'm curious, is it correct to assume that this `Zswap`- > > prefixed > > memory mentioned in meminfo is never the one that is in SWAP? I > > mean, > > Zswap being a buffer before data goes to swap kind of implies that > > yes, > > the data *either* in zswap or in swap. But just wanted to hear that > > explicitly. > > I know this makes sense, but unfortunately no. Zswap is currently > transparent to the rest of the system. For all intents and purposes, > pages in zswap are considered in swap. You cannot even use zswap with > an actual swapfile. So the zswap stats should be a subset of the swap > stats. > > FWIW, Nhat is working on restructuring this to have zswap be its own > entity, separate from any swapfiles. > > > > > The background to my question is that I'm trying to find the > > culprit > > some "phantom memory" eventually filling up my SWAP. This memory is > > not > > one accounted to apps (as calculated via `smem`), nor to tmpfs. So > > my > > next suspect was something related to ZSwap. > > > > > As I mentioned, zswap should be transparent to the rest of the > system, > so it shouldn't make a difference in this case whether the pages are > in zswap or in the swapfile. > > You can use the memory.swap.current counter to find out which memory > cgroup currently has swapped out pages (in zswap or in the swapfile). > This should help find the application that has memory in swap. If you > want to find the exact type of memory (e.g. anon vs tmpfs), that > would > be more tricky. Perhaps you can swapoff and see what counters > increase > in memory.stat of the relevant memory cgroup? Thank you, so, I've waited till my SWAP gets almost full again (apparently my new workflow triggers that a lot). It is 7.5G out of 8 in total. 437M is taken by tmpfs'es, let's subtract for simplicity, so I have 7G taken by something else. Now I'm looking at `/sys/fs/cgroup/user.slice/memory.swap.current` and it's 4422422528 = 4.1G. That's a lot less than 7G. I'm certain this "phantom swap memory" is hidden in `user.slice`, because if I wait till OOM-killer gets triggered and kills some app, my user-systemd gets crashed for some reason, taking down the entire user session, and afterwards SWAP is almost free. I think this memory.swap.current isn't much different compared to just asking `smem` for SWAP taken by individual apps. As of writing the words that's 4.6G for the entire system, as calculated by: sudo smem -c "name user pid vss pss rss swap" | awk '{total+=$7} END {print "Swap memory: " total "K"}' So 7 - 4.6 = 2.4G of some "phantom" memory. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] ZSwap leaks memory upon being disabled 2024-10-26 11:33 ` Konstantin Kharlamov @ 2024-10-26 17:47 ` Yosry Ahmed 2024-10-27 0:29 ` Konstantin Kharlamov 0 siblings, 1 reply; 19+ messages in thread From: Yosry Ahmed @ 2024-10-26 17:47 UTC (permalink / raw) To: Konstantin Kharlamov; +Cc: linux-mm, Johannes Weiner, Nhat Pham, Chengming Zhou On Sat, Oct 26, 2024 at 4:33 AM Konstantin Kharlamov <Hi-Angel@yandex.ru> wrote: > > On Fri, 2024-10-25 at 00:50 -0700, Yosry Ahmed wrote: > > On Thu, Oct 24, 2024 at 11:41 PM Konstantin Kharlamov > > <Hi-Angel@yandex.ru> wrote: > > > > > > On Thu, 2024-10-24 at 13:47 -0700, Yosry Ahmed wrote: > > > > On Thu, Oct 24, 2024 at 6:02 AM Konstantin Kharlamov > > > > <Hi-Angel@yandex.ru> wrote: > > > > > > > > > > When ZSWAP is disabled, the `Zswap` and `Zswapped` in meminfo > > > > > are > > > > > still non-zero. > > > > > IOW, ZSWAP doesn't free memory upon being disabled. > > > > > > > > > > Stumbled upon this while trying to figure out where did ≈4G of > > > > > my > > > > > SWAP memory > > > > > disappear. Been seeing some unknown memory in SWAP for years, > > > > > now I > > > > > suspect ZSWAP > > > > > might be the culprit. But no way to know for sure because of > > > > > this > > > > > bug. > > > > > > > > > > # Steps to reproduce > > > > > > > > > > 1. Enable ZSWAP > > > > > 2. Wait for `grep Zswap /proc/meminfo` to become non-zero > > > > > 3. Disable ZSWAP via `sudo sh -c "echo 0 > > > > > > /sys/module/zswap/parameters/enabled"` > > > > > 4. Look at `grep Zswap /proc/meminfo` > > > > > > > > > > ## Expected > > > > > > > > > > The rows are zero because ZSWAP is disabled. > > > > > > > > Not really, the expected behavior is that further swapouts will > > > > not > > > > go > > > > to zswap, but pages that are already compressed in zswap will not > > > > be > > > > written out to the backing swapfile or swapped back to memory. A > > > > swapoff would be required for the latter. > > > > > > > > This is documented in: > > > > https://docs.kernel.org/admin-guide/mm/zswap.html#overview. > > > > > > Oh, I see, thank you, sorry for the noise. > > > > > > Then, I'm curious, is it correct to assume that this `Zswap`- > > > prefixed > > > memory mentioned in meminfo is never the one that is in SWAP? I > > > mean, > > > Zswap being a buffer before data goes to swap kind of implies that > > > yes, > > > the data *either* in zswap or in swap. But just wanted to hear that > > > explicitly. > > > > I know this makes sense, but unfortunately no. Zswap is currently > > transparent to the rest of the system. For all intents and purposes, > > pages in zswap are considered in swap. You cannot even use zswap with > > an actual swapfile. So the zswap stats should be a subset of the swap > > stats. > > > > FWIW, Nhat is working on restructuring this to have zswap be its own > > entity, separate from any swapfiles. > > > > > > > > The background to my question is that I'm trying to find the > > > culprit > > > some "phantom memory" eventually filling up my SWAP. This memory is > > > not > > > one accounted to apps (as calculated via `smem`), nor to tmpfs. So > > > my > > > next suspect was something related to ZSwap. > > > > > > > > As I mentioned, zswap should be transparent to the rest of the > > system, > > so it shouldn't make a difference in this case whether the pages are > > in zswap or in the swapfile. > > > > You can use the memory.swap.current counter to find out which memory > > cgroup currently has swapped out pages (in zswap or in the swapfile). > > This should help find the application that has memory in swap. If you > > want to find the exact type of memory (e.g. anon vs tmpfs), that > > would > > be more tricky. Perhaps you can swapoff and see what counters > > increase > > in memory.stat of the relevant memory cgroup? > > Thank you, so, I've waited till my SWAP gets almost full again > (apparently my new workflow triggers that a lot). It is 7.5G out of 8 > in total. 437M is taken by tmpfs'es, let's subtract for simplicity, so > I have 7G taken by something else. If the tmpfs's are created and written to by processes in the user slice, they should show up memory.swap.current as well. > > Now I'm looking at `/sys/fs/cgroup/user.slice/memory.swap.current` and > it's 4422422528 = 4.1G. That's a lot less than 7G. I'm certain this Can you check the memory.swap.current value of other slices? The other possibility is that the pages are swapped out from the root cgroup, in which case they won't show up in memory.swap.current as they are basically unaccounted. Although typically user processes should not be running in the root cgroup. > "phantom swap memory" is hidden in `user.slice`, because if I wait till > OOM-killer gets triggered and kills some app, my user-systemd gets > crashed for some reason, taking down the entire user session, and > afterwards SWAP is almost free. Did you check the OOM logs? It is possible that the OOM killer kills some system process that has some memory in swap as well. > > I think this memory.swap.current isn't much different compared to just > asking `smem` for SWAP taken by individual apps. As of writing the > words that's 4.6G for the entire system, as calculated by: > > sudo smem -c "name user pid vss pss rss swap" | awk > '{total+=$7} END {print "Swap memory: " total "K"}' > > So 7 - 4.6 = 2.4G of some "phantom" memory. I am not sure about smem, but memory.swap.current should be accounting pages swapped out from all memory cgroups except the root. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] ZSwap leaks memory upon being disabled 2024-10-26 17:47 ` Yosry Ahmed @ 2024-10-27 0:29 ` Konstantin Kharlamov 2024-10-27 3:14 ` Nhat Pham 0 siblings, 1 reply; 19+ messages in thread From: Konstantin Kharlamov @ 2024-10-27 0:29 UTC (permalink / raw) To: Yosry Ahmed; +Cc: linux-mm, Johannes Weiner, Nhat Pham, Chengming Zhou On Sat, 2024-10-26 at 10:47 -0700, Yosry Ahmed wrote: > On Sat, Oct 26, 2024 at 4:33 AM Konstantin Kharlamov > <Hi-Angel@yandex.ru> wrote: > > > > On Fri, 2024-10-25 at 00:50 -0700, Yosry Ahmed wrote: > > > On Thu, Oct 24, 2024 at 11:41 PM Konstantin Kharlamov > > > <Hi-Angel@yandex.ru> wrote: > > > > > > > > On Thu, 2024-10-24 at 13:47 -0700, Yosry Ahmed wrote: > > > > > On Thu, Oct 24, 2024 at 6:02 AM Konstantin Kharlamov > > > > > <Hi-Angel@yandex.ru> wrote: > > > > > > > > > > > > When ZSWAP is disabled, the `Zswap` and `Zswapped` in > > > > > > meminfo > > > > > > are > > > > > > still non-zero. > > > > > > IOW, ZSWAP doesn't free memory upon being disabled. > > > > > > > > > > > > Stumbled upon this while trying to figure out where did ≈4G > > > > > > of > > > > > > my > > > > > > SWAP memory > > > > > > disappear. Been seeing some unknown memory in SWAP for > > > > > > years, > > > > > > now I > > > > > > suspect ZSWAP > > > > > > might be the culprit. But no way to know for sure because > > > > > > of > > > > > > this > > > > > > bug. > > > > > > > > > > > > # Steps to reproduce > > > > > > > > > > > > 1. Enable ZSWAP > > > > > > 2. Wait for `grep Zswap /proc/meminfo` to become non-zero > > > > > > 3. Disable ZSWAP via `sudo sh -c "echo 0 > > > > > > > /sys/module/zswap/parameters/enabled"` > > > > > > 4. Look at `grep Zswap /proc/meminfo` > > > > > > > > > > > > ## Expected > > > > > > > > > > > > The rows are zero because ZSWAP is disabled. > > > > > > > > > > Not really, the expected behavior is that further swapouts > > > > > will > > > > > not > > > > > go > > > > > to zswap, but pages that are already compressed in zswap will > > > > > not > > > > > be > > > > > written out to the backing swapfile or swapped back to > > > > > memory. A > > > > > swapoff would be required for the latter. > > > > > > > > > > This is documented in: > > > > > https://docs.kernel.org/admin-guide/mm/zswap.html#overview. > > > > > > > > Oh, I see, thank you, sorry for the noise. > > > > > > > > Then, I'm curious, is it correct to assume that this `Zswap`- > > > > prefixed > > > > memory mentioned in meminfo is never the one that is in SWAP? I > > > > mean, > > > > Zswap being a buffer before data goes to swap kind of implies > > > > that > > > > yes, > > > > the data *either* in zswap or in swap. But just wanted to hear > > > > that > > > > explicitly. > > > > > > I know this makes sense, but unfortunately no. Zswap is currently > > > transparent to the rest of the system. For all intents and > > > purposes, > > > pages in zswap are considered in swap. You cannot even use zswap > > > with > > > an actual swapfile. So the zswap stats should be a subset of the > > > swap > > > stats. > > > > > > FWIW, Nhat is working on restructuring this to have zswap be its > > > own > > > entity, separate from any swapfiles. > > > > > > > > > > > The background to my question is that I'm trying to find the > > > > culprit > > > > some "phantom memory" eventually filling up my SWAP. This > > > > memory is > > > > not > > > > one accounted to apps (as calculated via `smem`), nor to tmpfs. > > > > So > > > > my > > > > next suspect was something related to ZSwap. > > > > > > > > > > > As I mentioned, zswap should be transparent to the rest of the > > > system, > > > so it shouldn't make a difference in this case whether the pages > > > are > > > in zswap or in the swapfile. > > > > > > You can use the memory.swap.current counter to find out which > > > memory > > > cgroup currently has swapped out pages (in zswap or in the > > > swapfile). > > > This should help find the application that has memory in swap. If > > > you > > > want to find the exact type of memory (e.g. anon vs tmpfs), that > > > would > > > be more tricky. Perhaps you can swapoff and see what counters > > > increase > > > in memory.stat of the relevant memory cgroup? > > > > Thank you, so, I've waited till my SWAP gets almost full again > > (apparently my new workflow triggers that a lot). It is 7.5G out of > > 8 > > in total. 437M is taken by tmpfs'es, let's subtract for simplicity, > > so > > I have 7G taken by something else. > > If the tmpfs's are created and written to by processes in the user > slice, they should show up memory.swap.current as well. > > > > > Now I'm looking at `/sys/fs/cgroup/user.slice/memory.swap.current` > > and > > it's 4422422528 = 4.1G. That's a lot less than 7G. I'm certain this > > Can you check the memory.swap.current value of other slices? That was a good idea! The `/sys/fs/cgroup/system.slice/memory.swap.current` seems to have the missing half of the SWAP memory. From my understanding of the `systemctl status` graph `sytem.slice` and `user.slice` groups do not intersect, and by adding up `system.slice/…` + `user.slice/…` I get around 8G. However, I'm still unclear what does this memory belong to. `system.slice/memory.swap.current` is 4.4G currently, that's a lot and I'm not seeing anything that could take so much memory. An even larger related mystery is why does this memory not show up in `smem` numbers for individual applications (which calculates it by going over `/proc/$pid/smaps` for every pid). > The other possibility is that the pages are swapped out from the root > cgroup, in which case they won't show up in memory.swap.current as > they are basically unaccounted. Although typically user processes > should not be running in the root cgroup. > > > "phantom swap memory" is hidden in `user.slice`, because if I wait > > till > > OOM-killer gets triggered and kills some app, my user-systemd gets > > crashed for some reason, taking down the entire user session, and > > afterwards SWAP is almost free. > > Did you check the OOM logs? It is possible that the OOM killer kills > some system process that has some memory in swap as well. I did, logs are pretty uninteresting. OOM kills `electron` (of element- desktop), but I tried closing it before the OOM, that didn't have much influence. Just an arbitrary victim. Then a few lines later a `Process 560296 (systemd) of user 1000 terminated abnormally with signal 11/SEGV`. Wasn't able to get stacktrace for systemd with Archlinux's debuginfo servers. And then everything gets down with systemd. I just tried closing every application I have open and I still got 5.5 in SWAP. Well, obviously there are services still running, Plasma, i3wm… Not many suspects left though. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] ZSwap leaks memory upon being disabled 2024-10-27 0:29 ` Konstantin Kharlamov @ 2024-10-27 3:14 ` Nhat Pham 2024-10-27 6:46 ` Yosry Ahmed 2024-10-27 10:25 ` [BUG] ZSwap leaks memory upon being disabled Konstantin Kharlamov 0 siblings, 2 replies; 19+ messages in thread From: Nhat Pham @ 2024-10-27 3:14 UTC (permalink / raw) To: Konstantin Kharlamov Cc: Yosry Ahmed, linux-mm, Johannes Weiner, Chengming Zhou On Sat, Oct 26, 2024 at 5:29 PM Konstantin Kharlamov <Hi-Angel@yandex.ru> wrote: > > That was a good idea! The > `/sys/fs/cgroup/system.slice/memory.swap.current` seems to have the > missing half of the SWAP memory. From my understanding of the > `systemctl status` graph `sytem.slice` and `user.slice` groups do not > intersect, and by adding up `system.slice/…` + `user.slice/…` I get > around 8G. > > However, I'm still unclear what does this memory belong to. > `system.slice/memory.swap.current` is 4.4G currently, that's a lot and > I'm not seeing anything that could take so much memory. I assume you do not have any proactive memory reclaimer? :) I believe the top utility can display swap usage by process. Have you tried that? There are a couple of edge cases - for instance, if you disable zswap writeback and zswap at the same time. We will allocate slots on swapfile, and store it at the page table entry, but we cannot store the page's content in zswap or the swapfile, so the page remains in memory. You're occupying swap space, but are not really saving any memory usage. IIRC, there is also an edge case where a page is faulted back into memory from swap, but the associated swap space cannot be immediately released. This should be temporary though - memory reclaimer will attempt to release these pages later on, or they can be released when we scan the swapfile for slots during swap out. > > An even larger related mystery is why does this memory not show up in > `smem` numbers for individual applications (which calculates it by > going over `/proc/$pid/smaps` for every pid). > > > The other possibility is that the pages are swapped out from the root > > cgroup, in which case they won't show up in memory.swap.current as > > they are basically unaccounted. Although typically user processes > > should not be running in the root cgroup. > > > > > "phantom swap memory" is hidden in `user.slice`, because if I wait > > > till > > > OOM-killer gets triggered and kills some app, my user-systemd gets > > > crashed for some reason, taking down the entire user session, and > > > afterwards SWAP is almost free. > > > > Did you check the OOM logs? It is possible that the OOM killer kills > > some system process that has some memory in swap as well. > > I did, logs are pretty uninteresting. OOM kills `electron` (of element- > desktop), but I tried closing it before the OOM, that didn't have much > influence. Just an arbitrary victim. Then a few lines later a `Process > 560296 (systemd) of user 1000 terminated abnormally with signal > 11/SEGV`. Wasn't able to get stacktrace for systemd with Archlinux's > debuginfo servers. And then everything gets down with systemd. > > I just tried closing every application I have open and I still got 5.5 > in SWAP. Well, obviously there are services still running, Plasma, > i3wm… Not many suspects left though. This beats me. I don't know the process situation in your laptop. Sorry :) ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] ZSwap leaks memory upon being disabled 2024-10-27 3:14 ` Nhat Pham @ 2024-10-27 6:46 ` Yosry Ahmed 2024-10-27 10:11 ` Konstantin Kharlamov 2024-10-27 10:25 ` [BUG] ZSwap leaks memory upon being disabled Konstantin Kharlamov 1 sibling, 1 reply; 19+ messages in thread From: Yosry Ahmed @ 2024-10-27 6:46 UTC (permalink / raw) To: Nhat Pham; +Cc: Konstantin Kharlamov, linux-mm, Johannes Weiner, Chengming Zhou On Sat, Oct 26, 2024 at 8:14 PM Nhat Pham <nphamcs@gmail.com> wrote: > > On Sat, Oct 26, 2024 at 5:29 PM Konstantin Kharlamov <Hi-Angel@yandex.ru> wrote: > > > > That was a good idea! The > > `/sys/fs/cgroup/system.slice/memory.swap.current` seems to have the > > missing half of the SWAP memory. From my understanding of the > > `systemctl status` graph `sytem.slice` and `user.slice` groups do not > > intersect, and by adding up `system.slice/…` + `user.slice/…` I get > > around 8G. > > > > However, I'm still unclear what does this memory belong to. > > `system.slice/memory.swap.current` is 4.4G currently, that's a lot and > > I'm not seeing anything that could take so much memory. I am not very familiar with what usually runs in system.slice. > > I assume you do not have any proactive memory reclaimer? :) I believe > the top utility can display swap usage by process. Have you tried > that? > > There are a couple of edge cases - for instance, if you disable zswap > writeback and zswap at the same time. We will allocate slots on > swapfile, and store it at the page table entry, but we cannot store > the page's content in zswap or the swapfile, so the page remains in > memory. You're occupying swap space, but are not really saving any > memory usage. > > IIRC, there is also an edge case where a page is faulted back into > memory from swap, but the associated swap space cannot be immediately > released. This should be temporary though - memory reclaimer will > attempt to release these pages later on, or they can be released when > we scan the swapfile for slots during swap out. I don't think this is an edge case. I think when we swapin a page we generally leave it in the swapcache if there is no pressure on swap space. In that case the memory is not really swapped out, but because it remains in the swapcache it is still reserving a swap slot, so it shows up as swap usage. Konstantin, could you check the amount of swapcache you have, whether through /proc/vmstat or memory.stat on both user and system slices? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] ZSwap leaks memory upon being disabled 2024-10-27 6:46 ` Yosry Ahmed @ 2024-10-27 10:11 ` Konstantin Kharlamov 2024-10-27 10:32 ` Konstantin Kharlamov 0 siblings, 1 reply; 19+ messages in thread From: Konstantin Kharlamov @ 2024-10-27 10:11 UTC (permalink / raw) To: Yosry Ahmed, Nhat Pham; +Cc: linux-mm, Johannes Weiner, Chengming Zhou On Sat, 2024-10-26 at 23:46 -0700, Yosry Ahmed wrote: > On Sat, Oct 26, 2024 at 8:14 PM Nhat Pham <nphamcs@gmail.com> wrote: > > > > On Sat, Oct 26, 2024 at 5:29 PM Konstantin Kharlamov > > <Hi-Angel@yandex.ru> wrote: > > > > > > That was a good idea! The > > > `/sys/fs/cgroup/system.slice/memory.swap.current` seems to have > > > the > > > missing half of the SWAP memory. From my understanding of the > > > `systemctl status` graph `sytem.slice` and `user.slice` groups do > > > not > > > intersect, and by adding up `system.slice/…` + `user.slice/…` I > > > get > > > around 8G. > > > > > > However, I'm still unclear what does this memory belong to. > > > `system.slice/memory.swap.current` is 4.4G currently, that's a > > > lot and > > > I'm not seeing anything that could take so much memory. > > I am not very familiar with what usually runs in system.slice. > > > > > I assume you do not have any proactive memory reclaimer? :) I > > believe > > the top utility can display swap usage by process. Have you tried > > that? > > > > There are a couple of edge cases - for instance, if you disable > > zswap > > writeback and zswap at the same time. We will allocate slots on > > swapfile, and store it at the page table entry, but we cannot store > > the page's content in zswap or the swapfile, so the page remains in > > memory. You're occupying swap space, but are not really saving any > > memory usage. > > > > IIRC, there is also an edge case where a page is faulted back into > > memory from swap, but the associated swap space cannot be > > immediately > > released. This should be temporary though - memory reclaimer will > > attempt to release these pages later on, or they can be released > > when > > we scan the swapfile for slots during swap out. > > I don't think this is an edge case. I think when we swapin a page we > generally leave it in the swapcache if there is no pressure on swap > space. In that case the memory is not really swapped out, but because > it remains in the swapcache it is still reserving a swap slot, so it > shows up as swap usage. > > Konstantin, could you check the amount of swapcache you have, whether > through /proc/vmstat or memory.stat on both user and system slices? Sure λ grep cache /sys/fs/cgroup/*/memory.stat … /sys/fs/cgroup/system.slice/memory.stat:swapcached 434917376 /sys/fs/cgroup/user.slice/memory.stat:swapcached 15478784 `434917376` is a 0.4G, not much. In comparison, `system.slice/memory.swap.current` is currently `4764139520 = 4.4G`. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] ZSwap leaks memory upon being disabled 2024-10-27 10:11 ` Konstantin Kharlamov @ 2024-10-27 10:32 ` Konstantin Kharlamov 2024-10-27 11:28 ` Konstantin Kharlamov 0 siblings, 1 reply; 19+ messages in thread From: Konstantin Kharlamov @ 2024-10-27 10:32 UTC (permalink / raw) To: Yosry Ahmed, Nhat Pham; +Cc: linux-mm, Johannes Weiner, Chengming Zhou On Sun, 2024-10-27 at 13:11 +0300, Konstantin Kharlamov wrote: > On Sat, 2024-10-26 at 23:46 -0700, Yosry Ahmed wrote: > > On Sat, Oct 26, 2024 at 8:14 PM Nhat Pham <nphamcs@gmail.com> > > wrote: > > > > > > On Sat, Oct 26, 2024 at 5:29 PM Konstantin Kharlamov > > > <Hi-Angel@yandex.ru> wrote: > > > > > > > > That was a good idea! The > > > > `/sys/fs/cgroup/system.slice/memory.swap.current` seems to have > > > > the > > > > missing half of the SWAP memory. From my understanding of the > > > > `systemctl status` graph `sytem.slice` and `user.slice` groups > > > > do > > > > not > > > > intersect, and by adding up `system.slice/…` + `user.slice/…` I > > > > get > > > > around 8G. > > > > > > > > However, I'm still unclear what does this memory belong to. > > > > `system.slice/memory.swap.current` is 4.4G currently, that's a > > > > lot and > > > > I'm not seeing anything that could take so much memory. > > > > I am not very familiar with what usually runs in system.slice. > > > > > > > > I assume you do not have any proactive memory reclaimer? :) I > > > believe > > > the top utility can display swap usage by process. Have you tried > > > that? > > > > > > There are a couple of edge cases - for instance, if you disable > > > zswap > > > writeback and zswap at the same time. We will allocate slots on > > > swapfile, and store it at the page table entry, but we cannot > > > store > > > the page's content in zswap or the swapfile, so the page remains > > > in > > > memory. You're occupying swap space, but are not really saving > > > any > > > memory usage. > > > > > > IIRC, there is also an edge case where a page is faulted back > > > into > > > memory from swap, but the associated swap space cannot be > > > immediately > > > released. This should be temporary though - memory reclaimer will > > > attempt to release these pages later on, or they can be released > > > when > > > we scan the swapfile for slots during swap out. > > > > I don't think this is an edge case. I think when we swapin a page > > we > > generally leave it in the swapcache if there is no pressure on swap > > space. In that case the memory is not really swapped out, but > > because > > it remains in the swapcache it is still reserving a swap slot, so > > it > > shows up as swap usage. > > > > Konstantin, could you check the amount of swapcache you have, > > whether > > through /proc/vmstat or memory.stat on both user and system slices? > > Sure > > λ grep cache /sys/fs/cgroup/*/memory.stat > … > /sys/fs/cgroup/system.slice/memory.stat:swapcached 434917376 > /sys/fs/cgroup/user.slice/memory.stat:swapcached 15478784 > > `434917376` is a 0.4G, not much. In comparison, > `system.slice/memory.swap.current` is currently `4764139520 = 4.4G`. I figured since 434917376 is 10 numbers, I'd grep everything in memory.stat that has ten digits: λ grep -P "\d{10}$" /sys/fs/cgroup/system.slice/memory.stat file 2671874048 shmem 2592768000 zswapped 2997760000 active_anon 1491247104 unevictable 1269555200 well, to me personally this isn't helpful, but perhaps am I missing something… ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] ZSwap leaks memory upon being disabled 2024-10-27 10:32 ` Konstantin Kharlamov @ 2024-10-27 11:28 ` Konstantin Kharlamov 2024-10-27 19:31 ` Yosry Ahmed 0 siblings, 1 reply; 19+ messages in thread From: Konstantin Kharlamov @ 2024-10-27 11:28 UTC (permalink / raw) To: Yosry Ahmed, Nhat Pham; +Cc: linux-mm, Johannes Weiner, Chengming Zhou On Sun, 2024-10-27 at 13:32 +0300, Konstantin Kharlamov wrote: > On Sun, 2024-10-27 at 13:11 +0300, Konstantin Kharlamov wrote: > > On Sat, 2024-10-26 at 23:46 -0700, Yosry Ahmed wrote: > > > I don't think this is an edge case. I think when we swapin a page > > > we > > > generally leave it in the swapcache if there is no pressure on > > > swap > > > space. In that case the memory is not really swapped out, but > > > because > > > it remains in the swapcache it is still reserving a swap slot, so > > > it > > > shows up as swap usage. > > > > > > Konstantin, could you check the amount of swapcache you have, > > > whether > > > through /proc/vmstat or memory.stat on both user and system > > > slices? > > > > Sure > > > > λ grep cache /sys/fs/cgroup/*/memory.stat > > … > > /sys/fs/cgroup/system.slice/memory.stat:swapcached > > 434917376 > > /sys/fs/cgroup/user.slice/memory.stat:swapcached 15478784 > > > > `434917376` is a 0.4G, not much. In comparison, > > `system.slice/memory.swap.current` is currently `4764139520 = > > 4.4G`. > > I figured since 434917376 is 10 numbers, I'd grep everything in > memory.stat that has ten digits: > > λ grep -P "\d{10}$" /sys/fs/cgroup/system.slice/memory.stat > file 2671874048 > shmem 2592768000 > zswapped 2997760000 > active_anon 1491247104 > unevictable 1269555200 > > well, to me personally this isn't helpful, but perhaps am I missing > something… I found the process the "phantom memory" belongs to! I just realized that I can see `memory.swap.current` for individual processes in a cgroup too, and it turns out currently 4.3G belong to sddm: /sys/fs/cgroup/system.slice/sddm.service/memory.swap.current:4723781632 systemctl confirms this: λ systemctl status sddm ● sddm.service - Simple Desktop Display Manager Loaded: loaded (/usr/lib/systemd/system/sddm.service; enabled; preset: disabled) Active: active (running) since Wed 2024-10-16 15:59:10 MSK; 1 week 3 days ago Invocation: daadb3ed391b421b90b216122339be83 Docs: man:sddm(1) man:sddm.conf(5) Main PID: 720 (sddm) Tasks: 10 (limit: 18621) Memory: 3.3G (peak: 4.1G swap: 4.3G swap peak: 5.8G zswap: 67.6M) CPU: 21h 30min 56.309s CGroup: /system.slice/sddm.service ├─720 /usr/bin/sddm └─724 /usr/lib/Xorg -nolisten tcp -background none -seat seat0 vt2 -auth /run/sddm/xauth_IKXVXT -noreset -displayfd 16 Note the `swap: 4.3G` sentence. So, this is good news, but still doesn't answer the question where did this memory go. Out of the 2 processes in the group, `smem` shows 2.1M for sddm and 88M for Xorg. I even tried manually calculating: λ sudo grep Swap /proc/72{0,4}/smaps | awk '{total+=$2} END {print "Swap memory: " total "K"}' Swap memory: 184656K That's 180M, for some reason very different, but whatever, still very far from 4.3G. ---------- Just to make it clear, the reason why I'm digging is that something's clearly very wrong. And I can't blame Xorg nor sddm currently, because by all means they don't take 4.3G of memory. The cgroup for some reason does, but the processes don't. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] ZSwap leaks memory upon being disabled 2024-10-27 11:28 ` Konstantin Kharlamov @ 2024-10-27 19:31 ` Yosry Ahmed 2024-10-27 22:13 ` phantom memory in a cgroup (was [BUG] ZSwap leaks memory upon being disabled) Konstantin Kharlamov 0 siblings, 1 reply; 19+ messages in thread From: Yosry Ahmed @ 2024-10-27 19:31 UTC (permalink / raw) To: Konstantin Kharlamov; +Cc: Nhat Pham, linux-mm, Johannes Weiner, Chengming Zhou On Sun, Oct 27, 2024 at 4:28 AM Konstantin Kharlamov <Hi-Angel@yandex.ru> wrote: > > On Sun, 2024-10-27 at 13:32 +0300, Konstantin Kharlamov wrote: > > On Sun, 2024-10-27 at 13:11 +0300, Konstantin Kharlamov wrote: > > > On Sat, 2024-10-26 at 23:46 -0700, Yosry Ahmed wrote: > > > > I don't think this is an edge case. I think when we swapin a page > > > > we > > > > generally leave it in the swapcache if there is no pressure on > > > > swap > > > > space. In that case the memory is not really swapped out, but > > > > because > > > > it remains in the swapcache it is still reserving a swap slot, so > > > > it > > > > shows up as swap usage. > > > > > > > > Konstantin, could you check the amount of swapcache you have, > > > > whether > > > > through /proc/vmstat or memory.stat on both user and system > > > > slices? > > > > > > Sure > > > > > > λ grep cache /sys/fs/cgroup/*/memory.stat > > > … > > > /sys/fs/cgroup/system.slice/memory.stat:swapcached > > > 434917376 > > > /sys/fs/cgroup/user.slice/memory.stat:swapcached 15478784 > > > > > > `434917376` is a 0.4G, not much. In comparison, > > > `system.slice/memory.swap.current` is currently `4764139520 = > > > 4.4G`. > > > > I figured since 434917376 is 10 numbers, I'd grep everything in > > memory.stat that has ten digits: > > > > λ grep -P "\d{10}$" /sys/fs/cgroup/system.slice/memory.stat > > file 2671874048 > > shmem 2592768000 > > zswapped 2997760000 > > active_anon 1491247104 > > unevictable 1269555200 > > > > well, to me personally this isn't helpful, but perhaps am I missing > > something… > > I found the process the "phantom memory" belongs to! I just realized > that I can see `memory.swap.current` for individual processes in a > cgroup too, and it turns out currently 4.3G belong to sddm: > > /sys/fs/cgroup/system.slice/sddm.service/memory.swap.current:4723781632 > > systemctl confirms this: > > λ systemctl status sddm > ● sddm.service - Simple Desktop Display Manager > Loaded: loaded (/usr/lib/systemd/system/sddm.service; enabled; preset: disabled) > Active: active (running) since Wed 2024-10-16 15:59:10 MSK; 1 week 3 days ago > Invocation: daadb3ed391b421b90b216122339be83 > Docs: man:sddm(1) > man:sddm.conf(5) > Main PID: 720 (sddm) > Tasks: 10 (limit: 18621) > Memory: 3.3G (peak: 4.1G swap: 4.3G swap peak: 5.8G zswap: 67.6M) > CPU: 21h 30min 56.309s > CGroup: /system.slice/sddm.service > ├─720 /usr/bin/sddm > └─724 /usr/lib/Xorg -nolisten tcp -background none -seat seat0 vt2 -auth /run/sddm/xauth_IKXVXT -noreset -displayfd 16 > > Note the `swap: 4.3G` sentence. > > So, this is good news, but still doesn't answer the question where did this memory > go. Out of the 2 processes in the group, `smem` shows 2.1M for sddm and 88M for Xorg. > > I even tried manually calculating: > > λ sudo grep Swap /proc/72{0,4}/smaps | awk '{total+=$2} END {print "Swap memory: " total "K"}' > Swap memory: 184656K > > That's 180M, for some reason very different, but whatever, still very far from 4.3G. I think smaps will only show you swapped out mapped memory. It could be tmpfs. One thing you can do is take a snapshot of memory.stat when memory.swap.current is at a high value (for sddm), then swapoff, then take another snapshot of memory.stat. We should see an increase in either anon or shmem, which will tell us which type of memory was swapped out. > > ---------- > > Just to make it clear, the reason why I'm digging is that something's clearly very > wrong. And I can't blame Xorg nor sddm currently, because by all means they don't > take 4.3G of memory. The cgroup for some reason does, but the processes don't. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: phantom memory in a cgroup (was [BUG] ZSwap leaks memory upon being disabled) 2024-10-27 19:31 ` Yosry Ahmed @ 2024-10-27 22:13 ` Konstantin Kharlamov 2024-10-30 14:41 ` Konstantin Kharlamov 0 siblings, 1 reply; 19+ messages in thread From: Konstantin Kharlamov @ 2024-10-27 22:13 UTC (permalink / raw) To: Yosry Ahmed; +Cc: Nhat Pham, linux-mm, Johannes Weiner, Chengming Zhou On Sun, 2024-10-27 at 12:31 -0700, Yosry Ahmed wrote: > On Sun, Oct 27, 2024 at 4:28 AM Konstantin Kharlamov > <Hi-Angel@yandex.ru> wrote: > > > > On Sun, 2024-10-27 at 13:32 +0300, Konstantin Kharlamov wrote: > > > On Sun, 2024-10-27 at 13:11 +0300, Konstantin Kharlamov wrote: > > > > On Sat, 2024-10-26 at 23:46 -0700, Yosry Ahmed wrote: > > > > > I don't think this is an edge case. I think when we swapin a > > > > > page > > > > > we > > > > > generally leave it in the swapcache if there is no pressure > > > > > on > > > > > swap > > > > > space. In that case the memory is not really swapped out, but > > > > > because > > > > > it remains in the swapcache it is still reserving a swap > > > > > slot, so > > > > > it > > > > > shows up as swap usage. > > > > > > > > > > Konstantin, could you check the amount of swapcache you have, > > > > > whether > > > > > through /proc/vmstat or memory.stat on both user and system > > > > > slices? > > > > > > > > Sure > > > > > > > > λ grep cache /sys/fs/cgroup/*/memory.stat > > > > … > > > > /sys/fs/cgroup/system.slice/memory.stat:swapcached > > > > 434917376 > > > > /sys/fs/cgroup/user.slice/memory.stat:swapcached 15478784 > > > > > > > > `434917376` is a 0.4G, not much. In comparison, > > > > `system.slice/memory.swap.current` is currently `4764139520 = > > > > 4.4G`. > > > > > > I figured since 434917376 is 10 numbers, I'd grep everything in > > > memory.stat that has ten digits: > > > > > > λ grep -P "\d{10}$" /sys/fs/cgroup/system.slice/memory.stat > > > file 2671874048 > > > shmem 2592768000 > > > zswapped 2997760000 > > > active_anon 1491247104 > > > unevictable 1269555200 > > > > > > well, to me personally this isn't helpful, but perhaps am I > > > missing > > > something… > > > > I found the process the "phantom memory" belongs to! I just > > realized > > that I can see `memory.swap.current` for individual processes in a > > cgroup too, and it turns out currently 4.3G belong to sddm: > > > > > > /sys/fs/cgroup/system.slice/sddm.service/memory.swap.current:472378 > > 1632 > > > > systemctl confirms this: > > > > λ systemctl status sddm > > ● sddm.service - Simple Desktop Display Manager > > Loaded: loaded (/usr/lib/systemd/system/sddm.service; > > enabled; preset: disabled) > > Active: active (running) since Wed 2024-10-16 15:59:10 MSK; > > 1 week 3 days ago > > Invocation: daadb3ed391b421b90b216122339be83 > > Docs: man:sddm(1) > > man:sddm.conf(5) > > Main PID: 720 (sddm) > > Tasks: 10 (limit: 18621) > > Memory: 3.3G (peak: 4.1G swap: 4.3G swap peak: 5.8G zswap: > > 67.6M) > > CPU: 21h 30min 56.309s > > CGroup: /system.slice/sddm.service > > ├─720 /usr/bin/sddm > > └─724 /usr/lib/Xorg -nolisten tcp -background none - > > seat seat0 vt2 -auth /run/sddm/xauth_IKXVXT -noreset -displayfd 16 > > > > Note the `swap: 4.3G` sentence. > > > > So, this is good news, but still doesn't answer the question where > > did this memory > > go. Out of the 2 processes in the group, `smem` shows 2.1M for sddm > > and 88M for Xorg. > > > > I even tried manually calculating: > > > > λ sudo grep Swap /proc/72{0,4}/smaps | awk '{total+=$2} END > > {print "Swap memory: " total "K"}' > > Swap memory: 184656K > > > > That's 180M, for some reason very different, but whatever, still > > very far from 4.3G. FTR, the reason I got "very different 180M" is I by mistake added up SwapPSS as well. > I think smaps will only show you swapped out mapped memory. It could > be tmpfs. > > One thing you can do is take a snapshot of memory.stat when > memory.swap.current is at a high value (for sddm), then swapoff, then > take another snapshot of memory.stat. > > We should see an increase in either anon or shmem, which will tell us > which type of memory was swapped out. Okay. I will have to wait, because the session got killed by OOM. But I think it's gonna reproduce in just a few days, my new workflow seems to be triggering that a lot. I took this chance to rename the thread as well, otherwise I'm gonna forget it upon writing the next email. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: phantom memory in a cgroup (was [BUG] ZSwap leaks memory upon being disabled) 2024-10-27 22:13 ` phantom memory in a cgroup (was [BUG] ZSwap leaks memory upon being disabled) Konstantin Kharlamov @ 2024-10-30 14:41 ` Konstantin Kharlamov 2024-10-30 19:44 ` Yosry Ahmed 0 siblings, 1 reply; 19+ messages in thread From: Konstantin Kharlamov @ 2024-10-30 14:41 UTC (permalink / raw) To: Yosry Ahmed; +Cc: Nhat Pham, linux-mm, Johannes Weiner, Chengming Zhou On Mon, 2024-10-28 at 01:13 +0300, Konstantin Kharlamov wrote: > On Sun, 2024-10-27 at 12:31 -0700, Yosry Ahmed wrote: > > One thing you can do is take a snapshot of memory.stat when > > memory.swap.current is at a high value (for sddm), then swapoff, > > then > > take another snapshot of memory.stat. > > > > We should see an increase in either anon or shmem, which will tell > > us > > which type of memory was swapped out. > > Okay. I will have to wait, because the session got killed by OOM. But > I > think it's gonna reproduce in just a few days, my new workflow seems > to > be triggering that a lot. Done. I missed one cycle, which again got my session killed by OOM 😅 Now I caught this in time. The information was retrieved by: (systemctl status sddm && cat /sys/fs/cgroup/system.slice/sddm.service/memory.stat) > ~/Projects/cgroups-mem-leak/"$(date -R)".log I wasn't sure how to represent it in email, and decided to post a diff of "before `swapoff -a`" and "after …", to be viewed with `diffr` or with `perl /path/to/diff-highlight` of git or similar. Diff follows: --- "Wed, 30 Oct 2024 17:27:38 +0300.log" 2024-10-30 17:27:38.401290017 +0300 +++ "Wed, 30 Oct 2024 17:28:12 +0300.log" 2024-10-30 17:28:12.397695798 +0300 @@ -6,8 +6,8 @@ man:sddm.conf(5) Main PID: 710 (sddm) Tasks: 9 (limit: 18621) - Memory: 1.2G (peak: 2.8G swap: 1.7G swap peak: 3.2G zswap: 58.3M) - CPU: 6h 10min 7.847s + Memory: 2.8G (peak: 2.8G swap: 0B swap peak: 3.2G) + CPU: 6h 10min 14.748s CGroup: /system.slice/sddm.service ├─710 /usr/bin/sddm └─746 /usr/lib/Xorg -nolisten tcp -background none -seat seat0 vt2 -auth /run/sddm/xauth_GXHGRA -noreset -displayfd 20 @@ -22,36 +22,36 @@ окт 28 09:22:36 dell-g15 sddm-helper[925]: Writing cookie to "/tmp/xauth_RKKlcB" окт 28 09:22:36 dell-g15 sddm-helper[925]: Starting X11 session: "" "/usr/share/sddm/scripts/Xsession \"env KDEWM=/usr/bin/i3 /usr/bin/startplasma-x11\"" окт 28 09:22:36 dell-g15 sddm[710]: Session started true -anon 42807296 -file 1150423040 -kernel 79376384 +anon 93822976 +file 2957750272 +kernel 18210816 kernel_stack 147456 pagetables 7204864 sec_pagetables 0 percpu 2184 sock 0 vmalloc 12288 -shmem 1150795776 -zswap 61173438 -zswapped 1751408640 +shmem 2958123008 +zswap 0 +zswapped 0 file_mapped 4108288 file_dirty 0 file_writeback 0 -swapcached 17666048 +swapcached 0 anon_thp 2097152 file_thp 0 shmem_thp 0 -inactive_anon 445014016 -active_anon 625201152 +inactive_anon 589209600 +active_anon 2321489920 inactive_file 2895872 active_file 2244608 -unevictable 140836864 -slab_reclaimable 8166656 -slab_unreclaimable 2618128 -slab 10784784 -workingset_refault_anon 69854 +unevictable 141144064 +slab_reclaimable 8169032 +slab_unreclaimable 2625208 +slab 10794240 +workingset_refault_anon 177253 workingset_refault_file 12496 -workingset_activate_anon 33476 +workingset_activate_anon 41579 workingset_activate_file 2372 workingset_restore_anon 12558 workingset_restore_file 2132 @@ -64,14 +64,14 @@ pgsteal_kswapd 1243374 pgsteal_direct 348876 pgsteal_khugepaged 9149 -pgfault 626853 +pgfault 626941 pgmajfault 11521 pgrefill 560417 pgactivate 85087 pgdeactivate 0 pglazyfree 0 pglazyfreed 0 -zswpin 87568 +zswpin 515158 zswpout 1395410 zswpwb 211559 thp_fault_alloc 8 ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: phantom memory in a cgroup (was [BUG] ZSwap leaks memory upon being disabled) 2024-10-30 14:41 ` Konstantin Kharlamov @ 2024-10-30 19:44 ` Yosry Ahmed 2024-10-31 21:59 ` Konstantin Kharlamov 0 siblings, 1 reply; 19+ messages in thread From: Yosry Ahmed @ 2024-10-30 19:44 UTC (permalink / raw) To: Konstantin Kharlamov; +Cc: Nhat Pham, linux-mm, Johannes Weiner, Chengming Zhou On Wed, Oct 30, 2024 at 7:41 AM Konstantin Kharlamov <Hi-Angel@yandex.ru> wrote: > > On Mon, 2024-10-28 at 01:13 +0300, Konstantin Kharlamov wrote: > > On Sun, 2024-10-27 at 12:31 -0700, Yosry Ahmed wrote: > > > One thing you can do is take a snapshot of memory.stat when > > > memory.swap.current is at a high value (for sddm), then swapoff, > > > then > > > take another snapshot of memory.stat. > > > > > > We should see an increase in either anon or shmem, which will tell > > > us > > > which type of memory was swapped out. > > > > Okay. I will have to wait, because the session got killed by OOM. But > > I > > think it's gonna reproduce in just a few days, my new workflow seems > > to > > be triggering that a lot. > > Done. I missed one cycle, which again got my session killed by OOM 😅 > Now I caught this in time. The information was retrieved by: > > (systemctl status sddm && cat /sys/fs/cgroup/system.slice/sddm.service/memory.stat) > ~/Projects/cgroups-mem-leak/"$(date -R)".log > > I wasn't sure how to represent it in email, and decided to post a diff > of "before `swapoff -a`" and "after …", to be viewed with `diffr` or > with `perl /path/to/diff-highlight` of git or similar. > > Diff follows: > > --- "Wed, 30 Oct 2024 17:27:38 +0300.log" 2024-10-30 17:27:38.401290017 +0300 > +++ "Wed, 30 Oct 2024 17:28:12 +0300.log" 2024-10-30 17:28:12.397695798 +0300 > @@ -6,8 +6,8 @@ > man:sddm.conf(5) > Main PID: 710 (sddm) > Tasks: 9 (limit: 18621) > - Memory: 1.2G (peak: 2.8G swap: 1.7G swap peak: 3.2G zswap: 58.3M) > - CPU: 6h 10min 7.847s > + Memory: 2.8G (peak: 2.8G swap: 0B swap peak: 3.2G) > + CPU: 6h 10min 14.748s > CGroup: /system.slice/sddm.service > ├─710 /usr/bin/sddm > └─746 /usr/lib/Xorg -nolisten tcp -background none -seat seat0 vt2 -auth /run/sddm/xauth_GXHGRA -noreset -displayfd 20 > @@ -22,36 +22,36 @@ > окт 28 09:22:36 dell-g15 sddm-helper[925]: Writing cookie to "/tmp/xauth_RKKlcB" > окт 28 09:22:36 dell-g15 sddm-helper[925]: Starting X11 session: "" "/usr/share/sddm/scripts/Xsession \"env KDEWM=/usr/bin/i3 /usr/bin/startplasma-x11\"" > окт 28 09:22:36 dell-g15 sddm[710]: Session started true > -anon 42807296 > -file 1150423040 > -kernel 79376384 > +anon 93822976 Anonymous memory increased, but not by too much. > +file 2957750272 > +kernel 18210816 > kernel_stack 147456 > pagetables 7204864 > sec_pagetables 0 > percpu 2184 > sock 0 > vmalloc 12288 > -shmem 1150795776 > -zswap 61173438 > -zswapped 1751408640 > +shmem 2958123008 shmem increased by a lot (~1.8G). So this looks like it could be the answer to your question about where the swap usage is coming from. I would try to find what tmpfs files are used by this application. > +zswap 0 > +zswapped 0 > file_mapped 4108288 > file_dirty 0 > file_writeback 0 > -swapcached 17666048 > +swapcached 0 > anon_thp 2097152 > file_thp 0 > shmem_thp 0 > -inactive_anon 445014016 > -active_anon 625201152 > +inactive_anon 589209600 > +active_anon 2321489920 > inactive_file 2895872 > active_file 2244608 > -unevictable 140836864 > -slab_reclaimable 8166656 > -slab_unreclaimable 2618128 > -slab 10784784 > -workingset_refault_anon 69854 > +unevictable 141144064 > +slab_reclaimable 8169032 > +slab_unreclaimable 2625208 > +slab 10794240 > +workingset_refault_anon 177253 > workingset_refault_file 12496 > -workingset_activate_anon 33476 > +workingset_activate_anon 41579 > workingset_activate_file 2372 > workingset_restore_anon 12558 > workingset_restore_file 2132 > @@ -64,14 +64,14 @@ > pgsteal_kswapd 1243374 > pgsteal_direct 348876 > pgsteal_khugepaged 9149 > -pgfault 626853 > +pgfault 626941 > pgmajfault 11521 > pgrefill 560417 > pgactivate 85087 > pgdeactivate 0 > pglazyfree 0 > pglazyfreed 0 > -zswpin 87568 > +zswpin 515158 > zswpout 1395410 > zswpwb 211559 > thp_fault_alloc 8 ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: phantom memory in a cgroup (was [BUG] ZSwap leaks memory upon being disabled) 2024-10-30 19:44 ` Yosry Ahmed @ 2024-10-31 21:59 ` Konstantin Kharlamov 2024-10-31 22:04 ` Yosry Ahmed 0 siblings, 1 reply; 19+ messages in thread From: Konstantin Kharlamov @ 2024-10-31 21:59 UTC (permalink / raw) To: Yosry Ahmed; +Cc: Nhat Pham, linux-mm, Johannes Weiner, Chengming Zhou On Wed, 2024-10-30 at 12:44 -0700, Yosry Ahmed wrote: > shmem increased by a lot (~1.8G). > > So this looks like it could be the answer to your question about > where > the swap usage is coming from. I would try to find what tmpfs files > are used by this application. Thank you! After doing more digging I reduced it to `Xorg` having a hunderds of `anon_inode:i915.gem`, and afterwards pinned down this to be Picom not freeing resources. Reported on Github¹. That said, isn't there a kernel bug too? If this `shmem` ends up in Swap, then it should be accounted in `Swap` fields of `proc/<pid>/smaps` accordingly, right? In the end, that's what the field is for: amount of SWAP taken by a process. Otherwise it is a "phantom memory": something being in SWAP, but who owns this "something" — there's no way to know, it just kind of "exists" amidst kernel and processes realms. 1: https://github.com/yshui/picom/issues/1378 ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: phantom memory in a cgroup (was [BUG] ZSwap leaks memory upon being disabled) 2024-10-31 21:59 ` Konstantin Kharlamov @ 2024-10-31 22:04 ` Yosry Ahmed 0 siblings, 0 replies; 19+ messages in thread From: Yosry Ahmed @ 2024-10-31 22:04 UTC (permalink / raw) To: Konstantin Kharlamov; +Cc: Nhat Pham, linux-mm, Johannes Weiner, Chengming Zhou On Thu, Oct 31, 2024 at 2:59 PM Konstantin Kharlamov <Hi-Angel@yandex.ru> wrote: > > On Wed, 2024-10-30 at 12:44 -0700, Yosry Ahmed wrote: > > shmem increased by a lot (~1.8G). > > > > So this looks like it could be the answer to your question about > > where > > the swap usage is coming from. I would try to find what tmpfs files > > are used by this application. > > Thank you! After doing more digging I reduced it to `Xorg` having a > hunderds of `anon_inode:i915.gem`, and afterwards pinned down this to > be Picom not freeing resources. Reported on Github¹. > > That said, isn't there a kernel bug too? If this `shmem` ends up in > Swap, then it should be accounted in `Swap` fields of > `proc/<pid>/smaps` accordingly, right? In the end, that's what the > field is for: amount of SWAP taken by a process. Otherwise it is a > "phantom memory": something being in SWAP, but who owns this > "something" — there's no way to know, it just kind of "exists" amidst > kernel and processes realms. I don't think so. shmem doesn't really belong to a single process. If you kill the process but leave the tmpfs files behind, the memory will not go away. > > 1: https://github.com/yshui/picom/issues/1378 ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] ZSwap leaks memory upon being disabled 2024-10-27 3:14 ` Nhat Pham 2024-10-27 6:46 ` Yosry Ahmed @ 2024-10-27 10:25 ` Konstantin Kharlamov 1 sibling, 0 replies; 19+ messages in thread From: Konstantin Kharlamov @ 2024-10-27 10:25 UTC (permalink / raw) To: Nhat Pham; +Cc: Yosry Ahmed, linux-mm, Johannes Weiner, Chengming Zhou On Sat, 2024-10-26 at 20:14 -0700, Nhat Pham wrote: > On Sat, Oct 26, 2024 at 5:29 PM Konstantin Kharlamov > <Hi-Angel@yandex.ru> wrote: > > > > That was a good idea! The > > `/sys/fs/cgroup/system.slice/memory.swap.current` seems to have the > > missing half of the SWAP memory. From my understanding of the > > `systemctl status` graph `sytem.slice` and `user.slice` groups do > > not > > intersect, and by adding up `system.slice/…` + `user.slice/…` I get > > around 8G. > > > > However, I'm still unclear what does this memory belong to. > > `system.slice/memory.swap.current` is 4.4G currently, that's a lot > > and > > I'm not seeing anything that could take so much memory. > > I assume you do not have any proactive memory reclaimer? :) No, just the kernel with `vm.swappiness = 100` and with ZSWAP (ZSWAP is on on Archlinux nowadays via CONFIG_ZSWAP_DEFAULT_ON). > I believe > the top utility can display swap usage by process. Have you tried > that? I just tried. Well, the data seems the same as what `smem` shows, except I can't add up the column numbers because top is interactive 😊 I noticed plasmashell was too bloated, so restarted it. Didn't solve the problem with some unknown memory taking gigabytes in SWAP though. > There are a couple of edge cases - for instance, if you disable zswap > writeback and zswap at the same time. We will allocate slots on > swapfile, and store it at the page table entry, but we cannot store > the page's content in zswap or the swapfile, so the page remains in > memory. You're occupying swap space, but are not really saving any > memory usage. I never disabled zswap writeback and as of writing the words zswap is on, so this certainly not it. > IIRC, there is also an edge case where a page is faulted back into > memory from swap, but the associated swap space cannot be immediately > released. This should be temporary though - memory reclaimer will > attempt to release these pages later on, or they can be released when > we scan the swapfile for slots during swap out. Replied in a separate email. ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2024-10-31 22:05 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-10-24 13:02 [BUG] ZSwap leaks memory upon being disabled Konstantin Kharlamov 2024-10-24 20:47 ` Yosry Ahmed 2024-10-25 6:41 ` Konstantin Kharlamov 2024-10-25 7:50 ` Yosry Ahmed 2024-10-26 11:33 ` Konstantin Kharlamov 2024-10-26 17:47 ` Yosry Ahmed 2024-10-27 0:29 ` Konstantin Kharlamov 2024-10-27 3:14 ` Nhat Pham 2024-10-27 6:46 ` Yosry Ahmed 2024-10-27 10:11 ` Konstantin Kharlamov 2024-10-27 10:32 ` Konstantin Kharlamov 2024-10-27 11:28 ` Konstantin Kharlamov 2024-10-27 19:31 ` Yosry Ahmed 2024-10-27 22:13 ` phantom memory in a cgroup (was [BUG] ZSwap leaks memory upon being disabled) Konstantin Kharlamov 2024-10-30 14:41 ` Konstantin Kharlamov 2024-10-30 19:44 ` Yosry Ahmed 2024-10-31 21:59 ` Konstantin Kharlamov 2024-10-31 22:04 ` Yosry Ahmed 2024-10-27 10:25 ` [BUG] ZSwap leaks memory upon being disabled Konstantin Kharlamov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox