Error in freeing memory with zone reclaimable always returning true.

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Error in freeing memory with zone reclaimable always returning true.
@ 2017-06-26  7:29 Ivid Suvarna
  2017-06-26  8:00 ` Michal Hocko
  0 siblings, 1 reply; 6+ messages in thread
From: Ivid Suvarna @ 2017-06-26  7:29 UTC (permalink / raw)
  To: linux-mm

[-- Attachment #1: Type: text/plain, Size: 2306 bytes --]

Hi,

I have below code which tries to free memory,
do
{
free=shrink_all_memory;
}while(free>0);

But kernel gets into infinite loop because shrink_all_memory always returns
1.
When I added some debug statements to `mm/vmscan.c` and found that it is
because zone_reclaimable() is always true in shrink_zones()

if (global_reclaim(sc) &&
            !reclaimable && zone_reclaimable(zone))
            reclaimable = true;

This issue gets solved by removing the above lines.
I am using linux-kernel 4.4 and imx board.

Similar Issue is seen here[1]. And it is solved through a patch removing
the offending lines. But it does not explain why the zone reclaimable goes
into infinite loop and what causes it? And I ran the C program from [1]
which is below. And instead of OOM it went on to infinite loop.

#include <stdlib.h>
#include <string.h>

int main(void)
{
for (;;) {
void *p = malloc(1024 * 1024);
memset(p, 0, 1024 * 1024);
}
}

Also can this issue be related to memcg as in here "
https://lwn.net/Articles/508923/" because I see the code flow in my case
enters:

if(nr_soft_reclaimed)
reclaimable=true;

I dont understand memcg correctly. But in my case CONFIG_MEMCG is not set.

After some more debugging, I found a userspace process in sleeping state
and has three threads. This process is in pause state through
system_pause() and is accessing shared memory(`/dev/shm`) which is created
with 100m size. This shared memory has some files.

Also this process has some anonymous private and shared mappings when I saw
the output of `pmap -d PID` and there is no swap space in the system.

I found that this hang situation was not present after I remove that
userspace process. But how can that be a solution since kernel should be
able to handle any exception.

"I found no issues at all if I removed this userspace process".

So my doubts are:

 1. How can this sleeping process in pause state cause issue in zone
reclaimable returning true always.

 2. How are the pages reclaimed from sleeping process which is using shared
memory in linux?

 3. I tried to unmount /dev/shm but was not possible since process was
using it. Can we release shared memory by any way? I tried `munmap` but no
use.

Any info would be helpful.

  [1]: https://groups.google.com/forum/#!topic/fa.linux.kernel/kWwlQzj8mhc

[-- Attachment #2: Type: text/html, Size: 2883 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Error in freeing memory with zone reclaimable always returning true.
  2017-06-26  7:29 Error in freeing memory with zone reclaimable always returning true Ivid Suvarna
@ 2017-06-26  8:00 ` Michal Hocko
  2017-06-26 13:04   ` Ivid Suvarna
  0 siblings, 1 reply; 6+ messages in thread
From: Michal Hocko @ 2017-06-26  8:00 UTC (permalink / raw)
  To: Ivid Suvarna; +Cc: linux-mm

On Mon 26-06-17 12:59:17, Ivid Suvarna wrote:
> Hi,
> 
> I have below code which tries to free memory,
> do
> {
> free=shrink_all_memory;
> }while(free>0);

What is the intention of such a code. It looks quite wrong to me, to be
honest.

> But kernel gets into infinite loop because shrink_all_memory always returns
> 1.
> When I added some debug statements to `mm/vmscan.c` and found that it is
> because zone_reclaimable() is always true in shrink_zones()
> 
> if (global_reclaim(sc) &&
>             !reclaimable && zone_reclaimable(zone))
>             reclaimable = true;
> 
> This issue gets solved by removing the above lines.
> I am using linux-kernel 4.4 and imx board.

The code has changed quite a bit since 4.4 but in princible
zone_reclaimable was a rather dubious heuristic to not fail reclaim too
early because that would trigger the OOM in the page allocator path
prematurely. This has changed in 4.7 by 0a0337e0d1d1 ("mm, oom: rework
oom detection"). zone_reclaimable later renamed to pgdat_reclaimable is
gone from the kernel in the latests mmotm kernel.

> Similar Issue is seen here[1]. And it is solved through a patch removing
> the offending lines. But it does not explain why the zone reclaimable goes
> into infinite loop and what causes it? And I ran the C program from [1]
> which is below. And instead of OOM it went on to infinite loop.

Yes the previous oom detection could lock up.

> 
> #include <stdlib.h>
> #include <string.h>
> 
> int main(void)
> {
> for (;;) {
> void *p = malloc(1024 * 1024);
> memset(p, 0, 1024 * 1024);
> }
> }
> 
> Also can this issue be related to memcg as in here "
> https://lwn.net/Articles/508923/" because I see the code flow in my case
> enters:
> 
> if(nr_soft_reclaimed)
> reclaimable=true;
> 
> I dont understand memcg correctly. But in my case CONFIG_MEMCG is not set.

then it never reaches that path.

> After some more debugging, I found a userspace process in sleeping state
> and has three threads. This process is in pause state through
> system_pause() and is accessing shared memory(`/dev/shm`) which is created
> with 100m size. This shared memory has some files.
> 
> Also this process has some anonymous private and shared mappings when I saw
> the output of `pmap -d PID` and there is no swap space in the system.
> 
> I found that this hang situation was not present after I remove that
> userspace process. But how can that be a solution since kernel should be
> able to handle any exception.
> 
> "I found no issues at all if I removed this userspace process".

I am not sure I understand what is the problem here but could you try
with the current upstream kernel?

> So my doubts are:
> 
>  1. How can this sleeping process in pause state cause issue in zone
> reclaimable returning true always.

It simply cannot. Sleeping process doesn't interact with the system.

>  2. How are the pages reclaimed from sleeping process which is using shared
> memory in linux?

There is a background reclaimer (kswapd for each NUMA node) and if that
cannot catch up with the pace of allocation then the allocation context
is pushed to reclaim memory (direct reclaim).

>  3. I tried to unmount /dev/shm but was not possible since process was
> using it. Can we release shared memory by any way? I tried `munmap` but no
> use.

remove files from /dev/shm?

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Error in freeing memory with zone reclaimable always returning true.
  2017-06-26  8:00 ` Michal Hocko
@ 2017-06-26 13:04   ` Ivid Suvarna
  2017-06-26 14:27     ` Michal Hocko
  0 siblings, 1 reply; 6+ messages in thread
From: Ivid Suvarna @ 2017-06-26 13:04 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm

On Mon, 2017-06-26 at 10:00 +0200, Michal Hocko wrote:
> On Mon 26-06-17 12:59:17, Ivid Suvarna wrote:
> > 
> > Hi,
> > 
> > I have below code which tries to free memory,
> > do
> > {
> > free=shrink_all_memory;
> > }while(free>0);
> What is the intention of such a code. It looks quite wrong to me, to
> be
> honest.
> 

My case is somewhat similar to hibernation where memory is freed for
hibernation image and I want to free as much memory as possible until
no pages can be reclaimed. i.e., until free returns 0.A 

> > 
> > But kernel gets into infinite loop because shrink_all_memory always
> > returns
> > 1.
> > When I added some debug statements to `mm/vmscan.c` and found that
> > it is
> > because zone_reclaimable() is always true in shrink_zones()
> > 
> > if (global_reclaim(sc) &&
> > A A A A A A A A A A A A !reclaimable && zone_reclaimable(zone))
> > A A A A A A A A A A A A reclaimable = true;
> > 
> > This issue gets solved by removing the above lines.
> > I am using linux-kernel 4.4 and imx board.
> The code has changed quite a bit since 4.4 but in princible
> zone_reclaimable was a rather dubious heuristic to not fail reclaim
> too
> early because that would trigger the OOM in the page allocator path
> prematurely. This has changed in 4.7 by 0a0337e0d1d1 ("mm, oom:
> rework
> oom detection"). zone_reclaimable later renamed to pgdat_reclaimable
> is
> gone from the kernel in the latests mmotm kernel.
> 

Suppose for testing purpose say I remove these lines only and not apply
the whole patch("mm, oom: rework oom detection") as a solution, then
what are the possible side effects? Are we like skipping something
(possible reclaimable pages) by doing this?
And will this effect any
other reclaim logics?

> > 
> > Similar Issue is seen here[1]. And it is solved through a patch
> > removing
> > the offending lines. But it does not explain why the zone
> > reclaimable goes
> > into infinite loop and what causes it? And I ran the C program from
> > [1]
> > which is below. And instead of OOM it went on to infinite loop.
> Yes the previous oom detection could lock up.
> 

Could you explain more on why zone reclaimable be returning true
always,
even if there are no pages in LRU list to reclaim?

> > 
> > 
> > #include <stdlib.h>
> > #include <string.h>
> > 
> > int main(void)
> > {
> > for (;;) {
> > void *p = malloc(1024 * 1024);
> > memset(p, 0, 1024 * 1024);
> > }
> > }
> > 
> > Also can this issue be related to memcg as in here "
> > https://lwn.net/Articles/508923/" because I see the code flow in my
> > case
> > enters:
> > 
> > if(nr_soft_reclaimed)
> > reclaimable=true;
> > 
> > I dont understand memcg correctly. But in my case CONFIG_MEMCG is
> > not set.
> then it never reaches that path.
> 

I did not understand. Are you saying that since MEMCG is disabled,
above if statement should
not be executed? If that is the case , then why I am entering the if
block?

> > 
> > After some more debugging, I found a userspace process in sleeping
> > state
> > and has three threads. This process is in pause state through
> > system_pause() and is accessing shared memory(`/dev/shm`) which is
> > created
> > with 100m size. This shared memory has some files.
> > 
> > Also this process has some anonymous private and shared mappings
> > when I saw
> > the output of `pmap -d PID` and there is no swap space in the
> > system.
> > 
> > I found that this hang situation was not present after I remove
> > that
> > userspace process. But how can that be a solution since kernel
> > should be
> > able to handle any exception.
> > 
> > "I found no issues at all if I removed this userspace process".
> I am not sure I understand what is the problem here but could you try
> with the current upstream kernel?
> 

The issue is fixed in upstream kernel with or without
userspaceA A process.
My whole point of this thread is to determine whether the userspace
process is creating this issueor not, since there is no issue found
without my userspace process.
I have a doubt whether private or shared mappings of this userspace
process is creating problem.

> > 
> > So my doubts are:
> > 
> > A 1. How can this sleeping process in pause state cause issue in
> > zone
> > reclaimable returning true always.
> It simply cannot. Sleeping process doesn't interact with the system.
> 
> > 
> > A 2. How are the pages reclaimed from sleeping process which is
> > using shared
> > memory in linux?
> There is a background reclaimer (kswapd for each NUMA node) and if
> that
> cannot catch up with the pace of allocation then the allocation
> context
> is pushed to reclaim memory (direct reclaim).
> 

Thanks for clearing my doubts.

> > 
> > A 3. I tried to unmount /dev/shm but was not possible since process
> > was
> > using it. Can we release shared memory by any way? I tried `munmap`
> > but no
> > use.
> remove files from /dev/shm?
> 

Since there are some files in shared memory created by process,
I just tried to remove them and test if the issue still exists. Sadly
it exists.A 

Cheers,
Ivid

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Error in freeing memory with zone reclaimable always returning true.
  2017-06-26 13:04   ` Ivid Suvarna
@ 2017-06-26 14:27     ` Michal Hocko
  2017-06-27  4:38       ` Ivid Suvarna
  0 siblings, 1 reply; 6+ messages in thread
From: Michal Hocko @ 2017-06-26 14:27 UTC (permalink / raw)
  To: Ivid Suvarna; +Cc: linux-mm

On Mon 26-06-17 06:04:08, Ivid Suvarna wrote:
> On Mon, 2017-06-26 at 10:00 +0200, Michal Hocko wrote:
> > On Mon 26-06-17 12:59:17, Ivid Suvarna wrote:
> > > 
> > > Hi,
> > > 
> > > I have below code which tries to free memory,
> > > do
> > > {
> > > free=shrink_all_memory;
> > > }while(free>0);
> > What is the intention of such a code. It looks quite wrong to me, to
> > be
> > honest.
> > 
> 
> My case is somewhat similar to hibernation where memory is freed for
> hibernation image and I want to free as much memory as possible until
> no pages can be reclaimed. i.e., until free returns 0. 

I would just discourage you from doing something like that. Why would
you want to swap out the working set for example? Isn't something like
dropping the clean page cache sufficient?

> > > But kernel gets into infinite loop because shrink_all_memory always
> > > returns
> > > 1.
> > > When I added some debug statements to `mm/vmscan.c` and found that
> > > it is
> > > because zone_reclaimable() is always true in shrink_zones()
> > > 
> > > if (global_reclaim(sc) &&
> > >             !reclaimable && zone_reclaimable(zone))
> > >             reclaimable = true;
> > > 
> > > This issue gets solved by removing the above lines.
> > > I am using linux-kernel 4.4 and imx board.
> > The code has changed quite a bit since 4.4 but in princible
> > zone_reclaimable was a rather dubious heuristic to not fail reclaim
> > too
> > early because that would trigger the OOM in the page allocator path
> > prematurely. This has changed in 4.7 by 0a0337e0d1d1 ("mm, oom:
> > rework
> > oom detection"). zone_reclaimable later renamed to pgdat_reclaimable
> > is
> > gone from the kernel in the latests mmotm kernel.
> > 
> 
> Suppose for testing purpose say I remove these lines only and not apply
> the whole patch("mm, oom: rework oom detection") as a solution, then
> what are the possible side effects? Are we like skipping something
> (possible reclaimable pages) by doing this?
> And will this effect any other reclaim logics?

as I've said oom detection at that time relied on this check. So you
could trigger oom prematurelly.

> > > Similar Issue is seen here[1]. And it is solved through a patch
> > > removing
> > > the offending lines. But it does not explain why the zone
> > > reclaimable goes
> > > into infinite loop and what causes it? And I ran the C program from
> > > [1]
> > > which is below. And instead of OOM it went on to infinite loop.
> > Yes the previous oom detection could lock up.
> > 
> 
> Could you explain more on why zone reclaimable be returning true
> always,
> even if there are no pages in LRU list to reclaim?

It will not but the mere fact that basically any freed page would reset
the NR_PAGES_SCANNED counter then chances are that this would keep you
livelocked.

> > > #include <stdlib.h>
> > > #include <string.h>
> > > 
> > > int main(void)
> > > {
> > > for (;;) {
> > > void *p = malloc(1024 * 1024);
> > > memset(p, 0, 1024 * 1024);
> > > }
> > > }
> > > 
> > > Also can this issue be related to memcg as in here "
> > > https://lwn.net/Articles/508923/" because I see the code flow in my
> > > case
> > > enters:
> > > 
> > > if(nr_soft_reclaimed)
> > > reclaimable=true;
> > > 
> > > I dont understand memcg correctly. But in my case CONFIG_MEMCG is
> > > not set.
> > then it never reaches that path.
> > 
> 
> I did not understand. Are you saying that since MEMCG is disabled,
> above if statement should
> not be executed? If that is the case , then why I am entering the if
> block?

If the memcg is disabled then nr_soft_reclaimed will never b true.

[...]
> > >  3. I tried to unmount /dev/shm but was not possible since process
> > > was
> > > using it. Can we release shared memory by any way? I tried `munmap`
> > > but no
> > > use.
> > remove files from /dev/shm?
> > 
> 
> Since there are some files in shared memory created by process,
> I just tried to remove them and test if the issue still exists. Sadly
> it exists. 

Files will exist as long as th process keeps them open. But I still do
not understand what you are after...

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Error in freeing memory with zone reclaimable always returning true.
  2017-06-26 14:27     ` Michal Hocko
@ 2017-06-27  4:38       ` Ivid Suvarna
  2017-06-27  5:14         ` Michal Hocko
  0 siblings, 1 reply; 6+ messages in thread
From: Ivid Suvarna @ 2017-06-27  4:38 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm

On Mon, 2017-06-26 at 16:27 +0200, Michal Hocko wrote:
> On Mon 26-06-17 06:04:08, Ivid Suvarna wrote:
> > 
> > On Mon, 2017-06-26 at 10:00 +0200, Michal Hocko wrote:
> > > 
> > > On Mon 26-06-17 12:59:17, Ivid Suvarna wrote:
> > > > 
> > > > 
> > > > Hi,
> > > > 
> > > > I have below code which tries to free memory,
> > > > do
> > > > {
> > > > free=shrink_all_memory;
> > > > }while(free>0);
> > > What is the intention of such a code. It looks quite wrong to me,
> > > to
> > > be
> > > honest.
> > > 
> > My case is somewhat similar to hibernation where memory is freed
> > for
> > hibernation image and I want to free as much memory as possible
> > until
> > no pages can be reclaimed. i.e., until free returns 0.A 
> I would just discourage you from doing something like that. Why would
> you want to swap out the working set for example? Isn't something
> like
> dropping the clean page cache sufficient?
> 
> > 
> > > 
> > > > 
> > > > But kernel gets into infinite loop because shrink_all_memory
> > > > always
> > > > returns
> > > > 1.
> > > > When I added some debug statements to `mm/vmscan.c` and found
> > > > that
> > > > it is
> > > > because zone_reclaimable() is always true in shrink_zones()
> > > > 
> > > > if (global_reclaim(sc) &&
> > > > A A A A A A A A A A A A !reclaimable && zone_reclaimable(zone))
> > > > A A A A A A A A A A A A reclaimable = true;
> > > > 
> > > > This issue gets solved by removing the above lines.
> > > > I am using linux-kernel 4.4 and imx board.
> > > The code has changed quite a bit since 4.4 but in princible
> > > zone_reclaimable was a rather dubious heuristic to not fail
> > > reclaim
> > > too
> > > early because that would trigger the OOM in the page allocator
> > > path
> > > prematurely. This has changed in 4.7 by 0a0337e0d1d1 ("mm, oom:
> > > rework
> > > oom detection"). zone_reclaimable later renamed to
> > > pgdat_reclaimable
> > > is
> > > gone from the kernel in the latests mmotm kernel.
> > > 
> > Suppose for testing purpose say I remove these lines only and not
> > apply
> > the whole patch("mm, oom: rework oom detection") as a solution,
> > then
> > what are the possible side effects? Are we like skipping something
> > (possible reclaimable pages) by doing this?
> > And will this effect any other reclaim logics?
> as I've said oom detection at that time relied on this check. So you
> could trigger oom prematurelly.
> 
> > 
> > > 
> > > > 
> > > > Similar Issue is seen here[1]. And it is solved through a patch
> > > > removing
> > > > the offending lines. But it does not explain why the zone
> > > > reclaimable goes
> > > > into infinite loop and what causes it? And I ran the C program
> > > > from
> > > > [1]
> > > > which is below. And instead of OOM it went on to infinite loop.
> > > Yes the previous oom detection could lock up.
> > > 
> > Could you explain more on why zone reclaimable be returning true
> > always,
> > even if there are no pages in LRU list to reclaim?
> It will not but the mere fact that basically any freed page would
> reset
> the NR_PAGES_SCANNED counter then chances are that this would keep
> you
> livelocked.
> 
> > 
> > > 
> > > > 
> > > > #include <stdlib.h>
> > > > #include <string.h>
> > > > 
> > > > int main(void)
> > > > {
> > > > for (;;) {
> > > > void *p = malloc(1024 * 1024);
> > > > memset(p, 0, 1024 * 1024);
> > > > }
> > > > }
> > > > 
> > > > Also can this issue be related to memcg as in here "
> > > > https://lwn.net/Articles/508923/" because I see the code flow
> > > > in my
> > > > case
> > > > enters:
> > > > 
> > > > if(nr_soft_reclaimed)
> > > > reclaimable=true;
> > > > 
> > > > I dont understand memcg correctly. But in my case CONFIG_MEMCG
> > > > is
> > > > not set.
> > > then it never reaches that path.
> > > 
> > I did not understand. Are you saying that since MEMCG is disabled,
> > above if statement should
> > not be executed? If that is the case , then why I am entering the
> > if
> > block?
> If the memcg is disabled then nr_soft_reclaimed will never b true.
> 
> [...]
> > 
> > > 
> > > > 
> > > > A 3. I tried to unmount /dev/shm but was not possible since
> > > > process
> > > > was
> > > > using it. Can we release shared memory by any way? I tried
> > > > `munmap`
> > > > but no
> > > > use.
> > > remove files from /dev/shm?
> > > 
> > Since there are some files in shared memory created by process,
> > I just tried to remove them and test if the issue still exists.
> > Sadly
> > it exists.A 
> Files will exist as long as th process keeps them open. But I still
> do
> not understand what you are after...
> 

Thanks Michal for the clarifications. One last thing, in suspend to ram
or suspend to disk we freeze userspace processes. Is there any way to
print the userspace processes that were freezed during
suspend?i.e.,either process name or PID.

Cheers,
Ivid

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Error in freeing memory with zone reclaimable always returning true.
  2017-06-27  4:38       ` Ivid Suvarna
@ 2017-06-27  5:14         ` Michal Hocko
  0 siblings, 0 replies; 6+ messages in thread
From: Michal Hocko @ 2017-06-27  5:14 UTC (permalink / raw)
  To: Ivid Suvarna; +Cc: linux-mm

On Mon 26-06-17 21:38:07, Ivid Suvarna wrote:
[...]
> Thanks Michal for the clarifications. One last thing, in suspend to ram
> or suspend to disk we freeze userspace processes. Is there any way to
> print the userspace processes that were freezed during
> suspend?i.e.,either process name or PID.

Well, try_to_freeze_tasks iterates over all tasks and checks whether
they are frozen (see freeze_task), so you can mimic that logic, although
you might need freezer_lock which is internal to the freezer. Also there
might tasks which have been frozen because of the freezer cgroup and it
is not clear whether you want to consider those as well.

Anyway I would recommend you to start a new email thread and involve
freezer maintainers to get a better info.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-06-27  5:14 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-26  7:29 Error in freeing memory with zone reclaimable always returning true Ivid Suvarna
2017-06-26  8:00 ` Michal Hocko
2017-06-26 13:04   ` Ivid Suvarna
2017-06-26 14:27     ` Michal Hocko
2017-06-27  4:38       ` Ivid Suvarna
2017-06-27  5:14         ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox