Timed out waiting for signal from JVM

Hello

today we discovered teh following error in the Djigzo logfile:

JVM appears hung: Timed out waiting for signal from JVM.
JVM did not exit on request, terminated
JVM received a signal SIGKILL (9).
Launching a JVM...
java version "1.6.0_24"
OpenJDK Runtime Environment (IcedTea6 1.11.5) (6b24-1.11.5-0ubuntu1~10.04.2)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)

WrapperManager: Initializing...

Phoenix 4.2

configure DjigzoServiceImpl.
Configuring Djigzo Service.

**** Begin system properties ****
-- listing properties --
java.runtime.name=OpenJDK Runtime Environment
sun.boot.library.path=/usr/lib/jvm/java-6-openjdk/jre/lib/a...
java.vm.version=20.0-b12
java.vm.vendor=Sun Microsystems Inc.
java.vendor.url=http://java.sun.com/
path.separator=:
java.vm.name=OpenJDK 64-Bit Server VM
file.encoding.pkg=sun.io
user.country=US
sun.java.launcher=SUN_STANDARD
sun.os.patch.level=unknown
wrapper.key=4V2wL5mowdmofv_Z
java.vm.specification.name=Java Virtual Machine Specification
user.dir=/usr/share/djigzo
wrapper.version=3.4.1
java.runtime.version=1.6.0_24-b24
java.awt.graphicsenv=sun.awt.X11GraphicsEnvironment
phoenix.home=/usr/share/djigzo/james-2.3.1
djigzo.charset-alias-provider.properties=conf/charset-aliases.properties
java.endorsed.dirs=/usr/lib/jvm/java-6-openjdk/jre/lib/e...
os.arch=amd64
java.io.tmpdir=/tmp
line.separator=

java.vm.specification.vendor=Sun Microsystems Inc.
wrapper.cpu.timeout=10
os.name=Linux
sun.jnu.encoding=UTF-8
java.library.path=./wrapper/
wrapper.jvm.port.max=31999
java.specification.name=Java Platform API Specification
java.class.version=50.0
java.naming.provider.url=djigzo
sun.management.compiler=HotSpot 64-Bit Tiered Compilers
wrapper.pid=1939
os.version=2.6.32-45-server
networkaddress.cache.ttl=300
wrapper.java.pid=6134
user.home=/usr/share/djigzo
user.timezone=Europe/Berlin
wrapper.jvmid=2
java.awt.printerjob=sun.print.PSPrinterJob
java.specification.version=1.6
file.encoding=UTF-8
user.name=djigzo
java.class.path=james-2.3.1/bin/phoenix-loader.jar:wr...
mail.mime.decodetext.strict=false
java.naming.factory.initial=tyrex.naming.MemoryContextFactory
java.vm.specification.version=1.0
wrapper.port=15012
sun.arch.data.model=64
java.home=/usr/lib/jvm/java-6-openjdk/jre
sun.java.command=org.apache.avalon.phoenix.launcher.Da...
djigzo.smime.outlook2010skiworkaround.enable=false
java.specification.vendor=Sun Microsystems Inc.
user.language=en
java.vm.info=mixed mode
java.version=1.6.0_24
java.ext.dirs=james-2.3.1/lib:./james-2.3.1/tools/lib
sun.boot.class.path=/usr/lib/jvm/java-6-openjdk/jre/lib/r...
java.vendor=Sun Microsystems Inc.
file.separator=/
java.vendor.url.bug=http://java.sun.com/cgi-bin/bugreport
sun.cpu.endian=little
sun.io.unicode.encoding=UnicodeLittle
mail.mime.parameters.strict=false
wrapper.native_library=wrapper
wrapper.jvm.port.min=31000
sun.cpu.isalist=
**** End system properties ****

spring config file: conf/spring/djigzo.xml
additional spring config dir: conf/spring/spring.d/
log4j config file: conf/log4j.properties

Obviously Djigzo decided to restart but the strange thing is that it
was working normal at this point. I have checked the logfiles and it
looks like happend some times already but with increasing frequency.

Any idea what could be wrong?

Regards

Andreas

The djigzo backend is started by the Java wrapper process. The Java
wrapper process monitors the JVM to see whether the JVM is still running
and if not it will restart it. Somehow the JVM was hanging or the Java
wrapper incorrectly assumed it was hanging and restarted the backend.
I have seen this happen when the gateway runs under VMWare with
overcommitted memory and not setting all memory of the virtual machine
as reserved. If memory is overcommitted and the memory of the virtual
machine is not fully reserved the virtual machine swaps the memory to
disk which kill the Java performance.

Are you running the gateway virtualized or on real hardware?

Kind regards,

Martijn

···

On 12/05/2012 10:11 AM, lst_hoe02(a)kwsoft.de wrote:

today we discovered teh following error in the Djigzo logfile:

JVM appears hung: Timed out waiting for signal from JVM.
JVM did not exit on request, terminated
JVM received a signal SIGKILL (9).
Launching a JVM...
java version "1.6.0_24"
OpenJDK Runtime Environment (IcedTea6 1.11.5)
(6b24-1.11.5-0ubuntu1~10.04.2)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)

WrapperManager: Initializing...

Phoenix 4.2

configure DjigzoServiceImpl.
Configuring Djigzo Service.

**** Begin system properties ****
-- listing properties --
java.runtime.name=OpenJDK Runtime Environment
sun.boot.library.path=/usr/lib/jvm/java-6-openjdk/jre/lib/a...
java.vm.version=20.0-b12
java.vm.vendor=Sun Microsystems Inc.
java.vendor.url=http://java.sun.com/
path.separator=:
java.vm.name=OpenJDK 64-Bit Server VM
file.encoding.pkg=sun.io
user.country=US
sun.java.launcher=SUN_STANDARD
sun.os.patch.level=unknown
wrapper.key=4V2wL5mowdmofv_Z
java.vm.specification.name=Java Virtual Machine Specification
user.dir=/usr/share/djigzo
wrapper.version=3.4.1
java.runtime.version=1.6.0_24-b24
java.awt.graphicsenv=sun.awt.X11GraphicsEnvironment
phoenix.home=/usr/share/djigzo/james-2.3.1
djigzo.charset-alias-provider.properties=conf/charset-aliases.properties
java.endorsed.dirs=/usr/lib/jvm/java-6-openjdk/jre/lib/e...
os.arch=amd64
java.io.tmpdir=/tmp
line.separator=

java.vm.specification.vendor=Sun Microsystems Inc.
wrapper.cpu.timeout=10
os.name=Linux
sun.jnu.encoding=UTF-8
java.library.path=./wrapper/
wrapper.jvm.port.max=31999
java.specification.name=Java Platform API Specification
java.class.version=50.0
java.naming.provider.url=djigzo
sun.management.compiler=HotSpot 64-Bit Tiered Compilers
wrapper.pid=1939
os.version=2.6.32-45-server
networkaddress.cache.ttl=300
wrapper.java.pid=6134
user.home=/usr/share/djigzo
user.timezone=Europe/Berlin
wrapper.jvmid=2
java.awt.printerjob=sun.print.PSPrinterJob
java.specification.version=1.6
file.encoding=UTF-8
user.name=djigzo
java.class.path=james-2.3.1/bin/phoenix-loader.jar:wr...
mail.mime.decodetext.strict=false
java.naming.factory.initial=tyrex.naming.MemoryContextFactory
java.vm.specification.version=1.0
wrapper.port=15012
sun.arch.data.model=64
java.home=/usr/lib/jvm/java-6-openjdk/jre
sun.java.command=org.apache.avalon.phoenix.launcher.Da...
djigzo.smime.outlook2010skiworkaround.enable=false
java.specification.vendor=Sun Microsystems Inc.
user.language=en
java.vm.info=mixed mode
java.version=1.6.0_24
java.ext.dirs=james-2.3.1/lib:./james-2.3.1/tools/lib
sun.boot.class.path=/usr/lib/jvm/java-6-openjdk/jre/lib/r...
java.vendor=Sun Microsystems Inc.
file.separator=/
java.vendor.url.bug=Oracle Java Technologies | Oracle
sun.cpu.endian=little
sun.io.unicode.encoding=UnicodeLittle
mail.mime.parameters.strict=false
wrapper.native_library=wrapper
wrapper.jvm.port.min=31000
sun.cpu.isalist=
**** End system properties ****

spring config file: conf/spring/djigzo.xml
additional spring config dir: conf/spring/spring.d/
log4j config file: conf/log4j.properties

Obviously Djigzo decided to restart but the strange thing is that it was
working normal at this point. I have checked the logfiles and it looks
like happend some times already but with increasing frequency.

Any idea what could be wrong?

--
DJIGZO email encryption

Zitat von Martijn Brinkers <martijn(a)djigzo.com>:

···

On 12/05/2012 10:11 AM, lst_hoe02(a)kwsoft.de wrote:

today we discovered teh following error in the Djigzo logfile:

JVM appears hung: Timed out waiting for signal from JVM.
JVM did not exit on request, terminated
JVM received a signal SIGKILL (9).
Launching a JVM...
java version "1.6.0_24"
OpenJDK Runtime Environment (IcedTea6 1.11.5)
(6b24-1.11.5-0ubuntu1~10.04.2)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)

WrapperManager: Initializing...

Phoenix 4.2

configure DjigzoServiceImpl.
Configuring Djigzo Service.

**** Begin system properties ****
-- listing properties --
java.runtime.name=OpenJDK Runtime Environment
sun.boot.library.path=/usr/lib/jvm/java-6-openjdk/jre/lib/a...
java.vm.version=20.0-b12
java.vm.vendor=Sun Microsystems Inc.
java.vendor.url=http://java.sun.com/
path.separator=:
java.vm.name=OpenJDK 64-Bit Server VM
file.encoding.pkg=sun.io
user.country=US
sun.java.launcher=SUN_STANDARD
sun.os.patch.level=unknown
wrapper.key=4V2wL5mowdmofv_Z
java.vm.specification.name=Java Virtual Machine Specification
user.dir=/usr/share/djigzo
wrapper.version=3.4.1
java.runtime.version=1.6.0_24-b24
java.awt.graphicsenv=sun.awt.X11GraphicsEnvironment
phoenix.home=/usr/share/djigzo/james-2.3.1
djigzo.charset-alias-provider.properties=conf/charset-aliases.properties
java.endorsed.dirs=/usr/lib/jvm/java-6-openjdk/jre/lib/e...
os.arch=amd64
java.io.tmpdir=/tmp
line.separator=

java.vm.specification.vendor=Sun Microsystems Inc.
wrapper.cpu.timeout=10
os.name=Linux
sun.jnu.encoding=UTF-8
java.library.path=./wrapper/
wrapper.jvm.port.max=31999
java.specification.name=Java Platform API Specification
java.class.version=50.0
java.naming.provider.url=djigzo
sun.management.compiler=HotSpot 64-Bit Tiered Compilers
wrapper.pid=1939
os.version=2.6.32-45-server
networkaddress.cache.ttl=300
wrapper.java.pid=6134
user.home=/usr/share/djigzo
user.timezone=Europe/Berlin
wrapper.jvmid=2
java.awt.printerjob=sun.print.PSPrinterJob
java.specification.version=1.6
file.encoding=UTF-8
user.name=djigzo
java.class.path=james-2.3.1/bin/phoenix-loader.jar:wr...
mail.mime.decodetext.strict=false
java.naming.factory.initial=tyrex.naming.MemoryContextFactory
java.vm.specification.version=1.0
wrapper.port=15012
sun.arch.data.model=64
java.home=/usr/lib/jvm/java-6-openjdk/jre
sun.java.command=org.apache.avalon.phoenix.launcher.Da...
djigzo.smime.outlook2010skiworkaround.enable=false
java.specification.vendor=Sun Microsystems Inc.
user.language=en
java.vm.info=mixed mode
java.version=1.6.0_24
java.ext.dirs=james-2.3.1/lib:./james-2.3.1/tools/lib
sun.boot.class.path=/usr/lib/jvm/java-6-openjdk/jre/lib/r...
java.vendor=Sun Microsystems Inc.
file.separator=/
java.vendor.url.bug=Oracle Java Technologies | Oracle
sun.cpu.endian=little
sun.io.unicode.encoding=UnicodeLittle
mail.mime.parameters.strict=false
wrapper.native_library=wrapper
wrapper.jvm.port.min=31000
sun.cpu.isalist=
**** End system properties ****

spring config file: conf/spring/djigzo.xml
additional spring config dir: conf/spring/spring.d/
log4j config file: conf/log4j.properties

Obviously Djigzo decided to restart but the strange thing is that it was
working normal at this point. I have checked the logfiles and it looks
like happend some times already but with increasing frequency.

Any idea what could be wrong?

The djigzo backend is started by the Java wrapper process. The Java
wrapper process monitors the JVM to see whether the JVM is still running
and if not it will restart it. Somehow the JVM was hanging or the Java
wrapper incorrectly assumed it was hanging and restarted the backend.
I have seen this happen when the gateway runs under VMWare with
overcommitted memory and not setting all memory of the virtual machine
as reserved. If memory is overcommitted and the memory of the virtual
machine is not fully reserved the virtual machine swaps the memory to
disk which kill the Java performance.

Are you running the gateway virtualized or on real hardware?

Kind regards,

Martijn

This is real Hardware and no memory shortage either. Djigzo was
working as expected at the time it was killed. Today where it was
noticed i was even working in the admin console, so it should be no
stall of Djigzo but more of a false restart trigger.

Regards

Andreas

Strange. I have never seen this problem on my own servers (well at least
not that I know, since it automatically restarts :slight_smile:

I have checked the Java Wrapper releases notes at:

http://wrapper.tanukisoftware.com/doc/english/release-notes.html

Although 3.4.1 is already old, I did not immediately see a bug fix which
might explain what happens. This looks the closest:

Version 3.5.9:
Fix a problem where the Wrapper was not correctly yielding control back
to its main loop when very large amounts of continuous output was being
logged from the JVM. Introduced in 3.4.0. In versions prior to 3.5.8,
this could have caused the JVM to timeout and restart itself. That
particular issue was resolved but the Wrapper process in 3.5.8 would
still have been unresponsive when this was happening. The Wrapper will
now always yeild back to its main loop after 250 milliseconds of
continuous logging.

You might try to use a newer Java wrapper release to see whether this
fixes the problem. The Java wrapper files are stored in:

/usr/share/djigzo/wrapper

The wrapper tar is extracted to a sub dir and then the following
symlinks link to the required files inside the version specific sub dir:

libwrapper.so --links-to--> wrapper-linux-xx/lib/libwrapper.so
wrapper-djigzo --links-to--> wrapper-linux-xx/bin/wrapper
wrapper.jar --links-to--> wrapper-linux-xx/lib/wrapper.jar

If you want to test with a newer release of Java wrapper, untar the file
and overwrite the symlinks and then restart djigzo.

Kind regards,

Martijn

···

On 12/05/2012 04:07 PM, lst_hoe02(a)kwsoft.de wrote:

Are you running the gateway virtualized or on real hardware?

This is real Hardware and no memory shortage either. Djigzo was working
as expected at the time it was killed. Today where it was noticed i was
even working in the admin console, so it should be no stall of Djigzo
but more of a false restart trigger.

--
DJIGZO email encryption

Zitat von Martijn Brinkers <martijn(a)djigzo.com>:

···

On 12/05/2012 04:07 PM, lst_hoe02(a)kwsoft.de wrote:

Are you running the gateway virtualized or on real hardware?

This is real Hardware and no memory shortage either. Djigzo was working
as expected at the time it was killed. Today where it was noticed i was
even working in the admin console, so it should be no stall of Djigzo
but more of a false restart trigger.

Strange. I have never seen this problem on my own servers (well at least
not that I know, since it automatically restarts :slight_smile:

I have checked the Java Wrapper releases notes at:

Release Notes - Java Service Wrapper

Although 3.4.1 is already old, I did not immediately see a bug fix which
might explain what happens. This looks the closest:

Version 3.5.9:
Fix a problem where the Wrapper was not correctly yielding control back
to its main loop when very large amounts of continuous output was being
logged from the JVM. Introduced in 3.4.0. In versions prior to 3.5.8,
this could have caused the JVM to timeout and restart itself. That
particular issue was resolved but the Wrapper process in 3.5.8 would
still have been unresponsive when this was happening. The Wrapper will
now always yeild back to its main loop after 250 milliseconds of
continuous logging.

You might try to use a newer Java wrapper release to see whether this
fixes the problem. The Java wrapper files are stored in:

/usr/share/djigzo/wrapper

The wrapper tar is extracted to a sub dir and then the following
symlinks link to the required files inside the version specific sub dir:

libwrapper.so --links-to--> wrapper-linux-xx/lib/libwrapper.so
wrapper-djigzo --links-to--> wrapper-linux-xx/bin/wrapper
wrapper.jar --links-to--> wrapper-linux-xx/lib/wrapper.jar

If you want to test with a newer release of Java wrapper, untar the file
and overwrite the symlinks and then restart djigzo.

What kind of wrapper do i need? The Professional, Standard or
Community edition?

Thanks

Andreas

The Community edition.

Kind regards,

Martijn

···

On 12/05/2012 04:35 PM, lst_hoe02(a)kwsoft.de wrote:

Zitat von Martijn Brinkers <martijn(a)djigzo.com>:

On 12/05/2012 04:07 PM, lst_hoe02(a)kwsoft.de wrote:

Are you running the gateway virtualized or on real hardware?

This is real Hardware and no memory shortage either. Djigzo was working
as expected at the time it was killed. Today where it was noticed i was
even working in the admin console, so it should be no stall of Djigzo
but more of a false restart trigger.

Strange. I have never seen this problem on my own servers (well at least
not that I know, since it automatically restarts :slight_smile:

I have checked the Java Wrapper releases notes at:

Release Notes - Java Service Wrapper

Although 3.4.1 is already old, I did not immediately see a bug fix which
might explain what happens. This looks the closest:

Version 3.5.9:
Fix a problem where the Wrapper was not correctly yielding control back
to its main loop when very large amounts of continuous output was being
logged from the JVM. Introduced in 3.4.0. In versions prior to 3.5.8,
this could have caused the JVM to timeout and restart itself. That
particular issue was resolved but the Wrapper process in 3.5.8 would
still have been unresponsive when this was happening. The Wrapper will
now always yeild back to its main loop after 250 milliseconds of
continuous logging.

You might try to use a newer Java wrapper release to see whether this
fixes the problem. The Java wrapper files are stored in:

/usr/share/djigzo/wrapper

The wrapper tar is extracted to a sub dir and then the following
symlinks link to the required files inside the version specific sub dir:

libwrapper.so --links-to--> wrapper-linux-xx/lib/libwrapper.so
wrapper-djigzo --links-to--> wrapper-linux-xx/bin/wrapper
wrapper.jar --links-to--> wrapper-linux-xx/lib/wrapper.jar

If you want to test with a newer release of Java wrapper, untar the file
and overwrite the symlinks and then restart djigzo.

What kind of wrapper do i need? The Professional, Standard or Community
edition?

--
DJIGZO email encryption

Zitat von Martijn Brinkers <martijn(a)djigzo.com>:

···

On 12/05/2012 04:35 PM, lst_hoe02(a)kwsoft.de wrote:

Zitat von Martijn Brinkers <martijn(a)djigzo.com>:

On 12/05/2012 04:07 PM, lst_hoe02(a)kwsoft.de wrote:

Are you running the gateway virtualized or on real hardware?

This is real Hardware and no memory shortage either. Djigzo was working
as expected at the time it was killed. Today where it was noticed i was
even working in the admin console, so it should be no stall of Djigzo
but more of a false restart trigger.

Strange. I have never seen this problem on my own servers (well at least
not that I know, since it automatically restarts :slight_smile:

I have checked the Java Wrapper releases notes at:

Release Notes - Java Service Wrapper

Although 3.4.1 is already old, I did not immediately see a bug fix which
might explain what happens. This looks the closest:

Version 3.5.9:
Fix a problem where the Wrapper was not correctly yielding control back
to its main loop when very large amounts of continuous output was being
logged from the JVM. Introduced in 3.4.0. In versions prior to 3.5.8,
this could have caused the JVM to timeout and restart itself. That
particular issue was resolved but the Wrapper process in 3.5.8 would
still have been unresponsive when this was happening. The Wrapper will
now always yeild back to its main loop after 250 milliseconds of
continuous logging.

You might try to use a newer Java wrapper release to see whether this
fixes the problem. The Java wrapper files are stored in:

/usr/share/djigzo/wrapper

The wrapper tar is extracted to a sub dir and then the following
symlinks link to the required files inside the version specific sub dir:

libwrapper.so --links-to--> wrapper-linux-xx/lib/libwrapper.so
wrapper-djigzo --links-to--> wrapper-linux-xx/bin/wrapper
wrapper.jar --links-to--> wrapper-linux-xx/lib/wrapper.jar

If you want to test with a newer release of Java wrapper, untar the file
and overwrite the symlinks and then restart djigzo.

What kind of wrapper do i need? The Professional, Standard or Community
edition?

The Community edition.

Kind regards,

Martijn

Thanks, i will try and let you know.

Regards

Andreas

Zitat von Martijn Brinkers <martijn(a)djigzo.com>:

Zitat von Martijn Brinkers <martijn(a)djigzo.com>:

Are you running the gateway virtualized or on real hardware?

This is real Hardware and no memory shortage either. Djigzo was working
as expected at the time it was killed. Today where it was noticed i was
even working in the admin console, so it should be no stall of Djigzo
but more of a false restart trigger.

Strange. I have never seen this problem on my own servers (well at least
not that I know, since it automatically restarts :slight_smile:

I have checked the Java Wrapper releases notes at:

Release Notes - Java Service Wrapper

Although 3.4.1 is already old, I did not immediately see a bug fix which
might explain what happens. This looks the closest:

Version 3.5.9:
Fix a problem where the Wrapper was not correctly yielding control back
to its main loop when very large amounts of continuous output was being
logged from the JVM. Introduced in 3.4.0. In versions prior to 3.5.8,
this could have caused the JVM to timeout and restart itself. That
particular issue was resolved but the Wrapper process in 3.5.8 would
still have been unresponsive when this was happening. The Wrapper will
now always yeild back to its main loop after 250 milliseconds of
continuous logging.

You might try to use a newer Java wrapper release to see whether this
fixes the problem. The Java wrapper files are stored in:

/usr/share/djigzo/wrapper

The wrapper tar is extracted to a sub dir and then the following
symlinks link to the required files inside the version specific sub dir:

libwrapper.so --links-to--> wrapper-linux-xx/lib/libwrapper.so
wrapper-djigzo --links-to--> wrapper-linux-xx/bin/wrapper
wrapper.jar --links-to--> wrapper-linux-xx/lib/wrapper.jar

If you want to test with a newer release of Java wrapper, untar the file
and overwrite the symlinks and then restart djigzo.

What kind of wrapper do i need? The Professional, Standard or Community
edition?

The Community edition.

Kind regards,

Martijn

Hm, not the root case. Restarted again with wrapper 3.5.15. Below is
the content of /proc/meminfo

MemTotal: 6062812 kB
MemFree: 45060 kB
Buffers: 98080 kB
Cached: 2820224 kB
SwapCached: 448 kB
Active: 3190632 kB
Inactive: 2362944 kB
Active(anon): 2029628 kB
Inactive(anon): 644244 kB
Active(file): 1161004 kB
Inactive(file): 1718700 kB
Unevictable: 4 kB
Mlocked: 0 kB
SwapTotal: 7812088 kB
SwapFree: 7810748 kB
Dirty: 1724 kB
Writeback: 0 kB
AnonPages: 2634840 kB
Mapped: 91544 kB
Shmem: 38588 kB
Slab: 299204 kB
SReclaimable: 263184 kB
SUnreclaim: 36020 kB
KernelStack: 3488 kB
PageTables: 20688 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 10843492 kB
Committed_AS: 3364384 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 33052 kB
VmallocChunk: 34356571308 kB
HardwareCorrupted: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 10176 kB
DirectMap2M: 6215680 kB

This is Ubuntu 10.04 LTS x64 with ECC-RAM and no system errors as far
as i can tell. Any further idea?
Is the timeout adjustable?

Regards

Andreas

···

On 12/05/2012 04:35 PM, lst_hoe02(a)kwsoft.de wrote:

On 12/05/2012 04:07 PM, lst_hoe02(a)kwsoft.de wrote:

Yes that should be possible although normally this should not be needed.
See:

http://wrapper.tanukisoftware.com/doc/english/prop-ping-interval-logged.html

Before changing the timeout, it might be a good idea to enable debugging
for the wrapper. It might tell us more

http://wrapper.tanukisoftware.com/doc/english/prop-debug.html

Just add

wrapper.debug=TRUE

to the file /usr/share/djigzo/wrapper/djigzo.wrapper.conf

and then restart djigzo

You will now see the ping's to the JVM and hopefully some more
information about why it restarts.

Kind regards,

Martijn

···

On 12/05/2012 05:34 PM, lst_hoe02(a)kwsoft.de wrote:

Zitat von Martijn Brinkers <martijn(a)djigzo.com>:

On 12/05/2012 04:35 PM, lst_hoe02(a)kwsoft.de wrote:

Zitat von Martijn Brinkers <martijn(a)djigzo.com>:

On 12/05/2012 04:07 PM, lst_hoe02(a)kwsoft.de wrote:

Are you running the gateway virtualized or on real hardware?

This is real Hardware and no memory shortage either. Djigzo was
working
as expected at the time it was killed. Today where it was noticed i
was
even working in the admin console, so it should be no stall of Djigzo
but more of a false restart trigger.

Strange. I have never seen this problem on my own servers (well at
least
not that I know, since it automatically restarts :slight_smile:

I have checked the Java Wrapper releases notes at:

Release Notes - Java Service Wrapper

Although 3.4.1 is already old, I did not immediately see a bug fix
which
might explain what happens. This looks the closest:

Version 3.5.9:
Fix a problem where the Wrapper was not correctly yielding control back
to its main loop when very large amounts of continuous output was being
logged from the JVM. Introduced in 3.4.0. In versions prior to 3.5.8,
this could have caused the JVM to timeout and restart itself. That
particular issue was resolved but the Wrapper process in 3.5.8 would
still have been unresponsive when this was happening. The Wrapper will
now always yeild back to its main loop after 250 milliseconds of
continuous logging.

You might try to use a newer Java wrapper release to see whether this
fixes the problem. The Java wrapper files are stored in:

/usr/share/djigzo/wrapper

The wrapper tar is extracted to a sub dir and then the following
symlinks link to the required files inside the version specific sub
dir:

libwrapper.so --links-to--> wrapper-linux-xx/lib/libwrapper.so
wrapper-djigzo --links-to--> wrapper-linux-xx/bin/wrapper
wrapper.jar --links-to--> wrapper-linux-xx/lib/wrapper.jar

If you want to test with a newer release of Java wrapper, untar the
file
and overwrite the symlinks and then restart djigzo.

What kind of wrapper do i need? The Professional, Standard or Community
edition?

The Community edition.

Kind regards,

Martijn

Hm, not the root case. Restarted again with wrapper 3.5.15. Below is the
content of /proc/meminfo

MemTotal: 6062812 kB
MemFree: 45060 kB
Buffers: 98080 kB
Cached: 2820224 kB
SwapCached: 448 kB
Active: 3190632 kB
Inactive: 2362944 kB
Active(anon): 2029628 kB
Inactive(anon): 644244 kB
Active(file): 1161004 kB
Inactive(file): 1718700 kB
Unevictable: 4 kB
Mlocked: 0 kB
SwapTotal: 7812088 kB
SwapFree: 7810748 kB
Dirty: 1724 kB
Writeback: 0 kB
AnonPages: 2634840 kB
Mapped: 91544 kB
Shmem: 38588 kB
Slab: 299204 kB
SReclaimable: 263184 kB
SUnreclaim: 36020 kB
KernelStack: 3488 kB
PageTables: 20688 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 10843492 kB
Committed_AS: 3364384 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 33052 kB
VmallocChunk: 34356571308 kB
HardwareCorrupted: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 10176 kB
DirectMap2M: 6215680 kB

This is Ubuntu 10.04 LTS x64 with ECC-RAM and no system errors as far as
i can tell. Any further idea?
Is the timeout adjustable?

--
DJIGZO email encryption

Is it possible that it is related to the CRL downloading. Most of the
time we have something like this nearby:

05 Dez 2012 17:34:23 | WARN IO Exception downloading CRL. URI:
ldap://ldap.sbca.telesec.de/CN=Deutsche%20Telekom%20Root%20CA%202,OU=T-TeleSec%20Trust%20Center,O=Deutsche%20Telekom%20AG,C=DE?AuthorityRevocationList. Message: javax.naming.NamingException: LDAP response read timed out, timeout used:120000ms. (mitm.common.security.crl.CRLStoreUpdaterImpl) [CRL Updater
thread]

Regards

Andreas

Downloading of CRLs are done on a separate thread. The only think I can
think of is that downloading the CRL somehow uses all the CPU for some
time although that should not happen since all threads have the same
priority. Do you have the certificate that contains that CRL
distribution point so I can test whether I can download the CRL?

Kind regards,

Martijn

···

On 12/05/2012 05:44 PM, lst_hoe02(a)kwsoft.de wrote:

Is it possible that it is related to the CRL downloading. Most of the
time we have something like this nearby:

05 Dez 2012 17:34:23 | WARN IO Exception downloading CRL. URI:
ldap://ldap.sbca.telesec.de/CN=Deutsche%20Telekom%20Root%20CA%202,OU=T-TeleSec%20Trust%20Center,O=Deutsche%20Telekom%20AG,C=DE?AuthorityRevocationList.
Message: javax.naming.NamingException: LDAP response read timed out,
timeout used:120000ms. (mitm.common.security.crl.CRLStoreUpdaterImpl)
[CRL Updater thread]

--
DJIGZO email encryption

Zitat von Martijn Brinkers <martijn(a)djigzo.com>:

Is it possible that it is related to the CRL downloading. Most of the
time we have something like this nearby:

05 Dez 2012 17:34:23 | WARN IO Exception downloading CRL. URI:
ldap://ldap.sbca.telesec.de/CN=Deutsche%20Telekom%20Root%20CA%202,OU=T-TeleSec%20Trust%20Center,O=Deutsche%20Telekom%20AG,C=DE?AuthorityRevocationList.
Message: javax.naming.NamingException: LDAP response read timed out,
timeout used:120000ms. (mitm.common.security.crl.CRLStoreUpdaterImpl)
[CRL Updater thread]

Downloading of CRLs are done on a separate thread. The only think I can
think of is that downloading the CRL somehow uses all the CPU for some
time although that should not happen since all threads have the same
priority. Do you have the certificate that contains that CRL
distribution point so I can test whether I can download the CRL?

Kind regards,

Martijn

There is more than one with crappy CRL download URL. I will try to
pick them and send you in private mail. There is obviously something
wrong with the CRLs anyway:

/var/log/old/james.wrapper.log.1:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: GC overhead limit exceeded
/var/log/old/james.wrapper.log.1:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.1:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.1:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.1:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.1:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.1:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.2.gz:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: GC overhead limit exceeded
/var/log/old/james.wrapper.log.2.gz:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.2.gz:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: GC overhead limit exceeded
/var/log/old/james.wrapper.log.3.gz:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.3.gz:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.3.gz:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.4.gz:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.4.gz:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space

I have now raised the Memory Limit in wrapper.conf from 512MB to 700MB
to see if this fixes at least the OutOfMemory :frowning:
And yes, one CPU is at 100% load for the time of CRL updates but at
least we have two of them :slight_smile:

Regards

Andreas

···

On 12/05/2012 05:44 PM, lst_hoe02(a)kwsoft.de wrote:

Zitat von Martijn Brinkers <martijn(a)djigzo.com>:

Is it possible that it is related to the CRL downloading. Most of the
time we have something like this nearby:

05 Dez 2012 17:34:23 | WARN IO Exception downloading CRL. URI:
ldap://ldap.sbca.telesec.de/CN=Deutsche%20Telekom%20Root%20CA%202,OU=T-TeleSec%20Trust%20Center,O=Deutsche%20Telekom%20AG,C=DE?AuthorityRevocationList.

Message: javax.naming.NamingException: LDAP response read timed out,
timeout used:120000ms. (mitm.common.security.crl.CRLStoreUpdaterImpl)
[CRL Updater thread]

Downloading of CRLs are done on a separate thread. The only think I can
think of is that downloading the CRL somehow uses all the CPU for some
time although that should not happen since all threads have the same
priority. Do you have the certificate that contains that CRL
distribution point so I can test whether I can download the CRL?

Kind regards,

Martijn

There is more than one with crappy CRL download URL. I will try to pick
them and send you in private mail. There is obviously something wrong
with the CRLs anyway:

/var/log/old/james.wrapper.log.1:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: GC overhead limit exceeded
/var/log/old/james.wrapper.log.1:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.1:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.1:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.1:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.1:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.1:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.2.gz:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: GC overhead limit exceeded
/var/log/old/james.wrapper.log.2.gz:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.2.gz:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: GC overhead limit exceeded
/var/log/old/james.wrapper.log.3.gz:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.3.gz:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.3.gz:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.4.gz:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.4.gz:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space

I have now raised the Memory Limit in wrapper.conf from 512MB to 700MB
to see if this fixes at least the OutOfMemory :frowning:
And yes, one CPU is at 100% load for the time of CRL updates but at
least we have two of them :slight_smile:

That might explain it. Unfortunately CRLs can be large and memory
consuming. I think 512 MB is too low for some large CRLs. For the
Virtual Appliance I have enabled "dynamic memory allocation" which uses
more memory if the system is configured with more memory. It's really
simple but effective (see /etc/default/djigzo). This is not enabled by
default since djigzo might be installed on systems that run other things
as well.

>java.lang.OutOfMemoryError: GC overhead limit exceeded

This is a warning from the JVM that the system spends more time on
garbage collection then some max value. I think that the system is then
restarted. Somehow you have a CA with a very large CRL. The system sets
a max size of a downloaded CRL (see djigzo.properties parameter:
system.cRLDownloadParameters.maxBytes). You can lower this value, but
then it might be that a CRL is no longer downloaded since it's too big.

I'll need to spend time refactoring the CRL part. The problem is that
Java tries to load the CRL in memory which can take a lot of memory. The
thing I have been thinking of for some time is to make a database backed
CRL. But this is not trivial to do.

Can you see which CRL is giving you grieve?

Kind regards,

Martijn

···

On 12/05/2012 06:03 PM, lst_hoe02(a)kwsoft.de wrote:

On 12/05/2012 05:44 PM, lst_hoe02(a)kwsoft.de wrote:

--
DJIGZO email encryption

Zitat von Martijn Brinkers <martijn(a)djigzo.com>:

···

On 12/05/2012 06:03 PM, lst_hoe02(a)kwsoft.de wrote:

Zitat von Martijn Brinkers <martijn(a)djigzo.com>:

On 12/05/2012 05:44 PM, lst_hoe02(a)kwsoft.de wrote:

Is it possible that it is related to the CRL downloading. Most of the
time we have something like this nearby:

05 Dez 2012 17:34:23 | WARN IO Exception downloading CRL. URI:
ldap://ldap.sbca.telesec.de/CN=Deutsche%20Telekom%20Root%20CA%202,OU=T-TeleSec%20Trust%20Center,O=Deutsche%20Telekom%20AG,C=DE?AuthorityRevocationList.

Message: javax.naming.NamingException: LDAP response read timed out,
timeout used:120000ms. (mitm.common.security.crl.CRLStoreUpdaterImpl)
[CRL Updater thread]

Downloading of CRLs are done on a separate thread. The only think I can
think of is that downloading the CRL somehow uses all the CPU for some
time although that should not happen since all threads have the same
priority. Do you have the certificate that contains that CRL
distribution point so I can test whether I can download the CRL?

Kind regards,

Martijn

There is more than one with crappy CRL download URL. I will try to pick
them and send you in private mail. There is obviously something wrong
with the CRLs anyway:

/var/log/old/james.wrapper.log.1:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: GC overhead limit exceeded
/var/log/old/james.wrapper.log.1:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.1:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.1:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.1:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.1:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.1:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.2.gz:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: GC overhead limit exceeded
/var/log/old/james.wrapper.log.2.gz:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.2.gz:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: GC overhead limit exceeded
/var/log/old/james.wrapper.log.3.gz:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.3.gz:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.3.gz:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.4.gz:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.4.gz:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space

I have now raised the Memory Limit in wrapper.conf from 512MB to 700MB
to see if this fixes at least the OutOfMemory :frowning:
And yes, one CPU is at 100% load for the time of CRL updates but at
least we have two of them :slight_smile:

That might explain it. Unfortunately CRLs can be large and memory
consuming. I think 512 MB is too low for some large CRLs. For the
Virtual Appliance I have enabled "dynamic memory allocation" which uses
more memory if the system is configured with more memory. It's really
simple but effective (see /etc/default/djigzo). This is not enabled by
default since djigzo might be installed on systems that run other things
as well.

>java.lang.OutOfMemoryError: GC overhead limit exceeded

This is a warning from the JVM that the system spends more time on
garbage collection then some max value. I think that the system is then
restarted. Somehow you have a CA with a very large CRL. The system sets
a max size of a downloaded CRL (see djigzo.properties parameter:
system.cRLDownloadParameters.maxBytes). You can lower this value, but
then it might be that a CRL is no longer downloaded since it's too big.

I'll need to spend time refactoring the CRL part. The problem is that
Java tries to load the CRL in memory which can take a lot of memory. The
thing I have been thinking of for some time is to make a database backed
CRL. But this is not trivial to do.

Can you see which CRL is giving you grieve?

Kind regards,

Martijn

I don't know how to quickly find out the size of the CRLs. I can
provide you a sample list with all CAs for which CRL trouble is
listed. What about CRLs from non-root CAs??

Regards

Andreas

Yes that would help. You could also select all CRLs and download them as
a p7b (but that might be a large file)

Regards,

Martijn

···

On 12/05/2012 06:22 PM, lst_hoe02(a)kwsoft.de wrote:

Zitat von Martijn Brinkers <martijn(a)djigzo.com>:

On 12/05/2012 06:03 PM, lst_hoe02(a)kwsoft.de wrote:

Zitat von Martijn Brinkers <martijn(a)djigzo.com>:

On 12/05/2012 05:44 PM, lst_hoe02(a)kwsoft.de wrote:

Is it possible that it is related to the CRL downloading. Most of the
time we have something like this nearby:

05 Dez 2012 17:34:23 | WARN IO Exception downloading CRL. URI:
ldap://ldap.sbca.telesec.de/CN=Deutsche%20Telekom%20Root%20CA%202,OU=T-TeleSec%20Trust%20Center,O=Deutsche%20Telekom%20AG,C=DE?AuthorityRevocationList.

Message: javax.naming.NamingException: LDAP response read timed out,
timeout used:120000ms.
(mitm.common.security.crl.CRLStoreUpdaterImpl)
[CRL Updater thread]

Downloading of CRLs are done on a separate thread. The only think I can
think of is that downloading the CRL somehow uses all the CPU for some
time although that should not happen since all threads have the same
priority. Do you have the certificate that contains that CRL
distribution point so I can test whether I can download the CRL?

Kind regards,

Martijn

There is more than one with crappy CRL download URL. I will try to pick
them and send you in private mail. There is obviously something wrong
with the CRLs anyway:

/var/log/old/james.wrapper.log.1:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: GC overhead limit exceeded
/var/log/old/james.wrapper.log.1:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.1:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.1:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.1:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.1:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.1:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.2.gz:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: GC overhead limit exceeded
/var/log/old/james.wrapper.log.2.gz:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.2.gz:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: GC overhead limit exceeded
/var/log/old/james.wrapper.log.3.gz:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.3.gz:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.3.gz:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.4.gz:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space
/var/log/old/james.wrapper.log.4.gz:Exception in thread "CRL Updater
thread" java.lang.OutOfMemoryError: Java heap space

I have now raised the Memory Limit in wrapper.conf from 512MB to 700MB
to see if this fixes at least the OutOfMemory :frowning:
And yes, one CPU is at 100% load for the time of CRL updates but at
least we have two of them :slight_smile:

That might explain it. Unfortunately CRLs can be large and memory
consuming. I think 512 MB is too low for some large CRLs. For the
Virtual Appliance I have enabled "dynamic memory allocation" which uses
more memory if the system is configured with more memory. It's really
simple but effective (see /etc/default/djigzo). This is not enabled by
default since djigzo might be installed on systems that run other things
as well.

>java.lang.OutOfMemoryError: GC overhead limit exceeded

This is a warning from the JVM that the system spends more time on
garbage collection then some max value. I think that the system is then
restarted. Somehow you have a CA with a very large CRL. The system sets
a max size of a downloaded CRL (see djigzo.properties parameter:
system.cRLDownloadParameters.maxBytes). You can lower this value, but
then it might be that a CRL is no longer downloaded since it's too big.

I'll need to spend time refactoring the CRL part. The problem is that
Java tries to load the CRL in memory which can take a lot of memory. The
thing I have been thinking of for some time is to make a database backed
CRL. But this is not trivial to do.

Can you see which CRL is giving you grieve?

Kind regards,

Martijn

I don't know how to quickly find out the size of the CRLs. I can provide
you a sample list with all CAs for which CRL trouble is listed. What
about CRLs from non-root CAs??

--
DJIGZO email encryption