Bug 154181 - UNO array transfer with 500k elements considerably slower when running on Java 17 when compared to running on Java 11
Summary: UNO array transfer with 500k elements considerably slower when running on Jav...
Status: UNCONFIRMED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: sdk (show other bugs)
Version:
(earliest affected)
7.6.0.0 alpha0+
Hardware: All Linux (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-03-14 10:00 UTC by Marc-Oliver Straub
Modified: 2023-08-20 03:44 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marc-Oliver Straub 2023-03-14 10:00:50 UTC
Description:
When invoking an interface method that transfers 500k integers (sal_Int32) from a C++ service to a Java client via UNO, we observe considerable performance degradation after updating our java runtime to Java 17. On a HP Z640 workstation, we see ~33sec vs. ~16sec when passing -XX:+UseBiasedLocking to the JVM.

This is due to the following JDK task: https://openjdk.org/jeps/374
And commit: https://github.com/openjdk/jdk15u/commit/398a2b3c37236c0cd1b5258b517adf9edaf0b044

UseBiasedLocking is marked for deprecation and disabled per default (OpenJDK 19 no longer allows setting -XX:+UseBiasedLocking).

The problem for UNO is as follows: Marshal and Unmarshal use a ByteArrayOutputStreamDataOutputStream resp. ByteArrayInputStreamDataInputStream.

1) java.io.ByteArrayInputStream.read() is synchronized and its used to read everything from the bytes of a transferred message, byte-per-byte (or close enough to make no difference performance-wise). So for each element in the array, the code is now taking a lock - the JVM no longer optimizes this away. 
2) Same applies for java.io.ByteArrayOutputStream.write(int).

We tried the workaround mentioned below, but for the example above it still results in about 30% performance degradation (average is at around ~22 seconds). 

Is there a more permanent solution that doesn't result in performance degradation? To our knowledge the synchronization is an accidental byproduct of using byte array output/input streams.



diff --git a/ridljar/com/sun/star/lib/uno/protocols/urp/Marshal.java b/ridljar/com/sun/star/lib/uno/protocols/urp/Marshal.java
index 106a8736cb31..5d0150f2ae23 100644
--- a/ridljar/com/sun/star/lib/uno/protocols/urp/Marshal.java
+++ b/ridljar/com/sun/star/lib/uno/protocols/urp/Marshal.java
@@ -15,6 +15,8 @@
  *   License, Version 2.0 (the "License"); you may not use this file
  *   except in compliance with the License. You may obtain a copy of
  *   the License at http://www.apache.org/licenses/LICENSE-2.0 .
+ *
+ * Modifications <C2><A9> 2023 Advantest Corporation
  */
 package com.sun.star.lib.uno.protocols.urp;
 
@@ -210,6 +212,11 @@ final class Marshal {
         return data;
     }
 
+    public Object getStreamForSynchronization()
+    {
+        return buffer;
+    }
+
     private void writeBooleanValue(Boolean value) throws IOException {
         output.writeBoolean(value != null && value.booleanValue());
     }
diff --git a/ridljar/com/sun/star/lib/uno/protocols/urp/Unmarshal.java b/ridljar/com/sun/star/lib/uno/protocols/urp/Unmarshal.java
index c16d19291356..301d4228c52e 100644
--- a/ridljar/com/sun/star/lib/uno/protocols/urp/Unmarshal.java
+++ b/ridljar/com/sun/star/lib/uno/protocols/urp/Unmarshal.java
@@ -15,6 +15,8 @@
  *   License, Version 2.0 (the "License"); you may not use this file
  *   except in compliance with the License. You may obtain a copy of
  *   the License at http://www.apache.org/licenses/LICENSE-2.0 .
+ *
+ * Modifications <C2><A9> 2023 Advantest Corporation
  */
 package com.sun.star.lib.uno.protocols.urp;
 
@@ -157,6 +159,11 @@ final class Unmarshal {
         }
     }
 
+    public Object getStreamForSynchronization()
+    {
+        return bytesInput;
+    }
+
     public Object readValue(TypeDescription type) {
         try {
             switch (type.getTypeClass().getValue()) {
@@ -231,7 +238,8 @@ final class Unmarshal {
     }
 
     public void reset(byte[] data) {
-        input = new DataInputStream(new ByteArrayInputStream(data));
+        bytesInput = new ByteArrayInputStream(data);
+        input = new DataInputStream(bytesInput);
     }
 
     private Boolean readBooleanValue() throws IOException {
@@ -471,6 +479,7 @@ final class Unmarshal {
     private final ThreadId[] threadIdCache;
     private final TypeDescription[] typeCache;
     private DataInputStream input;
+    private ByteArrayInputStream bytesInput;
 }
 
 /* vim:set shiftwidth=4 softtabstop=4 expandtab: */
diff --git a/ridljar/com/sun/star/lib/uno/protocols/urp/urp.java b/ridljar/com/sun/star/lib/uno/protocols/urp/urp.java
index 76b5006feb3e..2653e40e4cf5 100644
--- a/ridljar/com/sun/star/lib/uno/protocols/urp/urp.java
+++ b/ridljar/com/sun/star/lib/uno/protocols/urp/urp.java
@@ -16,7 +16,7 @@
  *   except in compliance with the License. You may obtain a copy of
  *   the License at http://www.apache.org/licenses/LICENSE-2.0 .
  *
- * Modifications <C2><A9> 2019 Advantest Corporation
+ * Modifications <C2><A9> 2023 Advantest Corporation
  *
  */
 
@@ -106,6 +106,8 @@ public final class urp implements IProtocol {
                 }
             }
             UrpMessage msg;
+            synchronized(unmarshal.getStreamForSynchronization()) // fix for CR-160841
+            {
             int header = unmarshal.read8Bit();
             if ((header & HEADER_LONGHEADER) != 0) {
                 if ((header & HEADER_REQUEST) != 0) {
@@ -116,6 +118,8 @@ public final class urp implements IProtocol {
             } else {
                 msg = readShortRequest(header);
             }
+            }
+
             if (msg.isInternal()) {
                 handleInternalMessage(msg);
             } else {
@@ -163,6 +167,8 @@ public final class urp implements IProtocol {
         throws IOException
     {
         synchronized (output) {
+            synchronized(marshal.getStreamForSynchronization())
+            {
             if (PRINT_ALL_IPCS) {
                 System.out.println("java: writeReply " + result);
             }
@@ -204,6 +210,7 @@ public final class urp implements IProtocol {
             }
             writeBlock(true);
         }
+        }
     }
 
     private void sendRequestChange() throws IOException {
@@ -540,9 +547,12 @@ public final class urp implements IProtocol {
                     new QueuedRelease(internal, oid, type, desc, tid));
                 return false;
             } else {
+                synchronized(marshal.getStreamForSynchronization())
+                {
                 writeQueuedReleases();
                 return writeRequest(
                     internal, oid, type, desc, tid, arguments, true);
+                }
             }
         }
     }



Steps to Reproduce:
Transfer a Sequence<sal_Int32> from C++ to a Java client. C++ implementation looks like this:
  Sequence< Sequence< sal_Int32 > > JavaTest::getSequenceOfSequenceOfLongs()
  {
    int size = 10000;
    Sequence<Sequence<sal_Int32>> returnValue(size);

    for (sal_Int32 i = 0; i < size; i++) {
      returnValue[i] = Sequence< sal_Int32 >(i + 1);

      for (sal_Int32 j = 0; j < i + 1; j++) {
        returnValue[i][j] = j;
      }
    }
    return returnValue;
  }

Actual Results:
33s runtime when using java 17
16s runtime when using java 17 -XX:+UseBiasedLocking

Expected Results:
16s runtime even when not specifying -XX:+UseBiasedLocking


Reproducible: Always


User Profile Reset: No

Additional Info:
n/a
Comment 1 Julien Nabet 2023-03-14 12:19:59 UTC
Stephan: I think you may be interested in this one since it concerns Uno and Java