Bug 134641 - binaryurp bridge termination sporadically causes DisposedException in a different bridge
Summary: binaryurp bridge termination sporadically causes DisposedException in a diffe...
Status: UNCONFIRMED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: sdk (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: needsDevAdvice
Depends on:
Blocks:
 
Reported: 2020-07-08 09:07 UTC by Marc-Oliver Straub
Modified: 2022-02-25 08:32 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marc-Oliver Straub 2020-07-08 09:07:58 UTC
Termination of an binaryurp bridge (eg. because the remote process crashes) can cause DisposedException in a different bridge. Expectation is that different bridges are not affected by termination of other bridges.

We had 3 processes communicating to each other using binaryurp bridges:
Process A <-> process B
Process B <-> process C
Process A and process C don't talk to each other.

Process A requests process B to execute a method. As part of this method, process B needs to call process C:

A: call doSomethingInProcessB(), waiting for result
B: execute doSomethingInProcessB(), will now call doSomethingInProcessC()
C: idling

Process A is now terminated (due to one of its threads crashing, a kill, ...).
Process B notices that the bridge to process A has terminated and calls ThreadPool::dispose(nDisposeId). ThreadPool::dispose(..) walks through all JobQueues, calling JobQueue::dispose(nDisposeId).

Since the doSomethingInProcessB()-call is still being processed, the associated  JobQueue contains the nDisposeId as topmost entry in the callstack. JobQueue::dispose(..) finds the disposeId and sets it to 0. It signals m_cndWait so that the bridge can terminate (jobqueue.cxx:143)

Concurrently to this, the worker thread currently working on doSomethingInProcessB() wants to call doSomethingInProcessC(). The IPC is sent out and JobQueue::enter(..) is called to wait for the result. JobQueue::enter(..) puts a different disposeId onto the callstack (since the call uses a different bridge) and should block on m_cndWait.wait() to wait for the result (jobqueue.cxx:73)

But m_cndWait has been signalled by JobQueue::dispose(), so JobQueue::enter(..) doesn't block - but m_lstJobs is still empty (jobqueue.cxx:98). It resets the m_cndWait and returns a nullptr, which is converted into a DisposedException by Bridge::makeCall() (bridge.cxx:610) - even though the bridge to process C is completely intact at this point in time.

I'd suggest the following fixes:
* JobQueue::enter() should check for job == nullptr after resetting m_cndWait in jobqueue.cxx:98. If so, it should continue waiting instead of returning nullptr. This will avoid the DisposedException, the call to doSomethingInProcessC() will work correctly.

* JobQueue::enter() should check for m_lstCallstack == 0 and m_lstJob.empty() after processing a request (jobqueue.cxx:109). This will ensure that the bridge will correctly terminate once doSomethingInProcessB() has finished.
Comment 1 Eleonora Govallo 2021-08-04 20:30:29 UTC
Hello!
Do you still want to implement your proposal?  If yes, please write to the IRC chat #libreoffice-dev
Also, you can try to make patch by yourself using information from this page https://wiki.documentfoundation.org/Development/GetInvolved
Comment 2 QA Administrators 2022-02-02 03:40:34 UTC Comment hidden (obsolete)
Comment 3 Marc-Oliver Straub 2022-02-25 08:32:10 UTC
Yes, I plan to provide a fix for this issue soon.