Crash with Skia/Vulkan rendering when dragging the LO appframe wider on Win10 with modest nVidia dGPU, 2GB graphics memory on a 4K (2048x3840px) display. Stacktrace(s) attached. STR 1. open LO 2. use the os/DE to drag and attach appframe to right half or left half of desktop 3. grab midpoint edge and drag the LO appframe wider 4. note the appframe will expand for some distance, but then will crash 5. repeat, but set LO to Skia/raster based rendering -- no crash with the raster framing Attached soffice.bin to WinDbg with symbols. Look to be hitting the OOM assert at vcl/skia/gdiimpl.cxx 485 that Mike K. put in with https://gerrit.libreoffice.org/c/core/+/161516 Version: 24.2.3.2 (X86_64) / LibreOffice Community Build ID: 433d9c2ded56988e8a90e6b2e771ee4e6a5ab2ba CPU threads: 8; OS: Windows 10.0 Build 19045; UI render: Skia/Vulkan; VCL: win Locale: en-US (en_US); UI: en-US Calc: CL threaded vulkaninfo.exe Device Properties and Extensions: ================================= GPU0: VkPhysicalDeviceProperties: --------------------------- apiVersion = 1.3.277 (4206869) driverVersion = 552.12.0.0 (2315452416) vendorID = 0x10de deviceID = 0x1380 deviceType = PHYSICAL_DEVICE_TYPE_DISCRETE_GPU deviceName = NVIDIA GeForce GTX 750 Ti pipelineCacheUUID = e1b3e8ed-3cf0-cbc3-9cd7-33f3274361af
Created attachment 194151 [details] WinDbg stack and analyze
@Mike, I can routinely reproduce on this 2Gb GPU / 4K display combo, do you have cycles to revisit?
(In reply to V Stuart Foote from comment #0) > Attached soffice.bin to WinDbg with symbols. Look to be hitting the OOM > assert at vcl/skia/gdiimpl.cxx 485 that Mike K. put in with > https://gerrit.libreoffice.org/c/core/+/161516 Please note that I didn't put any asserts there. I added code to *not* assert in a specific case, hoping that these cases could get fixed - but that code isn't run here. Given the description that developers made for the 'oomed()', I do not see what I could do here. An expert is needed, not me.
(In reply to Mike Kaganski from comment #3) > (In reply to V Stuart Foote from comment #0) > > Attached soffice.bin to WinDbg with symbols. Look to be hitting the OOM > > assert at vcl/skia/gdiimpl.cxx 485 that Mike K. put in with > > https://gerrit.libreoffice.org/c/core/+/161516 > > Please note that I didn't put any asserts there. I added code to *not* > assert in a specific case, hoping that these cases could get fixed - but > that code isn't run here. > > Given the description that developers made for the 'oomed()', I do not see > what I could do here. An expert is needed, not me. OK thanks, guess I misread the commit, and I'm certainly no expert either. But thought you were on the right track. After an initial skia GrDirectContext oomed() context return [1], you'd tested for > 10 Skia operations to flush with a default at 1000 ops, and then if still oomed() divide that by 2--and get context again just once? And only then fail if still OOM with oomed() context? If the flush is to work, maybe a factor of 10 on oomed() so if > 10 /= 10 --> and reduce the count of resize steps to carry before the flush? Or wishful thinking and I am way off track with being able to flush GPU memory... and maybe rather than the oomed() the Skia releaseResourcesAndAbandonContext() is another way to recover when OOM. =-ref-= [1] https://api.skia.org/classGrDirectContext.html
Can't help here=>uncc myself.
(In reply to V Stuart Foote from comment #4) That change gust ignored the OOM state, and all the operations that led to it, and simply halved the number of operations before flush. so next batch of operations would flush after 500, then 250, ... - no more than 8 attempts before the number is lower than 10. Given that oomed() call is documented to reset the state, I don't see what I can do here. Replacing canvas's context is not something that looks safe.
(In reply to Mike Kaganski from comment #6) > (In reply to V Stuart Foote from comment #4) > > That change gust ignored the OOM state, and all the operations that led to > it, and simply halved the number of operations before flush. so next batch > of operations would flush after 500, then 250, ... - no more than 8 attempts > before the number is lower than 10. > > Given that oomed() call is documented to reset the state, I don't see what I > can do here. Replacing canvas's context is not something that looks safe. OK can agree to that, and thanks for walking me through it. Setting => WF as it is annoying/concerning but a corner case in Vulkan usage and no one else has confirmed with STR.