Back to the Future (again)

Wed 28 May 2014 | tags: Open Source, LiDAR, JavaScript, -- (permalink)

Uday Verma and I have been recently developing a new JavaScript point cloud visualization client called plasio. Its goal is quite simple -- to provide a simple 3D visualization environment for point cloud data in a web browser.

For the endeavor to work well, we need to compress the bloaty point cloud data down to an efficient format. The usual techniques -- JSON + gzip, protobuf, msgpack -- don't do very well with point clouds. Point cloud data are periodic, columnar, and fluffy. Each of these characteristics works to defeat the usual technique suspects. The periodic nature (point after point after point) doesn't give long runs of slightly different bytes to differentially compress. The columnar nature (X, Y, Z, Intensity, R, G, B, ReturnNumber, etc) and its binary packing exacerbates the failure of run-length encoding techniques. The fluffy nature comes from using wide integers to store narrow ranges of data. Format choices like storing the data in dimension-major storage (XXXXYYYZZZ) can give you the correlated runs of data that helps the popular algorithms, but the data organization cuts against you in other undesireable, detrimental ways.

LASzip is an application of arithmetic encoding by Martin Isenburg to compress the cyclical nature of point cloud data. LASzip is a fantastically powerful open source technology that I have been proud to help popularize to the world. It leverages the inherent correlation in point cloud data and exploits it to use accurate predictions to store smaller-in-bit-size residuals. It is differential encoding on steroids, buoyed by the application of an accurate model of the data that pushes fewer bits of residuals into the file storage. Currently applied to point clouds that come from LiDAR systems, many features of its models are applicable to other point cloud data -- things like SfM and Kinect-style sensors like Occipital/Tango.

Because the usual browser-available compression suspects don't work very well for point cloud data, an uninspired choice would be to try to port LASzip to JavaScript. It's a big ask. The LASzip codebase is an intricate, bit-shifty pile of wizardry. Porting it to JavaScript, and the leakage that would result, would doom the project to being at best a second-class effort. It would suffer the restriction that many ports suffer as a result of being an out-of-home-language derivation. It is best if there's only one codebase.

Gary Bernhardt clearly articulates where a significant aspect of modern software development is heading -- to the browser. Of course, we've all known that for years, but the thing he rightly points out is that the suckiness-but-universality of JavaScript means developers are coming up with a myriad of ways to write not-JavaScript in a browser. One aspect that is getting legs is the idea of "JavaScript as assembly." The materialization of that idea is asm.js and Emscripten.

In short, these technologies allow you to compile your C/C++ into JavaScript. No porting. It's as crazy as it sounds, but it really works. There's still some leaks though. Concepts that aren't available in JavaScript (like pointer traversal and vtable lookups) are emulated -- often at great cost. If you're planning on JavaScripting your C/C++, you definitely want to design away from those things if possible. Otherwise, it's magical.

Uday has refactored LASzip to better suit compilation to JavaScript using Emscripten. https://github.com/verma/laz-perf is the current status of that effort. Native compression/decompression speed has improved vs. the existing source tree due to the reorganization to better fit compilation to JavaScript.

Once we're done, we'll have a reasonably-performing pure JavaScript LAZ compression/decompression library that we can use to build efficient, browser-based point cloud software.

comments powered by Disqus