Collaborating on a physical object when two people aren’t in the same room can be extremely challenging, but a new remote conferencing system allows the remote user to manipulate a view of the scene in 3D, to assist in complex tasks like debugging complicated hardware.
The system, called SharedNeRF, combines two graphics rendering techniques – one that is slow and photorealistic, and another that is instantaneous but less precise – to help the remote user experience the physical space of the collaborator.
“This would be a paradigm shift,” said Mose Sakashita, a doctoral student in the field of information science who developed the system. “It would enable people to work on tasks that have never been possible and that are very difficult to convey through video-based systems with only one angle.”
Sakashita designed the remote conferencing tool as an intern at Microsoft in 2023, working with Andrew Wilson ’93, formerly a computer science major at Cornell. Sakashita will present the work, “,” May 16 at the Association of Computing Machinery (ACM) CHI conference on Human Factors in Computing Systems (CHI’24). The paper received an honorable mention.
“When performing a task involving physical objects, such as fixing a kitchen faucet or assembling a circuit, today’s video conferencing systems are pretty clunky,” Wilson said. “Lately, there has been a burst of innovation in computer graphics and rendering techniques. SharedNeRF is among the first explorations of using these techniques to address problems that arise when showing more than talking heads.”
Sakashita’s graduate research in the lab of, professor of information science in the Cornell Ann S. Bowers College of Computing and Information Science and the multicollege Department of Design Tech, focuses on developing new technology to support remote collaboration.
SharedNeRF takes a novel approach to remote collaboration by employing a graphics rendering method called a neural radiance field (NeRF). NeRF uses artificial intelligence to construct a 3D representation of a scene using 2D images. It creates incredibly realistic depictions – complete with reflections, transparent objects and accurate textures – that can be viewed from any direction.
In the SharedNeRF system, the local collaborator wears a head-mounted camera to record the scene. The resulting images feed into a NeRF deep learning model, which renders the scene in 3D for the remote collaborator, who can rotate the viewpoint as desired.
When the scene changes, it triggers the NeRF model to update the view. This update takes some time, however – about 15 seconds – so Sakashita’s team merged the detailed visuals created by NeRF with point cloud rendering, a faster technology. The head-mounted camera and a second RGB-D camera, which detects color and depth, set up opposite the user, capture the scene as a collection of points in space. The method can rapidly convey dynamic parts of the scene, like moving hands.
By merging the two rendering techniques, a remote user can view the scene from various angles in high quality through NeRF while also seeing real-time movements in the scene through point clouds.
SharedNeRF also shows an avatar of the local collaborator’s head, so the remote user can see where they are looking.
Seven volunteers tested SharedNeRF by performing a collaborative flower-arranging project with a partner. When compared with a standard video conferencing tool, or just point cloud rendering alone, five of the volunteers preferred SharedNeRF. All agreed that the system helped them see the design’s details and gave them better control over what they were seeing.
“We found that people really appreciated that they can independently change the viewpoint,” Sakashita said. Many also enjoyed being able to zoom in and out on the flower arrangement and not having to explain to the local collaborator which view they wanted to see.
Currently, SharedNeRF is designed only for one-on-one collaboration, but the researchers envision that it could be extended to multiple users. The technology could also be used to record and archive events, such as educational demonstrations or surgeries, so that students can re-watch from different angles.
Sakashita said future work would be necessary to improve the image quality and to offer a more immersive experience through virtual reality or augmented reality techniques.
Patricia Waldron is a writer for the Cornell Ann S. Bowers College of Computing and Information Science.