We present our method for transferring style from any arbitrary image(s) to object(s) within a 3D scene. Our primary objective is to offer more control in 3D scene stylization, facilitating the creation of customizable and stylized scene images from arbitrary viewpoints. To achieve this, we propose a novel approach that incorporates nearest neighborhood-based loss, allowing for flexible 3D scene reconstruction while effectively capturing intricate style details and ensuring multi-view consistency.
We used Nearest Neighbor Feature Matching (NNFM) loss besides using Gram loss in order to train our radiance fields. Thus, we achieve higher quality renderings. While performing our experimentations, we observed that subsampling the mask in order to extract the object features from the VGG feature space leads to boundary artifacts as the style loss applies to a larger area than the original mask owing to the receptive field of the model. In order to circumvent this, we work with the mask in the image space itself. The quality of the rendering depends heavily on the segmentation masks of the objects. As the SAM is very accurate at producing refined segmentation masks for objects, we do not really run into many issues pertaining this. Thus, we are able to produce high-quality and high fidelity renderings with specific objects being styled the way an user would want. Based on our experiments, our method has proven to be robust and scales well to increasing number of objects and styles.