3D Neural Affordance Highlighter
Applying AI to localize affordance regions in 3D point clouds using vision-language models.
This project extends neural affordance highlighting to 3D point clouds, investigating whether affordance regions can be identified using CLIP-based textual supervision without explicit 3D labels. It utilizes multi-view rendering, differentiable rendering, and neural networks to enhance segmentation accuracy.
π GitHub Repository
πΉ Key Highlights
- 3D Affordance Localization β Detects interaction affordances on household objects (e.g., doors, bottles, knives) without labeled supervision.
- Vision-Language Integration β Uses CLIP embeddings for affordance region identification.
- Multi-View & Differentiable Rendering β Improves spatial alignment of affordance predictions.
- Grid Search Optimization β Experimented with hyperparameters (learning rates, augmentations, network depth).
- Performance Evaluation β Assessed using IoU and aIoU for affordance-specific segmentation accuracy.
π Technologies Used
- Machine Learning β CLIP, PyTorch
- 3D Processing β Differentiable Rendering, Multi-View Learning
- Dataset β 3D AffordanceNet (22,949 objects, 23 classes, 18 affordance labels)
- Evaluation Metrics β IoU, aIoU, CLIP-based similarity scores
On this page I want to also include the report we wrote in CVPR format for this project.
If the embedded PDF below does not load, you can download it here.
The 3D Neural Affordance Highlighter explores how vision-language models can extend affordance recognition to 3D environments. By leveraging differentiable rendering and CLIP embeddings, the project demonstrates the potential for unsupervised affordance detection in 3D space.