Learning Category-Specific Mesh Reconstruction
from Image Collections

Angjoo Kanazawa*
Shubham Tulsiani*
Alexei A. Efros
Jitendra Malik
University of California, Berkeley


Given an annotated image collection of an object category, we learn a predictor that can map a novel image to its 3D shape, camera pose, and texture.

We present a learning framework for recovering the 3D shape, camera, and texture of an object from a single image. The shape is represented as a deformable 3D mesh model of an object category where a shape is parameterized by a learned mean shape and per-instance predicted deformation. Our approach allows leveraging an annotated image collection for training, where the deformable model and the 3D prediction mechanism are learned without relying on ground-truth 3D or multi-view supervision. Our representation enables us to go beyond existing 3D prediction approaches by incorporating texture inference as prediction of an image in a canonical appearance space. Additionally, we show that semantic keypoints can be easily associated with the predicted shapes. We present qualitative and quantitative results of our approach on the CUB dataset, and show that we can learn to predict the diverse shapes and textures across birds using only an annotated image collection. We also demonstrate the the applicability of our method for learning the 3D structure of other generic categories.


Kanazawa, Tulsiani, Efros, Malik.

Learning Category-Specific Mesh Reconstruction
from Image Collections.

arXiv, 2018.

[pdf]     [Bibtex]

Overview and Results


 [GitHub] (coming soon)


We thank David Fouhey for the creative title suggestions, and members of the BAIR community for helpful discussions and comments. This work was supported in part by Intel/NSF VEC award IIS-1539099, NSF Award IIS-1212798, and BAIR sponsors. This webpage template was borrowed from some colorful folks.