Alt Purpose: Whole images are down sampled to up to 300*300 thumbnails, which are meant to provide the complete contextual information of the faces.

  • Some statistics:
    • # of Entities: 99,952
    • # of Lines: 10,490,534
    • Image Resolution: up to 300*300
    • Average Image # per Entity: 105
    • Total file size (uncompressed): 214GB
  • File format: text files, each line is an image record containing 6 columns, delimited by TAB.
    • Column1: Freebase MID
    • Column2: Query/Name
    • Column3: ImageSearchRank
    • Column4: ImageURL
    • Column5: PageURL
    • Column6: ImageData_Base64Encoded


  1. The data is released for non-commercial research purpose only. You have to read and agree the MSR Data License Agreement before you downloading the data;
  2. Please contact us If you are a celebrity but do not want to be included in this data set. We will remove related entries by request;
  3. In all the related publications, please cite the paper "MS-Celeb-1M: A Dataset and Benchmark for Large Scale Face Recognition" and provide the link to
