Classifying genealogy-related images with Azure’s Custom Vision Service

One of my pet peeves on Ancestry.com is getting image hints for things like immigrant ship icons, DNA icons, angels, coats of arms, flags, etc.

I don’t begrudge the folks that want to decorate their tree with these badges, but I don’t care about them and I don’t want to see them as hints.

Now, ancestry.com does have a feature where users can note whether an image they’re uploading is a document or picture or what have you. But it’s not like you can search on it, and I suspect I’m one of the few people that bothers with it.

Thing is, machine learning could identify and categorize all the images we upload, make them searchable, and even exclude some—such as angels and DNA icons—from popping up as hints.

It’s not hard—there are many commercially available machine learning services that will categorize images with just a bit of training. Azure Custom Vision Service actually makes training such a model a drag-and-drop experience.

Let me show you.

First, I need to set up the service. It only takes a minute, assuming you already have an Azure subscription, which I do.

Next, I need to provide the Custom Vision service a bunch of images and tell them what they are.

If I were doing this for real, I’d prepare hundreds of images for far more categories. And, as this little warning notes, I would have equal groups for each category.

But for this demo, I think this is good enough.

Let’s test it!

First, let’s try it out on this DNA icon. See the results here? The model is 95% confident this is DNA, though it also thinks it might be a coat of arms.

This tombstone for Alonzo Hawn? The model is 100% confident.

The same goes for this newspaper article, for this photo of my dad, and the Powell family arms.

Interestingly, this “DNA verified” icon really trips up the model: it thinks this is a coat of arms. If I were really building this model, I’d run another iteration of the model after adding a bunch of images like this one.

Now, ancestry.com would have some more development work to do to make image categorization a feature of their service, but the hard part—the machine learning—isn’t hard anymore.