Why did you accept that hint? A new ancestry.com feature.

Will you look at this? Ancestry.com is asking me why I accepted a hint. Or ignored it. Or said “maybe.”

I have the “beta features” flag turned on for ancestry.com. Despite working in the tech sector, I’m not really an early adopter, but I’ve been so frustrated with ancestry’s service (and with all my brick walls) that I figured it was worth getting to the bleeding edge.

I haven’t seen an announcement for this, but this feels huge to me.

My gut is that ancestry.com is evaluating its hint model—which is at least partially driven by one user adding a record to a profile that matches one of the profiles in my tree. That model assumes that all user input is accurate, when we all know that’s not true.

The short-hand for folks like me in the data analytics world is the phrase “garbage in, garbage out.” If your dataset is garbage, your analysis will be garbage.

Asking questions such as these might just provide ancestry.com a data source to evaluate user contributions, possibly even use machine learning to assess the validity of hints.

For example, clicking that “I want to save and review later” is an easy indicator for ancestry’s algorithm to say “meh, don’t pay attention to this.”

Not selecting anything at all—which I suspect most ancestry.com users would do—would effectively provide the model the same answer: Don’t pay attention to this user’s input.

Response rate could also give ancestry.com a way to score their users: those that are committed to helping ancestry.com understand their data could potentially be given a higher weighting in a more modern hint algorithm. More important, it could help identify careless researchers, and limit their ability to muddy the waters.

Of course, this begs the question of whether ancestry.com should change their baseline assumption: that the central task in genealogy research is finding someone who’s already researched it.

I’m more intrigued by the behavioral side of it, though: by asking these questions, will users reconsider accepting an ancestry.com hint? Will people ask themselves “Is it just a name? Do that dates really match? Did I check the other people in the record?”

There are drawbacks, of course. Take this photo hint for George Rautzhong. It’s a photo of his tombstone, and I have chosen not to attach these to profiles.

I have two main reasons why I won’t attach a piece of evidence. First is that I don’t like the source—it might be a tertiary source, or just a picture I don’t care about. Second, I just don’t care about the profile: I mean, what do I gain from adding city directory entries for a sibling of my main line? Not much.

Ancestry.com has made an erroneous assumption that people will care equally about all the profiles in their tree.

I’ve been, at best, skeptical of the utility of ancestry.com’s features over the past year. But this tells me that great things could be on the way.

Genealogy records & U.S. Immigration Laws

I’ve often wondered why immigration records in the U.S. suddenly change. My great-grandfather’s record from 1904 was so detailed it listed his aunt’s name and address in Philadelphia. The 1846 ship-list that I think was for my 3rd-great-grandfather just listed name, gender and age. And when my 2nd-great-grandparents moved from Quebec to Minnesota in the 1870s, there was no record at all.

On my wife’s side, I can find lists of her Pennsylvania Dutch ancestors when they and their fellow adult male passengers took an oath of loyalty in Philadelphia, but her Scots-Irish ancestors seem to appear magically in Virginia with no records at all.

Why? I did some research, and it turns out, there are three major periods in immigration law in the United States.

After 1882, you start seeing detailed immigration records that are consistent across the country.

Between 1819 and 1882, you’ll have ship manifests with very basic information, but nothing for land crossings.

And before 1819, its entirely up to the officials at the port of entry.

In 1891, Congress established a Bureau of Immigration, finishing a transfer of immigration control from states and cities to the federal government which began in 1882. Ellis Island was part of this legal period, and from 1882 on, you’ll find standardized, detailed immigration records regardless of the port of entry.

If you use ancestry.com, you’ll find that only the basics have been indexed—name, gender, etc. What hasn’t been indexed is gold, though: each entry notes where and with whom the immigrant will be staying. This is family: see this entry for my great-grandfather, Michael Devaney? He reported he would be staying with his aunt, and she was unknown to everyone, in the U.S. and Ireland, researching that part of the Devaney family.

Prior that the important law was the Steerage Act of 1819, which required that ship captains provide a detailed registry of passengers with name, sex, age, and occupation those data points. You might get a bit more depending on the ship or port, but don’t get your hopes up. What I found for my third great-grandfather George Haggerty, who came to the U.S. in 1848, is pretty typical.

There are three interesting things to note here: first, ship captains could face legal penalties if the lists were inaccurate, so you can expect some degree of validation.

Second, the requirement was only for ship captains debarking passengers at a port of entry. If, for example, you landed at St. John, took a ferry down the St. Lawrence to Montreal, and then walked south to Manhattan, there wouldn’t be any record of your arrival in the United States. Likewise, a captain could drop some or all of his passengers off on an isolated beach on Cape May before sailing up the Delaware to Philadelphia.

Now, both of those scenarios may sound ridiculous, but thousands of Irish did this during the Famine. Why? Well, Congress charged ship captains a head tax for Irish passengers during the Famine. The English, on the other hand, wanted to settlers in Canada, and subsidized the fare to St. John, including the free ferry trip to Montreal.

Before 1819, though, it was entirely up to local governments, and outside of naturalization, most of them did nothing.

Tracking the immigration of English Quakers to Colonial Pennsylvania, for example, is probably going to be through religious sources, not secular. Quakers would carry certificates of removal from their Monthly Meeting in England so they could easily join a new Monthly Meeting in Pennsylvania, and those certificates would be mentioned in those meeting records. But the colonial government didn’t care.

The Puritans also kept detailed records of the Great Migration to Massachusetts, but this wasn’t official government record keeping either.

Virginia, you’ll find next to nothing.

The only exception was naturalization during the colonial period: under English law, any real property owned by an alien would revert to the Crown upon that alien’s death, not to his heirs.

This is why you can find basic information about ethnic German immigrants to the colonies of Pennsylvania and New York: they were eager to swear loyalty to the Crown so that they could pass any farmland they acquired to their children. They weren’t ship lists, they were oaths of loyalty to the Crown given shortly after arriving in Philadelphia.

Disabling ancestry.com’s hints doesn’t actually disable ancestry.com’s hints

Several months ago, I got so fed up with Ancestry.com’s hint system that I did this. Yup, I disabled all hints on all of my trees.

No more little wiggly leafs showing up on profiles distracting me with icons of little angels or records dated decades before my ancestor was born, or after they died.

Did you see that? In the upper left corner? No?

How about now. See those little leafs pop up?

Yeah. So the only thing that ancestry.com’s “hint disabling” feature does is prevent the little leaf from appearing in the top right-hand corner here.

And in those intervening months, Ancestry.com served me up over a thousand of what I have to imagine are useless hints.

Now, I’ll be honest, I’m the one who is incapable of ignoring all of those hints. I know intellectually that maybe only one or two of those will tell me something new, and so I shouldn’t bother to look. I know that I’m the one that can’t control myself. I admit that.

But I also know that ancestry.com has a data analytics team, and I bet you that analytics team has shown that hints drive user engagement, and that continued user engagement is highly correlated with, and possibly causal of, subscription renewals. So ancestry.com knows that if it keeps showing me hints, I’ll have a reason to keep coming back and paying them money without them having to improve their service much.

I do data analytics for a living for a big tech company. This is the kind of thing we get paid to do.

So… I’m going to start looking at ancestry.com competitors. See what else is out there, see if I can find a service that doesn’t distract me, that helps me focus on the genealogy research I want to focus on.

Not that I’ll be able to drop ancestry.com: the service is designed to be sticky, to make it difficult to switch to a competitor because you can’t move your data easily.

It’s a design model that’s falling out of favor in the commercial space because companies hate getting locked-in. But there’s little to stop it in the consumer space.

Ideas to improve Ancestry.com hints

Ancestry.com hints have become completely useless for me, and I have some ideas on how to improve them.

When I started using Ancestry.com nearly a decade ago the hint system was amazing, helping me turn some basic information from my wife’s aunts into a fleshed family tree.

But today, I would estimate that a third of the hints are for tertiary sources that I rarely use, another third are wildly and obviously wrong, and the remaining third are random images or copy/pastes from sites such as find-a-grave.

Moreover, nine of ten hints are for siblings of direct lines (and those siblings’ spouses) that won’t me get through brick walls and that I no longer care about.

In short, maybe one in five hundred hints tells me something new and useful about someone I care about.

Consider Charles Stanford. I know this fellow’s story: my father officiated Charles’ 1968 wedding to Jean O’Neill, my dad’s cousin. My father also delivered Charles’ and Jean’s eulogy after they were murdered by a drunk driver in 1987.

Now look at the hints ancestry.com recently offered about a family story I know.

This one hints that Charles married in 1919 decades before he was born. This one hints that he served in the Marines at the age of two. This one hints he died in Wales even though my profile clearly states he died in Pennsylvania.

Just about every day, Ancestry.com serves up useless hints like these. It’s like panning for gold, washing out pounds of mud in the hopes of finding a dust mote of gold.

Is this mess solvable? Yes. Here’s how.

  1. Set some basic data rules. If the profile I create sets a birth & death year & place, don’t show me hints for other states and countries, let alone for records before that person was born, or after they died.
  2. Even better, let me tune hint accuracy just like I can tune when running searches. At a minimum, let do this at the tree level, but I’d love to be able to adjust person by person.
  3. Let me turn off hints from particular sources. I don’t want to see recommendations for summary genealogies such as North American Family Histories or Sons of the American Revolution. Ever.
  4. Let me turn off hints person by person, even branch by branch. Most of the people in my tree are siblings and their spouses. Once I land the lineage and story for direct ancestors, I’m not going to learn anything else from siblings, cousins and their spouses. I don’t want to delete those people, but I don’t want to have my research distracted by hints about them.

But Ancestry.com should be able to go beyond those simple UI features. Many companies, including Microsoft, offer powerful, easy-to-configure artificial intelligence services that could really make ancestry.com hints useful.

  1. Help me with images. Image classification services, including facial recognition ones, could easily be trained to let exclude hints for icons of immigrant ships, DNA, angels, country flags, gravestones and other random stuff that I don’t care about. And by easily, I mean point-and-click artificial intelligence. Click the card above to see a quick demo of this with Azure’s Custom Vision AI service.
  2. Identify unique story contributions. Ancestry owns Find-a-grave, it can search the web, and it can use commercially available machine-learning services to exclude sources that were just copied from another site, and highlight sources that are unique and specially transcribed.
  3. Rank the hints, and funnel unlikely hints directly to the “undecided” bucket. Or only notify me in the general user interface if a hint is likely to be a match, while hiding the unlikely hints within the profile.

Of course, I don’t expect to see such improvements. To continue growing, Ancestry.com needs to expand their user base, and that means creating new genealogists. Developing features for users like me who will keep paying regardless doesn’t make much business sense.

Unless Ancestry.com realized that I would pay more for this.

Book Review: American Nations

I just finished a 2011 book by Colin Woodward called American Nations. The book is largely focused on trying to understand how North American politics works today based on emigration and settlements of different pre-colonial cultural & religious groups.

That history has a real application to genealogy, at least, the part where we try to understand who our ancestors were, what the believed and how they lived their lives.

Quick summary:

  1. It doesn’t matter where people came from, they become acculturated to the environment in which they live their lives.
  2. People didn’t just migrate to places in the United States where friends and family already lived. They migrated West within their cultural groups.
  3. Woodward has this really cool map of settlement patterns that can help you understand your ancestors’ culture, and
  4. That map can help you make educated guesses about where to look for records as you move back in time.

It’s these last two that I find fascinating and that I think makes Woodward’s book a good purchase. If you want to avoid the modern politics, just skip the last four or five chapters on modern cultural clashes.

Woodward posits that North America has eleven distinct cultural groups or nations, with nine of them pre-dating the Revolution.

I won’t go through them all—read the book, but in my wife’s family, there were really just four:

  1. Tidewater, an aristocratically inclined nation centered around the Chesapeake Bay
  2. Midlands, a moderate, pluralistic, Pacifist nation with Philadelphia Quaker roots.
  3. Yankeedom, a communitarian, utopian-inspired culture founded by New England Puritans.
  4. Appalachian, a libertarian-inclined nation founded by Scots-Irish settlers in the colonial Backcountry.

What’s fascinating is that these borders actually match to different branches in her tree, and I can almost always point to a particular event that had them cross.

First, take a look at Ohio. Woodward has this state split between three nations, Yankeedom, Appalachia and Midlands.

My wife’s maternal line has a lot of Ohio in it. Some Midlands Pennsylvania Dutch, some Appalachian Scots-Irish. For a more than a century, both families moved West within these boundaries.

The Scots-Irish line moved from the Virginia backcountry—what is West Virginia today—through Appalachian Kentucky, then Appalachian Southern Ohio, and finally Appalachian Southern Indiana and Illinois.

The Pennsylvania Dutch family went west through Pennsylvania—but only the Midlands Pennsylvania counties—to Midlands Northern Ohio.

How did the two families cross? Well, one branch of the Scots-Irish line ended up in Illinois along what Woodward asserts was on the Appalachian/Midland border, and then went to Midland Iowa. There, they met up with a branch of that Midland family that went all over Midlands territory, from Kansas to Iowa to Nebraska.

But beyond that, my wife’s maternal line is only Midlands and Appalachia.

My wife’s paternal line? Until the 1840s, there were really six distinct lines: three were Yankee, one dating back to the Mayflower, and another two that were acculturated in existing Yankee communities in New Brunswick and western New York. The fourth was pure Appalachian. The last two were ethnic German, one 100% Pennsylvania Dutch from colonial times, the other Germans from Russia.

These six lines had absolutely no geographic overlap until the mid 1800s. What brought them together? The Oregon Trail. Between 1843 and 1900, each branch went overland

Classifying genealogy-related images with Azure’s Custom Vision Service

One of my pet peeves on Ancestry.com is getting image hints for things like immigrant ship icons, DNA icons, angels, coats of arms, flags, etc.

I don’t begrudge the folks that want to decorate their tree with these badges, but I don’t care about them and I don’t want to see them as hints.

Now, ancestry.com does have a feature where users can note whether an image they’re uploading is a document or picture or what have you. But it’s not like you can search on it, and I suspect I’m one of the few people that bothers with it.

Thing is, machine learning could identify and categorize all the images we upload, make them searchable, and even exclude some—such as angels and DNA icons—from popping up as hints.

It’s not hard—there are many commercially available machine learning services that will categorize images with just a bit of training. Azure Custom Vision Service actually makes training such a model a drag-and-drop experience.

Let me show you.

First, I need to set up the service. It only takes a minute, assuming you already have an Azure subscription, which I do.

Next, I need to provide the Custom Vision service a bunch of images and tell them what they are.

If I were doing this for real, I’d prepare hundreds of images for far more categories. And, as this little warning notes, I would have equal groups for each category.

But for this demo, I think this is good enough.

Let’s test it!

First, let’s try it out on this DNA icon. See the results here? The model is 95% confident this is DNA, though it also thinks it might be a coat of arms.

This tombstone for Alonzo Hawn? The model is 100% confident.

The same goes for this newspaper article, for this photo of my dad, and the Powell family arms.

Interestingly, this “DNA verified” icon really trips up the model: it thinks this is a coat of arms. If I were really building this model, I’d run another iteration of the model after adding a bunch of images like this one.

Now, ancestry.com would have some more development work to do to make image categorization a feature of their service, but the hard part—the machine learning—isn’t hard anymore.

My thoughts on Ancestry.com’s DNA ThruLines beta

Ancestry.com recently introduced a new feature called DNA ThruLines. The idea is to merge DNA evidence and people’s family trees to help you break through brick walls.

It’s interesting, but I have three reservations about the execution:

  1. The DNA ThruLines icon shows up even when there’s nothing there.
  2. The “hints” are, ultimately, just other peoples’ family trees, which are typically just copies of copies lacking compelling evidence.
  3. It doesn’t help me break through brick walls by proposing connections that aren’t already there.

First, the DNA ThruLines user interface tosses up what I would consider false positives. Take a glance at my wife’s family tree. See that little blue ThruLines icons above Oliver Raser? I takes me three clicks to discover that there are no DNA ThruLines matches here.

Don’t waste my time, Ancestry.com. Hide the icon when you don’t have information to share.

Second, DNA ThruLines relies on other people’s family trees, and that’s a garbage-in, garbage-out scenario. To me, it feels exactly like getting hints from other peoples’ family trees, with no distinction between a profile with dozens of sources and one with no evidence at all.

Take my mom’s 2nd great-grandfather, James Brown. I’m confident in this lineage, but the DNA tests from our many distant cousins confirms we’ve got it right.

I have no idea who James Brown’s father was, however. But Ancestry.com has a suggestion: John Edward Brown. But there are no DNA matches from his other children, that is, via James Brown’s siblings—that’s the kind of proof I want to see from DNA.

And when I click through to John Edward Brown on “Steve’s Tree,” what do I get? A profile with no supporting evidence.

Really, how is this any different from family tree hints? DNA ThruLines, despite the cool the marketing language of DNA, is little different from the family tree hints they’ve offered for a decade.

And I hate using tertiary sources like this because they can be so unreliable. I am so reluctant to use them I even recorded a video about what I think is the only safe way.

My third point is that DNA ThruLines is missing the big opportunity here: helping us break through brick walls by identifying siblings and cousins, aunts and uncles. Researching relatives is one of my favorite ways to break through brick walls, as one you see in this video.

Here’s an interesting scenario from my mom’s DNA matches: her maternal line, the Posses, seemed to come to the United States without friends or family. It’s like Charles Poss magically appeared in Pennsylvania.

Now, there’s this really fascinating DNA match with two other members who document their descent from a Jacob Posz who was ten years older than our Charles, who was also a Catholic, and who came from Germany around the same time…

There’s a common ancestor there, but DNA ThruLines isn’t helping me make connections because the trees aren’t already matched together.

Here’s how I’d like it to work. This is also my favorite example of garbage-in, garbage out for DNA ThruLines.

Let’s take Thomas Chew. The consensus opinion is that his parents were Andrew Chew and Anna Mariah Barthist. And look here, Ancestry.com is suggesting Anna Barthist as a match based on similarities between my mother-in-law’s DNA and that of another member who descends through Thomas’ brother, Joseph Chew.

The evidence is pretty complicated, so I won’t present it all here. But the nut of it is that Joseph Chew’s wife was also a Chew, and I think that she was a descendant of Andrew Chew and Anna Mariah Barthist, while Thomas and Joseph Chew were from a completely different branch of the Chew family.

In fact, when I look at my mother-in-law’s DNA matches, she actually has two matches with people who descent from a completely different Chew branch.

Confused? Yeah, I can see why. Put it this way: DNA ThruLines is suggesting that my mother-in-law had two distinct lines of descent. And not because there’s some oddity in her DNA, but because two different genealogists have different views of the evidence and put different lineages in their trees.

What would I like to see? Well, it’s right here with that Chew example. I created a person in my tree named Wild Speculation Chew and connected him to Joseph and William Chew.

I want ancestry.com to take common surnames—like in my earlier Poss example and show me multiple potential lineages, with estimated degrees of separation, based on DNA matches with people’s whose lines of descent don’t appear to converge with my tree.

That’s the real opportunity here: giving me ideas on where and about whom I should research to break through a brick wall.

Building a porch potty for my dog

Our little rescue dog, Javy, wasn’t entirely house trained when we adopted him, and we weren’t really trained to his schedule. We tested out a temporary porch potty made of cardboard, and he took to it immediately.

In this video, I’ll show you how I built a more permanent one.