English Cove HOA action against BLM signs

I thought about delivering a copy of the below text to all of my neighbors. Katie talked me out of it. But it’s cathartic to write down my thoughts. MY PERSONAL THOUGHTS ON ENGLISH COVE’S RULE ABOUT SIGNS In a recent email communication, the English Cove Board of Directors reminded…

Continue reading

A fire at my neighbor’s house

The power went out around 5pm in southeast Redmond yesterday, including in our neighborhood, the English Cove HOA. OMG EMERGENCY! Said the children. Who couldn’t get on screens. Oh the humanity!!!!! My wife, Katie, and I were already feeling lazy about dinner—it’d been two straight months on lockdown since returning…

Continue reading

Kusto: Creating an IfExists function to check if a table exists


One of the things I find lacking in Kusto is an explicit way to test for the existence of a table: in both Azure SQL and Azure Data Lake, the ifexists function and exists compiler directive, respectively served this purpose.

Kusto doesn’t seem to have an explicit statement supporting this, but you can roll your own using the isfuzzy union argument. The isfuzzy argument basically says that a union should run as long as at least one table exists.

So here’s how to create your own “ifexists” function for kusto.

Write a function that takes a table name as a string input. Within the function itself, create a datatable with a single row named Exists with a value of 0.

Then union that datatable with the function input using the table operator like so and do a count of rows alias with the name Exists. I’d limit this to a single row, so you minimize execution.

Finally, sum up the Exists column result of your union so that you have only a single row. If the tablename from the input exists and has at least one row, your function will return 1. If it doesn’t exist or is empty, your function will return 0.

When using this, you convert the table output of the function to a scalar value using the toscalar function, like so. That’s it.

Of course, you don’t have to use this as a distinct function: you could simply have a fuzzy union in your code.

There is another way to do this if you’re writing an ETL function that acts differently depending on whether the table exists. We do this with very large telemetry sets when we just want to pull new values rather than pulling months’ worth of data and overwriting.

When we’re actively developing such a function, we may need to change the schema or do a compete reload. So if the table doesn’t exist, we want to pull a much larger set of data rather than just the past day or so.

Here’s the trick: once a function is compiled, it will run regardless of whether the tables upon which it relies exist. When you aggregate a non-existent table to a single row and convert the output to a scalar—say, the most recent datetime—Kusto returns a null value without any error. We check if that value isempty, and if it is, we grab several months’ worth of data, rather than just the new rows.

Kusto: Creating Dimensions with the datatable operator

I’m Mike O’Neill and I’m a data nerd who uses Azure Data Explorer, or Kusto, every day to glean insights into Azure’s developer and code-to-customers operations.

In my last video, I talked about how to assess a new data source to identify potential simple dimensions.

Kusto presents a singular problem for creating simple dimensions, especially when you’re used to storing dimensions as a table. You can’t create a little table as you would in Azure SQL and just insert and update values as you please. You also can’t repeatedly overwrite a stream as I used to do in Azure Data Lake.

In fact, you can’t really update anything in Kusto, nor can you overwrite a table that easily either. But Kusto does have an interesting workaround: you can write a list of comma-delimited values, use the datatable operator to make it into a table, and embed that into a function. Need to make changes? Just edit the text.

Let me give you a real example I wrote last week. One of the things I need to track for Azure’s engineering pipeline work is the name of the datacenter to which we’re deploy. Now, there’s nothing secret about this dimension: I pulled the info from these two public webpages.

Instead of creating a table like I would in Azure SQL, I just format all the values as a CSV, and then wrap it with the datatable operator. That operator requires three things: the operator itself, the schema immediately after it, and then a pair of square brackets around the CSV values.

Create that as a function, et voila, you’ve got a dimension. Anytime I need to update the values, I just update the function.

It’s not a perfect solution, however: because you’re just working with text, you’ve got none of the referential integrity capabilities of Azure SQL. Nor can you easily rebuild the dimension programmatically so that you don’t make silly mistakes.

I made that mistake last week, though it was with a mapping function with about 300 values rather than a pure, simple dimension. My colleague left a comment on my pull request, asking me to “find Waldo.” It was a friendly tease because I hadn’t bothered to do a simple check for duplicates.

If you use this method, you’ll need to be extra careful to maintain and run regular unit tests every time you alter the function.

And since early March, all of us in the State of Washington have been living through social distancing for the novel corona virus, my colleague teased me a little bit more, with this new version of “where’s Waldo.”

Stay healthy.