The Daily Parker

Politics, Weather, Photography, and the Dog

Four unrelated stories

A little Tuesday morning randomness for you:

Back to debugging acceptance tests.

Thanks for playing

Richard Florida demonstrates how Amazon's HQ2 competition was rigged:

A detailed analysis undertaken by Patrick Adler, my colleague at the University of Toronto’s Martin Prosperity Institute, and Adam Singer, a graduate student at the university’s Rotman and Munk schools, took a look at how all 238 HQ2 applicant cities and the 20 finalists lined up on Amazon’s RFP criteria. While it can be difficult to measure whether a given city adheres to each criterion, their analysis shows that many of the finalist cities do not even fit the most obvious ones. What’s more, several of the rejected cities seem to fit Amazon’s criteria for its HQ2 city better than some of the finalists.  

[I]t’s worth asking why these 20 cities were selected as finalists, even if others would appear to be better candidates according to Amazon’s own criteria. Our analysis suggests the finalists may have other things in common that are not listed on the company’s RFP.

For one, the finalists are more likely to be farther away from the company’s original home base in physical distance, reflecting the predominance of East Coast cities on the list. Last year, an Amazon executive was quoted as saying that Amazon would like to build HQ2 outside of the Pacific Northwest, to attract a more diverse set of employees.

Finalist cities are also likely to have a larger share of tech workers. And they are more likely to have non-stop flights to the company’s current home base in Seattle.

But one factor is even more interesting. Our analysis found that shortlisted cities had more U.S. senators with considerable seniority.

At the end of the day, none of this should surprise us. Like all corporate site selection, the HQ2 process is a rigged game, where the company knows the answer in advance and sets up a fictitious competition to wrest maximum incentives.

Besides the political advantages, there are many signs that Amazon’s HQ2 is heading to the greater Washington, D.C. region—the fact that its CEO has a multi-million dollar mansion there (currently undergoing a $12 million renovation, with large public rooms for social events) and already owns the Washington Post; the fact that three area jurisdictions made the shortlist; and the fact that the person running Amazon’s search previously ran an economic development agency in the region. Perhaps four other metros on the list are serious contenders—New York, Boston, Chicago, and Toronto—with Philadelphia, Denver, Atlanta, and Dallas having an outside chance.

Chicago, however, will be less likely to play the race-to-the-bottom game.

List of 2018 A-to-Z topics

Blogging A to ZHere's the complete list of topics in the Daily Parker's 2018 Blogging A-to-Z challenge on the theme "Programming in C#":

Generally I posted all of them at noon UTC (7am Chicago time) on the proper day, except for the ones with stars. (April was a busy month.)

I hope you've enjoyed this series. I've already got topic ideas for next year. And next month the blog will hit two huge milestones, so stay tuned.

Z is for Zero

Blogging A to ZToday is the last day of the 2018 Blogging A-to-Z challenge. Today's topic: Nothing. Zero. Nada. Zilch. Null.

The concept of "zero" only made it into Western mathematics just a few centuries ago, and still has yet to make it into many developers' brains. The problem arises in particular when dealing with arrays, and unexpected nulls.

In C#, arrays are zero-based. An array's first element appears at position 0:

var things = new[] { 1, 2, 3, 4, 5 };
Console.WriteLine(things[1]);

// -> 2

This causes no end of headaches for new developers who expect that, because the array above has a length of 5, its last element is #5. But doing this:

Console.WriteLine(things[5]);

...throws an IndexOutOfRange exception.

You get a similar problem when you try to read a string, because if you recall, strings are basically just arrays of characters:

var word = "12345";
Console.WriteLine(word.Substring(4));

// 5

Console.WriteLine(word.Substring(5));

// IndexOutOfRange exception

The funny thing is, both the array things and the string word have a length of 5.

The other bugaboo is null. Null means nothing. It is the absence of anything. It equals nothing, not even itself (though this, alas, is not always true).

Reference types can be null, and value types cannot. That's because value types always have to have a value, while reference types can simply be a reference to nothing. That said, the Nullable<T> structure gives value types a way into the nulliverse that even comes with its own cool syntax:

int? q = null;
int r = 0;
Console.WriteLine(q ?? 0 + r);
// 0

(What I love about this "struct?" syntax is you can almost hear it in a Scooby Doo voice, can't you?)

Line 1 defines a nullable System.Int32 as null. Line 2 defines a bog-standard Int32 equal to zero. If you try to add them, you get a NullReference exception. So line 3 shows the coalescing operator that basically contracts both of these statements into a succinct little fragment:

// Long form:
int result;
if (q.HasValue)
{
	result = q.Value + r;
}
else
{
	result = 0 + r;
}

// Shorter form:
int result = (q.HasValue ? q.Value : 0) + r;

// Shortest form:
int result = q ?? 0 + r;

And so the Daily Parker concludes the 2018 Blogging A-to-Z challenge with an entire post about nothing. I hope you've enjoyed the posts this month. Later this morning, I'll post the complete list of topics as a permanent page. Let me know what you think in the comments. It's been a fun challenge.

Y is for Y2K (and other date/time problems)

Blogging A to ZI should have posted day 25 of the Blogging A-to-Z challenge. yesterday, but life happened, as it has a lot this month. I'm looking forward to June when I might not have the over-scheduling I've experienced since mid-March. We'll see.

So it's appropriate that today's topic involves one of the things most programmers get wrong: dates and times. And we can start 20 years ago when the world was young...

A serious problem loomed in the software world in the late 1990s: programmers, starting as far back as the 1950s, had used 2-digit fields to represent the year portion of dates. As I mentioned Friday, it's important to remember that memory, communications, and storage cost a lot more than programmer time until the last 15 years or so. A 2-digit year field makes a lot of sense in 1960, or even 1980, because it saves lots of money, and why on earth would people still use this software 20 or 30 years from now?

You can see (or remember) what happened: the year 2000. If today is 991231 and tomorrow is 000101, what does that do to your date math?

It turns out, not a lot, because programmers generally planned for it way more effectively than non-technical folks realized. On the night of 31 December 1999, I was in a data center at a brokerage in New York, not doing anything. Because we had fixed all the potential problems already.

But as I said, dates and times are hard. Start with times: 24 hours, 60 minutes, 60 seconds...that's not fun. And then there's the calendar: 12 months, 52 weeks, 365 (or 366) days...also not fun.

It becomes pretty obvious even to novice programmers who think about the problem that days are the best unit to represent time in most human-scale cases. (Scientists, however, prefer seconds.) I mentioned on day 8 that I used Julian day numbers very, very early in my programming life. Microsoft (and the .NET platform) also uses the day as the base unit for all of its date classes, and relegates the display of date information to a different set of classes.

I'm going to skip the DateTime structure because it's basically useless. It will give you no end of debugging problems with its asinine DateTime.Kind member. This past week I had to fix exactly this kind of thing at work.

Instead, use the DateTimeOffset structure. It represents an unambiguous point in time, with a double value for the date and a TimeSpan value for the offset from UTC. As Microsoft explains:

The DateTimeOffset structure includes a DateTime value, together with an Offset property that defines the difference between the current DateTimeOffset instance's date and time and Coordinated Universal Time (UTC). Because it exactly defines a date and time relative to UTC, the DateTimeOffset structure does not include a Kind member, as the DateTime structure does. It represents dates and times with values whose UTC ranges from 12:00:00 midnight, January 1, 0001 Anno Domini (Common Era), to 11:59:59 P.M., December 31, 9999 A.D. (C.E.).

The time component of a DateTimeOffset value is measured in 100-nanosecond units called ticks, and a particular date is the number of ticks since 12:00 midnight, January 1, 0001 A.D. (C.E.) in the GregorianCalendar calendar. A DateTimeOffset value is always expressed in the context of an explicit or default calendar. Ticks that are attributable to leap seconds are not included in the total number of ticks.

Yes. This is the way to do it. Except...well, you know what? Let's skip how the calendar has changed over time. (Short answer: the year 1 was not the year 1.)

In any event, DateTimeOffset gives you methods to calculate time and dates accurately across a 20,000-year range.

Which is to say nothing of time zones...

X is for XML vs. JSON

Blogging A to ZWelcome to the antepenultimate day (i.e., the 24th) of the Blogging A-to-Z challenge.

Today we'll look at how communicating between foreign systems has evolved over time, leaving us with two principal formats for information interchange: eXtensible Markup Language (XML) and JavaScript Object Notation (JSON).

Back in the day, even before I started writing software, computer systems talked to each other using specific protocols. Memory, tape (!) and other storage, and communications had significant costs per byte of data. Systems needed to squeeze out every bit in order to achieve acceptable performance and storage costs. (Just check out my computer history, and wrap your head around the 2400 bit-per-second modem that I used with my 4-megabyte 386 box, which I upgraded to 8 MB for $350 in 2018 dollars.)

So, if you wanted to talk to another system, you and the other programmers would work out a protocol that specified what each byte meant at each position. Then you'd send cryptic codes over the wire and hope the other machine understood you. Then you'd spend weeks debugging minor problems.

Fast forward to 1996, when storage and communications costs finally dropped below labor costs, and the W3C created XML. Now, instead of doing something like this:

METAR KORD 261951Z VRB06KT 10SM OVC250 18/M02 A2988

You could do something like this:

<?xml version="1.0" encoding="utf-8"?>
<weatherReport>
	<station name="Chicago O'Hare Field">KORD</station>
	<observationTime timeZone="America/Chicago" utc="2018-04-26T19:51+0000">2018-04-26 14:51</observationTime>
	<winds>
		<direction degrees="">Variable</direction>
		<speed units="Knots">6</speed>
	</winds>
	<visibility units="miles">10</visibility>
	<clouds>
		<layer units="feet" ceiling="true" condition="overcast">25000</layer>
	</clouds>
	<temperature units="Celsius">18</temperature>
	<dewpoint units="Celsius">-2</dewpoint>
	<altimeter units="inches Hg">29.88</altimeter>
</weatherReport>

The XML only takes up a few bytes (612 uncompressed, about 300 compressed), but humans can read it, and so can computers. You can even create and share an XML Schema Definition (XSD) describing what the XML document should contain. That way, both the sending and receiving systems can agree on the format, and change it as needed without a lot of reprogramming.

To display XML, you can use eXtensible Style Language (XSL), which applies CSS styles to your XML. (My Weather Now project uses this approach.)

Only a few weeks later, Douglas Crockford defined an even simpler standard: JSON. It removes the heavy structure from XML and presents data as a set of key-value pairs. Now our weather report can look like this:

{
  "weatherReport": {
    "station": {
      "name": "Chicago O'Hare Field",
      "icao code": "KORD"
    },
    "observationTime": {
      "timeZone": "America/Chicago",
      "utc": "2018-04-26T19:51+0000",
      "local": "2018-04-26 14:51 -05:00"
    },
    "winds": {
      "direction": { "text": "Variable" },
      "speed": {
        "units": "Knots",
        "value": "6"
      }
    },
    "visibility": {
      "units": "miles",
      "value": "10"
    },
    "clouds": {
      "layer": {
        "units": "feet",
        "ceiling": "true",
        "condition": "overcast",
        "value": "25000"
      }
    },
    "temperature": {
      "units": "Celsius",
      "value": "18"
    },
    "dewpoint": {
      "units": "Celsius",
      "value": "-2"
    },
    "altimeter": {
      "units": "inches Hg",
      "value": "29.88"
    }
  }
}

JSON is easier to read, and JavaScript (and JavaScript libraries like JQuery) can parse it natively. You can add or remove key-value pairs as needed, often without the receiving system complaining. There's even a JSON Schema project that promises to give you the security of XSD.

Which format should you use? It depends on how structured you need the data to be, and how easily you need to read it as a human.

More reading:

W is for while (and other iterators)

Blogging A to ZWe're in the home stretch. It's day 23 of the Blogging A-to-Z challenge and it's time to loop-the-loop.

C# has a number of ways to iterate over a collection of things, and a base interface that lets you know you can use an iterator.

The simplest ways to iterate over code is to use while, which just keeps looping until a condition is met:

var n = 1;
while (n < 6)
{
	Console.WriteLine($"n = {n}");
	n++;
}
Console.WriteLine("Done");

while is similar to do:

var n = 1;
do
{
	Console.WriteLine($"n = {n}");
	n++;
} while (n < 6);
Console.WriteLine("Done");

The main difference is that the do loop will always execute once, but the while loop may not.

The next level up is the for loop:

for (var n = 1; n < 6; n++)
{
	Console.WriteLine($"n = {n}");
}
Console.WriteLine("Done");

Similar, no?

Then there is foreach, which iterates over a set of things. This requires a bit more explanation.

The base interface IEnumerable and its generic equivalent IEnumerable<T> expose a single method, GetEnumerator (or GetEnumerator<T>) that foreach uses to go through all of the items in the class. Generally, anything in the BCL that holds a set of objects implements IEnumerable: System.Array, System.Collections.ICollection, System.Collections.Generic.List<T>...and many, many others. Each of these classes lets you manipulate the set of objects the thing contains:

var things = new[] { 1, 2, 3, 4, 5 }; // array of int, or int[]
foreach(var it in things)
{
	Console.WriteLine(it);
}

foreach will iterate over all the things in the order they were added to the array. But it also works with LINQ to give you even more power:

var things = new List<int> {1, 2, 3, 4, 5};
foreach (var it in things.Where(p => p % 2 == 0))
{
	Console.WriteLine(it);
}

Three guesses what that snippet does.

These keywords and structures are so fundamental to C#, I recommend reading up on them

V is for var

Blogging A to ZFor my second attempt at this post (after a BSOD), here (on time yet!) is day 22 of the Blogging A-to-Z challenge.

Today's topic: the var keyword, which has sparked more religious wars since it emerged in 2007 than almost every other language improvement in the C# universe.

Before C# 3.0, the language required you to declare every variable explicitly, like so:

using System;
using InnerDrive.Framework.Financial;

Int32 x = 123; // same as int x = 123;
Money m = 123;

Starting with C# 3.0, you could do this instead:

var i = 123;
var m = new Money(123);

As long as you give the compiler enough information to infer the variable type, it will let you stop caring about the type. (The reason line 2 works in the first example is that the Money struct can convert from other numeric types, so it infers what you want from the assignment. In the second example, you still have to declare a new Money, but the compiler can take it from there.)

Some people really can't stand not knowing what types their variables are. Others can't figure it out and make basic errors. Both groups of people need to relax and think it through.

Variables should convey meaning, not technology. I really don't care whether m is an integer, a decimal, or a Money, as long as I can use it to make the calculations I need. Where var gets people into trouble is when they forget that the compiler can't infer type from the contents of your skull, only the code you write. Which is why this is one of my favorite interview problems:

var x = 1;
var y = 3;
var z = x / y;

// What is the value of z?

The compiler infers that x and y are integers, so when it divides them it comes up with...zero. Because 1/3 is less than 1, and .NET truncates fractions when doing integer math.

In this case you need to do one of four things:

  • Explicitly declare x to be a floating-point type
  • Explicitly declare y to be a floating-point type
  • Explicitly declare the value on line 1 to be a floating-point value
  • Explicitly declare the value on line 2 to be a floating-point value
// Solution 1:

double x = 1;
int y = 3;
var z = x / y;

// z = 0.333...

// Solution 3:

var x = 1f;
var y = 3;
var z = x / y;

// z == 0.333333343

(I'll leave it as an exercise for the reader why the last line is wrong. Hint: .NET has three floating-point types, and they all do math differently.)

Declaring z to be a floating-point type won't help. Trust me on this.

The other common reason for using an explicit declaration is when you want to specify which interface to use on a class. This is less common, but still useful. For example, System.String implements both IEnumerable and IEnumerable<char>, which behave differently. Imagine an API that accepts both versions and you want to specify the older, non-generic version:

var s = "The lazy fox jumped over the quick dog.";
System.Collections.IEnumerable e = s;

SomeOldMethod(e);

Again, that's an unusual situation and not the best code snippet, but you can see why this might be a thing. The compiler won't infer that you want to use the obsolete String.IEnumerable implementation under most circumstances. This forces the issue. (So does using the as keyword.)

In future posts I may come back to this, especially if I find a good example of when to use an explicit declaration in C# 7.

U is for UUID

Blogging A to ZFor day 21 of the Blogging A-to-Z challenge I'm going to wade into a religious debate: UUIDs vs. integers for database primary keys.

First, let's define UUID, which stands for Universally Unique Identifier. A UUID comprises 32 hexadecimal digits typically displayed in 5 groups separated by dashes. The actual identifier is 128 bits long, meaning the chance of a collision between any two of them is slightly lower than the chance of finding a specific grain of dust somewhere in the solar system.

An integer, on the other hand, has just 32 or 64 bits, depending on the system you're using. Not only do integers collide frequently, but given that incrementing integer keys typically start at 1, they collide all the time. Also, using an incrementing integer, you don't know what ID your database will give you before you insert a given row, unless you create some gnarly SQL that hits the database a minimum of twice.

Many people have weighed in on whether to use UUIDs or auto-incrementing integers for database keys. People argue about the physical alignment of rows, debugging and friendly URLs, stable IDs vs deterministic IDs, non-uniqueness across tables, inadvertent data disclosure...lots of reasons to use one or the other.

The bottom line? It doesn't really matter. What matters is that you have sensible, non-religious reasons for your choice.

Both UUIDs and serial integers have their place, depending on the context. If you have a lookup table that users will never see, use serial IDs; who cares? If you use an ORM extensively, you might prefer UUIDs.

If you're new to programming, all of this seems like angels on the head of a pin. So read up on it, listen to the arguments on both sides, and then decide what works to solve your problem. Which is basically what you should do all the time as a professional programmer.

S is for String

Blogging A to ZDay 19 of the Blogging A-to-Z challenge was Saturday, but Apollo After Hours drained me more or less completely for the weekend.

So this morning, let's pretend it's still Saturday for just a moment, and consider one of the oddest classes in the .NET Base Class Library (BCL): System.String.

A string is just a sequence of one or more characters. A character could be anything: a letter, a number, a random two-byte value, what have you. System.String holds the sequence for you and gives you some tools to control them, like Compare, Format, Join, Split, and StartsWith. Under the hood, the class holds the string as an array of char values.

Even though System.String is a class and not a struct, it behaves much more like the latter than the former. Strings are immutable: once you create a string, any changes you make to it create a new string instance. Also, below a certain length, strings live on the stack rather than in the heap, which has consequences for memory management and performance. But unlike structs, strings can be null. This is valid code:

var s = "Hello, world";
string t = null;
var u = t + s;

Strings can also be zero-length or all whitespace, which is why the class has a very useful method String.IsNullOrWhitespace().

The blog C# in Depth has a good description of strings that's worth reading. Jon Skeet also takes on string memory management in one of his longer posts.