The Daily Parker

Politics, Weather, Photography, and the Dog is bad, but the Times should know better

I am agog at a bald impossibility in the New York Times' article today about the ACA exchange:

According to one specialist, the Web site contains about 500 million lines of software code. By comparison, a large bank’s computer system is typically about one-fifth that size.

There were three reporters in the byline, they have the entire Times infrastructure at their disposal, and still they have an unattributed "expert" opinion that the codebase is 33 times larger than Linux. 500 MLOC? Why not just say "500 gazillion?" It's a total Dr. Evil moment.

Put in other terms: it's like someone describing a large construction project—a 20-story office building, say—as having 500 million rivets in it. A moment's thought would tell you that the mass of 500 million rivets would approach the steel output of South Korea for last month.

The second sentence is nonsense also. "A large bank's computer system?" Large banks have thousands of computer systems; which one did you mean? Back to my example: it's like comparing the 500-million-rivet office building to "a large bank's headquarters."

I wouldn't be so out of my head about this if it weren't the Times. But if they can't get this right, what hope does any non-technical person have of understanding the problem?

One last thing. We, the people of the United States, paid for this software. HHS needs to disclose the source code of this monster. Maybe if they open-sourced the thing, they could fix it faster.

Small world

The Chicago technology scene is tight. I just had a meeting with a guy I worked with from 2003-2004. Back then, we were both consultants on a project with a local financial services company. Today he's CTO of the company that bought it—so, really, the same company. Apparently, they're still using software I wrote back then, too.

I love when these things happen.

This guy was also witness to my biggest-ever screw-up. (By "biggest" I mean "costliest.") I won't go into details, except to say that whenever I write a SQL delete statement today, I do this first:

FROM MissionCriticalDataWorthMillionsOfDollars
WHERE ID = 12345

That way, I get to see exactly what rows will be deleted before committing to the delete. Also, even if I accidentally hit <F5> before verifying the WHERE clause, all it will do is select more rows than I expect.

You can fill in the rest of the story on your own.

Unbelievably stupid Windows thing

Fortunately, I'm in an airport with lots of power outlets. Because my laptop just warned me that it was down to its last few milliamps, even though ordinarily the 90 W/h battery I lug around can last about 8 hours. What happened? Windows Search decided that consuming 50% of my CPU (i.e., two entire cores) was a good idea while running on battery.

So since I have an hour before boarding, and since I'm now plugged in (which means I don't have any worries about driving my portable HDD), here is a lovely picture of Montréal from earlier today:

Git is not Mercurial

I'm pulling the public repository for Orchard again, because I made a mistake with Git that I can't seem to undo. I've set up my environment to have a copy of the public repository, and then a working repository cloned from it. This allows me to try things out on my own machine, in private branches, while still pulling the public bits without the need to merge them into my working copy.

Orchard, which will soon (I hope) replace dasBlog as this blog's platform, recently switched from Mercurial to Git, to which led to this problem.

I may simply not have grasped all the nuances of Git. Git is extremely powerful, in the sense that it will do almost anything you tell it to do, without regard for the consequences. It reflects the ethos of the C++ programming language, which gave everyday programmers ways to screw up previously only available to experts.

My specific screw-up was that I accidentally attempted to push my local changes back to my copy of the Public repository. I had added about six changesets, which I couldn't extract from my copy of public no matter what I tried.

So, while writing this, I just pulled a clean copy of public, checked out the two branches I wanted (1.1 and fw45, for those keeping score at home), and merged with my existing changes.

Now I get to debug that mess...and I may toss it and start over.

Border cases

Just a quick note about debugging. I just spent about 30 minutes tracking down a bug that caused a client to get invoiced for -18 hours of premium time and 1.12 days of regular time.

The basic problem is that an appointment can begin and end at any time, but from 6pm to 8am, an appointment costs more per hour than during business hours. This particular appointment started at 5pm and went until midnight, which should be 6 hours of premium and 1 hour of regular.

The bottom line: I had unit tests, which automatically tested a variety of start and end times across all time zones (to ensure that local time always prevailed over UTC), including:

  • Starts before close, finishes after close before midnight
  • Starts before close, finishes after midnight before opening
  • Starts before close, finishes after next opening
  • Starts after close, finishes before midnight
  • Starts after close, finishes after midnight before opening
  • Starts after close, finishes after next opening
  • ...

Notice that I never tested what happened when the appointment ended at midnight.

The fix was a single equals sign, as in:

- if (localEnd > midnight & local <= localOpenAtEnd)
+ if (localEnd >= midnight & local <= localOpenAtEnd)

Nicely done, Braverman. Nicely done.

When the Azure emulator is more forgiving than real life

Last night I made the mistake of testing a deployment to Azure right before going to bed. Everything had worked beautifully in development, I'd fixed all the bugs, and I had a virgin Windows Azure affinity group complete with a pre-populated test database ready for the Weather Now worker role's first trip up to the Big Time.

The first complete and total failure of the worker role I should have predicted. Just as I do in the brick-and-mortar development world, I create low-privilege SQL accounts for applications to use. So immediately I had a bunch of SQL exceptions that I resolved with a few GRANT EXEC commands. No big deal.

Once I restarted the worker role, it connected to the database, loaded its settings, downloaded a file from NOAA and...crashed:

Inner Drive Weather threw System.Data.Services.Client.DataServiceRequestException

One of the request inputs is out of range.

at System.Data.Services.Client.DataServiceContext.SaveResult.d__1e.MoveNext()

Oh no. The dreaded Azure Storage exception that tells you absolutely nothing.

Flash forward fifteen minutes (now past midnight; and for context, I'm writing this on the 9am flight to Los Angeles), with Fiddler running on a local instance connecting to production Azure storage, and I found the XML block on which real Azure Storage barfed but the Azure storage emulator passed without a second thought. The offending table entity is metadata that the NOAA downloader worker task stores to let the weather parsing worker task know it has work to do:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
   <entry xmlns:d="" 
  <title />
    <name />
  <id />
  <content type="application/xml">
      <d:FileTime m:type="Edm.DateTime">2013-02-09T05:35:00Z</d:FileTime>
      <d:IsParsed m:type="Edm.Boolean">false</d:IsParsed>
      <d:ParseTime m:type="Edm.DateTime">0001-01-01T00:00:00</d:ParseTime>
      <d:RetrieveTime m:type="Edm.DateTime">2013-02-10T05:55:29.1084794Z</d:RetrieveTime>
      <d:Size m:type="Edm.Int32">68202</d:Size>
      <d:Timestamp m:type="Edm.DateTime">0001-01-01T00:00:00</d:Timestamp>

Notice that the ParseTime and Timestamp values are equal to System.DateTimeOffset.MinValue, which, it turns out, is not a legal Azure table value. Wow, would it have helped me if the emulator horked on those values during development.

The fix was simply to make sure that neither System.DateTimeOffset.MinValue nor System.DateTime.MinValue ever got into an outbound table entity, which took me about five minutes to implement. Also, it turned out that even though my table entity inherited from TableServiceEntity, I still had to set the Timestamp property when using real Azure storage. (The emulator sets it for you.)

By this point it was 12:30 and I needed to get some sleep, however. So my plan to run an overnight test will have to wait until this evening at my hotel. Then I'll find the other bits of code that work fine against the emulator but, for reasons that pass understanding, the emulator gets completely wrong.

Putting a bow on it

We're just 45 minutes from releasing a software project to our client for user acceptance testing (UAT), and we're ready. (Of course, there are those 38 "known issues..." But that's what the UAT period is for!)

When I get back from the launch meeting, I'll want to check these out:

Off to the client. Then...bug fixes!

Performance improvement; or, how one line of code can change your life

I'm in the home stretch moving Weather Now to Azure. I've finished the data model, data retrieval code, integration with the existing UI, and the code that parses incoming weather data from NOAA, so now I'm working on inserting that data into the database.

To speed up development, improve the design, and generally make my life easier, I'm using Entity Framework 5.0 with database-first modeling. The problem that consumed me yesterday afternoon and on into this morning has been how to ramp up to realistic volumes of data.

The Worker Role that will go out to NOAA and put weather data where Weather Now can use it will receive somewhere around 60,000 weather reports every hour. Often, NOAA repeats reports; sometimes, NOAA sends truncated copies of reports; sometimes, NOAA sends garbled reports. The GetWeather application (soon to be Azure worker task) has to handle all of that and still function in bursts of up to 10,000 weather reports at once.

The WeatherStore class takes parsed METARs and stores them in the CurrentObservations, PastObservations, and ClimateObservations tables, as appropriate. As I've developed the class, I've written unit tests for each kind of thing it has to do: "Store single report," "Store many reports" (which tests batching them up and inserting them in smaller chunks), "Store duplicate reports," etc. Then yesterday afternoon I wrote an integration test called "Store real-life NOAA file" that took the 600 KB, 25,000-line, 6,077-METAR update NOAA published at 2013-01-01 00:00 UTC, and stuffed it in the database.

Sucker took 900 seconds—15 minutes. In real life, that would mean a complete collapse of the application, because new files come in about every 4 minutes and contain similarly thousands of lines to parse.

This morning, I attached JetBrains dotTrace to the unit test (easy to do since JetBrains ReSharper was running the test), and discovered that 90% of the method's time was spent in—wait for it—DbContext.SaveChanges(). As I dug through the line-by-line tracing, it was obvious Entity Framework was the problem.

I'll save you the steps to figure it out, except to say Stack Overflow is the best thing to happen to software development since the keyboard.

Here's the solution:

using (var db = new AppDataContext())
	db.Configuration.AutoDetectChangesEnabled = false;

// do interesting work


The result: The unit test duration went from 900 seconds to...15. And that is completely acceptable. Total time spent on this performance improvement: 1.25 hours.

Chaining LINQ predicates

I've spent a good bit of free time lately working on migrating Weather Now to Azure. Part of this includes rewriting its Gazetteer, or catalog of places that it uses to find weather stations for users. For this version I'm using Entity Framework 5.0, which in turn allows me to use LINQ extensively.

I always try to avoid duplicating code, and I always try to write sufficient unit tests to prevent (and fix) any coding errors I make. (I also use ReSharper and Visual Studio Code Analysis to keep me honest.)

There are two methods in the Gazetteer's PlaceFinder class that search for places by distance. The prototypes are:

public static IEnumerable FindNearby(ILocatable center, Length radius)


public static IEnumerable FindNearby(ILocatable center, Length radius, Expression<Func<Place, bool>> predicate)

But in order for the first method to work, it has to create a predicate of its own to draw a box around the center location. (The ILocatable interface requires Latitude and Longitude. Length is a class in the Inner Drive Extensible Architecture representing a measurable two-dimensional distance.) So in order for the second method to work, it has to chain predicates.

Fortunately, I found Joe and Ben Albahari's library of LINQ extensions. Here's the second method:

public static IEnumerable<PlaceDistance> FindNearby(
	ILocatable center,
	Length radius,
	Expression<Func<Place, bool>> predicate)
	var searchPredicate = 
		SearchDistancePredicate(center, radius)

	var places = Find(searchPredicate);

	return SearchDistanceResults(places, center, radius);

This allows me to use a single Find method that takes a predicate, engages a retry policy, and returns exactly what I'm looking for. And it allows me to do this, which just blows my mind:

var results = PlaceFinder.FindNearby(TestNode, TestRadius, p => p.Feature.Name == "airport");

Compared with the way Weather Now works under the hood right now, and how much coding the existing code took to achieve the same results, I'm just stunned. And it will make migrating Weather Now a lot easier.

Upgrading to Azure Storage Client 2.0

Oh, Azure Storage team, why did you break everything?

I love upgrades. I really do. So when Microsoft released the new version of the Windows Azure SDK (October 2012, v1.8) along with a full upgrade of the Storage Client (to 2.0), I found a little side project to upgrade, and went straight to the NuGet Package Manager for my prize.

I should say that part of my interest came from wanting to use some of the .NET 4.5 features, including the asynchronous helper methods, HTML 5, and native support for SQL 2012 spatial types, in the new version of Weather Now that I hope to complete before year's end. The Azure SDK 1.8 supports .NET 4.5; previous version didn’t. And the Azure SDK 1.8 includes a new version of the Azure Emulator which supports 4.5 as well.

To support the new, Azure-based version (and to support a bunch of other projects that I migrated to Azure), I have a class library of façades supporting Azure. Fortunately, this architecture encapsulated all of my Azure Storage calls. Unfortunately, the upgrade broke every other line of code in the library.

0. Many have the namespaces have changed. But of course, you use ReSharper, which makes the problem go away.

1.The CloudStorageAccount.FromConfigurationSetting() method is gone. Instead, you have to use CloudStorageAccount.Parse(). Here is a the delta from TortoiseHg:

- _cloudStorageAccount = CloudStorageAccount.FromConfigurationSetting(storageSettingName);
+ var setting = CloudConfigurationManager.GetSetting(storageSettingName);
+ _cloudStorageAccount = CloudStorageAccount.Parse(setting);

2. BlobContainer.GetBlobReference() is gone, too. Instead of getting a generic IBlobContainer reference back, you have to specify whether you want a page blob or a block blob. In this app, I only use page blobs, so the delta looks like this:

- var blob = _blobContainer.GetBlobReference(blobName);
+ var blob = _blobContainer.GetBlockBlobReference(blobName);

Note that BlobContainer also has a GetPageBlobReference() method. It also has a nearly-useless GetBlobReferenceFromServer method that throws a 404 error if the blob doesn’t exist, which makes it useless for creating new blobs.

3. Blob.DeleteIfExists() works somewhat differently, too:

- var blob = _blobContainer.GetBlobReference(blobName);
- blob.DeleteIfExists(new BlobRequestOptions 
-	{ DeleteSnapshotsOption = DeleteSnapshotsOption.IncludeSnapshots });
+ var blob = _blobContainer.GetBlockBlobReference(blobName);
+ blob.DeleteIfExists();

4. Remember downloading text directly from a blob using Blob.DownloadText()? Yeah, that’s gone too. Blobs are all about streams now:

- var blob = _blobContainer.GetBlobReference(blobName);
- return blob.DownloadText();
+ using (var stream = new MemoryStream())
+ {
+ 	var blob = _blobContainer.GetBlockBlobReference(blobName);
+ 	blob.DownloadToStream(stream);
+ 	using (var reader = new StreamReader(stream, true))
+ 	{
+ 		stream.Position = 0;
+ 		return reader.ReadToEnd();
+ 	}
+ }

5. Because blobs are all stream-based now, you can’t simply upload files to them. Here’s the correction to the disappearance of Blob.UploadFile():

- var blob = _blobContainer.GetBlobReference(blobName);
- blob.UploadByteArray(value);
+ var blob = _blobContainer.GetBlockBlobReference(blobName);
+ using (var stream = new MemoryStream(value))
+ {
+ 	blob.UploadFromStream(stream);
+ }

6. Microsoft even helpfully corrected a spelling error which, yes, broke my code:

- _blobContainer.CreateIfNotExist();
+ _blobContainer.CreateIfNotExists();

Yes, if not existS. Notice the big red S, which is something I’d like to give the Azure team after this upgrade.*

7. We’re not done, yet. They fixed a "problem" with tables, too:

  var cloudTableClient = _cloudStorageAccount.CreateCloudTableClient();
- cloudTableClient.CreateTableIfNotExist(TableName);
- var context = cloudTableClient.GetDataServiceContext();
+ var table = cloudTableClient.GetTableReference(TableName);
+ table.CreateIfNotExists();
+ var context = cloudTableClient.GetTableServiceContext();

8. Finally, if you have used the CloudStorageAccount.SetConfigurationSettingPublisher() method, that’s gone too, but you don’t need it. Instead, use the CloudConfigurationManager.GetSetting() method directly. Instead of doing this:

if (RoleEnvironment.IsAvailable)
		(configName, configSetter) => 
		(configName, configSetter) => 

You can simply do this:

var someSetting = CloudConfigurationManager.GetSetting(settingKey);

The CloudConfiguration.GetSetting() method first tries to get the setting from Azure, then from the ConfigurationManager (i.e., local settings).

I hope I have just saved you three hours of silently cursing Microsoft’s Azure Storage team.

* Apologies to Bill Cosby.