The Daily Parker

Politics, Weather, Photography, and the Dog

When a convenience becomes a nuisance, Azure VM edition

We've been using Microsoft Azure virtual machines for development for a while. This means we run our Visual Studio instances in the cloud up on special virtual machines that have nothing on them except the bare minimum required for writing software. This keeps different projects separate from each other, and also speeds up network access, which is useful for network-intensive applications.

We started noticing, however, that going to MSDN or Google or other big sites became...challenging. All of these sites started acting as if our VMs were located in Brazil, when we knew perfectly well that they were in Virginia. Microsoft has finally explained the problem:

IPv4 address space has been fully assigned in the United States, meaning there is no additional IPv4 address space available. This requires Microsoft to use the IPv4 address space available to us globally for the addressing of new services. The result is that we will have to use IPv4 address space assigned to a non-US region to address services which may be in a US region. It is not possible to transfer registration because the IP space is allocated to the registration authorities by Internet Assigned Numbers Authority.

At times your service may appear to be hosted in a non-US location.

It is important to note that the IP address registration authority does not equate to IP address physical location (i.e., you can have an IP address registered in Brazil but allocated to a device or service physically located in Virginia). Thus when you deploy to a U.S. region, your service is still hosted in U.S. and your customer data will remain in the U.S.

In other words, Microsoft's cloud service is so popular that they have run out of addresses to assign to it. Microsoft, it should be noted, has tens of millions of IPv4 addresses available. (Of course, IPv4 has only 4.2 trillion possible addresses, though fully 43 billion are in private IP ranges.)

Stupid error messages that cost lots of time

Parker and I haven't yet left for Ribfest because I've just spent two and a half hours debugging an application.

After upgrading the application to the current version of the Inner Drive Extensible Architecture™ the thing wouldn't start. I simply got an error message in plain text, "The page cannot be displayed because an internal server error has occurred." The Windows Application Log supplied this clue:

The worker process for application pool 'a177c227-f36e-4874-aefe-9b41ca0d14ec' encountered an error 'Cannot read configuration file ' trying to read global module configuration data from file '\\?\C:\Users\dab\AppData\Local\dftmp\Resources\02e946dc-c92e-4774-a19a-5b013a38da65\temp\temp\RoleTemp\applicationHost.config', line number '0'. Worker process startup aborted.

Searching through Stack Overflow gave me a few clues, but nothing concrete. So I had to go through the web.config file line by line until I found this:

<system.diagnostics>
	<trace>
		<listeners>
			<add type="Microsoft.WindowsAzure.Diagnostics.DiagnosticMonitorTraceListener,
		Microsoft.WindowsAzure.Diagnostics, Version=1.8.0.0, Culture=neutral, 
		PublicKeyToken=31bf3856ad364e35" name="AzureDiagnostics">
				<filter type="" />
			</add>
		</listeners>
	</trace>
</system.diagnostics>

Deleting the configuration section altogether worked. So did changing the 1.8 to a 2.2. And now the application runs. And now Parker and I are going to get ribs.

The error message is just stupid programmers being lazy. It isn't really that hard to write error messages that tell users what has gone wrong. In this case, line number 0 wasn't the problem; it was farther down in the configuration file, and in fact it had very little to do with the configuration file at all.

I would like to have seen a message in the application log that "the system could not load Microsoft.WindowsAzure.Diagnostics version 1.8." Why was that too hard for the Azure Emulator team?

Not making my job easier today

Microsoft Azure is having some difficulties today in its East data center. It's causing hiccups. Nothing more. Just hiccups. But these hiccups are peculiarly fatal to the Weather Now worker process, so it keeps dying. Before dying, it texts me. So in the last 18 hours I've gotten about 30 texts from my dying worker process.

Maybe it's just telling me to go see Edge of Tomorrow?

Update, 15:15 CDT: Microsoft has finally updated the service dashboard to reflect the horkage.

That seemed to go well...

The deployment, I mean. Everything works, at least on the browsers I've used to test it. I ran the deployment three times in Test first, starting from a copy of the Production database each time, so I was as confident as I could be when I finally ran it against the Production database itself. And, I made sure I can swap everything back to the old version in about 15 minutes.

Also, I snuck away to shoot publicity photos for Spectralia again, same as last year. I'll have some up by the end of the week, after the director has seen them.

Scary software deployment

Jez Humble, who wrote the book on continuous delivery, believes deployments should be boring. I totally agree; it's one of the biggest reasons I like working with Microsoft Azure.

Occasionally, however, deploying software is not at all boring. Today, for example.

Because Microsoft has ended support for Windows Server 2008 as of next week, I've upgraded an old application that I first released to Azure in August 2012. Well, actually, I updated it back in March, so I could get ahead of the game, and the boring deployment turned horrifying when half of my client's customers couldn't use the application because the OS upgrade broke their Windows XP/IE8 user experience. Seriously.

All of my client's customers have now upgraded to Chrome, IE11, or Firefox, and I've tested the app on all three browsers. Everything works. But now I have to redeploy the upgrade, and I've got a real feeling of being once-bitten.

The hard part, the part that makes this a one-way upgrade, is a significant change to the database schema. All the application's lookup lists, event logging, auditing, and a few other data structures, are incompatible with the current Production version. Even if there weren't an OS upgrade involved, the database changes are overdue, so there is no going back after this.

Here are the steps that I will take for this deployment:

  1. Copy current Production database to new MigrationTest database
  2. Upgrade MigrationTest database
  3. Verify Test settings, connection strings, and storage keys
  4. Deploy Web project to Test instance (production slot)
  5. Validate Test instance
  6. Deploy Worker project to Test instance (production slot)
  7. Validate Worker instance
  8. Shut down Production instance
  9. Back up Production database to bacpac
  10. Copy Production database within SQL instance
  11. Upgrade Production database
  12. Verify Production settings, connection strings, and storage keys
  13. Deploy solution to Production instance (staging slot)
  14. Validate Production Web instance
  15. Validate Production Worker instance
  16. VIP swap to Production

Step 1 is already complete. Step 2 will be delayed for a moment while I apply a patch to Visual Studio over my painfully-slow internet connection (thanks, AT&T!). And I expect to be done with all of this in time for Game of Thrones.

How to look up Azure table data by ID

Short answer: You can't. So don't try.

Back in 2007, when I wrote a scheduling application for a (still ongoing!) client, Azure was a frustrating research project at Microsoft. Every bit of data the application stored went into SQL Server tables including field-level auditing and event logs.

The application migrated to Azure in August 2012, still logging every audit record and event to SQL tables, which are something like 10x more expensive per byte than Azure Table Storage. Recently, I completed an upgrade to the Inner Drive Extensible Architecture™ so it can now use Azure table storage for both.

The old application knew nothing about this. So upgrading the application with the new IDEA bits worked fine for writing to the audit and event logs, but completely broke reading from them.

Here's the code the app uses displaying a specific audit record so an administrator can see the field-level details:

// Get the repository from Castle IoC:
var repo = ActiveContainer.Instance.Container.Resolve<IAuditRepository>();

// Get the individual audit record by ID:
var audit = repo.Find(id);

That's great if the audit record uses a database identity key. Unfortunately it does not; it uses an Azure partition and row key combination.

I agonized for a couple of days how to fake database identities in Azure, or how to write a mapping table, or do some other code- or data-intensive thing or another. Then this afternoon, epiphany!

The user viewing an audit record is doing it in the context of reviewing a short list of audit headers. They click on one of these headers to pop up the detail box. The detail box uses the audit ID like this: https://scheduler.myclient.com/Auditing/AuditDetailViewer/12345.

It turns out, the only time a user cares about audit details is when she has the audit list right in front of her. So the audit ID is irrelevant. It only has to be unique within the context of the user's experience.

Here's the solution. In the Auditing class, which generates the audit list, I do this:

foreach (var item in orderedAudits)
{
	// Temporal cohesion: Add identity to Audit before using
	AddIdentity(item);
	Cache(item);
	CreateAuditRow(placeHolder, isObjectSpecified, item, timeZone);
	// End temporal cohesion
}

The cache uses least-frequently-used scavenging with a capacity of 512 items. (If the one user who cares about auditing ever needs to see more than 512 audit items in one list, I'll coach him not to do that.) Items live in the cache until it gets full, at which time the least-used ones are removed. This lets me do this in the audit detail view's control code:

var audit = Auditing.Find(id);

The Auditing.Find method simply plucks the item from the cache. If it returns null, oops, the audit details are missing or have expired from the cache, sorry. Just rerun the list of audits that you just clicked on.

I'm going to use a similar approach to the event log.

Waiting for software to deploy...

I'm uploading a couple of fixes to Inner-Drive.com right now, so I have a few minutes to read things people have sent me. It takes a while to deploy the site fully, because the Inner Drive Extensible Architecture™ documentation (reg.req.) is quite large—about 3,000 HTML pages. I'd like to web-deploy the changes, but the way Azure cloud services work, any changes deployed that way get overwritten as soon as the instance reboots.

All of the changes to Inner-Drive.com are under the hood. In fact, I didn't change anything at all in the website. But I made a bunch of changes to the Azure support classes, including a much better approach to logging inspired by a conversation I had with my colleague Igor Popirov a couple of weeks ago. I'll go into more details later, but suffice it to say, there are some people who can give you more ideas in one sentence than you can get in a year of reading blogs, and he's one of them.

So, while sitting here at my remote office waiting for bits to upload, I encountered these things:

  • The bartender's iPod played "Bette Davis Eyes" which immediately sent me back to this.
  • Andrew Sullivan pointed me (and everyone else who reads his blog) towards the ultimate Boomer fantasy, the live-foreverists. (At some point in the near future I'm going to write about how much X-ers hate picking up after both Boomers and Millennials, and how this fits right in. Just, not right now.)
  • Slate's Jamelle Bouie belives Wisconsin's voter rights decision is a win for our cause. ("Our" in this case includes those who believe retail voter fraud is so rare as to be a laughable excuse for denying a sizable portion of the population their voting rights, especially when the people denied voting rights tend to be the exact people who Republicans would prefer not to vote.)

OK, the software is deployed, and I need to walk Parker now. Maybe I'll read all these things after Game of Thrones.

Scaling Azure websites globally

I want to try this:

In less than an hour [my website] went from a small prototype in a data center in Chicago and then scaled it out to datacenters globally and added SSL.

The step-by-step explanation is worth a read if you do anything in .NET.

Stuff I didn't get to this afternoon

Busy day, so I'm just flagging these for later:

Back to the mines...

Even Azure requires maintenance

Yesterday I migrated this blog and four other ASP.NET websites from a Windows 2008 Microsoft Azure virtual machine (VM) to a brand-new Windows 2012 R2 VM. I did this because Microsoft has announced the end-of-life for Windows 2008 VMs on June 1st, so I thought I'd get a jump on it.

VMs usually mean never having to say "reinstall." Unfortunately, since this involved upgrading three steps at once, I decided it would be simpler just to launch a new VM and migrate the applications using FTP.

Seven hours and 25 minutes later, everything works, and I've archived the old VM's virtual hard disk (VHD). Why did it take 7:25 to complete?

Forget it. I'm not reliving those hours. I will say only that at least 90 minutes of that time was completely wasted because my AT&T Uverse FiOS line doesn't...quite...make it to my building, limiting it to 1.5 Mbps. Yes, I have a 1.5 Mbps Internet line. While waiting for things to download and upload yesterday, I spoke with them, and they assured me that I have the fastest Uverse service available to me.

Which brings up the other problem with doing so much in Microsoft Azure: you need good Internet connectivity. Which I don't have. Which meant I spent a lot of time yesterday rubbing Parker's belly and cursing AT&T.