Sunday, December 14, 2014

Experimenting with Azure Stream Analytics

Just little over a month ago Microsoft announced public preview of a new service – Stream Analytics. A service designed to process in (near) real time huge amount of streamed data. With its current state the services integrates with Azure Event Hubs and Azure Blob storage for data source streams (also called Inputs) and Event Hubs, Blob Storage, Azure SQL Database as possible write targets (also called Outputs). With the support of SQL-like language, you can design your stream processor so you can slice and dice your real-time input data, and turn it into a trustful information.

Now comes the power of cloud. In couple of easy steps and couple of hours you can bring up a reliable infrastructure that can handle tens of thousands events/messages per second. I was really curious how far can it go in a simple test. So I quickly made up a test scenario. Base for my experiment is the getting started tutorial here. There is a small issue with “Start the Job” step. Described is that you must go to “configure” section for your Job in order to adjust your job output start time. This configuration however is not located under Configure section. This specific setting is configured on the window where you start your job:

Now. In order to make the things more interesting I made the following adjustments:

  • Scaled my event hub to 10 scale units. Thus achieving potentially 10000 events per seconds target.
  • Changed the Event Hub sample code a bit to bump up more messages.
  • Created small PowerShell to help me start N simultaneous instances of my command line app
  • Did everything on a VM in same Azure DC (West Europe) where my Event Hub and Stream Analytics are running

Code changes to the original Service Bus Event Hub demo code.

I stripped out all unnecessary code (i.e. creating the event hub – I have already created it, I know it is there, parsing command line arguments, etc.).  My final Program.cs looks like this:

        static void Main(string[] args)
{
System.Net.ServicePointManager.DefaultConnectionLimit = 1024;
eventHubName = "salhub";
Console.WriteLine("Start sending ...");
Stopwatch sw = new Stopwatch();
sw.Start();
Paralelize();
sw.Stop();
Console.WriteLine("Completed in {0} ms", sw.ElapsedMilliseconds);
Console.WriteLine("Press enter key to stop worker.");
Console.ReadLine();
}

static void Paralelize()
{
Task[] tasks = new Task[25];
for (int i = 0; i < 25; i++)
{
tasks[i] = new Task(()=>Send(2000));
}

Parallel.ForEach(tasks, (t) => { t.Start(); });
Task.WaitAll(tasks);
}

public static void Send(int eventCount)
{
Sender s = new Sender(eventHubName, eventCount);
s.SendEvents();
}


Now with this single command line app, I am sending 25 x 2 000, or 50 000 messages in parallel. To make things funnier I run this single console app in pseudo-parallel by just starting it 20 times with this simple PowerShell script:

for($i=1; $i -le 20; $i++)
{
start .\BasicEventHubSample.exe
}

Thus I start the processes almost the same time. And wait to finish, i.e. to have all processes send all their messages. Twenty times 50 000 messages should make 1 000 000 messages. Then just get the result of the slowest operation. Of course all the measures are then a little approximate, but good enough to give me idea about the possibilities in my hands. Without the need to invest in expensive hardware and developing complex solutions. One more thing – I started my stream analytics job before I start my data pumping command line executable, just to make sure that the Stream processor is already there when I start bombing with data.


Please note couple of things. First of all Stream Analytics is in preview, so there might be issues and glitches. But the end results are just astonishing. Looking at the graphs for both the Event Hub and Stream analytics is just awesome. By the way, the last thing that I proved, is that new service tiers of Azure SQL Database are also awesome. With this amount of data in stream analytics, it had no issues writing the results into a single Basic (with 5 DTUs) database! I began seeing the results in my SQL Database table at the moment I switched from started command line programs to my SQL Server management studio and could see the result coming in real time.


Bottom line, with my last try, I bumped 1 000 000 events into Event hub in just about 75 seconds! That makes a little above 13 000 events in second! With just couple of line of code. How cool it is to look at graphic like this:



How cool it is to look at graphics like the Azure Event Hubs one:



Azure Event hubs, millions of messages. How long would it take us if we had to create a local test lab to process that amount of data?


We have to not forget some of the known issues and limitations for the Stream Analytics as listed here. Most important of them being:



  • Geographic availability (Central US and West Europe)

  • Streaming Unit quota (12 streaming units per azure per region per subscription!)

  • UTF-8 as the only supported encoding for CSV and JSON input sources

  • Really neat performance metrics such as latency are not currently provided

With this base line, I am convinced that Azure Event Hubs can really deliver millions of events per second throughput, and that Stream Analytics can really process that amount of data.

Saturday, December 6, 2014

Easy authentication in Azure Web Sites

Since couple of year (3-4) I strongly evangelize single-sign-on, federated identity, claims authentication and so on. There are at least two major points to support that:

You (as developer) don’t want to be responsible for the leak of tens or hundreds of thousands passwords and personal data. This responsibility is just too high.

Living in 21st century, there is not a single Internet user, who does not have at least 2 online identities which can be used for authentication (Google, Microsoft, FaceBook, Yahoo, etc.)

Having said that, I have also written a number of articles on claims based authentication, custom login pages, etc. In all of them user had to go through some learning curve. This is not the case today! Today, Microsoft is thinking about developers and lets them to focus on application itself, business logic and just does not care about authentication! Do not forget that you can run .NET (ASP.NET WebForms, MVC and even ASP.NET vNext!), Java, Node.Js, PHP, Python on Azure Web Sites today! With three easy steps, you can protect your Web Site with Azure Active Directory!

What is Azure Active Directory – this is the Identity management system that is responsible for all Office 365 subscribers, Dynamics CRM Online subscribers, Microsoft Intune and all Azure Subscriptions! You may even had no idea, but with every Azure subscription, comes one default Azure Active Directory. So, if you are using Azure, regardless of that being MSDN benefit, Regular pay-as-you-go or a free Trial, you already have one Azure Active Directory tenant! If you wish, you can learn a bit more about how to manage your Azure Active Directory here.

So, dear user, you have created your Azure Web Site and now you have to protect it with your Azure Active Directory tenant. Here are the three easy steps you have to follow:

1. Navigate to the Configure tab of your Web site. Scroll down to Authentication / Authorization section and click Configure

3. Select your Azure Active Directory (if you have not changed anything, the name of your Active directory will most probably be “Default Directory) and chose “Create new application”:

Done:

Now your site is protected by Azure Active directory, has automatic Claims Authentication, you don’t have to worry about salting and hashing users passwords, don’t need to worry about how user would reset their password and so on. Protecting your site has never been easier!

What are the catches! Smile There is always a catch! First of all, this service is yet in preview and has some limitations:

  • You can only protect your site with your Azure Active directory, but you can add Microsoft Accounts (i.e. someone@hotmail.com) to your Azure Active Directory, but not any external users (i.e. FaceBook, Google, Yahoo)
  • With the current release all users in the configured directory will have access the application.
  • With the current release the whole site is placed behind login the requirement (you cannot define “public” pages, but it is relatively easy to do this in a web.config file).
  • Head less authentication/authorization for API scenarios or service to service scenarios are not currently supported.
  • With the current release there is no distributed log-out so logging the user out will only do so for this application and not all global sessions (which means, that if user comes back, he/she will automatically be logged-in again).

Quick, easy and works across the whole stack of supported platforms on Azure Web Sites (.NET, PHP, Java, Node.JS, Python).