Designing a new SymbolSource

A long, long time ago

It’s been a long time since I have written here, and it’s been an equally long time since we have made any visible improvements to SymbolSource. Part of the reason is that Kamil and I have both been involved in building an EHR (electronic health record) webapp for the Polish outpatient care market. Well, involved doesn’t really cut it, that startup has been more or less our entire life for three years. But now I can say, not without a lot of pride, that we have successfully led the project out of the startup phase. If you’re interested in what we have managed to build, visit www.mediporta.pl. (The website only has a Polish version at the moment.)

We also might not have seemed very responsive when contacted with issues, but we have never stopped reading and noting all the feedback that we received. It has been very hard to reply, when the current SymbolSource architecture has not given us many options of implementing improvements. When we started the project, there was no Azure, no NuGet (yes!) and normalized SQL databases ruled the Earth. That resulted in quite a few mediocre decisions, looking back today.

Those times have passed, however, and it’s time to loose the excuses and make things better.

Before we begin

If you are not familiar with what purpose SymbolSource serves, or how it works, first have a look at the official Wiki:

Back to drawing board

A few months ago we have started talking about a total rewrite of SymbolSource, and even with the limited time we could have spared, we have made good progress in designing a new, scalable architecture for the service. It will enable us to support many different scenarios with the best possible performance:

a public symbol repository for nuget.org,
public and private feeds integrated with myget.org,
hosted instances deployed in our Azure subscription,
instances deployed in private Azure subscriptions,
on-premise instances integrated with Active Directory.

But today I don’t want to talk about architecture, as it is more or less irrelevant when a service performs well. Today I’d like to share with you how we see the most basic form of interaction with the new SymbolSource – pushing and managing packages.

Consider this post a spec which, although already implemented, isn’t yet available publicly for testing. We are looking forward to any questions and comments!

Pushing a package

Pushing to the public repository:

nuget.exe push NHibernate.4.0.3.4000.symbols.nupkg 8ac00d48-a8e8-48e4-bb40-4fc92f18e15c -source http://nuget.smbsrc.net

Pushing to a named feed integrated with MyGet:

nuget.exe push EntityFramework.6.1.4-alpha1-40301.symbols.nupkg 60b1845b-116f-4eb0-8086-f96acaae46d7 -source http://myget.smbsrc.net/aspnetwebstacknightly

As a part of our effort to improve SymbolSource performance, we have decided to make all package operations asynchronous, which means that a successful push will only acknowledge that the package was received correctly. Read on to see how to determine the true package status.

Listing packages in various states

If all is well, you should see the package listed in the feed:

nuget.exe list -source http://nuget.smbsrc.net -allversions -prerelease

The list of packages will be similar to the following:

NHibernate 4.0.3.4000-smbsrc150302193927

Note that SymbolSource has automatically added a SemVer compatible version suffix. It uniquely identifies each package upload with a timestamp. There is no guarantee that two packages with identical versions don’t have different symbol files inside. That’s why we process them independently.

You can probably see why the extra options to nuget.exe are needed:

-prerelease shows packages with the smbsrc suffix,
-allversions disables skipping earlier uploads.

What if a previous upload was accidental? Read on for instructions on how to delete a package.

Deleting a package

Removing a package – and all of its symbols and sources! – will be as simple as issueing:

nuget.exe delete NHibernate 4.0.3.4000-smbsrc150302193927 8ac00d48-a8e8-48e4-bb40-4fc92f18e15c -source http://nuget.smbsrc.net

Again, this operation is asynchronous. A success message from nuget.exe will only tell you that the package has been found and correctly queued for deletion.

Determining other states

The new SymbolSource will also introduce a concept of subfeeds, which will let users list packages in various states. The list command showed earlier targets the default subfeed, which is own,succeded. You will be able to explicitly target:

ownership – currently own or all (subject to permissions),
state – one of success, partial, indexing and so on.

Here’s an example of listing all failed packages in the default feed as an administrator of the Caliper company instance:

nuget.exe list -source http://caliper.smbsrc.net/,all,partial -allversions -prerelease

The meaning of the various states is as follows:

new – not yet opened, random package name and unknown version,
original – a copy of the uploaded package, without any processing, useful if we improve the indexer and need to resubmit packages without any users’ actions,
damaged – package could not be read with a standard NuGet library,
indexingqueued – indexing will soon start asynchronously,
indexing – processing is in progress as you are looking at the list,
succeeded – the package has been indexed without any issues,
deletingqueued – deleting with soon start asynchronously,
deleting – deleting is in progress as you are looking at the list,
deleted – the package has been completely removed from the index,
partial – something went wrong, either during indexing or deleting, and some symbols and sources might be available, but some may not.

If listing is possible… is downloading too?

Yes! Just as for any other NuGet package:

nuget.exe install NHibernate -version 4.0.3.4000-smbsrc150302193927 -source http://nuget.smbsrc.net

But you might be surprised by the results. Remember that when no subfeed is specified, you target the succeeded state. Packages in that state have no content, but status files instead! I will blog more about this in a future post. At the moment these are JSON files that specify what symbols and sources where detected and whether they have been uploaded to permanent storage. If there were any problems, a package will be listed in the partial state, and the status files will provide error messages.

By the way, since the JSON status files have complete information about indexed symbols and sources, we can delete a package entirely based only on those, without hitting any database at all. A big win performance-wise!

Time for some feedback

What do you think about the scheme that we designed? Please share your thoughts.

Marcin Mikołajczak a.k.a. TripleEmcoder

All posts for the month March, 2015