Sunday, December 15, 2013

VB.NET in a Year

I've been working with VB.NET for a little over a year now. When I first started my job, I didn't know or care what language they used. I was quite surprised to see a software shop use VB. I thought that stuff had died a long time ago. Well, after a year, I think I've learned most of the language itself. There's still much to learn about the surrounding systems, but I'm pretty comfortable with the core of the language.

What have I learned about VB.NET? It's not really that bad. Let me go over a few pain points and then explain what I like about it.

Pain #1: No extension methods on objects of type Object. This is very annoying. We use ADO.NET extensively, and those things are not typed. If I request an INT NULL from the database, I want it back as a Nullable(Of Integer). However, it comes back as an Object which might be DBNull.

That's not a major problem. I can write an extension method that will do the conversion for me. Well, it turns out I can't. VB.NET just won't apply extension methods to Object. C# will do it just fine, but VB.NET won't. As a result, I can't use an extension method. I have to call the function on this thing instead of treating it as a function of object.

Pain #2: Really long closure syntax. I use the method form of LINQ extensively, so it's really annoying when 1/10 of my code is "Function (...)". C# makes it nice and easy. Who decided that VB.NET has to be wordy?

Pain #3: Tuple support. One of my favorite languages is Python. If I want to pass back two objects, I can pass it back as a tuple and assign it to a tuple. In VB.NET, I need to create a class or modify my parameters. I really don't like modifying parameters, and I am too lazy to create types for everything.

Anonymous types work pretty well for closures, but they don't work across function boundaries. If I return it, it becomes an Object in the caller. What happened? The compiler knew it had two fields before, how come it forgot it?

Say I do manage to get it back. How do I assign the fields to my local variables? I miss "(a, b) = (b, a)".

Pain #4: Generic Constructors. Say I have a generic class where the type is determined by the value passed into the constructor. In the new statement, I have to specify the type of the class even though the compiler can infer it from my argument. This is all good until I have to pass in an IDictionary(Of String, IDictionary(Of Integer, Integer)). That's way too much typing for something the compiler should do for me.

Pain #5: No dedicated dictionary syntax. I really want x = {a: 3, b: 4}.

Pain #6: (New Foo).Bar doesn't work. The compiler forces me to assign New Foo to a variable before I can call Bar on it. very annoying. I don't need to keep it. I don't want to assign it to a variable.

Pain #7: No macros. I want at least C style macros if I can't have Lisp style macros. I get neither. Awesome.

After a year, certainly there are things I like about VB.NET. What do I like about VB.NET?

Plus #1: IntelliSense. This is more about Visual Studio than VB.NET, but Visual Studio is one of the best parts about VB.NET. Not having to remember property/method names is awesome. Not having to remember the methods of types is also amazing. What can I do with this thing? Type '.' and IntelliSense will tell you right there. And it's always right. It never gives extra suggestions nor misses anything you haven't used before.

Plus #2: Edit and Continue. Even though I can't edit and continue anonymous methods, it's still magic.

Plus #3: Optional parentheses. I used to think this was really weird, but now I like it. This makes me type AddressOf often, but I think the parentheses I've saved makes up for the AddressOf that I have to type once in a long while.

Plus #4: CLR library. There's a lot of good stuff built in already. No need to reinvent the wheel.

That's about it. In conclusion, the only thing that VB.NET has over C# is Plus #3. But as I look back over this list, most of the stuff I don't like only cost me a few seconds more of typing. Seeing as how I don't spend that much time coding anyway, it's actually not that bad. If I had a choice, would I choose to use VB.NET? Probably not, but it's not as bad as I had first thought.

Tuesday, December 3, 2013

Initial Responses to XP

Most of our development team has been introduced to XP now. We agree that there are a lot of changes that need to be made. Some of these changes are internal to the team, and some require external changes. The two areas that we think need to start first are the XP team and short stories. Our existing team is both a huge team and no team. We have about 15 developers, a few testers, a few product managers, a few account managers who act as customers, and a project manager. The people are very friendly, and everyone helps each other in whatever way possible. The environment is very good for a team, however, team is pretty spread out physically and logically. I am a developer, so I am most familiar with the developer's perspective. It takes a while to talk to the testers or product managers, and it is nearly impossible to get the attention of customers in a timely manner. Many developers are very busy with their stuff, so it is hard to get help when I need it. I often have to wait several hours to get answers. No only is the team busy with stuff, everyone is busy with different stuff. We strive for collective code ownership by having everyone work on various portions of everything. There is a significant amount of code, and it takes a new developer weeks if not months to know enough about the code to be productive. In an effort to improve our bus factor, we assign developers to various portions of the system that they are unfamiliar with. The main problem with this is that they are individually assigned to a half-year project with little involvement from other developers. At the end of the half year, the developer is familiar with the code, but no one else is. So instead of whole projects having a single expert, we have many portions of projects having a single expert. We haven't really improved our bus factor much. To address these issues, we are attempting to have smaller and tighter knit teams. We will have testers and customers be more involved in the development process to reduce the amount of time it takes to get answers. We will try pair programming to improve the collective code ownership. Personally, I was expecting a lot more resistance against pair programming, but most programmers don't mind it as much as I had thought. We have been trying it for a few smaller sessions and it seems to work pretty well. Surprisingly, this is probably going to happen first before other improvements. It is likely to grow organically as well. While we get the team together, we will also tackle the problem of slipping deadlines. We came up with a long list of things to do to fix them, but the best thing we are looking at is just to create smaller stories that give us more milestones. We are going to divide up the 2-12 week stories that we currently have into smaller 1-3 day stories that we will be able to better manage. So all in all, we are hitting quite a few bumps in the road to XP, but we're excited about what kind of changes we can make to our process.

Friday, November 22, 2013

SyncLock Starvation

We had an issue in production that was particularly hard to track down. One of our applications appeared to be hanging while not doing a whole lot of work. We figured out that it was running a bunch of queries and hanging there. Looking at SQL Server, we saw that the connection was always waiting on async network IO, but the wait time was always pretty short (less than 2 seconds). We saw our application hang for hours at a time.

The funny thing was that this only happened occasionally, and without a good pattern. I had debugged the application several times, but it never had this issue while debugging. Finally, I was able to debug the application while it was hanging. Pausing all the threads, I noticed that they were all waiting for a single SyncLock.

A little background for this SyncLock: we have a bunch of databases that change frequently, so we store the database locations in one central database. To avoid hitting this database all the time, we cache the result in the application's memory. Whenever the application needs a connection string, it would first try the cache. If it's not there, then it goes back to the database to reload the cache. While the cache is being refreshed, it can't serve other threads looking for a connection string, so there's a SyncLock guarding this section.

This works great when we expect all our requests to exist. However, we recently added a new algorithm that checks to see if a connection string exists or not. Most of the time, it does not. This means that the application is constantly refreshing its cache. While it's refreshing, no other threads can access any databases, because the code that gets a connection string is guarded by that SyncLock. Thus, it looks like our application is hanging. The application would've eventually finished its job, but it would've taken a long long time.

This made me curious. Does SyncLock not serve requests in FIFO order? Can a thread be starved while waiting for a SyncLock? The answer appears to be yes. Here's the code to reproduce this.


Unfortunately, this isn't a super reliable way of producing the situation. We do see, however, that the maximum starvation time occasionally is much larger than it should be. Most of the time the threads are served in near FIFO order, so our application didn't always have an issue. However, sometimes, the threads are served in some other ordering, so our application did appear to be waiting forever for that lock.

Thursday, November 14, 2013

Starting Down the Journey to Extreme Programming

I'm trying to get my company to use XP.

A little bit of background first. We are a profitable SaaS shop with about 15 developers. We would like to call ourselves Agile, but we don't follow any practices very strictly. We have 6 week iterations with occasional shorter iterations for bug fixes and late code. We have customer-centric stories with short descriptions and nebulous scope.

So why do I want us to use XP? We regularly miss deadlines. If we are not promising a feature to a customer, there's a good chance it will miss the intended delivery date. We have customers that refuse to pay for our work.  We have last minute changes that set us back by weeks. We have developers who are so bogged down by maintenance that they can't work on their stories.

I think we are a good candidate for XP because we are making something that no one knows how it should work. We are an industry leader coming out with new ideas that change the way the industry thinks. Our customers can't really tell us what they want until they see it.

How are we going to know if it works? One measure is just to see how many missed deadlines we have. Another measure is the team's stress level. We can also measure bugs in new code.

How do we plan on making this work? Since few of our team members have done agile development before, we are going to study agile development together. We are going to discuss how practices can work in our organization, and how we they can benefit us.

There are some practices such as TDD or pair programming that take a while to learn, and will decrease our productivity for a while. We can't really afford to take a huge hit in productivity, so we will take these slowly. I've introduced the ideas to the team, but we won't really use it until a small team of us have proven it to be successful. So we have 4 volunteers who will try out TDD and pair programming on smaller projects that have value, but allow us to try it separately from the rest of the system without impacting deadlines. We will only spend a small percentage of our time experimenting. We will still be using existing methods the rest of the time.

There are some practices that can give us immediate value. We are planning to move to a new space, so we can implement Sit Together to increase collaboration and decrease the communication costs. We can start breaking down our stories into smaller more manageable stories. We can do team estimates.

So that's what we're doing, and we'll see how it goes. Maybe we'll adopt all XP practices in a few months. Maybe we'll pick and choose a few. Maybe we'll decide it doesn't work. We'll definitely gain some valuable experience from trying it out, though.

What Questions Should I Ask at Interviews?

"Do you have any questions for us?" This is a question almost every interviewer asks near the end of the interview. Everyone agrees that the candidate should always ask questions, but the motivation for asking the questions differ.

I've heard that I should ask questions so that the interviewer thinks that I am interested in the company. I should do research on the company and ask questions about what I found out to show that I've done my research and am motivated to work for the company.After having done a few interviews, I disagree with this. 

Personally, I could care less if you've done research or not. We need high quality developers, and we can't find enough who are willing to join us. I will never reject someone because he didn't ask any questions. As for the rest of the team, we've never rejected anyone because they had no idea what we are making. At our candidate summary meetings, we never said we don't want him because he didn't ask any questions.

My wife disagrees with me on this. In her industry, it does matter how much you know about the company. She would like to see that the candidate has done research into the company to know more about the goals and principles of the company. In my experiences with this company, this does not matter at all.

Now I am not suggesting you shouldn't ask questions, just that you shouldn't ask questions just to prove your interest. There are plenty of other questions to ask. You should find out if you're a good fit. In particular, you want to find out things like overtime, weekends, management, evaluations, or typical day. Specifically for programmers, you should ask things like source control, development process, build process, support roles, and involvement in making bigger decisions. There are other sites with some good questions, and you should definitely check them out.

Tuesday, October 15, 2013

Hash Lookup vs Binary Search

So I was browsing through our code base, and saw that in several places, we have some binary searches. Binary search is known to be pretty effective when we are finding things in a sorted array. It has O(lg(n)) running time. Hashtables, on the other hand, have O(1) running time. For small tables, though, binary search can be faster than hashtables since a comparison is cheaper than computing a hash function. At some point, though, we can expect the hashtable to out perform binary search. Let's try and find out what the break-even point is:
And here's the results:
Well, it doesn't look like binary search is ever better than the hashtable. Binary search is logarithmic at first, but then it starts growing faster. The hashtable performs pretty consistently at the beginning, and then slowly grows at the end.

The abnormalities are probably due to caching. There's two places where the slope changes. Those probably correspond to different layers of memory. Maybe first an L2 cache miss, and then an L3 cache miss.

I also implemented an iterative binary search:

It did not perform significantly better. Hashtable was still always faster.

Monday, October 14, 2013

An Emperical Look at .NET's Gen 0 Garbage Collection

A while back, we had an issue with .NET's garbage collector. In particular, Gen 0, which should be super efficient and fast, was slowing down our multi-threaded application.

We had a 24-core machine, and an application that needs to do 5000 totally independent tasks. No matter who many threads we threw at it, task manager always showed about 200% CPU usage. Digging into the concurrent profiler, we saw that many threads were paused all over the place. The more threads we started, the more time they paused. Looking at what they were blocking on, they were all blocking on memory allocation. Every few milliseconds, one thread would enter Gen 0, and start garbage collecting. The other threads would quickly block when they tried to allocate some memory. Then Gen 0 would complete, and all other threads would continue again. The more threads we had, the more frequent these garbage collection cycles became, causing our application to never exceed 200% CPU on a 24-core system.

Let's try to reproduce it in a simpler program.

First, let's show that threading does work with a simple program:
And the output on my 4-core machine:














Looks like it's working. So let's add some memory allocations:

Here are the results:
threadCount
arraySize 1 2 3 4 5 6
1 658 376 278 223 208 211
10 677 392 282 241 227 237
100 1049 700 581 535 526 544
1000 4607 3826 3895 3918 4289 4399
10000 34185 35658 41861 46026 49758 52625

This time around, I am allocating some memory on the heap. There's probably some magic that .NET is doing for me to burn all those extra CPU cycles for larger sizes, but the trend I am looking for is clearly there. When the objects are large, scaling the number of threads does not give me as much of a performance gain as I had hoped.

Let's see if I can tune it for that CPU graph that I am looking for:
This is my CPU graph while scaling through the 6 threads. We see that the CPU usage goes up, but never hits 100% even when using 6 threads.



Now let's try some of the other garbage collector configurations. Let's enable gcServer:
threadCount
arraySize 1 2 3 4 5 6
1 677 383 282 189 202 255
10 724 373 279 509 309 273
100 1312 696 627 849 1565 2317
1000 6329 4596 4457 9292 13110 12540
10000 49819 48821 44955 50832
How about gcServer disabled, and gcConcurrent disabled?
threadCount
arraySize 1 2 3 4 5 6
1 644 400 277 221 213 207
10 678 380 295 243 223 224
100 1055 665 546 516 514 520
1000 4712 3736 3692 3984 3900 4052
10000 34827 35426 43847
How about gcServer enabled, and gcConcurrent disabled?
threadCount
arraySize 1 2 3 4 5 6
1 675 346 273 192 175 178
10 710 371 281 211 193 197
100 1309 753 588 529 500 562
1000 7537 4905 4591 5654 4509 4457
10000 49413 46509 46088 46285 46737

So it seems that the configurations make some impact, but they are not going to solve our problems completely. Let's see if we can do something better.

What if we try to manage our own memory by maintaining our own list of Stuff objects that can be reused?

The code has gotten more complicated, and the results show it as well:
threadCount
arraySize 1 2 3 4 5 6
1 3684 2124 1510 1186 1200 1196
10 3841 2178 1497 1273 1200 1162
100 4423 2737 1782 1344 1399 1256
1000 7941 4294 2955 2364 2480 2170
10000 60650 32238 22427 18197 17902 16450

But we do notice that this scales fairly well with the number of threads. In fact, 6 threads is almost always better than 4 threads. Very weird.

I am not going to delve into the weirdness today because I've come across a solution that scales well, but is much slower on small objects, and twice as slow on a single thread for large objects. And just for kicks, here are the times if we don't empty out the array when we reuse it:
threadCount
arraySize 1 2 3 4 5 6
1 2674 1414 1328 857 801 854
10 2678 1418 997 847 876 886
100 2699 1396 1105 791 801 764
1000 2714 1410 1087 784 853 811
10000 2727 1412 1005 870 955 961
From these experiments, we can see that the garbage collector works very well for small objects. However, it has trouble with large objects. For large objects, we would do well maintaining our own pool of resources. This method seems to be twice as slow, but it scales fairly well on my 4 core system.

I didn't take very good measurements, but the number of gen 0 collections went from 10's of thousands when .NET was managing the memory, to just hundreds when I was doing it myself. When large objects were an issue, the number of gen 1 and 2 collections went from the single digits to thousands.