Geeks With Blogs
Running with Code Like scissors, only more dangerous

Ever since attending the C# 3.0: Beyond LINQ presentation at Desert Code Camp (excellent presentation Donald), I've been a little more wary of a couple of the new features introduced by C# 3.0.  I wanted to go over them and point out some of the pitfalls that they can lead to.  Note that I'm not trying to discourage its use - BUT - if you know how something works in the underlying technology, you're more likely to choose the correct techniques for a given situation.

All static code analysis and time evaluation was done using Visual Studio 2008 Professional Edition Beta 2 (with the .NET Framework 3.5 Beta 2), building the binaries into release mode with code optimization turned on.  Time evaluation for simple properties was done on a computer running Windows Vista Ultimate x64, with an Intel Core 2 Duo E6300 (1.86GHz) with 2gb of dual-channel DDR memory.

LINQ and Binary Code Reuse

My first foray into this is with LINQ.  This is more of a static analysis than a runtime one, but it seems to be important.

As programmers, we routinely manipulate data in the same way in different locations, whether it's within a single function or across modules.  We pull the data from the persistence store, iterate over it, etc.  Sometimes we need to repeat this in multiple locations. 

One of the holy grails in this case is code reuse.  How frequently can we avoid repeating the same code?  Unfortunately, there are sometimes architectural or design constraints that prevent us from being able to include code in only one place.  Furthermore, anonymous type support in C# 3.0 prevents us from doing some of what we would otherwise be able to do, since anonymous types are constrained to method-scope (as they rightly should be), we can't publish functions that reference our lambda expressions across different functions. 

Now, I'm of the mind that the compiler should be able to look at constant expressions and optimize.  Consider the following code:

 1: public class Customer
 2: {
 3: public string Name { get; set; }
 4: public string Address { get; set; }
 5: public string Email { get; set; }
 6: public int Age { get; set; }
 7: }
 8:  
 9: static void TestLinq()
 10: {
 11: Customer a = new Customer() { Name = "Rob", Address = "1 Some Way", Email = "robpaveza@gmail.com", Age = 23 };
 12: Customer b = new Customer() { Name = "Robert", Address = "ABC", Email = "nunya@bizness", Age = 30 };
 13: Customer c = new Customer() { Name = "Test A", Address = "DEF", Email = "h@x0r.cc", Age = 25 };
 14: Customer d = new Customer() { Name = "Test B", Address = "GHI", Email = "abc@def.tk", Age = 18 };
 15: Customer e = new Customer() { Name = "Test C", Address = "JKL", Email = "ghi@jkl.tk", Age = 32 };
 16: Customer f = new Customer() { Name = "Test D", Address = "MNO", Email = "mno@pqr.tk", Age = 28 };
 17: Customer g = new Customer() { Name = "Test E", Address = "PQR", Email = "stu@vwx.tk", Age = 57 };
 18: Customer h = new Customer() { Name = "Test F", Address = "STU", Email = "yza@bcd.tk", Age = 48 };
 19: Customer i = new Customer() { Name = "Test G", Address = "VWX", Email = "efg@hij.tk", Age = 19 };
 20: Customer j = new Customer() { Name = "Test H", Address = "YZ", Email = "klm@nop.tk", Age = 25 };
 21:  
 22: List<Customer> customers = new List<Customer>(new Customer[] { a, b, c, d, e, f, g, h, i, j });
 23:  
 24: var customerList = from cust in customers
 25: where cust.Age > 25 && Regex.IsMatch(cust.Email, "@\\w+.tk", RegexOptions.IgnoreCase)
 26: select new { cust.Name, cust.Email };
 27: foreach (var cust in customerList)
 28: {
 29: Console.WriteLine("Name: {0}; E-mail: {1}", cust.Name, cust.Email);
 30: }
 31:  
 32: var customerList2 = from cust in customers
 33: where cust.Age > 25 && Regex.IsMatch(cust.Email, "@\\w+.tk", RegexOptions.IgnoreCase)
 34: select new { cust.Name, cust.Email };
 35: foreach (var cust in customerList2)
 36: {
 37: Console.WriteLine("Name: {0}; E-mail: {1}", cust.Name, cust.Email);
 38: }
 39:  
 40: var customerList3 = from cust in customers
 41: where cust.Age > 25 && Regex.IsMatch(cust.Email, "@\\w+.tk", RegexOptions.IgnoreCase)
 42: select new { cust.Name, cust.Email };
 43:  
 44: foreach (var cust in customerList3)
 45: {
 46: Console.WriteLine("Name: {0}; E-mail: {1}", cust.Name, cust.Email);
 47: }
 48: }

This is a fairly straightforward example of a LINQ expression.  However, what we would consider to be a simple code reuse example (the where clause) is not optimized by the compiler.  We can't blame it on the fact that the property expressions aren't marked constant - as much as I'd like to - because the where clause is broken into a method invoked via an anonymous delegate.  .NET Reflector shows this somewhat, but to really see what's going on, you need to examine the IL.  Suffice it to say that from these three calls, exactly the same code, result in six compiler-generated methods (mapping the where and select clauses 3 times each). 

linqwhere

Two compiler-generated methods would have done the job.  The more disturbing part is that this is within the same method; if it's not going to optimize within-methods, you can believe that it's not going to optimize within- or across-types.

How can you optimize?  Well, it turns out that lambda expressions aren't the absolute key to this, but they do improve the syntax somewhat.  You can manually create a property that retrieves an anonymous delegate (created by a lambda expression) and use that with the methods provided by List<T>.

 1: static Predicate<Customer> DotTkEmailIs25
 2: {
 3: get { return (c => c.Age > 25 && Regex.IsMatch(c.Email, "@\\w+.tk", RegexOptions.IgnoreCase)); }
 4: }

It is important to note that you'll be trading memory usage for on-disk footprint and speed when using this technique.  You'll optimize the size of the binary on disk, but you'll be creating duplicate lists.  Now, it's likely that you'll get that memory usage back when the compiler only has to JIT one anonymous method.  Remember, the JIT compiler transforms managed code to native code at the entry point of a method; the greater the number of methods, the more frequent the JITter will be called.

Finally, remember that the extension methods relevant for LINQ - such as .Count(), .Average(), .Aggregate() and the like will all operate on IEnumerable<T>.  This means that .Count() is a O(n) operation, whereas if it's executed on a platform like SQL, or even retrieving the .Count property from a List<T>, it will be much faster.  This is not necessarily true when using the LINQ-to-SQL tool.  I have not done performance benchmarks for LINQ-to-SQL, but when it is used, it should retrieve a value comparably faster than when operating LINQ over a collection.

Guidelines for using LINQ:

  • Remember that each of your clauses will result in new anonymous methods within your binary.
  • Remember that each clause needs to be JIT-compiled.  This results in a one-time compilation cost.
  • Aggregate extension methods are at least O(n) operations, where they may be faster to execute in SQL or with LINQ-to-SQL.

Extension Methods

I'm just going to say so right now: I believe that extension methods will lead programmers to poor design decisions in the hope of increasing development speed.  The MSDN Magazine article that inspired this blog post suggests that the most common use will "probably be to provide shared interface implementations."

This is poor design.  If we want to provide shared interface implementations, we should be providing a method to support multiple inheritance.  The extension methods example - with the IDog interface and the DogExtensions class that provides the parameterless Bark overload - would also be solved by using optional parameters.

To a certain extent, I am baffled by the fact that C# does not support optional parameters.  Its predecessors (C and C++) do, and VB.NET does, so we know it's supported within the runtime; it would be a simple matter to simply emit overloads based on optional parameters into IL.  I imagine the designers in my mind, "No, then it would be almost like Visual Basic."  I hope I'll one day be able to understand the omission of optional parameters and the const modifier for parameters, properties, and methods.

Before using extension methods, consider how your design might be better served by breaking functionality into a utility class or a virtual method in a base class.  Using extension methods prevents a method from being polymorphic; you cannot use the "virtual" keyword on the method definition, and you can't override the method or access it via the base keyword in a derived class.

They're definitely useful for a technology such as LINQ.  Past that, it scares me to think of what kinds of dangerous things can be done with them.

Guidelines for using Extension Methods:

  • Minimize the surface of their use (extend as small of a class hierarchy as possible).
  • Use polymorphic, base class method implementations when possible.
  • Review the impact of using extension methods on your application design.

Object Initializers

All I can say about these are: yay!!  I've always liked the way that attributes are initialized.  I wish that initializers were a bit more syntactically close to attributes, but this will do.

The only thing I can say that is bad about this is the stack gets double variables allocated when you're using initializers.  I noticed this with the LINQ code I posted up above:

objinitializers

I don't know why I couldn't just create a new value and do the property assignments.  It seems like doing something like this would thrash the stack and the processor cache when using a lot of local variables initialized this way.  I'm not saying to not use it - just be wary!

Anonymous Types

I don't have a LOT to say about these - they're method-scoped, so there's a limit to the damage that can be done by a developer who's intent on breaking design.  But truly the only useful thing I can see with anonymous types is using LINQ-to-SQL, where SQL columns will be mapped back to variable fields.  I don't see much utility in the customer example shown above (except maybe shielding the original data from being changed), since it requires instantiating new variables and not having a typename.

Implicitly-Typed Local Variables

Just don't do it.  Stop being lazy.  "Var" buys you a couple keystrokes.  Seriously, I'll beat you up.

Simple Properties

Simple properties are kind of neat, until you realize that they defeat the entire purpose of having properties (with one notable exception which I will discuss later).  They allow the programmer to create public properties with get- and set- implementations but to leave out the backing store variable, which is compiler-generated.

However, because they leave out the implementation, they are essentially as unprotected as public fields.

 1: public class UsesSimpleProperties
 2: {
 3: public string Name { get; set; }
 4: public int Age { get; set; }
 5: }
 6:  
 7: public class UsesFields
 8: {
 9: public string Name;
 10: public int Age;
 11: }

As you can see from the example, the exact same protection is afforded to the fields and the properties: none.  Where we would normally use the concept of properties to ensure that appropriate values are going into our object's state, simple properties do not afford us this ability.  I can't say "Oh, the value of age CAN'T be less than 0!" 

The only advantage of simple properties over fields is that they enable XML serialization; the XML serializer only accurately serializes public properties with public get- and set methods.

Surprisingly, there wasn't a relevant or reliable between using simple properties and fields to get and set a property when not attached to the debugger, though I would guess that this is due to JIT inlining. 

Still, simple properties pretty much defeat the purpose of having properties.  Don't be a fool; validate your property values!

Guidelines for using Simple Properties:

  • Great for generating quick XML-serializable types
  • Don't use it if you need to validate the property value.
  • Validate your property values!

Summary

C# 3.0 has a lot of neat new features, all of which have appropriate spotlights in which they should be used.  As developers, we should all recognize the performance impacts these new technologies will have on our code.  Language additions such as LINQ help bridge the gap between languages, whereas other extensions are syntactical sugar of which maybe we should be a bit wary.  Remember: I'm not saying don't use these features (okay, except for implicitly-typed locals), just be sure you're using them in the right places and for the right reasons.

Posted on Sunday, November 4, 2007 5:37 PM | Back to top


Comments on this post: My Fearmongering of C# 3.0

# re: My Fearmongering of C# 3.0
Requesting Gravatar...
Really nice job :)
Left by Paul on Nov 05, 2007 5:39 PM

Your comment:
 (will show your gravatar)


Copyright © Robert Paveza | Powered by: GeeksWithBlogs.net