Wednesday, January 9, 2008

Arguments on antidepressants...

I was coming into a few problems with backslashes and quotes not playing consistently nice for snippet input. So I set out to find out why. I ran across a few interesting facts about command line arguments while trying to figure out the best (most reliable) way to escape them for snippet input. The following is an excerpt from tenouk.com...

  • For information, Microsoft uses Microsoft C Runtime (CRT) for C codes; that is Microsoft C version (mix of standard C and Microsoft C).
  • Microsoft C startup code uses the following rules when interpreting arguments given on the operating system command line:
  1. Arguments are delimited by white space, which is either a space or a tab.
  2. A string surrounded by double quotation marks is interpreted as a single argument, regardless of white space contained within. A quoted string can be embedded in an argument. Note that the caret (^) is not recognized as an escape character or delimiter.
  3. A double quotation mark preceded by a backslash, \", is interpreted as a literal double quotation mark (").
  4. Backslashes are interpreted literally, unless they immediately precede a double quotation mark.
  5. If an even number of backslashes is followed by a double quotation mark, then one backslash (\) is placed in the argv array for every pair of backslashes (\\), and the double quotation mark (") is interpreted as a string delimiter.
  6. If an odd number of backslashes is followed by a double quotation mark, then one backslash (\) is placed in the argv array for every pair of backslashes (\\) and the double quotation mark is interpreted as an escape sequence by the remaining backslash, causing a literal double quotation mark (") to be placed in argv.


This is quite interesting. I knew there must be a better way, however. Fortunately, I was correct. I found the following frustration post on the MSDN forums shortly after. Here's the juicy stuff...

The static System.Environment.CommandLine will provide the original untampered cmdline - should be straight forward to use this.
there is also System.Environment.GetCommandLineArgs() method but this is used to create the args parameter passed to main and suffers the same problem , I mention it because, when you use it the element 0 of the array contains the path to the excutable which you can use to strip the exe name from the System.Environment.CommandLine



Awesome! What this gives us is a good clean way to get all of our input directly from the command line call to the snippet. The biggest problem is that I (currently) can do nothing to eliminate this problem for you. I will however be trying to come up with a better answer than that. Until then, it's the snippet developer's job to be sure that the input being given to the snippet is clean. Here is a pretty decent GetInput() method that should do the trick. Oh, the code below also allows the snippet to be "piped"-to, so it really is a good idea to consider using it as your input standard. The key is making sure it's implemented. ;)


private static String GetInput()
{
System.IO.Stream stream = Console.OpenStandardInput();
System.IO.StreamReader reader =
new System.IO.StreamReader(stream);
String input = reader.ReadToEnd();
if (String.IsNullOrEmpty(input))
{
String application = Environment.GetCommandLineArgs()[0];
input = Environment.CommandLine
.Substring(
Environment.CommandLine.LastIndexOf(application)
+ application.Length + 2)
.Trim().Trim('\"');
}
return input;
}

2 comments:

Lance "ji" May said...

So we all know that "piping" is the best input, right? Good. Now while that little code block for the GetInput() method is really how a snippet should protect itself, I have made improvements to the client that will attempt to run snippets twice if need be. The first run will attempt to pipe the input. If that returns nothing, than the client will assume that it didn't know there was input, and will attempt to run the snippet again via argument. Now I have tested the argument escaping as much as I can stand, and things seem to be flowing smoothly. If in some absolutely strange fluke incident it does twist up your input, then I apologize. This is, however, the best that I can come up with at the moment. The client improvement will be available soon in the next (pre)release.

Lance "ji" May said...

I made another adjustment to the GetInput() method, but it is already posted, so no need to bore you with the details. ;)