(How)
C^# You Are - 20 April 2008
This week's questions revolve around common string operations.
- You need to parse HTML files looking for images. For each image you need
to replace the value of the src attribute with a local path. How can you
do this? Answer
This is actually not a trivial problem. Simple string functions like
Replace or IndexOf won't work reliably. Technically, short of
an HTML parser, there isn't a completely accurate mechanism that will work.
You can get relatively close however.
The first thing you need to do is find all source attributes of the image tags.
You can use regular expressions for this.
<img[^>]*src="(?<srcValue>(([^"]*))"[^>]*>
Thanks to SexyRegEx for this expression. Once you have the exression
you can use Regex to find all the matches. For each match you can
write out the tag with a modified source attribute. Since the expression
uses srcValue as the group name for the attribute you can use the
Groups property of the Match object to get the individual groups.
string value = "<html><table><img src=\"abc.gif\" /> <img src=\"def.gif\" /> </table></html>";
Regex re = new Regex("<img[^>]*src=\"(?<srcValue>([^\"]*))\"[^>]*>");
foreach(Match match in re.Matches(value))
{
Console.WriteLine("Source attribute = " + match.Groups["srvValue"]);
};
- You are creating a logger class. You want to ensure all messages have a
linefeed at the end (and possibly in the mind). For consistency you want
to ensure that all linefeeds are simply \n rather than \r or
\r\n. How can you do this? Answer
The easiest approach is to use String.Replace to replace all occurrences
of one set of characters with another. You have to be careful though.
If you simply replace all occurrences of \r then you'll have two
linefeeds whenever \r\n occur. Parsing the string would resolve
this issue but such code is harder to write, debug and understand. I find
the simpler, less efficient approach of double replacement to be better.
Here is the sample code.
public static string FixNewLines ( string value )
{
value = value.Replace("\r\n", "\n");
value = value.Replace("\r", "\n");
return value;
}
- You are writing a reader for INI files. How can you write a function to
parse each line of the INI file (excluding section headers) and return back the
key and value (if any)? Answer
This is not too hard. You simply need to use the String.Split method
in combination with some filtering for comments. Here is a simple function
to get you started.
static KeyValuePair ParseIniLine ( string line )
{
KeyValuePair pair = new KeyValuePair();
//Trim the line
line = (line != null) ? line.Trim() : "";
//Remove any comment character (;)
int index = line.IndexOf(';');
if (index >= 0)
line = line.Substring(0, index);
//Split based upon the = sign
string[] tokens = line.Split(new char[] { '=' }, 2);
if (tokens.Length > 0)
pair.Key = tokens[0];
if (tokens.Length > 1)
pair.Value = tokens[1];
return pair;
}
The above code doesn't handle the case of a semicolon in the middle of a string
but otherwise it should work.