My name
is
Jon Skeet

Parsing a feed in C#

I am having problems parsing a feed in C#.

I cannot get the authors of the feeds to change the code so I have to handle it.

I have tried passing the feed straight into the XmlDocument object as a URL, or obtaining it with WebClient as text, trimming it to remove any space that seems to be put in front of it for some reason and then use the LoadXML method to load it.

You can see an example of the feed here > http://scotjobsnet.co.uk.ni.strategiesuk.net/testfeed.xml

I cannot get past either the

XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(feedURL);

Or with a string.

XmlDocument xmlDoc = new XmlDocument();
string feedAsString = "";
// get from web as string
var webClient = new WebClient();

// Tell them who we are for white listing
webClient.Headers.Add("user-agent", "Mozilla/5.0 (compatible; Job Feed Importer;)");

// fetch feed as string
var content = webClient.OpenRead(feedURL);
var contentReader = new StreamReader(content);
var rssFeedAsString = contentReader.ReadToEnd();
rssFeedAsString = rssFeedAsString.Trim(); // remove any white space beore the feed
xmlDoc.LoadXml(feedAsString);

The errors I get are:

Root element is missing.
Could not extract first items from feed string; Error The element with name 'jobs' and namespace '' is not an allowed feed format.

I want to use xpath /jobs/job/ to loop through the feed nodes.

I have parsed feeds like this before with XmlDocument passing in just a URL and if not then a string.

I am thinking of resorting to using regular expressions to loop through the feeds using a <job>[\s\S]+></job> type expression.

However I would rather use standard methods.

As I cannot get the feeds changed can anyone tell me what is wrong with the feed and the way I am parsing it. Forgive the use of var I was just knicked a snippet of code to parse a feed from an example that was using it. I am using strong types every where else and will convert it once I get it working.

Any help would be much appreciated.

Thanks

EDIT: The reason your current code is failing is pretty simple - you're trying to parse an empty string:

string feedAsString = "";
...
var rssFeedAsString = contentReader.ReadToEnd();
rssFeedAsString = rssFeedAsString.Trim();
xmlDoc.LoadXml(feedAsString);

You're never setting feedAsString to a new value - but you're fetching the text as rssFeedAsString. Those are two different variables.

That said, I'd use a different approach entirely. I don't see any need for trimming etc - or using XPath, or passing it through an RSS reader (given that it's not RSS). The only tricky part is explicitly specifying a User-Agent header, as otherwise the server rejects the request.

Personally I'd use LINQ to XML, which seems to be fine:

using System;
using System.Net;
using System.Xml.Linq;

class Test
{
    static void Main()
    {
        string text;
        using (var webClient = new WebClient())
        {
            string url = "http://scotjobsnet.co.uk.ni.strategiesuk.net/testfeed.xml";
            webClient.Headers.Add("user-agent", "Mozilla/5.0");
            text = webClient.DownloadString(url);
        }
        var doc = XDocument.Parse(text);
        foreach (var job in doc.Root.Elements("job"))
        {
            Console.WriteLine(job);
        }
    }
}

See more on this question at Stackoverflow