I am working on a project where I need to be able to take a website url www.google.com
for example and get the html for it in a text file to be parsed separately, but I don't know how to do so.
I know there is an easier way to do this than the way I'm doing it, but this is a project aimed at use and knowledge increase.
Downloading just a single URL to a file is dead easy using WebClient
:
using (var client = new WebClient())
{
client.DownloadFile(url, filename);
}
The trickier bit is that very few web pages really consist of a single piece of HTML - most then load Javascript, or load more data with Javascript, etc.
In .NET 4.5 and later you might want to use HttpClient
instead of WebClient
- although it's asynchronous and (as far as I can see) doesn't provide anything quite as convenient as DownloadFile
when that's all you want to do.
See more on this question at Stackoverflow