воскресенье, 13 сентября 2009 г.

Reading Meta Tags of Any Page Programatically without loading in browser

In this article I will show you how to read meta tags programatically using C# and Asp.Net. How this article is different from other articles available on internet is that all the samples available on internet talks about reading and writing tags from page itself but In this article our approach will be do dynamically download the contents of a page and read meta tags from it.

First thing first we need to download the content of page without loading it into browser. For this we will be using WebRequest class. Below Code creates a request to "http://www.microsoft.com/en/us/default.aspx" using default credentials

// Create a request for the URL.
WebRequest request = WebRequest.Create("http://www.microsoft.com/en/us/default.aspx");
// If required by the server, set the credentials.
request.Credentials = CredentialCache.DefaultCredentials;

Now we are set to get response from the client. To receive response we are going to use WebResponse class as

// Get the response.
HttpWebResponse response = (HttpWebResponse)request.GetResponse();

Once we have response we want to load this into Html DOM. As you should know that Html uses DOM model to load documents. So in next few lines we will get response in form of string and use that string to load IHTMLDocument2 class.

// Get the stream containing content returned by the server.
Stream dataStream = response.GetResponseStream();
// Open the stream using a StreamReader for easy access.
StreamReader reader = new StreamReader(dataStream);
// Read the content.
string responseFromServer = reader.ReadToEnd();

//reads the html into an html document to enable parsing
IHTMLDocument2 doc = new HTMLDocumentClass();
doc.write(new object[] { responseFromServer });
doc.close();

Now that we have entire Page loaded in memory in form of HtmlDocument we are going to iterate it and retrieve Meta tags from it.


//loops through each element in the document to check if it qualifies for the attributes to be set
foreach (IHTMLElement el in (IHTMLElementCollection)doc.all)
{
// check to see if all the desired attributes were found with the correct values
bool qualify = true;
if (el.tagName == "META")
{
HTMLMetaElement meta = (HTMLMetaElement)el;
Response.Write("Content " + meta.content + "
");
}

}

Of course you can do lot of more things with above code. But we will take that up in some other articles. For your reference I am pasting te complete code below. For the sample to work please add a reference to mshtml by

Steps:-

1.) In the solution explorer, highlight the project to which you want to add the parsing functionality
2.) In the menu, click on Project -> Add reference
3.) In the dialog box that is shown, under the .Net tab - choose the Microsoft.mshtml assembly
4.) Click the select button and click on the OK button

Now we can reference this assembly

Don't forget to add namespace

using mshtml;

Response.Write("Button2_Click");

// Create a request for the URL.
WebRequest request = WebRequest.Create("http://www.microsoft.com/en/us/default.aspx");
// If required by the server, set the credentials.
request.Credentials = CredentialCache.DefaultCredentials;
// Get the response.
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
// Display the status.
Console.WriteLine(response.StatusDescription);
// Get the stream containing content returned by the server.
Stream dataStream = response.GetResponseStream();
// Open the stream using a StreamReader for easy access.
StreamReader reader = new StreamReader(dataStream);
// Read the content.
string responseFromServer = reader.ReadToEnd();
// Display the content.
Console.WriteLine(responseFromServer);
// Cleanup the streams and the response.
reader.Close();
dataStream.Close();
response.Close();

//reads the html into an html document to enable parsing
IHTMLDocument2 doc = new HTMLDocumentClass();
doc.write(new object[] { responseFromServer });
doc.close();

//loops through each element in the document to check if it qualifies for the attributes to be set
foreach (IHTMLElement el in (IHTMLElementCollection)doc.all)
{
// check to see if all the desired attributes were found with the correct values
bool qualify = true;
if (el.tagName == "META")
{
HTMLMetaElement meta = (HTMLMetaElement)el;
Response.Write("Content " + meta.content + "
");
}

}

Комментариев нет: