Hi all,

Looking to extract all HTML tags from a dump of HTML data and put them all in a listbox.

I currently have the following code.

It displays to me things like HTML HEAD TITLE BODY.

But i want things like the IMG and ALT tags.

    ' Obtain the document interface
    Dim htmlDocument As mshtml.IHTMLDocument2 = DirectCast(New mshtml.HTMLDocument(), mshtml.IHTMLDocument2)
    ' Construct the document
    ' Extract all elements
    Dim allElements As mshtml.IHTMLElementCollection = htmlDocument.all
    ' Iterate all the elements and display tag names
    For Each element As mshtml.IHTMLElement In allElements
    ' Extract all image elements
    Dim imgElements As mshtml.IHTMLElementCollection = htmlDocument.images
    ' Iterate through each image element
    For Each img As mshtml.IHTMLImgElement In imgElements
End Sub

If you don't absolutely have to use the mshtml interface you could try this:

Imports System.IO
Public Class Form1

    Private Sub Form1_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load
        'Set the webbrowser control visible property to false if you don't need it for anything else.
        WebBrowser1.Url = New Uri("C:\Test1.htm")
    End Sub

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        Dim htmlDocument As HtmlDocument = WebBrowser1.Document
        ' Iterate all the elements and display tag names
        For Each element As HtmlElement In htmlDocument.All
            If element.TagName.ToUpper = "IMG" Then
            End If

    End Sub
End Class


Thanks so much for your code. working very well. i have one more problem to ask sorry.

But i can open a new thread if you would like me too.

Within the code it has IMG as the tag name. if i wanted to tag to be for example;


    Dim htmlDocument As HtmlDocument = WebBrowser1.Document
    ' Iterate all the elements and display tag names
    For Each element As HtmlElement In htmlDocument.All
        If element.TagName.ToUpper = "TITLE" Then
        End If
End Sub

I get the following error...

Public member 'src' on type 'HTMLTitleElementClass' not found.

Thanks so much!!

Sorry. after thinking a little more, i should be more obvious about what i want to do.

So for example if the TITLE element has nothing in it for example "" then i want that printed in listbox2.

If it has something init i want that printed into listbox2 for example if TITLE element has

" Welcome to Amazon " init. i want that into listbox2.


Try recasting the document to a IHTMLDocument3 use getElementsByTagName on the new cast object,

i'll be honest, i don't know how to do that. do you have any sample code?


I have tested this by casting the webbrowser.Document.DomDocument, so hopefully it will work for you.

You used: Dim htmlDocument As mshtml.IHTMLDocument2 = DirectCast(New mshtml.HTMLDocument(), mshtml.IHTMLDocument2)

recast as mshtml.IHTMLDocument3

   Dim doc3 As mshtml.IHTMLDocument3 = DirectCast(htmlDocument, IHTMLDocument3)
   For Each img As mshtml.IHTMLImgElement In doc3.getElementsByTagName("img")

src is the file path for images, and my original code works for that. To get the inner text for Title use:

        If element.TagName.ToUpper = "TITLE" Then
        End If
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.