Hello there guys! :)
It's been quite a while since the last time I've been here. I am currently working on topics different from my usual, and I have found myself baffled by a problem. So, here goes:
I am working on a web application which collects user fiscal data. Quite sensitive stuff. I need to be able to insert the data, available through a web portal, into a local database. The only way I have to access said data is through the official web site of the organization hosting it, therefore the user must authenticate his access. So, the whole thing boils down to parsing the data from the HTML generated by the portal.
I have already developed a parser, and it works nicely, if the HTML file is provided as a normal, local file (i.e. the user goes on the site, logs in, sees the data he wants, saves it as a web page in HTML format, and then feeds it into the application). I can even access the file online (the component is written in Java, so it's just changing the stream from local to url). But, as you can see, the file is not static, and to generate it, authentication is in order. Furthermore, I cannot have the credentials of the user.
What I wanted to do is create a more user-friendly solution than guiding the user to save the page and feed it into the application. I tried creating a simple page with an iFrame which would load the external page, then would allow the user to navigate to his data, and by pressing a button would save the HTML of the page. And then I came across Cross-domail policies, which I had no idea about.
Since I am no good with JavaScript, I thought of using another Java component. Invoking Swing/SWT components, I created a web-browser window, having complete control over the data, so the idea functioned: when the user is seeing the data he wants, a simple click is sufficient to save it, pass it as an argument, even pre-parse it, as you can imagine. But then, my application runs on Tomcat, so when working in local, everything is OK. When trying to access the Java component from another station, the window never appears, it is instead displayed on the server side.
So, another dead-end. Today I ran across another idea, suggesting the use of a PHP proxy script. That is, I create a simple PHP script on my server, and instead of quering the foreign site, I query my script instead. My script queries the site, gets the data, and then passes it back to me. I could get around to it, I actually think it's quite robust and good a solution, but here is the thing: how can I access the data since I have no access to it? Is the only option to request the credentials of the user and log into the system by using a GET method with the appropriate parameters?
Perhaps I am really confused, but it's been more than a week now, and I really am at a loss. Any suggestion is welcome, and I thank you in advance! :)
Cheers!