Hello :) So I'm doing a project based on semantic analysis of Java code whereby my program (hopefully a plug-in) will prompt the user with solutions to his/her compiler errors with the option to auto-fix. What sets my project apart from a program like Eclipse is that it's targeted for first year students and so errors like:

for (int k = 0; k < 10; k++);

will be flagged as a possible semantic error. Of course there are quite a few other errors that I check for and for those errors I have no prescribed solution for, I hope to use a neural network to 'learn' how those errors are fixed.

I was wondering, can anyone point me in the direction of reference material, or has anyone dealt with anything like this in the past? Any information would be greatly appreciated!

This is what I have in mind so far:

A compiler tool designed to identify and possibly correct logic errors in source code. Its application shall be intended to aid first year Java programmers. The following logic errors shall be preprogrammed into the code:

  • Semicolons at the end of looping and decision constructs, e.g. if(a > 1);
  • Variable declaration and identifier naming rules (includes class name matches file name as well as variable scope)
  • Ill structured Boolean expressions (using pseudo-code instead of legal Java code and using = instead of ==)
  • Misspelling of method/variable names (hopefully will also lead to correct class being used; simpler to spell check but using Java as then target language)
  • Enforcement of Java conventions e.g. capital letters for constant names
  • Differences between int and double mathematical operators
  • Mathematical analysis of user formulas (to alert programmer of possible outcome). For example pi*r would be flagged as possibly incorrect with pi*r^2 as a possible correction.
  • Structural Errors (placement of the import keyword; method body placement)

The tool will provide a detailed description of why these errors have occurred and how they are fixed. It is thus in essence a learning tool. Explanations will not be provided for learnt rules.

It will not be restricted to pre-programmed checks, but will have the ability to learn how errors are corrected by the programmer and so in future analysis, these suggestions will be included. This is useful if a tutor corrects the student’s error. This process could be implemented in a number of ways, but a neural network shall be used whereby the compiler error (network input) is mapped to a change in source code (network output). As this process is dynamic, this means network will somehow determine the best solution for the error. Of course the network will be pre-trained will the errors mentioned above.

We shall attempt to allow the neural network used in map between the input and output space to be selected and so a new ‘personality’ as such will be used so that it meets the needs of the programmer. This depends on the performance of the tested network architectures – should there only be one solution, user selection will not be applicable.

So I know that I have to work on my implementation of a Neural Network and I have, but for now I need to dig up information about it. I know that it's been done in Fortran, just can't find any information on it -sigh-

So any thoughts?

Does not look trivial. Contact a freelancer

A freelancer? I'm pretty sure I can handle it, just wouldn't mind some ideas. Things that I could consider, stuff like that. My post is acutally a bit wrong. I don't want help with the neural network part (though any suggestions are very welcome!). I have to write a formal proposal for this project in which we need to consider pervious implementations. I haven't yet found any. I know it's been done in FORTRAN (it as in a auto-correct feature), I just don't know what it's called. So if people already know stuff about it, that would be super useful! It's not that I don't know how to use regular expressions and grammars, rather I'd like to discuss the concept of my project and perhaps improve it :)

If it was C++, the assignment would have been easy since you have compiler tool like lex and yacc. I do not whether you have simmilar tool for Java.

Oh most definately I plan to use Lexical and Parser Generation tools. JFlex and JavaCC are my tools of choice. That's the easy part... and it helps that the grammar for Java is already available.

Member Avatar for iamthwee

Of the neural networks, I'd go for the self organising ones.

Well, Netbeans has an "auto-correct" suggestion feature that you may be able to glean some ideas from if you want to study the source: http://www.netbeans.info/downloads/dev.php

Joone - Java Object Oriented Neural Engine http://www.jooneworld.com/ might be useful in your neural network development.

While I can see the value in pointing some of these things out to a beginner, you mention a few rules that may not be errors necessarily:

quote=PoovenM;390217

for (int k = 0; k < 10; k++);

will be flagged as a possible semantic error.

This for instance may be intentional depending upon the conditions within the loop itself, i.e. index advancement.

quote=PoovenM;390217

  • Differences between int and double mathematical operators
  • Mathematical analysis of user formulas (to alert programmer of possible outcome). For example pir would be flagged as possibly incorrect with pir^2 as a possible correction.

Deciphering the mixing of int and double operations may prove pretty tricky, though I guess you could make suggestions about casting etc if the compiler is complaining. The formula analysis though will probably be nigh impossible since, from your example of pir, pid is valid as the circumference and r or d has no intrinsic meaning within a program.

Anyway, those considerations aside (and obviously they are just opinions :) ), good luck with the project!

Hey :) Gosh I didn't realize people replied (thought I do have eMail notification on). It woulda been nice if I read this before I handed in my proposal! lol

Self-Organizing maps are rather fascinating... though they are mainly used to map from one dimension to another... especially useful if the input data is in the form of a vector whose dimension you wish to reduce, perhaps for visual inspection. I wasn’t going to consider using them (cause they seem rather difficult to implement – ah but nothing is as it seems!) but when I’m testing architectures out, I will give this one a go. I plan to use NeuroSolutions to carry out my testing phase, but I’ll use Viscovery for SOMs. I’m not sure how I would represent my input though hmm…

With regards to Mathematical formula analysis… I agree it does seem somewhat impossible, but it can be achieved in a rather clumsy way: if you consider Math.PI * d; the variable ‘d’ would be tokenized and classified as a variable name during syntactic analysis, so if I see a Math.PI and it’s being multiplied by a variable then I could just flag that as a possible mistake.
I really need to think about that more! So I’ve spoken to one of my Professors and in due course shall sit down and discuss the possibilities with him.

Thank you for the links Ezzaral, I also discovered that Eclipse has code analyzer plugins and a very nifty program called JTest. Joone seems very promising and I will definitely give it a visit! Oh and yeah I think I’ll need the luck! So thank you.

Thank you guys for all of your help!

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.