I have to implement a system that analyses the image data captured by
a video camera mounted in the window of a shop.
(The images captured are in a grey-scale format)

I need to perform basic activity, person tracking and recognition.

The activity recognition is aimed to determine the level of interest
in the items displayed in the shop window. There are two types of
activities that need to be recognized: (1) person walking past the
shop and (2) person looking at the shop window. By counting the number
of persons that stop and look for some time at the window, the level
of interest in the shop can then be derived.
The person tracking recognition is aimed at determining how many
different persons pass in front of the shop. This information can be
again used to determine the level of interest in the shop (person may
be returning to get another look at the items on display) as well as
identify possible criminal intent (“scoping the place”).
Therefore, the system needs to be able to track and identify all
persons in the scene and label their overall activity.
The activity needs to be labeled based on the cumulative information
about each person tracked. Specifically,
each new person entering the scene will be given the “person walking”
activity label. If they stop in front of the shop then, their activity
changes to “person looking at the window shop”. The system receives
the information in the form of a sequence of images.

The system should be able to do the following:
a) build a suitable average frame from a given sequence of images
b) clearly specify how many persons are present in each of the test
frames as well as:
i) the position and identity of each of the persons (using a
bounding box) and the label of their activity
ii) clearly specify if any of the persons have been seen before in
the sequence

SO i need some advice on how I can achieve this?

Thanks

The system should be able to do the following:
a) build a suitable average frame from a given sequence of images
b) clearly specify how many persons are present in each of the test
frames as well as:
i) the position and identity of each of the persons (using a
bounding box) and the label of their activity

SO i need some advice on how I can achieve this?

Thanks

Because you need to tell how many persons are in front of the window, you need somesort of person-detection. OpenCV with HAAR-detection comes to mind.

Once you have the detection, the rest should be easy. Just calculate the center of each person and track how many pixels (inches) they move in a given timespan. If it's less then x: they're standing still, else: move.

ii) clearly specify if any of the persons have been seen before in
the sequence

What do you mean? Does your system need to recognize of this person has looked in the window before? If so: That's going to be damn hard to code. I've been working with CV for some time now and individual person detection is extremely hard.

Yea that last bit I need to clear up with the lecturer, sounds difficult

Thanks for the other advice, gives me a starting point

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.