Building directory heirarchy from irritating data

Question

roswell1329 11 Junior Poster in Training

15 Years Ago

My company is retiring an online document warehousing application that stored lots of text data. The application stored the data in a folder hierarchy that looked just like a Windows folder tree. I'm trying to replicate that hierarchy on a UNIX file system, but the tools provided with the application to extract the hierarchy information are not terribly useful.

One tool will give me a list of folder ID numbers and names like so:

Folder 8777 - fozzy.
Folder 8778 - fozzy1.
Folder 8779 - fozzy2.
Folder 8780 - grover1.
Folder 8781 - grover2.
Folder 8782 - rolf1.
Folder 8783 - rolf2.
Folder 8784 - rolf3.
Folder 8785 - rolf4.
Folder 8786 - travel_statements.
Folder 8787 - invoices.

Another tool will give me sub-folder relationships based on folder ID like the following:

Folder 100 - <root>.
   subfolder 101 - flag 0.
   subfolder 119 - flag 0.
   subfolder 227 - flag 0.
   subfolder 239 - flag 0.
   subfolder 1198 - flag 0.
   subfolder 1320 - flag 0.
   subfolder 2264 - flag 0.
   subfolder 3025 - flag 0.
   subfolder 3028 - flag 0.
   subfolder 3031 - flag 0.
Folder 1198 - kermit1.
   subfolder 1227 - flag 0.
   subfolder 1231 - flag 0.
   subfolder 1238 - flag 0.
   subfolder 1374 - flag 0.
   subfolder 1504 - flag 0.
   subfolder 1538 - flag 0.
   subfolder 1642 - flag 0.
   subfolder 2459 - flag 0.
   subfolder 2635 - flag 0.
   subfolder 2642 - flag 0.
   subfolder 3998 - flag 0.
   subfolder 7942 - flag 0.
   subfolder 8656 - flag 0.
Folder 1227 - monkey1.
   subfolder 1228 - flag 0.
   subfolder 1327 - flag 0.
   subfolder 1347 - flag 0.
   subfolder 1390 - flag 0.
   subfolder 1396 - flag 0.
Folder 3333 - piggy1.
   No sub folders.

I first approached this problem by just looping through the list of folder ID's, and for each folder ID run a recursive function that would continue to scan through the sub-folder information until a path could be built back to the root folder (folder ID 100). This appeared to work great, but I encountered 2 problems:

I discovered that some sub-folders were present in more than one location, but my code only picked up the first instance
I also found that some folders were positioned outside the hierarchy of the root folder

Next, I tried using the sub-folder information to start with. I built a list of simple strings representing one parent/child pair like this: 100/1198. Then, for each pair, I looped through the sub-folder info again and tried building paths based on the child element matching the parent element of any scanned lines. This caught some of the duplicate paths, but I ended up with a bunch of paths that had no relationship to the beginning or end of the tree.

Can anyone here think of how I could build folder hierarchy based on this kind of data? Or can anyone here even think of a good way I could represent this data internally so I could build the paths without missing any possible path combinations? Any assistance would be greatly appreciated. Thank you!

perl unix

2 Contributors
2 Replies
95 Views
1 Week Discussion Span
Latest Post 15 Years Ago Latest Post by roswell1329

All 2 Replies

k_manimuthu 43 Junior Poster in Training

15 Years Ago

use strict;
use warnings;
use File::Path;
use Cwd;

## Source.txt file having for the above data
undef $/;
open (FIN, "<source.txt") || die "Cannot Open the Input File";
my $file=<FIN>;
close (FIN);

my ($root, @lines, @folder, $flag, $cwd);

# Get root folder name
$root=$1 if ($file=~ m{(\d+).*?<root>}s); 

# The input file store in to a array
@lines=split(/\n/, $file); 

# ( o => root folder process, 1 => second level folder process)
$flag=0;

# Get current working directory and change the root folder name.
# insist of $cwd you may assign your location
$cwd=cwd();  
$root="$cwd/$root";

# Here $lines[0] ignored, because root folder name already captured.
foreach my $i ( 1 .. $#lines)
{
	# Generate 2nd level folder
	if ( $lines[$i]=~ m{^folder (\d+)}i)
	{
		push (@folder, $1); $flag=1;
		mkpath ("$root/$folder[$#folder]");
	}
	# Generate sub folders of 2nd level folder
	elsif ($lines[$i]=~ m{^\s*subfolder (\d+)}i && ($flag == 1))
	{
		mkpath ("$root/$folder[$#folder]/$1");
	}
	# Generate sub folders of root folder
	elsif ($lines[$i]=~ m{^\s*subfolder (\d+)}i)
	{
		mkpath ("$root/$1");
	}
}

I assumed the data should be
1) 1st level ( Root Folder and their sub folder).
2) 2nd level ( 2nd level folder and their sub folder ).

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

roswell1329 11 Junior Poster in Training · Answer 1 · 2009-12-01T05:58:48+00:00

Hello k_manimuthu, thank you for your post. Unfortunately, the data was not merely level 1 and level 2 data...the depth could have extended indefinitely (though usually no deeper than 5 or 6 levels). However, I was able to get the assistance I needed over at PerlMonks. The link will take you directly to the node with my question. The code they provided me with was brilliant, and worth a look if you're interested. Thanks to everyone else to even looked at this post!

Building directory heirarchy from irritating data

Recommended Answers Collapse Answers

All 2 Replies

Recommended Answers