Hi everyone,

I'm a relative beginner to writing UNIX scripts. In the past, I've been able to hack together simple scripts. Now I need a script which a little more complex than I'm used to, and I really need help. I'm up against a tight deadline and am growing desperate, as I can't seem to find a solution either on the web nor in my UNIX programming book.

Here's the problem: I'm on a SunOS system. On the machine, I have a large number of files scattered across a vast directory structure. I have to copy all those target files into my home directory. Luckily, the directory structure is well-organized. It looks like this:

/root/projects/*ARCHIVE*/date/*SUBARCHIVE*/output/*SUBSUBARCHIVE*/The_Files_I_Need

The directories in lowercase have constant names - I don't have to worry about them ever changing. But the directories I've named with *ALLCAPS* do change names. Think of them as wild cards.

Put another way: If I wanted to exhaustively list every directory, the top tier would looks like this:

/root/projects/PROJECT01/
/root/projects/PROJECT02/
/root/projects/PROJECT03/
/root/projects/PROJECT04/
...
/root/projects/PROJECT20/

The next tier would look like this:

/root/projects/PROJECT01/date/JAN2000/
/root/projects/PROJECT01/date/FEB2000/
/root/projects/PROJECT01/date/MAR2000/
...
/root/projects/PROJECT01/date/MAY2010/
/root/projects/PROJECT02/date/JAN1995/
...
/root/projects/PROJECT20/date/DEC2005/

And the next tier would look like this:

/root/projects/PROJECT01/date/JAN2000/output/SAMPLE0001/
/root/projects/PROJECT01/date/JAN2000/output/SAMPLE0002/
/root/projects/PROJECT01/date/JAN2000/output/SAMPLE0003/
...
/root/projects/PROJECT01/date/JAN2000/output/SAMPLE1328/
...
/root/projects/PROJECT20/date/DEC1995/output/SAMPLE483822/

And so on. The target files I ultimately need to read are in those final directories.

/root/projects/PROJECT01/date/JAN2000/output/SAMPLE0001/TARGET_102932
/root/projects/PROJECT01/date/JAN2000/output/SAMPLE0001/TARGET_32323
/root/projects/PROJECT01/date/JAN2000/output/SAMPLE0001/TARGET_32999293
...

There are literally thousands of these target files, all with dynamic names.

So the problem I'm having is I can't just do a "cp /root/projects/*/date/*/output/*/*" because the pathnames become too long. I can't hardwire the directory names I don't know because there's obviously too many of them. I've been experimenting with code, but my results have been frankly pitiful. I'm sure there's some way of doing this as a loop-within-a-loop-within-a-loop... but I can't figure out how to do it.

Here's the quasi-code I've been trying to get to work:

==============================================================================

#!/bin/bash

# create tmp directory into which I'll copy the files
mkdir ${HOME}/TMP


# jump into first common directory, start to drill down
cd /root/projects
for i in PROJECT01 PROJECT02 PROJECT03 (...) PROJECT20
do
  cd $i/date
  ls > SUBARCHIVE_LIST              #how to dynamically store the *SUBARCHIVE* values?
  for j in SUBARCHIVE_LIST
    do
      cd $j/output
      ls > SUBSUBARCHIVE_LIST       #same problem here!
      for k in SUBSUBARCHIVE_LIST
        do
          cd $k
          cp * ${HOME}/TMP          #here I copy the files
        done
    done
done

==============================================================================

Can anyone help? I hope so! I'm hoping this is a relatively easy problem for you experienced folks.


PS - sorry for the very long text; I try to be precise

Break it into steps

1. Find files

cd /root/projects
find PROJECT* -type f > $HOME/allProjectFiles.txt

Gets you something like this

$ cat $HOME/allProjectFiles.txt
/root/projects/PROJECT01/date/JAN2000/output/SAMPLE0001/TARGET_102932
/root/projects/PROJECT02/date/JAN2001/output/SAMPLE0002/TARGET_32323
/root/projects/PROJECT03/date/JAN2001/output/SAMPLE0003/TARGET_32999293

2. Extract ones of interest $ awk -F'/' '$6 ~ /JAN2001/' $HOME/allProjectFiles.txt > $HOME/selectedProjectFiles.txt Now you can make the condition in red as complicated as you want.
You can match literal strings, or regular expressions. $2 == "PROJECT01" && $6 ~ /JAN200[0-4]/ being the PROJECT01 files for January for the first half of the decade.

3. Copy files $ cat $HOME/selectedProjectFiles.txt | while read line ; do echo cp $line ${HOME}/TMP ; done When you're happy that the printout of copy commands looks good, just delete the echo and it will actually do the copying.

Outstanding! Thank you so much, this is exactly what I hoped for! :)

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.