Hello,
I am trying to write a bash shell script that does the following:
I would really appreciate if someone can help me correct my code that i have written below:
1.Finds all *.txt files within my directory of interest (files are in sub-directories)
2. reads each of the files (25 files) one by one (tab-delimited format and have the same data format)
3. skips the first 10 rows of the file
4. extracts and prints out columns 2,14 , 15 into one output file
5. adds a new column to the final output file with the name of the txt file from where the data was extracted.
I have written a shell script which is not working properly and doesnot have the code for the part to skip 10 rows.
Below I have pasted a sample input file, output file and my code
Input file format: The actual data starts from the line:
DATA 1 1 1 0
TYPE text text text text integer float float text text text integer integer integer integer
FEPARAMS Protocol_Name Protocol_date Scan_Date Scan_ScannerName Scan_NumChannels Scan_MicronsPerPixelX Scan_MicronsPerPixelY Scan_OriginalGUID Grid_Name Grid_Date Grid_NumSubGridRows Grid_NumSubGridCols Grid_NumRows Grid_NumCols
DATA miRNA-v1_95_May07 (Read Only) 5/2/2007 12:14 1/26/2008 11:25 Agilent Technologies Scanner G2505B US45102930 1 5 5 a18d8bd4-628a-4054-b2ba-45c7a66de583 016436_D_20070426 4/26/2007 0:00 1 1 192 82
*
TYPE float float float integer integer float integer float float float integer float float integer
STATS gDarkOffsetAverage gDarkOffsetMedian gDarkOffsetStdDev gDarkOffsetNumPts gSaturationValue gAvgSig2BkgNegCtrl gNumSatFeat gLocalBGInlierNetAve gLocalBGInlierAve gLocalBGInlierSDev gLocalBGInlierNum gGlobalBGInlierAve gGlobalBGInlierSDev gGlobalBGInlierNum
DATA 26.709 27 5.44777 1000 1203179 1.11899 0 38.7173 65.4263 2.95429 12029 65.4263 2.95429 12029
*
TYPE integer integer integer text integer text integer integer text text text text float float
FEATURES FeatureNum Row Col chr_coord SubTypeMask SubTypeName ProbeUID ControlType ProbeName GeneName SystematicName Description PositionX PositionY
DATA 1 1 1 0 0 1 miRNABrightCorner30 miRNABrightCorner30 miRNABrightCorner30 6774.29 228.723
DATA 2 1 2 66 Structural 2 1 DarkCorner DarkCorner DarkCorner 6800.2 229.421
DATA 3 1 3 chr14:100595916-100595897 0 3 0 A_25_P00010115 hsa-miR-154* hsa-miR-154* NA 6826.51 228.385
DATA 4 1 4 chr8:135881995-135882010 0 5 0 A_25_P00010390 hsa-miR-30b hsa-miR-30b NA 6850.48 228.853
Output format: tab delimited file. The last column shows the filename from which the data was extracted
1 6774.29 228.723 ABC.txt
2 6800.2 229.421 ABC.txt
3 6826.51 228.385 DEF.txt
4 6850.48 228.853 DEF.txt
5 6875.37 228.408 XYZ.txt
6 6900.98 229.321 XYZ.txt
My incomplete code: It is missing the skipping rows steps. Also it throws an error:
'test1.sh: line 3: syntax error near unexpected token `do
'test1.sh: line 3: `do
for filename in $(find -iname '*.txt')
do
awk -F"\t" '
BEGIN {OFS="|"} {print $2,$14,$15,FILENAME}
' $filename > output.txt
done