SED: Conditionally removing \n

Question

tomok 0 Newbie Poster

17 Years Ago

Hi guys,

After trawling through the forums and interweb, hopefully someone here can offer me some help with sed!
I have a csv file that contains records. Each field is delimited by a comma, and each row is delimited by a \n. However, in some cases, I need to remove the \n that may occur in a string between two comma's.

e.g. 1 (correct) TAP,NO,NO,Country,NSW,NSW,No,Yes,,LLL,,Rural and Remote,,,,,,

e.g. 2 (incorrect) TAP,NO,NO,Country,NSW,NSW,No,Yes,,LLL,,"Regional Towns
Added 22 May 04 by Mwun-LLL/8M",,,,,,

As you can see in example two, after "Regional Towns", there is a \n that needs to be removed.
I can't remove every single occurence of \n because that will just give me one huge line. I can only remove the \n when it occurs between comma's.

I am quite new to sed, so any help would be much appreciated!

shell-scripting

3 Contributors
11 Replies
121 Views
1 Week Discussion Span
Latest Post 17 Years Ago Latest Post by eggi

masijade 1,351 Industrious Poster

17 Years Ago

I haven't tried it very often, but look up the sed command "N". It is meant to append an additional line into the pattern space. Try something like

/"[^"]*$/{
N
s/\n//
}

Edited 12 Years Ago by diafol because: fixed formatting

masijade 1,351 Industrious Poster

17 Years Ago

Just to clarify (in case you don't realise it)

sed 's/^.*,\n$//'

doesn't work because sed is a line editor. It works one line at a time, and the "\n" at the end of the line is the end of the pattern, i.e. it is $. For that reason "\n$" will never match. Also, you cannot remove/replace the newline, for the same reason. That is the reason that the "N" command exists, so that you can pull an additional line into the pattern space, thereby making the \n a part of the pattern, rather than it being the end of the pattern (now the \n at the end of the second line is the end of the pattern, until you read in a third line, etc.)

At least that is the way I understand it. I could be wrong though.

Edit: And, by the way, just as a side note, if that command had actually done anything, it would have deleted the entire line. ;-)

masijade 1,351 Industrious Poster

17 Years Ago

It's cool. I was mainly explaining for the OP's benefit. The more he understands the less he has to ask. ;-)

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

eggi 92 Posting Whiz · Answer 1 · 2008-03-08T12:14:20+00:00

Hey There,

The above solution is good for 2 lines. It can be modified for an indefinite number, also, but that doesn't come to my mind right this second ;)

If the \n you want to remove is always at the exact end of the line and you have a variable amount of \n's to contend with, you can try this also, if you know what the exact format of a line should be (looks like a comma at the end and never a comma directly preceding a \n, from what you've posted but that might not always be the case, in which case this won't help - sorry):

sed 's/^.*,\n$//'

and it should take care of you in that limited circumstance

Best wishes,

Mike

tomok 0 Newbie Poster · Answer 2 · 2008-03-11T06:40:22+00:00

Hi guys,

Thanks for your help, it's certainly put me on the right track!

Mike, you are correct about masijade's sed statement, it breaks when there is more than 2 \n in a line.
However, running your sed statement produces no results for some reason. Looking at it, it does what I need it to do, i.e. if a line ends with ",\n", remove that \n. All lines need to end with a comma.

Any thoughts as to why it might not be working? I am pretty confident that the text file is UNIX format, as masijade's statement works okay.

Thanks in advance

tomok 0 Newbie Poster · Answer 3 · 2008-03-11T07:30:32+00:00

Actually, looking back at my previous post, I am contradicting myself.
What I need is a sed statement that searches for all lines that DO NOT end with a comma, and remove that \n at the end. All lines need to end with one or more comma's.

Hope that makes more sense

eggi 92 Posting Whiz · Answer 4 · 2008-03-11T11:07:43+00:00

Hey There,

My bad on the sed statement. I meant to do the substitution, and left out the match operators :(

Given your extra post, Masijade's post is dead on.

just sub

/"[^"]*$/

with

/.*[^,]$/

Hopefully we were able to help out in tandem :)

, Mike

tomok 0 Newbie Poster · Answer 5 · 2008-03-12T07:10:43+00:00

Works like a charm!
Thanks heaps Mike,

tom.

eggi 92 Posting Whiz · Answer 6 · 2008-03-12T11:07:20+00:00

Glad to contribute :)

Take care,

, Mike

eggi 92 Posting Whiz · Answer 7 · 2008-03-14T09:44:28+00:00

Yes, I understand - you have to cut me some slack for answering questions in the middle of the night ;) Corrected in my follow-up post.

Thanks :)

, Mike

eggi 92 Posting Whiz · Answer 8 · 2008-03-14T23:03:32+00:00

Definitely, no hard feelings, just kidding around with ya,

I really do need to get some sleep, though ;)

Take it easy :)

, Mike