• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Grep expression to include carriage return if it's there

Contributor ,
Nov 04, 2016 Nov 04, 2016

Copy link to clipboard

Copied

Hi all,

Is there a grep expression that is able to handle a break return or other non-text character that might or might not be between a pair of tabs?

I'm using Grep within BBEdit to convert Excel data into tagged XML, to then import into InDesign. However, when someone has put a carriage return within a cell in Excel, my find/replace won't handle that properly. I can use a workaround, but I didn't want to amend the original data if possible.

My expression looks for 7 columns of text, and takes each piece and puts tags around them.

filename    |    full name    |    star    |    company    |    biog    |    city    |    twitter

johnsmith.tif    |    John Smith    |    yes    |    DeLoitte    |    I work at Deloitte and I'm brilliant.    |    New York    |    @myhandle

Find:

^(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)$

Replace:

<person>

<picbox href="file://Headshots/\1" />

<name>\2</name>

<star>\3</star>

<company>\4</company>

<biog>\5</biog>

<city>\6</city>

<twitter>\7</twitter>

</person>

But what if one of those columns has a break return but all the others don't? - Just to be clear, it MIGHT have a carriage return, it MIGHT NOT - this is the tricky bit.

Is there a grep expression that can account for that?

johnsmith.tif    |    John Smith    |    yes    |    DeLoitte    |    I work at Deloitte

and I'm brilliant.    |    New York    |    @myhandle

If anyone can help, that would be amazing.

Thanks,

Justy

Views

1.3K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Nov 04, 2016 Nov 04, 2016

Copy link to clipboard

Copied

Hi,

If you want to go greo, the closest thing I can think of is

\t?[^\t]+\t?

But you would have to do some extra cleaning.

Otherwise, I would consider scripting given that you can export excel to CSV.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Nov 04, 2016 Nov 04, 2016

Copy link to clipboard

Copied

Ooo that's brilliant.

Would you kindly explain what it's doing? If I repeat it 7 times it picks up all the columns needed if I wrap it in ^ and $.

But I then have a problem with my replace string as I can't work out what bit of the expression is the text and so meant to be kept.

My actual, original find and replace strings so you can see what I'm transforming:

Find

^(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)$

Replace

<bounding>\n<picbox href="file://Headshots/\1" />\n<biogbox>\n<name>\2</name>\n<star>\3</star>\n<company>\4</company>\n         <bio>\5</bio>\n<citybox>\n<city>\6</city>\n<twitter>\7</twitter>\n</citybox>\n</biogbox>\n</bounding>

Many thanks

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Nov 04, 2016 Nov 04, 2016

Copy link to clipboard

Copied

Hi again,

Sorry to be a pain, I'm trying to understand your grep expression more. Is my breakdown of it correct?

\t?[^\t]+\t?

\t?      = \t looks for a tab, ? zero or one time

[^\t]+  = [ start of a pattern, ^ beginning of a line, \t tab, ] end of pattern, + pattern appears one or more times

\t?      = (as first line above)

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 04, 2016 Nov 04, 2016

Copy link to clipboard

Copied

LATEST

To elaborate on Loic's explanation, just a bit, the ^ inside the opening bracket makes the class "negative" so it finds anything except the characters following it up to the closing bracket.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Nov 04, 2016 Nov 04, 2016

Copy link to clipboard

Copied

Trying to morph a xml file from csv via GREP looks very artistic to me. I am not sure I can provide much more help here.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Nov 04, 2016 Nov 04, 2016

Copy link to clipboard

Copied

It's not that scary. It's only plain text either side of the original data. Thanks again.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Nov 04, 2016 Nov 04, 2016

Copy link to clipboard

Copied

Some online tools intend to do such transformations.

Give this a try:

CSV To XML Converter - BeautifyTools.com

And let us know.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Nov 04, 2016 Nov 04, 2016

Copy link to clipboard

Copied

I've used a similar website to do the same thing. That's when I decided that should the website ever go down, or it needs to be more complex, I need a way to do it offline. That's where grep is helping. I can certainly get it into a workable format. Having imported over 300 biographies in a few minutes. Just hoping to refine my grep expressions so the original text is tampered with as little as possible.

Any errors are then the client's problem, not mine.

When I get the right expression I'll post back.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Nov 04, 2016 Nov 04, 2016

Copy link to clipboard

Copied

Hi,

Basically it looks for any character other than tab one or more times that can be possibly surrounded by tabs.

So it's

\t?      = \t looks for a tab, ? zero or one time

[^\t]+  = any possible character but tab one or more times

\t?      = (as first line above)

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines