Hello
I have seen a number of find/change and GREP formulas to do similar things. I have NO scripting or coding experience and have labored to understand GREP.![]()
So I am a little afraid to use it as I don't know what all the modifiers refer to (I do have a printout of some neat GREP cheatsheets like Mike Witherell's that I can absorb until I obtain a good reference )
I need something I can copy and paste into either find/change or GREP dialog that will do the following in less than 12 steps (hopefully) without doing something catastrophic like removing all of my paragraph marks (which I almost did using someones GREP expression)
I think that's it
I did find this one recently (Maybe Jongware?)
[~m~>~f~|~S~s<~/~,~3~4%]{2
Which from my dim understanding addresses em, en, flush and hair space , nonbreaking space ,figure space,third space--not sure of the rest.
Really this is way over my head.
I know this will be a piece of cake for you guys
Thanks
I was hoping Jongware would come in with something really elegant (and maybe he still will) but in the meantime, my approach would be to start by eliminating all multiple whitespaces except paragraph returns and forced line breaks. This seems to do it:
Find (\s)(\p{space_separator}|\t)+ and replace with $1
This will leave the first whitespace and remove any following whitespace up to the point that a line or paragraph break (and not completely tested, but I suppose other sorts of breaks) is encountered, leaving the paragraph or line break intact. Note that this will destroy tables built with tabs (as opposed to "real" tables) that have multiple tabs between items, and it will not remove a single whitespace before a paragraph or line break.
Next I would remove the whitesapce at the ends of paragraphs and the like:
Find (\s)(\n|\r) and repace with $2 seems to do that, and it also seems to leave multiple returns (I don't know if you want to remove those) and to work with other breaks as well (again, not fully tested). The simpler \s$ and replace with nothing removes the first return in a two-return sequence and seems to ignore th other types of breaks completely.
At this point there should not be any multiple whitespaces other than possibly blank paragraphs. If you want to get rid of those, you can run the Find/Change By list script of the built-in multple returns to single return query in the find/change dropdown list.
So now you need to find opening single and double quotes, parentheses, brackets or braces and remove a space after them if it exists:
Find ([\[\{\(~{~[])(\s) and replace with $1
and finally remove any space before your selected punctuation and the closing cases of the items above:
Find (\s)([.,;:!\)\]\}~}~]]) and replace with $2
The last two queries will probably also work with look-bhind for the first and look-ahead for the second (putting the classes in the look expressions) and repalcing with nothing, but I'm not sure which method is more efficient. The last query could conceivably also miss a space followed by an apostrophe or mistakenly remove a space before a work that starts with an apostophe (again, not thoroughly tested). and is ignoring straigh quotes of any type as they are ambdextrous and might want space on either side.
Hopefully the forum didn't mess up any of those expressions...
Hi Peter,
At the risk of sounding stupid. Nevermind it will be stupid.
What is the space separator in the first solution?
Find (\s)(\p{space_separator}|\t)+ and replace with $1
Its not an underscore is it?![]()
Could I just borrow your brain for a few weeks? I promise to give it back when I'm done.![]()
\p{space_separator} (exactly as written) is a comprehensive wildcard for a large variety of spaces. It works like \s, but does not include the linebreaks and paragraph breaks in the found results.
It would be tempting, for example to use (\s)(\s)+ to find any whitespace followed by any amount of other whitespace, but the \s will also pick up the paragraph breaks, so if you have a space at the end of a paragraph, you lose the paragraph break. The \p{space_separator} won't see that as two whitepaces, so the paragraph is preserved, but you then must go back and remove any spaces before a paragraph break in a second pass ( the second query).
No need to feel stupid. I had to do a bit of research this morning to come up with that myself. I've never seen it in use before.
Hi Peter
Phew! I did too (web research) because I thought maybe it was a substitute for some kind of unicode mark that the forum wouldn't allow you to insert as is and it had to be spelled out.
So I ran your first solution and it worked very well. On to the others to see how they do.
Thanks again
Hi Peter,
Good to know about the space seperator.
I think I did follow someone's GREP (prior to giving up and coming here) and it included the (\s)(\s)
Thank goodness I used it slowly because it started eating my paragraph/line breaks like pac man
So everything worked fine and the manuscript is looking very tidy.
I am doing a close pass thru and well let you know of any glitches. Can't find a one so far.
The combination of these GREP formulas would be a very nice package to run on a large text to really tidy it up and make it look professional. I wish I knew scripting because I would try to consolidate these features. I'm sure someone somewhere has done it but after 8 hrs search yesterday I sure couldn't find it.
The Chicago Manual of Style and others are kind of picky about these spaces and punctuation marks etc. So the info you shared is a great feature sans proofreader.
Thanks again!
Peter Kahrel (whose ebook is the source I used this morning, and a reference I highly recommend at only about $10) has a lot of free GREP and scripting aides on his website. Take a look at http://www.kahrel.plus.com/indesign/grep_query_manager.html which will allow you to make a "chain" from this set of queries that you can then run in one step.
Hi Peter
Thanks. I got both the query manager and GREP editor from Kahrels' site. I managed to form a chain of queries from what you provided here today as well as having the time to sit down and dissect every part of your solutions, getting to know some associations etc. Pretty interesting stuff but still tough .
So to recap (and provide others with a distilled version of all of this) would you say the below is accurate ?
In particular I am interested that not only are offending spaces removed BUT that spaces are preserved or inserted appropriately
What do you think?
Peter is too modest, he's doing just great.
- No space BEFORE-One Space after ---period,semicolon,colon, exclamation, question mark,CLOSING Parenth,Bracket,Brace, single & Dbl. quotation marks
- Find (\s)([.,;:!\)\]\}~}~]])
- Replace with $2
- No space AFTER-One Space Before----OPENING bracket,brace,parenthesis,Dbl & single quotes
- Find ([\[\{\(~{~[])(\s)
- Replace with $1
These remove the space before/after but do not automatically add a space after/before.
In the first case, you could add a space right after the '$2' in the Replace With string, but it already may have a space, in which case you suddenly have two. One alternative is to optionally remove it with the Find string (add it as an optional match) and always add it with the Replace string, but remember that this string will only be found if there is a space preceding it. That way you'd only check the space after in cases where there was a bad space before.
So I propose you add another two find/changes to add the space, only when necessary.
One Space After: find
([.,;:!\)\]\}~}~]])(?!\s)
replace:
$0 [followed by one single space]
One Space Before: find
(?<!\s)([\[\{\(~{~[])
replace:
[one single space] $0
Color-coding with my WhatTheGrep might make it just a tad clearer what's going on in that jumble of codes:
(1 [ .,;:! \) \] \} ~} ~] ] 1) (?!! \s !)
and
(?<!<! \s <!) (1 [ \[ \{ \( ~{ ~[ ] 1)
(Orange is lookahead/lookbehind, blue is a regular escaped character, pink is an InDesign special character, green is normal grouping parentheses, and lavender is a character group.)
Some more about this notation:
Eleivana07 wrote:
What is the space separator in the first solution?
Find (\s)(\p{space_separator}|\t)+ and replace with $1
Its not an underscore is it?
A funny thing: it doesn't matter
The name of this character group is "Space Separator", but
1. it is case insensitive (other than almost all other GREP codes!)
2. it is separator insensitive! You can use 'space-separator', 'space_separator', 'space separator', and even 'spaceseparator'
It also has a shortcut: "Zs" (which also is case and separator insensitive, so you can use "\p{z-s}" or "\p{zS}"). The simple search string
\p{zs}{2,}
will find any two or more spaces in succession (excluding tabs, though).
Another freebie is that you can use the same code negated: \P{zs} will match anything not in this set.
There are loads and loads of useful named character groups described in Peter Kahrel's O'Reilly shortcut about GREP.
Theun,
I actually tried the short version of /p{zs} suggested by Peter K before posting, and it was givin me strange results, returning single spaces and the first character following in a word. I did my testing in CS5.
Another point that you didn't bing up about the summary description:
This is actually removing whitespace BEFORE the paragraphs or forced linebreaks. Sapce after a paragraph break is actually leading space onthe first line of the following paragrapgh, and the first query would have caught that and removed it since \s recognizes the paragraph break, and the \p{space-separator} recognizes the other types of space except the tab, which we also included in the "or" statement so the only types of whitepsace left after the paragraph break would hav been another break.
I actually left out the the last of Theun's (jongware's) quries on purpose. It would not be unusual to have a parenthetical where it should be followed by some punctuation mark, nor a quote that ends a paragraph. Granted adding a space back before the return would be invisible in the output, but we just went to a lot of trouble to tremove them, and even more importantly we removed spaces preceding most punctuation and we defiinitely don't want to add them back.
Likewise, I can think of plenty of cases where you might be starting a paragraph with one of those punctuation marks (many of them restricted to technical sorts of work, of course), but I'm not sure it's a great idea to blindly add spaces as in his first query. I'd be more inclined to let Spell Check pick up that sort of odd situation and fix them on a case by case basis.
Cheers.
Hi Peter,
So are you saying that the query
Is redundant to the first query?
Is there a way to consolidate any of these steps?
I did some studying yesterday (Kahrel's stuff) but honestly it made my head hurt.
I don't mind doing each by each and I saved them all chronologically for easy access.
Like I said before, this would be a neat little 'broom and dustpan' for a lot of text.
I am still trying to figure out what in each of these queries makes them either ADD in a component or SUBTRACT a component.
Although I think I am understanding the syntax a little better (just a little)
Thanks again!
Hi Jongware
Thank you for taking the time to explain. Like I said to Peter, I am trying to wrap my head around these expressions, what they mean and how minor nuances can change them.
I do have a GREP cheat sheet that I reference and I DID obtain the query manager and GREP editor and played around with it trying to chain the 'greps' together to make one or two sweep throughs on my document.
It did some funny things and I was reluctant to use it. Thank god for Ctrl-Z
I do know that the following are the most important at this step for my book to look finished.
Would there be a way to write a query so that it only added a space at the correct location ONLY if it did NOT find one?
Curious
Eleivana07 wrote:
Hi Peter,
So are you saying that the query
- Find (\s)(\n|\r)
- Repace with $2
Is redundant to the first query?
No. It's necessary (or at least, in my opinion, desirable) in order to remove the extra space that you will occasionally see after the last real character in a paragraph, so it's supplemental, rather than redundant, to pick up the cases that didn't get fixed in order to preserve the paragraph breaks.
In a case where a paragraph ends period space space return the first query will find the first space after the period, and it will see the second space as extra, but it will ignore the return, so the result will be period space return (the $1 in the change filed is always the first space in a group and it is always preserved. In the case where the paragraph already ends period space return there will be no change because the query does not recognize a group of spaces.
In the query above we are looking specifically at the case of <last non-space character> space return (though we don't look for the <last non-space character>). Because the first query has already removed all but on space everyplace there are multiples, this query looks specifically for the space/return combination and discards the space ($2 is the return).
Would this be a fatal error if it didn't run? I would say no, and you didn't actually requet the removal of whitespace at the breaks, but you struck me as the sort of person who would want a clean file.
Was that any clearer?
Eleivana07 wrote:
I do know that the following are the most important at this step for my book to look finished.
- No space BEFORE-One Space after ---period,semicolon,colon, exclamation, question mark,CLOSING Parenth,Bracket,Brace, single & Dbl. quotation marks
- No space AFTER-One Space Before----OPENING bracket,brace,parenthesis,Dbl & single quotes
Would there be a way to write a query so that it only added a space at the correct location ONLY if it did NOT find one?
Curious
The query that jongware provide above does exactly that -- adds a space after those punctuation marks if it doesn't see one, but as I said I don't think this is a good thing to automate. Consider this text:
"(1) GREP is a very powerful tool for automating changes by pattern recognition (but dangerous if misused)."
Adding a space before the first open parnethesis or after the last close would be mistakes, as would be adding a space after the period.
Hi Peter
Ha Ha. You got me. Yes I want a clean file mostly because this is my first book and keeping each subject distinct on its own 2 page spread is so critical to the book's layout. Some of the info is so tight that a few extra spaces really makes a difference.
We won't discuss the mild OCD![]()
I see what you're saying above and actually I hadn't thought about that. So you're right. And yes, you made it very clear.
Thanks again
> Adding a space before the first open parenthesis or after the last close would be mistakes, as would be adding a space after the period.
True, but you could narrow it down, e.g.
Find: \)(?=[\u\l])
Replace with: )\s
which could be made more precise. And something before the opening parenthesis.
Another useful addition is to remove all white space at the end of a story, which I don't think is caught by any of the queries mentioned here:
Find: \s+\Z
Replace with nothing
Unwanted space at the beginning of a story is less likely, and maybe you do want a tab there, but if you need to remove story-initial space you can do it using these:
Find: \A\s+
Replace with nothing
Peter
[thanks for the kind words about the ShortCut!]
Peter Kahrel wrote:
Find: \)(?=[\u\l])
Replace with: )\s
Does \s work in the change field? I thought that would be literal there...
I think maybe I'm just not convinced that the probability of a missing space is anywhere near as great as the probability of finding excess multiple spaces, and to automate a 100% foolproof way to add them is worth the effort, or even possible. Much as I think it's a mistake to trust in spell checkers for doing your proofing, a missing space after a parenthesis is the sort of thing I think would get picked up, just the way missing space after a period is flagged. I'm a lousy typist, but even I don't tend to miss when I lose a space, so I guess I'd rather see them on a case-by-case basis. Certainly that can be done with Find/Change, but not if you are scripting the queries, right?
> Does \s work in the change field? I thought that would be literal there...
It does, in the same way that \t inserts a tab in your document. It's handy to use \s and \t in the change field in things like forum posts, where you can't see space and tab characters.
> I'd rather see them on a case-by-case basis
I agree. But the challange to find queries is sometimes irresistable!
> but not if you are scripting the queries, right?
Well, it could, but you'd just be repeating Indesign's Find/Change interface. The grep editor I scripted is useful for these things (I think in all immodesty). It highlights all matches in a document in the way that new versions of Word do. So rather than pressing Find all the time, you simple page through the document and you can clearly see all you matches.
Peter
North America
Europe, Middle East and Africa
Asia Pacific