In PS preference/performance check to see if the GPU has been disabled. Can happen if PS sees a problem with it.
As near as I can tell, it's enabled... In preferences I have:
Use Graphics Processor is checked. It appears to know I'm using a Radeon 5750 card.
In Advanced Settings Drawing mode is "Advanced. "Use graphics Processor to Accelerate Computation" is checked.
Use Open CL is NOT checked.
Anti-alias Guides and Paths is checked
30 Bit display is NOT checked.
Is there some setting I need to change?
Have read it is a slow filter, even more so with 16 bit images.
You might try the advanced settings of basic, normal, and advanced to see if anything changes.
Not sure the latest ATI driver is the best, which one are you using? Drivers are still sorting out CS6 features and how best to work with them.
Yeah, I've read that it's slow too, but c'mon THREE major releases and the thing is still unusable?
Anyhow, it's the current driver from ATI, so I sure hope it's the best one.
Anybody else got any ideas? Adobe? Like I said, it's gotta be SOME setting 'cause I can't believe they'd STILL have something that's this slow after this much time...
I find Surface Blur in 16-bit mode can take minutes with the CPU at full utilisation, whereas in 8-bit mode, it takes seconds when everything else is equal. That suggests some optimisation is missing when the filter runs in 16-bit mode, making it virtually unusable.
I just tested a bit...
I have a highly detailed landscape image of 9000 x 6000 pixels (not small). It's 16 bits/channel.
Surface Blur ran these times to completion on the entire image:
- 15 seconds at Radius: 12, Threshold 15
- 7.2 seconds at Radius: 8, Threshold 8
- 3.6 seconds at Radius: 5, Threshold 15
Clearly it's influenced by the Radius most of all. So the question is, what settings are you choosing?
I noticed by watching Task Manager that this filter multi-threads well, busying all 24 of my logical processors to near 100%. It's safe to say it will be faster on a system with more cores. Do you have Hyperthreading enabled on your system (some BIOSs let you turn it off)?
Given that Surface Blur is divvying up the image into tiles to multi-thread, it could also respond to changes to the Cache Tile Size setting in the Performance preferences.
Yes, radius has the most significant effect of the filter's two parameters on filter time when all else is equal, but my point was that, keeping all else equal, 16-bit v 8-bit shows a massive difference in time. An 8-bit image may take 10 seconds, but with the only difference being 16-bit, the filter may take about 10 minutes.
Examples with 12 megapixel image.
radius 10, threshold 15
8-bit: 2 secs
16-bit: 35 secs
radius 20, threshold 15
8-bit: 6 secs
16-bit: 140 secs
radius 40, threshold 15
8-bit: 9 secs
16-bit: 530 secs
That looks like a missing optimisation in the filter when processing 16-bit images.
I'm using an i7-920, quad core, and yes, hyperthreading is enabled. And when this filter is running it pegs all EIGHT threads at 100% for the whole time.
I haven't timed things at 16-bit 'cause it runs into minutes, but it's SLOW... I suspect my numbers would be about like conroy's. Noel, could you try a test on a box a little less top-end-fringe (maybe 4 processors and 8-16GB and so on) and see what kind of numbers you get?
Noel, since the image is getting divided into tiles, do I want LARGE tiles or SMALL tiles?
If I recall correctly, I'm using threshold of 7, radius 15-20.
Something is clearly wrong:
100M file, radius 7 Threshold 20, time <13 sec.
I'm running an AMD Athlon II 4 core. All cores running also.
Tile size max.
Noel, could you try a test on a box a little less top-end-fringe (maybe 4 processors and 8-16GB and so on) and see what kind of numbers you get?
Unfortunately I don't have an actual 4 or 8 core workstation handy running Photoshop CS6.
However, I have virtual machines on this big host system on which I can test things, and I can vary the virtual hardware. I configured a Windows 7 x64 VM with 4 virtual cores and 8 GB RAM and opened a 12 megapixel (4000 x 3000) 16 bits/channel image into Photoshop CS6.
radius 15, threshold 7
- 15.6 seconds (16 bits/channel)
- 1.2 seconds (8 bits/channel)
radius 20, threshold 7
- 26.8 seconds (16 bits/channel)
- 1.6 seconds (8 bits/channel)
Here's where things get a bit weird, though...
My host system (12 core w/hyperthreading) finished the radius 20 threshold 7 Surface Blur on the 16 bits/channel file in just 1.4 seconds! I checked it several times just to be sure I didn't screw something up in the test, and this is repeatable.
Even if the scaling was perfect at a 24 / 4 (logical processor ratio) == 6, that should have only brought down the time to 26.8 / 6 == a bit more than 4 seconds.
I don't honestly know whether Surface Blur makes use of the Mercury Graphics Engine and has GPU acceleration capabilities, but my VM does not offer Photoshop nearly as much GPU acceleration capability, where the host system has full capability and a good video card (Radeon HD 7850).
These numbers certainly don't seem to be scaling as you'd expect, do they?
I can only advise experimenting with the various tile sizes (remembering to close and reopen Photoshop after any change before testing).
Thanks for experimenting Noel. Sounds like I should just try fiddling with a few settings and see if it makes any difference. It appears that for some folks the surface blur works fine, for others like me, it's extremely slow in 16-bit... I'll mess around a bit.
You know that two of us other than Noel with machines much closer to yours, actually posted data as well. Noel's results still reflect that his machine has capabilities that places it beyond those of ordinary users.
I did run additional tests that are more in line with conroy, but, apparently, so what, so I'll leave it between you and Noel to figure it out. He's one of the best, so good luck.
Thing is, my virtual machines are running on my workstation - so the core speed and instruction set is the same, but there just aren't as many logical processors simultaneously working on the task. And the GPU acceleration facilities inside a VM are greatly reduced. That my VM runs the function MUCH more slowly than can be explained by the reduction of logical processors says there's something far from obvious going on here.
Here's a 12 MP 16 bits/channel image to test with. Let's use this so we're all on the same page from here forward:
Lawrence, perhaps your result didn't get much acknowledgement because you mentioned the file size you tested with was "100M". Can you re-run your test using the above 4000 x 3000 pixel x 16 bits/channel file?
I just re-verified these measurements with this image:
4 core (4 logical processor) 8GB RAM:
- 17 seconds - Radius 15, Threshold 7
- 27 seconds - Radius 20, Threshold 7
12 core (24 logical processor) 48GB RAM:
- 1.2 seconds - Radius 15, Threshold 7
- 1.4 seconds - Radius 20, Threshold 7
I tried disabling the use of the Graphics Processor and it made no difference to the times - they were exactly the same, so that pretty much proves the Mercury Graphics Engine isn't involved and that this particular filter is relying solely on the CPU..
Clearly something is being done FAR more efficiently by the workstation vs. the VM running on the workstation. It's 15x faster with only 6x the number of logical processors working on the task. I can't say why, but I wonder maybe it's because the data is being divided up into smaller chunks... I'm imagining it might cross a threshold to where the data each of the processes needs to work on can be held entirely in the cache.
Dave, did you experiment with making the Cache Tile Size smaller? Did that help?
The only other thing I've run across where my workstation seems to perform radically better from others was when some folks reported that having the Layers Panel thumbnails on was causing them to experience a significant slowdown with moving layers or groups around in large documents. I didn't see the radical slowdowns - the difference in speed (thumbnails on vs. off) was only very minor.
Oh geez, just shoot me now, and save me from the misery of old age!
I had Threshold and Radius reversed for the tests on the workstation.
I've grayed out the results above. When I actually set the parameters properly this is what I really see:
12 core (24 logical processor) 48GB RAM:
- 5.4 seconds - Radius 15, Threshold 7
- 9.0 seconds - Radius 20, Threshold 7
THAT fits better with the theory that there are just more cores working simultaneously on the task.
Many apologies! It took reviewing the parameters to the Surface Blur filter yet another time to realize I was swapping them.
I'll just go crawl off and sulk now.
The 100m file is close enough for government work...6921 x2522
75 Sec-Radius 20, Threshold 7
Flip the numbers and it's just under 13 sec. So, your machine is faster in any case. Remember, this machine is not hyper threaded nor running SSD's or even raid.
I interpret the OP's ignoring two other contributors to being basically rudeness. After all, my file is also closer to yours then the OP's.
I also did the first one reversed, but I am older than you!
I have been messing with Audition for the last two days, trying to get a 60Hz buzz (not hum) from a file. You think you have problems. Hell, I couldn't easily find the forum!
I made it work.
Edit: Add keyboard dyslexia to the mix! And spelling!
Message was edited by: Hudechrome
Sorry guys, not being rude, just fragmented... My wife has me working like a dog, remodeling yet ANOTHER room, so I'm not getting a lot of time to fiddle asround in here.
I DID do a little testing late last night. Rebooted and didn't start anything else up - my normal system has Outlook running all the time, and Lightroom running with Photoshop 90+ % of the time.
SO, with JUST Photoshop running I tried a test on a regular 4288x2848 .dng file.
With threshold 7 and radius 15 my time was 70 seconds +/- a couple
With threshold 7 and radius 20 my time was 89 seconds +/- a couple
I tried changing tile sizes from my 1024 default. At 1028 the times didn't change. I changed it to 128 K and the times still didn't change. So, it doesn't appear tile size has much impact. Changing threshold and/or radius alter the times a lot... At threshold 3 and radius 30 times jump to around 180 seconds, but at least 20 of those are the initial blur before it even accepts the "OK" to start processing...
Anyhow, on my system, it appears the surface blur, much more than other operations, are dependent on what else is going on with the system or how much memory is locked up or what other applications are open.
For what it's worth, converting to 8-bit makes the surface blur with threshold 3 and radius 30 run in just over a second...
First off, I know my system isn't 10x more powerful than yours (benchmarks would imply more like 4x), yet your times are 10x as long for the same operation on 16 bit data. That's a clue that something specific is a bit wrong right there.
Secondly, you're seeing a 100 to 1 difference in speed between 16 bit and 8 bit. That doesn't fit well with others' observations that are more like 20 to 1.
So the question is: What could be wrong that would make your system particularly worse at Surface Blurring 16 bit data?
Your processor has a good amount of cache, though I suppose it's possible that the 16 bit document is just so much larger that it tends to flush through the cache, while the 8 bit document fits better.
Do you see a similar 100 to 1 speed difference in Surface Blurs of, say, a 6 megapixel or 3 megapixel image, 16 vs. 8 bit?
Have you benchmarked your system? I wonder if there could be a misconfiguration or a fault that's holding it back from delivering as much performance as it can... Consider trying the Passmark Benchmark application (which offers a free trial and allows you to compare with similar systems via an online database).
I'm surprised the performance didn't change much with the change of tile size. Maybe they don't use that setting in dividing up the work in this particular filter.
I haven't benchmarked the system in a while - probably a year. What I find MORE bizarre than the ratios I'm seeing at the moment is the difference between the other night and yesterday after a reboot and having nothing else running. I timed one operation for as long as I could stand the other night when I entered the initial note, and it ran over 10 MINUTES, with all 8 threads pegged. In fact, much of the time I couldn't tell if ANYTHING was happening.
After the reboot, and retesting, it's slow but NOTHING like it was. If I get a chance this week, just for giggles I'll try the Passmark Benchmark and see if my system comes up way short somewhere.
I also ran chkdsk on all the spindles in the box last week and had it repair any bad sectors. It didn't find anything. I also ran a defragmentation on all the disks involved in Photoshop or Lightroom, but again, they get defragged automatically weekly, so that didn't find anything either.
Worst case, if I DO want to use the surface blur, I may have to do everything else, save the .psd, THEN convert to 8 bit, blur, output, and not save the changes to the psd so it's still a normal 16-bit file. Kludgy, but it's that or use low radius values and just figure it'll take between 60 and 90 seconds, which is still a whole lot better than what I was originally seeing.
Ok, things are getting a bit confusing. I don't count the time it takes to generate the preview. Are you adding them together or assuming the total time as a summation of both activities? Actually, just opening sets a clock running, and possibly that is added as well. However, not likely as I'll see the clock sitting at several seconds, do a step and now the clock shows less time, so I presume the clock resets itself for each different task. But, this is not true for repeated Blur invoke. Blur, Cancel, Blur, Cancel etc does sum up the steps.
It is a function of the bit depth. I ran the Blur at 1/2 size, retaining 16 bit and my processing time dropped by half. But 8 bit also cuts the size by 1/2 and is much quicker.
I also tried 32 bit but it did a weird blur;
So at Radius 20, Threshold 7:
8 bit: 4.9 sec
16 bit: 77 sec
16 bit, 1/2 size (50m): 35sec
So your system at this point is roughly the same speed in this test as mine, and tracks the conversion to 8 bits. I don't have hyperthreading so I suspect the core numbers don't count much, ie it's likely not to be heavily threaded, if at all. Task Manager does show all four running, at any rate.
I can set this system to a single core and I may try that.