This content has been marked as final. Show 27 replies
It might not be the cause of the problem, but it's something you have to correct. The cfmail tag already includes output functionality. You should therefore do instead
<cfmail from="#CFEvent.data.from#" to="#trim(stremail)#" subject="#CFEvent.data.sujet#" type="html">
<!---- ... a bit of other stuff --->
Then modify "a bit of other stuff " where necessary. Do you create any objects in a bit of other stuff ? The java.lang.OutOfMemoryError often results when the JVM is handling too many live objects.
here is the complete loop :
But even without the "other stuff" in the loop, it still crashs
I try... but as I have the same problem with cffile (create and delete) instead of cfmail... I only have very little hope...
I tried... cfloop instead of cfoutput, and 2 cffile actions (write then delete) instead of cfmail... and it still crashs !
"Error","Thread-12","04/24/06","10:20:32",,"Error invoking CFC for gateway emailblaster: null"
with my "update every 100 loops" query, I can see that if crashed after the 417,200 th loop.
- clean the email addresses in the db before you send?
- batch the email via the asynch gateway into more reasonable chunks, since
you're using an asynch gateway i guess you don't need immediate feedback
- lose the query/loop & the use the query on the cfmail tag itself
- since you don't seem to need immediate feedback, parse the sent mail logs via
an asynch gateway to see which emails failed to send, etc.?
is mail spooling turned on? can your SMTP server keep up?
Cfloop is necessary, in any case. You should not use cfoutput around cfmail, because cfmail implicitly includes output functionality.
The rate at which the query object is re-created could be the problem. Although you've reduced the rate to every 100th, that could still be too fast for the Java Virtual Machine. Experiment with every 10000th and, if successful, every 1000th, and so on. Also try-catch the query code.
> Cfloop is necessary, in any case. You should not use cfoutput around cfmail,
uh, why is it necessary?
Stiil on the write/delete instead of cfmail (for testing purpose), I replaced the update query by a "every 1,000 loop do not delete the file". It crashed at 418,000 loops.
- Why not cleaning the email before it's insertion in the database ? I already do it, but I want to be sure to avoid crashing the sending process because someone manualy (bad) changed an adress in the database, or everything else that can lead to a bad formated adress...
- splitting into chunks... what a shame !!! the gateways are designed for such tasks, that is to say long running queries. The asynchronous process is made to avoid the "blank loading page" of these long tasks. Do you mean I need to make a gateway that handle the whole process and that calls for example every thousands email another gateway that send the emails ?!?
One thing is funny : I choosed to use gateways to test them. For other emailing processes I already made, I made a form that lead to the creation of a schedule task. Then the task is a simple cfm page that does (almost) exactly the same code : loop over query, one cfmail per loop, one cfquery update every 100 loops. And it works like a charm (since cf mx 6.1... before 6.1 cfmail was crashing under load, the solution I chose was to create a txt file in the "pickup" directory of the IIS SMTP).
- which emails crashed : not a specific adress. For example, if it crashs after the 101,000th email, If I recall the gateway but these time specify to start at the 100,00th email, then the next crash will occur around 200,000 (not 101,000)
- the SMTP server in cause ? No, it crashes with cffile instead of cfmail too. It also handle lots of emails with the other solution I explained (schedule task)
I'm a bit confused... I will probably re-switch to the schedule task solution...
More often than not, java.lang.OutOfMemoryError means too many objects at once. I first suspected the query, verif_envoi_nl. But since you don't store query-objects in an array or structure, the problem could not be coming from there.
However, the next suspect still has to do with the query. What happens when you replace THIS.datasource with just the name of the datasource? Elsewhere, is the CFC's timeout high enough?
> - Why not cleaning the email before it's insertion in the database ? I already
> do it, but I want to be sure to avoid crashing the sending process because
you can't crash the "sending process" w/bad email, it will only NOT send those
> someone manualy (bad) changed an adress in the database, or everything else
> that can lead to a bad formated adress...
you need procedures, etc. to manage that sort of thing. for instance, if your db
has triggers, you can have it verify the email when/if somebody messes w/it.
> - splitting into chunks... what a shame !!! the gateways are designed for such
> tasks, that is to say long running queries. The asynchronous process is made to
> avoid the "blank loading page" of these long tasks. Do you mean I need to make
> a gateway that handle the whole process and that calls for example every
> thousands email another gateway that send the emails ?!?
probably several way to handle this. here's one:
one asynch gateway, one CFC, pass in query conditions & let it work (do the
query, send the mail). instead of one monster loop of 500k, maybe 10 of 50k
would work better. it's all the same to the user/main app--it hands off the
processing to the gateway & goes on to do other things. you can easily call the
gateway 10x in the main app, since everything's happening in the gateway it's
very fast as far as the user/main app is concerned.
i'm not sure what your UPDATE query is actually used for but you could easily
hand-off parsing send logs, etc. to an asynch gateway (could even be the same
one that's running the email, you can specify a different CFC to use) to handle
that. or if you really wanted to get "fancy" have a look at sean's concurrency
i've combined gateways before. for example, SMS & asynch gateways work very well
together (cf can melt down SMS aggregators fairly easily). so having more than 1
gateway handle a job's not a bad idea.
I set the cfc AND gateway timeout to ... 24 hours (86400 seconds)
In the "gateway creation" cfm file, I put <cfset emaildata.timeout = 86400> then <cfset status = SendGatewayMessage("emailblaster", emaildata)>
And in the emailblaster CFC :
<cfsetting requestTimeOut = "86400" >
juste after the <cfcomponent ...> tag
The THIS.datasource is set just after the requestTimeOut at the beginning of the cfc.
<cfset THIS.datasource = Application.datasource>
in application.cfm, <cfset Application.datasource = "mydsn">
I changed and now use <cfset datatource = "mydsn"> at the beginning of the CFC, and "#datasouce#" in the queries.
No changes, crash exactly at the same "thousands" than before : 418,000
I attached the complete CFC (sorry, french comments, tab/space mess...)
The gateway + separate CFC for chunks, why not, but, maybe I will have to do something like that, but I'm not happy with this solution... as I want flexibility (the person could specify the origin of the email, for example the table name or a list of comma separated adresses, or ...) It leads to one more "gateway", I mean a cfc that pass datas to another cfc which do the job (sending the emails).
The UPDATE query is usefull to give real time feedback (another cfm page shows information of which table is used, which number is currently processed, and if the sending process is done or not)... and if it crashs, I'm sure to have information where the process stopped. For example, when the outofmemory exception occured, no cferror is thrown, so the code in the cfcatch tag is not executed... and I have no feedback where it crashed.
I use a table mainly because the server that sends the emails is not the main (web and back-office) server and I can't access to the disks of each other easily. Sure I could use log + a task that parse the log, but it's a bit heavy work... The fact is that it crash even without query and crash too with cffile.
I am studying the code. In the meantime, what value is set for the mail spool interval in the Coldfusion Administrator? I ask because the number of mail delivery threads is roughly proportional to the spool interval. (Incidentally, I don't see the need for the try-catch within a try-catch)
The cf admin settings :
- Maintain connection to mail server : checked
- Spool Interval (seconds) : 15
- Mail Delivery Threads : 25
- Spool to : disk
- Max number of messages spooled to memory : 50 000
... but as I said before, it also crash without the use of cfmail but just cffile action=write / delete
The try/catch inside another try/catch is very simple :
if there is a crash inside the query output loop because of a probem to generate 1 email, I don't want to stop the process. But if there is a problem outside the loop I want a "clean" crash and an email sent back to me with information about the situation when it crashed.
The question about try-catch was out of curiosity. Your explanation makes sense.
>... but as I said before, it also crash without the use of cfmail but
>just cffile action=write / delete
I remember that. I need the number of threads because I'm simulating your mailing procedure. While we're on the subject, how far will your loop go, with no other code than just
<!--- y'a rien ici --->
I want to be able to send 1,000,000 emails without error. (My database is 800,000 persons, so I want to be able to contact them if needed
What I mean is, how many loops do you get when there is no code in the loop?
Ok, sorry I didn't undertood.
I tried with nothing in the loop. It takes just a few seconds to complete the whole task, and I have no error. I looped over 802,052 adresses, the total in my database.
I can try with more (by creating a false table with more than one million of rows) but I think there will be no change.
We've won one battle.
a loop that does nothing isn't very useful ... even if it doesn't crash !-)
> huuu ???
> a loop that does nothing isn't very useful ... even if it doesn't crash !-)
I have, in the past, got Java out-of-memory errors with templates that have
non-ascii characters in them, although the last time I saw this would have
been CFMX6.0, I think (although that could well be because I make sure I
don't have any in there, any more!)
What happens if you run your CFFILE version of the loop, but first out any
characters from your comments that have graves, acutes or cedillas.
This seems like a dumb suggestion, but it's easy to test, and removes it as
a possibility. I - too - see nothing in your code that raises a red flag.
It's an achievement that the component code executes successfully, and in just a few seconds. In my opinion, it helps in the process of elimination.
For example, if the empty loop hung or also caused an outOfMemory error, then it would mean some external process associated with the gateway is manufacturing objects in the background. In fact, I'm currently looking for any sources of live objects and also for a way to gracefully free memory.
I would also the do following, even if only to eliminate them as possible suspects
- Verify that there is just one, and only one, clean call to SendGatewayMessage()
- Set initial heap size = max heap size = 1024 MB and maxPermSize = 256 MB
(reference: ColdFusion MX: Tips for performance and scalability )
- delete the tag, <cfsetting>, from the component code (It is related to the debugging process, a major memory consumer, and so is a suspect by association)
- experiment with a larger value of "Event Gateway Processing Threads" (in Coldfusion Administrator => Event Gateways => Settings)
- I removed the cfsettinf timeout : no changes.
- I removed non ascii chars... no changes (crast at 355,000).
- I changed the JVM settings... no changes.
- there is one clean call to the gateway : <cfset status = SendGatewayMessage("emailblaster", emaildata)>
- the settings in the cfadmin were : 10 threads / max 250,000 events in queue. I tried 20 / max 1,000,000.... no changes (crash at 355,000 loops)
The "funny" thing I notice, is that yesterday the crash was around 412,000, today it's 355,000 loops.
I restarted Coldfusion to have a fresh new jrun... and restarted the task : same crash at 355,000 loops.
What does it mean ? Maybe it's something related to the database content, an email adresss that can cause problems... As the query sorts the result by email alphabetically, It's possible to have between yesterday and today a news adress that causes problems.
So, I add a new thing : I don't delete the txt file after 355,000 loops. to see the last email processed and be sure it crashed every time at the same row. The first time I tried it crashed at 355,849
The last adress processed was "jkcay.ofutueq AT wanadoo.fr" (I change the order of some letters to avoid displaying the real adress) : no space, no accent, nothing special :-(
The next adress was "jkcay.cancoise-freromau AT wanadoo.fr" (same process with the letters). One more time, nothing noticeable !
The previous and following 10 adresses have nothing special either.
I retried the test a second time ... it crashed at 355,950. But this time (as I do my test on our live server, and there were new emails inserted between the two tests the last email processed was one before the previous test.
I mean that the last processed email the second time was eleven rows before the last one processed the first time.
So... I'm quite sure now it's not relative the the content of the database. It seems more something like a timeout or the number of objects in the memory, ...
The "funny" thing I notice, is that yesterday the crash was around 412,000, today it's 355,000 loops.
Coincides with one of my pet hypotheses. There could be a memory-intensive code-block or process, outside of the component, that the JVM runs while the component code is busy. Where the loop crashes would then depend, for example, on when the rogue-process started and when its memory usage peaked.
"One limitation that you need to be aware of is that you cannot use this utility from Terminal Services. "
I can only use the server using TS... the server is 50km far from me :-(
I will try on a local dev server... but the config is quite different (slower cpu, only 1GB of ram, database on the same server than CF...) but why not