My CQ5 application will be including a few knowledge bases which will not live in the JCR but need to be searchable by Lucene so the user can search the data within JCR and the external data together. How can these data sources be added to Lucene's Index and made searchable?
What do you mean with "will not be live in the JCR"? Are these knowledge bases stored in the CRX or are they external to CRX in some files/databases/...? If you store them in the CRX, but give no possibility to render them directly (via templats and components), they are indexed and acessible via JCR query.
But as the Lucene is an implementation detail of Jackrabbit/CRX and only used to back the JCR search, it is not designed to index data outside of the scope of the repository. So if you want to index external files or databases, you cannot rely on the JCR search, but I would recommend you to use an external (full fledged) search engine like Apache solr.