0 Replies Latest reply on Jan 31, 2007 1:49 AM by tudsytoo

    Is Flex right for me and my project? Developers sought

    tudsytoo
      Sorry for the long post. I want to know if Flex/Apollo would be a good platform for the following project.
      There's also a question of how to store the data, but it will almost certainly be XML.

      Briefly, here are the questions, with more background to make sense of
      them below. I think it's an interesting project and we're looking for contributors!

      1. What's a good way to store about 20,000 texts that are highly
      hyperlinked and so that on would have a high degree of flexibility in
      how to display within both desktop and web-based applications?

      2. What's a good platform for developing an application to display
      the texts (usually several at once) in a flexible manner that would also
      a) be easy to finddevelopers who can work in the platform
      b) be as OS independent as possible
      c)enable one to utilize as many already existing modules as
      possible, to speed development
      d) be easily able to construct a web interface to interface with the same text database.
      e) have the ability to release code with some kind of open source license.

      The application I want to build is as follows:

      So about two weeks ago I got very annoyed that there is no freely
      available Judaic text database (e.g. Bar Ilan/DBS Master
      Library/Judaic Classics Library) type program available (the
      commercial packages range from $80-$600).
      There are of course some very good websites that
      have a good amount of material easily and freely accessible, but it's
      largely spread out and the interfaces are pretty weak. The same
      applies to the commercial packages, the interface is quite
      utilitarian, and doesn't seem to have evolved since the mid-90s.

      I stumbled across the website www.hebrewbooks.org which has scanned
      11,000 books, and OCRd about 10,000 to varying degrees of accuracy.
      They are also more than happy to share their information freely. I
      spoke with the head of the site, and he was into the idea.

      Two things need to happen simultaneously. One being that the texts
      need to be prepared, which is a huge task in and of itself. The most
      important thing is to create the appropriate hyperlinks. In most
      cases, there is a central text (CT), say the Torah, with commentaries
      moving linearly along the CT and commenting on phrases as they appear
      in the CT. Each comment is called a Dibur HaMatchil (DH). Of course,
      within each DH, there might be references to other parts of the CT, or
      to virtually any other text. This really is like a normal hyperlink.

      This structure is pretty constant throughout the Jewish library, its
      found in the Torah, Mishnah, Talmud, Midrash, Shulchan Aruch, etc.
      Each of these CTs however, have different organizational systems. The
      Torah has Books, Parshas, as well as the Chapter/Verse system. The
      mishna has Tractates divided into Chapters, divided into Mishnayot.
      The Talmud has Tractates which are usually referenced according to
      Page and Side of the page (printing is standard enough to allow for
      this). Shulchan Aruch has 4 sections, which contain Simanim, and
      S'eefim. Sometimes, there are multiple CTs following the same pattern
      and discussing the same topic, so in essence, one CT becomes a
      commentary on another CT.

      There are of course many books which are not necessarily commentaries
      on any specific text (i.e. not structured according to DHs), but say
      philosophical works which will quote from all over the place.

      Even though we won't be trying to organize it all at once, we need to
      create a good data structure from the get go that allows for all of
      these connections to be made, that is easily searchable (shouldn't be
      a problem), and easily browsable (also not a big deal).

      As for what needs to be done with this data...

      I really see the killer app, what nobody else has done yet to my
      knowledge, as being able to select a CT, and choose which commentaries
      you would like to see on it, and then having a 'book' autogenerated on
      the fly, layed out according to pre-defined templates or user-defined
      rules (to a certain degree). With books now, any given edition you
      choose always has either too few, too many, or not quite the right
      combination of commentaries. I see this really becoming awesome in
      the next few years as e-paper/e-book readers become (hopefully) cheap
      and ubiquitous.

      Ideally, there should be a tagging engine, so that each book can come
      with some pre-defined tags, allow the user to add their own tags (and
      share them back with the community), auto-generate keywords (maybe
      distinct from tags?) for a given text and then have search and browse
      features that leverage those tags. So if you want to see all 13th
      century books from France and Germany dealing with personal injury
      law, it'll be a snap to find them. Or to conduct a free text search
      through any subset of the library you want.

      Also, I'd love to see tag clouds and information mapping a la
      www.quintura.com
      www.kartoo.com
      www.kartoo.net (no search here, just nice graphics)
      www.ujiko.com
      This would make it much easier to explore new sources of information.
      All current programs essentially require you to know exactly what
      you're looking for before you start. If you don't have the right
      phrase, you'll never find what you need, and there's no easy way to
      discover it.

      And of course, there have to be tabs which can display either an
      autoformatted page, or a pdf of the original book.

      So this is a big project, as you can see, but a good portion of the
      work seems to be done to me. On the data side, 10,000 books are
      already scanned and OCRd, remaining is the laborious task of marking
      them all up. Here, it would hopefully be possible to create small
      programs that would enable the masses of people who are very familiar
      with the texts, but know no computer-ese to aid in the task. I think
      there would be good response for at least the most important texts.

      On the UI side, all of the search/database features I mentioned above
      I think could be found in various open source repositories, with only
      the need to customize the code. The auto-layout stuff also seems
      pretty simple from an algorithmic point of view, assuming you have
      data that's appropriately marked up. Particularly if we stick to
      pre-defined templates. Adding user-defined templates later as
      appropriate.

      I'm probably infinitely naive, as I haven't coded anything since
      Pascal. I see a timeline of 1 year to a proof-of-concept type
      application with a limited dataset, and 5 years to realize what I've
      described above.

      FYI, in addition to finding good hearted volunteers, there are some
      good fundraising leads at the moment that could help things
      considerably.

      Any advice you have is more than welcome. If it's easier over the
      phone, then let's talk.
      And if you're interested, we're still looking for just about
      everything including:

      1. Chief Software Architect
      2. Coders
      3. Referring me to people who might be interested
      4. Anything else you want!

      Hope all is well,
      Andy