Announcement

Collapse
No announcement yet.

Interested in an open source alternative to Kofax?

Collapse
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Interested in an open source alternative to Kofax?

    First of all, I would like to apologize for my absence lately -- alongside the 16 college credit hours I've completed this semester (Intro to Software Engineering, Linear Algebra, Differential Equations, Macro Economics, and Intro to World Civilizations), I've also been putting in 30+ hours a week with my internship, so I haven't had much time for anything else.

    That brings me to the subject at hand: I've begun developing an application that will serve as an open source alternative to those provided by Kofax, Abbyy, and Iris. Once completed, it will support plugins, be able to generate, print, and read barcoded cover sheets, import documents, capture documents from a scanner, extract text from documents, and export them to an external file (or in my case, a hot folder mapped to an Alfresco repository).

    It was started out of necessity, as many of our clients are smaller businesses who cannot afford to drop $10 grand on a scanning station, and should they choose the more affordable desktop application, would prefer not to pay an additional $1.5+ grand for the privileges of generating and scanning barcoded cover sheets.

    I have no intention of developing my own barcode or scanning libraries, or engineering my own OCR engine, but instead have chosen to leverage existing open source solutions that accomplish those tasks, and have gotten the application off to a good start.

    Is there an interest in the community for such an application? To my knowledge, there is no [working] open source solution of this type in the wild, so it would be the first of its kind. If there is an interest, under what license should I place it? And are there any developers who would be interested in helping me get it off the ground?

    The core application is written in Java, as I am most familiar with it and Java is cross platform (I can compile a single binary on 64-bit Linux and deploy it on 32- and 64-bit Linux, Windows, [Open-]Solaris, BSD, Mac OS-X, etc.), but there is plenty of room for C++ developers as interacting with the OCR engine and the Sane and Twain drivers requires use of the JNI (Java Native Interface), which allows the interoperability of Java and native C/C++ code.

    Any help or suggestions I can get will be greatly appreciated. I am currently the only developer working on this, and could really use some assistance by somebody with a bit more experience than myself.
    Asus G1S-X3:
    Intel Core2 Duo T7500, Nvidia GeForce 8600M GT, 4Gb PC2-5300, 320Gb Hitachi 7k320, Linux ( )

    #2
    Re: Interested in an open source alternative to Kofax?

    Glad to see you back. The project you have there sounds really interesting. Unfortunately, my time now-a-days prohibit me from doing that kind of work any more. Good luck and keep us up to date!

    Comment


      #3
      Re: Interested in an open source alternative to Kofax?

      Originally posted by MoonRise
      Glad to see you back. The project you have there sounds really interesting. Unfortunately, my time now-a-days prohibit me from doing that kind of work any more. Good luck and keep us up to date!
      Will do -- thanks @MoonRise!
      Asus G1S-X3:
      Intel Core2 Duo T7500, Nvidia GeForce 8600M GT, 4Gb PC2-5300, 320Gb Hitachi 7k320, Linux ( )

      Comment


        #4
        Re: Interested in an open source alternative to Kofax?

        As I promised, an update!

        Until I decide on an official name, I've uploaded the source code to my MediaFire account -- here is a direct link: http://www.mediafire.com/?e40ttmzmngi

        That will download a file, Socr3-Installer.jar, that if executed, will create two directories: $HOME/.Socr3 and $HOME/Desktop/Socr3. The .hidden file contains the application's libraries, config files, logs, and two plug ins, and desktop folder contains the Socr3.jar application archive, source code, and javadoc documentation. To tweak the application, you may modify the $HOME/.Socr3/init.js file, and to format the output from Log4J, $HOME/.Socr3/config/log4j.properties. The application has two log files, one may be found in $HOME/.Socr3/log/socr3.log, and the other is only created if Log4J cannot be instantiated, which is $HOME/.Socr3/fatal.log.

        You don't have to execute the installer, though; since JAR files are just ZIP files with a different extension, you may simply unzip the file to get the source code.

        Currently, for importing documents, the application is slow and a resource hog, and I wouldn't suggest checking out that feature unless you have about a gig of memory to dedicate to the application (~700 MB should be fine, but if you want to open a large PDF, I strongly suggest a gig to keep the application [and your system] from becoming unresponsive). The init.js file allocates a maximum of 1024 MB by default, but the application should run fine on 100 MB or less if you don't want to import documents (the JVM will only use that much RAM if it is required). Don't try to export documents because that feature doesn't work correctly yet.

        Regarding barcoded coversheets, the application generates, prints, and reads them well.

        Also, remember, this is an early release and VERY beta, or late alpha-ish, and is lacking several features. Let me know if the installer doesn't work correctly, or if the application fails to initialize.

        EDIT: 12/23/2009
        I might should add that I am planning on porting the application to Qt and dropping Java as a core dependency.
        Asus G1S-X3:
        Intel Core2 Duo T7500, Nvidia GeForce 8600M GT, 4Gb PC2-5300, 320Gb Hitachi 7k320, Linux ( )

        Comment


          #5
          Re: Interested in an open source alternative to Kofax?

          Updated 01/01/2010

          I have just pushed another update. This one includes only 2 new features, the ability to export documents and control the quality of rendered images in the previewer. Most of my time has been spent refactoring the application to be much more efficient and less resource intensive, and I am now satisfied with its performance. The PDF-Renderer library has been replaced by jPod, and I plan on replacing iText with jPod soon, too.

          The developers of OpenCapture and I have begun discussing merging our applications, so I may very well have more developers working with me soon, which would be nice.
          Asus G1S-X3:
          Intel Core2 Duo T7500, Nvidia GeForce 8600M GT, 4Gb PC2-5300, 320Gb Hitachi 7k320, Linux ( )

          Comment


            #6
            Re: Interested in an open source alternative to Kofax?

            You said this is written in Java! So it should be cross platform? If I can, I'd like to see if I can "test" this out @ work, but, unfortunately, we are tied to M$. Glad to hear such good progress! I do so miss my programming days! Being an IT Manager type now a days just really seems to SUCK!

            Comment


              #7
              Re: Interested in an open source alternative to Kofax?

              Originally posted by MoonRise
              You said this is written in Java! So it should be cross platform?
              Yep -- in its current state, any platform that supports Java should be able to run the application, although I've only tested it on Linux and Windows. The only really useful things it does currently are generate barcoded coversheets, read them when you import a set of documents (after clicking "Get Barcodes"), and embed their values as headers in the document you export. The application can't read PDF documents nearly as quickly as Okular, though.

              The way I have it configured, the labels below each decoded barcode are used as the header key, and the decoded label is used as its value. I'm not sure if this is iText dependent, or a PDF spec, but iText associates each key => value pair as a hashtable does, where each each key must be unique or the previous value associated with the key will be overwritten. I will implement a check that makes sure each label is unique, and alerts the user if there are duplicates (the application will support templates soon, so this shouldn't be much of an issue).

              It's a work-in-progress.

              Originally posted by MoonRise
              I do so miss my programming days!
              Me too
              Asus G1S-X3:
              Intel Core2 Duo T7500, Nvidia GeForce 8600M GT, 4Gb PC2-5300, 320Gb Hitachi 7k320, Linux ( )

              Comment

              Working...
              X