Announcement

Collapse
No announcement yet.

This application exists, but I can't remember what it is!

Collapse
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    This application exists, but I can't remember what it is!

    Greetings, good people!

    This is driving me crazy. First, a little background: I'm putting together a book based on a few decades of publications. For the period before 1995, no electronic files exist, so I've scanned the publications, run OCR on them, and had then to format and so on. It is time consuming and awful.

    For the period 2000 and after, I have electronic files, which makes things much easier. Now I can cut and paste as I like without the hellish scan/OCR/cleanup.

    Ah, but for that in-between five years: I have electronic files. They are binary, generated by some application, but they bear no file extension. They do contain text, though it's little hunks of text with a lot of non-ascii gibberish attached -- about a paragraph per page. I thought they might have been QuarkXPress files, but a program that reads those files ain't having any. I'm trying to get in touch with those responsible for creating these things, to see if I can find out what program made them. So far no luck there, either.

    Thing is, I remember having a Linux word processor sometime in the foggy distant past -- I thought it was TextMaker, but if so the feature seems to have been abandoned -- that would open files by brute force and display the ascii text therein, thereby letting me easily (well, relatively easily) copy and paste to my heart's content. I'd need of course to format and all that.

    Anyway, I cannot for the life of me remember a way of doing this other than using strings from the command line, which is nice but I'd like to be able to have a little more control.

    Anybody know of an application that will nicely suck the text out of a binary file, where the binary stuff appears to be formatting?

    Thanks in advance.

    #2
    That sounds like strings.
    From the man:
    DESCRIPTION
    For each file given, GNU strings prints the printable character sequences that are at least 4 characters long (or the
    number given with the options below) and are followed by an unprintable character. By default, it only prints the
    strings from the initialized and loaded sections of object files; for other types of files, it prints the strings from
    the whole file.

    strings is mainly useful for determining the contents of non-text files.
    Or man od.

    Or, the Swiss File Knife:
    http://sourceforge.net/projects/swis...leknife/1.7.1/
    Last edited by GreyGeek; Mar 26, 2014, 11:44 AM.
    "A nation that is afraid to let its people judge the truth and falsehood in an open market is a nation that is afraid of its people.”
    – John F. Kennedy, February 26, 1962.

    Comment


      #3
      Originally posted by dep View Post
      ... that would open files by brute force and display the ascii text therein...
      Ordinary text editors like kate or gvim will do at least that. If the gibberish to text ratio is high you'd be better off running strings first, though.

      Just in case you're not familiar with the command line, you could open a terminal (konsole in the menu)
      Code:
      cd [I]whereever[/I]
      strings "[I]filename[/I]" > [I]"filename[/I]".txt
      given a file in whereever called filename. If you had a lot of them, say with extension .orig,
      Code:
      cd [I]whereever[/I]
      for file in *.[I]orig[/I]; do strings "$file" > "${file%.[I]orig[/I]}.txt"; done
      (The quotes are only needed if there's spaces or funny characters in the file names.)

      HTH, and regards, John Little
      Regards, John Little

      Comment


        #4
        ummmm yeah.............about ten years worth of stuff.........

        even a paid application in windblows doesn't get the job done.

        Invest in ten dollars worth of coffee beans....... a fifteen dollar coffee grinder........

        Change the font to..........don't know........ 36.............

        drink coffee .......take time off for snuggeling with the wife........and....

        copy and past into folders that are significantly named....no NOT a significant other....

        hard.........tedious......manual........time consuming.....work......but....worth it.

        woodsmoke

        Comment


          #5
          The "snuggle" part sounds good. The rest not so much!
          "A nation that is afraid to let its people judge the truth and falsehood in an open market is a nation that is afraid of its people.”
          – John F. Kennedy, February 26, 1962.

          Comment

          Working...
          X