Announcement

Collapse
No announcement yet.

How to: copy text from page that won't let you copy the text

Collapse
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    How to: copy text from page that won't let you copy the text

    A rather annoying situation has begun to occur, it seems to me, in that there are more and more web pages with information on them that:

    a) will not let one "copy and paste" the text into a word processor, what one gets in the paste is a link to the page instead of the text.
    b) does not have a print function
    c) when one tries to print from the browser one gets the whole page with many, many adverts and takes five pages to print one page of text
    d) However, one can twitter it, favourite it, send it by e-mail or any of a myriad of "social" things, but one cannot "just print it". They even have a "register with facebook", and you can be sure that they will figure out a way to send stuff sent to the facebook account. One can also just 'register" but one then gets the spam because of the confirmation e-mail.

    so.

    Here is a rather tedious but workable solution.

    1) When one drags to select do not "cross a frame/junction" when selecting to copy. That means, that since they have broken the text up into a bunch of boxes with adverts in between them, and to the sides, that you will have to carefully copy, from the top down, not the bottom up, the text in a "frame" and paste that block of text one block at a time into your word processor.
    2) Paste using the edit function not with a "right click", One has to paste using "paste special".
    3) A dialog box will open with the "paste special", choose "unformatted text".
    4) One then gets the unformatted text AND the link back to the website but one just deletes the link.

    Of course, the "justification" for this is, I am sure, that they are worried that you will copy and paste it into your website and use it to make gazillions of dollars while they do not. Of course I realize that you are not going to do that, but just to remind you.....it is perfectly legal, in the United States, to copy material such as a recipe, for one's own, private use.

    This whole "driving people to facebook" thing is getting really irritating.

    If anyone has questions please ask.

    woodsmoke

    #2
    Could you save the whole page, then open it in a HTML editor and copy the desired text from there?

    Comment


      #3
      I use Google Chrome browser and two extensions. One is webpage screenshot that allows you to capture the page as a png, crop it and annotate it. Of course the result is an image file. The other is Save the Trees that allows you to choose the section you want printed and send it to the printer. Thus you can avoid the adverts and graphics if you so choose
      Linux because it works. No social or political motives in my decision to use it.
      Always consider Occam's Razor
      Rich

      Comment


        #4
        Hi guys
        thanks for the replies.

        Detonate I tried the view source and couldn't get it to display properly.

        Richb, I don't use Chrome so that is a good suggestion, thanks.

        Any more suggestions anybody/

        woodsmoke

        Comment


          #5
          I use Firefox with the NoScript add-on - I allow every site by default, but if a site does something to block the context menu or prevent copying or printing, I block javascript for that site and it often sorts it out. It should address your points a b and d. When it's an ad- or frame-heavy layout that's the problem, it gets a bit harder.

          Pasting as unformatted text is also a good part of the solution.

          If all else fails, Ctrl-U to view source gets you access to the raw html. Don't think I've yet had to use this to copy text, though.
          I'd rather be locked out than locked in.

          Comment


            #6
            On tough nuts maybe saving page as an image somehow then ocr-it?
            Ok, got it: Ashes come from burning.

            Comment


              #7
              For really difficult pages I just get my minions to copy it out by hand.
              I'd rather be locked out than locked in.

              Comment


                #8
                lol
                minions!

                any other ideas out there?

                I, personally, like the no script addon and am going to test it.

                woodsmoke

                Comment


                  #9
                  I tried apt-get minions but to no avail. I really should upgrade. (sigh)

                  I have found that most things can be copied by saving the site (save as). Then in the site_files folder you can find all the pictures, and usually the text as well. Sometimes the text is generated in a different way and I haven't quite understood how that happens, but cut/paste the various sections into a text file is what I usually do. In fact I keep a lot of text files of stuff from the net - it's like my own Wikipedia.

                  Comment


                    #10
                    As far as not being able to print/save/seeing the code: that's (almost) always done with JavaScript. So if you disable JavaScript with NoScript, or in the browser itself, it's almost certain you can do anything you want with it.
                    If you want to print a page with the print function on the site, hmmm, that's very often not very well implemented. And that function works with JavaScript, so if you disable JavaScript, you can't print.
                    In Firefox there are two extensions (maybe more, but these two I know) so you can remove elements from the page before printing. The first one is PrintEdit. You open it from File -> PrintEdit, or simply by starting to Print. You now have the possibility to remove everything you want, before printing.
                    I switched to Remove Temporarily, because I think it's a bit easier. Rightclick on the page you want to print and choose Inspect Element. There's a new button on the lowerright: Remove Element. You can select elements and remove them before printing.
                    Both extensions take some time to get used to them. You don't really remove anything, if something goes wrong you can undo your last move or just start over again.
                    If nothing works, you can try the browsers' cache. There are extensions to look what's in the cache. If you work in offline mode, you must be able to do everything you want with the page.

                    Edit: changed left to right. I'll never learn that...
                    Last edited by Goeroeboeroe; Mar 12, 2012, 06:32 PM.

                    Comment


                      #11
                      Originally posted by Goeroeboeroe View Post
                      As far as not being able to print/save/seeing the code: that's (almost) always done with JavaScript. So if you disable JavaScript with NoScript, or in the browser itself, it's almost certain you can do anything you want with it.
                      Excellent advice, thanks.

                      I installed Ardvark some time ago but haven't used it. Also, I just went to check and there is another addon called HackTheWeb. There may be more.

                      Comment


                        #12
                        Since Kubuntu has the "print to file" option, can't you just print it straight to a pdf, then go back in and copy away?

                        If possible, PM me a couple of those sites, I personally haven't found a site that will not allow me to copy the text, and I'd like to try this out.

                        Comment


                          #13
                          Originally posted by ScottyK View Post
                          If possible, PM me a couple of those sites, I personally haven't found a site that will not allow me to copy the text, and I'd like to try this out.
                          +1

                          If you just wanted to extract blocks of text, wouldn't saving the page as text file be sufficient?
                          Kubuntu 12.04 - Acer Aspire 5750G

                          "I don't make a great deal of money, but I'm ok with that 'cause I don't hurt a lot of people in the process either"

                          Comment


                            #14
                            Originally posted by bra|10n View Post
                            +1

                            If you just wanted to extract blocks of text, wouldn't saving the page as text file be sufficient?
                            I think that would net you a lot of garbage and you would still need to cut and paste from there. It might be a good idea nevertheless.

                            I've come across these pages with right click disabled, but I can't find one now. Could we have an example posted so we can check ideas? IIRC, the last time it happened to me, I just blocked the text and used Ctrl-C Ctrl-V. Right-click is really just a convenience, isn't it?

                            Comment


                              #15
                              I just downloaded the whole page and opened it in word (which will download the page) Then just copied the text I needed

                              Comment

                              Working...
                              X