Announcement

Collapse
No announcement yet.

Soundex, RegExp, Neural Nets and other algorithms ...

Collapse
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Soundex, RegExp, Neural Nets and other algorithms ...

    On many occasions while writing software one is faced with the problem of trying to interpret what the user has entered.

    In other cases, programmers attempt to condense text to save space and then expand it back out again when it needs to be read.

    The amazing thing is that despite the power of modern computers and software, NOTHING approaches the power of the human mind to translate garbage into sense, and at speeds that boggle the mind. It's is if the brain is a 1,000 core super computer.

    See how easily you can read the following messages:


    7H15 M3554G3
    53RV35 7O PR0V3
    H0W 0UR M1ND5 C4N
    D0 4M4Z1NG 7H1NG5!
    1MPR3551V3 7H1NG5!
    1N 7H3 B3G1NN1NG
    17 WA5 H4RD BU7
    N0W, 0N 7H15 LIN3
    Y0UR M1ND 1S
    R34D1NG 17
    4U70M471C4LLY
    W17H 0U7 3V3N
    7H1NK1NG 4B0U7 17,
    B3 PROUD! 0NLY
    C3R741N P30PL3 C4N
    R3AD 7H15.



    If you can raed this, you have a sgtrane mnid, too.


    Can you raed this? Olny 55 people out of 100 can.

    I cdnuolt blveiee that I cluod aulaclty uesdnatnrd what I was rdanieg. The phaonmneal pweor of the hmuan mnid, aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it dseno't mtaetr in what oerdr the ltteres in a word are, the olny iproamtnt tihng is that the frsit and last ltteer be in the rghit pclae. The rset can be a taotl mses and you can still raed it whotuit a pboerlm. This is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the word as a wlohe. Azanmig. huh? Yaeh, and I awlyas tghuhot slpeling was ipmorantt!
    "A nation that is afraid to let its people judge the truth and falsehood in an open market is a nation that is afraid of its people.”
    – John F. Kennedy, February 26, 1962.

    #2
    Count me in the 55 group!

    Please Read Me

    Comment


      #3
      I bet Vinny Wright is in there. I wonder if he'd notice anything unusual. A vindication of sorts to those who can't spell the same word twice the same way in the same sentence, I've worked with a couple.
      Regards, John Little

      Comment


        #4
        Yea this is cool. It's really interesting how easy it is to read, after first looked upon as jibberish. With all respect to computers, but the human brain is so far the best data miner on the planet.

        Makes me think of a book by Tor Nørretranders called The user Illusion I read long time ago. He coined the term "exformation" examining information that is left out, or information that is cultural knowledge, or the knowledge that 'goes without saying'.

        In 1862 the author Victor Hugo wrote to his publisher asking how his most recent book, Les Misérables, was getting on. Hugo just wrote “?” in his message, to which his publisher replied “!”, to indicate it was selling well. This exchange of messages would have no meaning to a third party because the shared context is unique to those taking part in it. The amount of information (a single character) was extremely small, and yet because of exformation a meaning is clearly conveyed.
        He also argues about how the mind not only need to take in information - but that the hardest thing for the brain with interpreting information is to disregard or remove a substantial amount of information. Something that is hard for some people - like savants or persons with autistic syndrome, but also for everyone else.

        It's a very interesting book for anyone dealing with human handling of information, computing and 'free will'.

        thanks for sharing Grey Geek

        b.r

        Jonas
        Last edited by Jonas; Feb 08, 2013, 02:44 AM.
        ASUS M4A87TD | AMD Ph II x6 | 12 GB ram | MSI GeForce GTX 560 Ti (448 Cuda cores)
        Kubuntu 12.04 KDE 4.9.x (x86_64) - Debian "Squeeze" KDE 4.(5x) (x86_64)
        Acer TimelineX 4820 TG | intel i3 | 4 GB ram| ATI Radeon HD 5600
        Kubuntu 12.10 KDE 4.10 (x86_64) - OpenSUSE 12.3 KDE 4.10 (x86_64)
        - Officially free from windoze since 11 dec 2009
        >>>>>>>>>>>> Support KFN <<<<<<<<<<<<<

        Comment


          #5
          I can also read upside down and mirrored almost as fast as I can read it normally... Although I don't read it normally that fast as I am dyslexic.

          It is mildly worrying that I got to "it dseno't mtaetr in what oerdr" before I even noticed and that was probally due to the content more then anything else. Also, have you tried running that through a text to speech engine

          Comment


            #6
            Well, I got the jist of what it was saying but not enough to put me in the smart class.

            Based on some of the stuff google figured out when I type it in rong I have to put google in the top 55%.

            Ken.
            Opinions are like rear-ends, everybody has one. Here's mine. (|)

            Comment


              #7
              Originally posted by GreyGeek View Post
              [/FONT][/COLOR]Can you raed this? Olny 55 people out of 100 can.

              I cdnuolt blveiee that I cluod aulaclty uesdnatnrd what I was rdanieg. The phaonmneal pweor of the hmuan mnid, aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it dseno't mtaetr in what oerdr the ltteres in a word are, the olny iproamtnt tihng is that the frsit and last ltteer be in the rghit pclae. The rset can be a taotl mses and you can still raed it whotuit a pboerlm. This is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the word as a wlohe. Azanmig. huh? Yaeh, and I awlyas tghuhot slpeling was ipmorantt!
              You know I love being a party pooper but:

              99 Words. 52 are spelled correct. So we have only 47 scrambled words. 9 are 4 letter words and usually as 2 letters are correct the brain is hardly challenged at all. So let’s say that only 38 are scrambled. In many of the other words the word isn’t genuinely very well scrambled, just mixing pairs of adjacent letters or moving a sound around in the word. The majority of the words tend to be few letters. The longer words are usually very easily interpolated thanks to context and thus we are left with what is genuinely not that impressive of a challenge to the brain. Furthemore, if only the first and last letter were important, then how do we distinguish between say salt and slat. I’m pretty sure I could write a program that parsed and converted that paragraph far faster than any human could with sufficient accuracy for it to be usable.

              Still a neat trick but well more than the quoted 55 of 100 people are capable of reading it. If the scrambling was more pervasive and efficient then I'm sure that it would have been a lot harder. If we added even some higher level language in there then I'm sure understandability would be almost impossible. Imagine trying to read an article on molecular biology or mathematics with scrambling.

              Comment


                #8
                Originally posted by jlittle View Post
                I bet Vinny Wright is in there. I wonder if he'd notice anything unusual. A vindication of sorts to those who can't spell the same word twice the same way in the same sentence, I've worked with a couple.
                LOL yes yes I can read it , if read is the right word ,,,,understand , decipher ,interpret whatever . :-P

                VINNY
                i7 4core HT 8MB L3 2.9GHz
                16GB RAM
                Nvidia GTX 860M 4GB RAM 1152 cuda cores

                Comment


                  #9
                  Originally posted by dmeyer View Post
                  Imagine trying to read an article on molecular biology or mathematics with scrambling.
                  The introduction from "MtrR Control of a Transcriptional Regulatory Pathway in Neisseria meningitidis That Influences Expression of a Gene (nadA) Encoding a Vaccine Candidate". Should present no challenges to those afflicted with typoglycemia.

                  ------

                  Nisrseeia mgidneiniits is a Garm-naeigtve oiatlbge hmuan pgahtoen taht cneziloos the nhsporyaanx in 10–35% of atldus [1]. For rneasos not cenutrrly uoerntdosd, cnsaommel mnonoaciecgcl (MC) ctnlaizooion deoevlps itno an iisvanve desasie cuaisng smipceetia and migitnines in 0.5 per 100,000 posenrs in the Uentid Sattes and up to 1,000 per 100,000 psernos in sub-Srhaaan Acfrian eieimcpds [2]. The seepd of dasseie pooesrgrisn rsultes in up to 10–15% mlotriaty eevn wtih abnoiiittc tpahery [3], wlhie oetfn liaevng sruvrvois wtih pmenenrat ngreluooiacl copaotilcmnis [4]. Vicncaes asagnit the clapsuar prysodhlacciae of the msot cmomon daessie-asostaiecd sypreeots (A, C, W135, and Y) are aibllaave, lienavg the hriyurpevelnt and inumme-evsivae spteyore B as a creunrt fcuos for vciacne rsaecreh [5].

                  Ahodisen to the mouscal srcaufe of the nsopayahrnx is the fsirt setp in sfsesuccul ctoizoinoaln, mdeitead by a veairty of focrats, wtih tpye IV plii [6], [7], [8] and Opa and Opc piertons [9], [10] pdeocrud in the getreast audncbnae. Rcetnely, a non-frimbial “Oca” flamiy (Oomilregic cleoid-ciol aehsdin) niersaiesl ahsdein termed NdaA was iefdinietd in 50% of hprirenvelyut MC caulpsar srrooegup B leagenis [11], but not in oehtr culpasar sgeroourp snairts. Cmiorpsed of a ladeer pdeipte, guloblar “haed” dimoan, α-hliex ieamtdnietre rieogn, and a C-tiearnml mmeabrne ahocnr, NdaA frmos hgilhy sbltae mlrutieimc cloied-ciol stuctrrues aolng the hcelial saltk, pnosoniiitg the gballuor “haed” for hsot clel iaerctniton [12]. Iparnmtltoy for ctidriaoenson as a vinacce cnaiddate, rcenomnbait NdaA lckanig the C-taiernml ahocnr eticlis a bdrtieciacal atindoby rspsonee wtih eoipptes ascisbecle in ecnusaltaped MC. Athugolh ndaA alllee sceuenqes deffir bteween sianrts, vriead atingen eoipxsesrn, not detsviiry, iefcnunles iumnme srea tteir llvees and pcierttoon [11]. Aloncridgcy, the intctdiaeiifon of frtocas iinnecunflg NdaA lvlees at the gnee erspixsoen level is ccrtiial for omiitzpnig the eacfficy of a NdaA-tgeertad viccane. Frmrertuohe, utdnndneraisg ndaA eoxsesirpn may offer ceuls itno the snlagis ielvvond in cntiernovg a pvissae co-iianhtabnt of the hamun msauocl lnniig itno an ivsinvae and faatl sieptc iefiotncn.

                  MC uess a mtuli-teired arpoacph to cntorol ndaA esiopsrexn. Mimuxam lleves of the NdaA pitoern are oseverbd in sttoniaray-pashe in a gtworh-dpneednet mennar [11], wtih esopexsirn of ndaA vnirayg wlediy anomg MC snairts [13], [14]. Usraptem form the poeotmrr are mtiplule taireonlttcuede (TAAA) rpatees wshoe nmuebr corsenoprds wtih vieard ndaA erpsixeson [13], [15]. Tsehe rtepaes are psahe viarlbae, lkeily cesuad by spielpd-satrnd mnspiirigas dirnug rotialepicn [16]. Sveearl rrtoulagey ptoneirs bnid to the ndaA potmorer (Fgiure 1), iicunnldg iogierttnan hsot ftaocr (IHF) and firerc uapkte rraoguetly ptoiren (Fur), tguhoh ndaA eeoisrpxsn is ueacgnnhd in a Fur nlul mtuant [14]. Retnecly, a MraR-flaimy tsptrianraioncl rgluteaor, tmered FraR and NdaR in sparteae paiticblouns [14], [17], was iiiendfted as a rsseepror of ndaA, fhrtuer eipadnxng the lsit of ndaA rrlotugaey fcrtoas. Tihs DNA-biinndg peoritn was frist ifidniteed in the gncccooous (GC) and was swhon to rpesers esrixsoepn of the fraAB-eednocd eflfux pmup taht is rseospnible for hgih lveels of ftaty aicd rcsinatsee [18]. In crnotast, MC FraR deos not affect fttay aicd raticssene trughoh FrAaB, pheaprs due to ntlualray hgih fttay aicd rtsaeicsne eesspxred by tihs pogtahen [19]. Iersgtitennly, hevower, MC FraR deos bnid to its frAaB pmrtooer roiegn wtih rellivetay hgih aifnfity and rperesses fAraB eoesripsxn as sowhn by RT-PCR [20]. Bucesae FraR rgaltuees eisprosxen of fAraB in btoh MC and GC, wilhe ndaA is pnesert olny in a ssbuet of MC plooiupnats, we wlil cnoniute to use the ntneoaruclme of FraR for the roseserpr of ndaA besad on its mroe uvnearsil aiivctty on fAarB in btoh GC and MC.

                  The salml mlecuole 4-hrhxcepylydyaoetnic aicd (4HPA) was ietniidfed as an idecunr or de-rrpeosser of ndaA by rneivelig the DNA-bninidg ativcity of FraR [14]. Bneig a cilonezor of the oynrrhoapx, MC is whsaed in slavia, in wihch 4HPA is a cmmoon moietaltbe [21], plsisboy lnadieg to irasneced essrixoepn of ndaA and seunbsquet ivasivne dssaeie. Crolusiuy, FraR-celonrtlod tgrteas in GC are dlctriey and irnitcdely rluateged by the TteR fiamly routgaelr MrtR. Rrspseioen of fraR by MrtR ildicrenty up-rgulaeets fAraB [18], wlihe the gnee ecdniong guitamnle steantyshe (gnlA) is delitcry ruatgeled by btoh FraR and MrtR [22]. Trheofere, we qoeeiutnsd wehhter MrtR sliialmry atcfefs ndaA eiesrsxpon in MC, aniddg to the gowring lsit of ralguotrey focatrs tertingag ndaA. Hree we cfniorm taht FraR is the pramiry rsopseerr of ndaA, yet MrtR, wehn esresepxd at eetlvaed leevls, dlerticy reepserss ndaA as wlel. Frterrohume, DNA-bnniidg and DsNae I poctetiorn assays sseuggt taht MrtR ileufncnes FraR bniding at the ndaA ptomreor salmiir to the pneoomehnn seen in gnlA eiessxpron in GC [22], stgegusing a hgiher cplixemtoy to Nssraiieel rrtgeaoluy secemhs taht is cvernosed aroscs scpeeis.
                  Last edited by SteveRiley; Feb 08, 2013, 10:54 PM.

                  Comment


                    #10
                    Originally posted by SteveRiley View Post
                    The introduction from "MtrR Control of a Transcriptional Regulatory Pathway in Neisseria meningitidis That Influences Expression of a Gene (nadA) Encoding a Vaccine Candidate". Should present no challenges to those afflicted with typoglycemia.

                    ------

                    Nisrseeia mgidneiniits is a Garm-naeigtve oiatlbge hmuan pgahtoen taht cneziloos the nhsporyaanx in 10–35% of atldus [1]. For rneasos not cenutrrly uoerntdosd, cnsaommel mnonoaciecgcl (MC) ctnlaizooion deoevlps itno an iisvanve desasie cuaisng smipceetia and migitnines in 0.5 per 100,000 posenrs in the Uentid Sattes and up to 1,000 per 100,000 psernos in sub-Srhaaan Acfrian eieimcpds [2]. The seepd of dasseie pooesrgrisn rsultes in up to 10–15% mlotriaty eevn wtih abnoiiittc tpahery [3], wlhie oetfn liaevng sruvrvois wtih pmenenrat ngreluooiacl copaotilcmnis [4]. Vicncaes asagnit the clapsuar prysodhlacciae of the msot cmomon daessie-asostaiecd sypreeots (A, C, W135, and Y) are aibllaave, lienavg the hriyurpevelnt and inumme-evsivae spteyore B as a creunrt fcuos for vciacne rsaecreh [5].
                    The problem with technical terms like this is there is no training data for your brain to work off, so it cannot possibly guess at what was being said. Unlike commonly used words your brain can very quickly recognise them in context even if the context is gibberish.

                    To be honest, I only have a little more trouble then normal reading that passage, the words that I do not understand I probably wouldn't understand anyway so my brain just associates them as a recognised pattern and then skips over them.

                    I basically read it as:

                    <1> <2> is a Gram-negative <3> human pathogen that <4> the <5> in 10–35% of adults [1]. For reasons not currently understood, <6> <7> (MC) <8> develops into an invasive disease causing <9> and <10> in 0.5 per 100,000 persons in the United States and up to 1,000 per 100,000 persons in sub-Sahara African <11> [2]...

                    Which is how I tend to read normally I blame my dyslexia for this

                    Comment

                    Working...
                    X