I want to thank OMagain in advance for doing the heavy lifting required to make my little tool/game sharable. His efforts will not only speed the process up immeasurably they will lend some much needed bipartisanship to this endeavor as we move forward. When he is done I believe we can begin to attempt to use the game/tool to do some real testable science in the area of ID . I’m sure all will agree this will be quite an accomplishment.
Moving forward I would ask that in these discussions we take things slowly doing our best to leave out the usual culture warfare template and try to focus on what is actually being said rather than the motives and implications we think we see behind the words.
I believe now would be a good time for us to do some preliminary definitional housework. That way when OMagain finishes his work on the gizmo I can lay out some proposed Hypotheses and the real fun can hopefully start immediately.
It is always desirable to begin with good operational definitions that are agreeable to everyone and as precise as possible. With that in mind I would like to suggest the following short operational definitions for some terms that will invariably come up in the discussions that follow.
1. Random– exhibiting no discernible pattern , alternatively a numeric string corresponding to the decimal expansion of an irrational number that is unknown to the observer who is evaluating it
2. Computable function– a function with a finite procedure (an algorithm) telling how to compute the function.
3. Artifact– a nonrandom object that is described by a representative string that can’t be explained by a computable function that does not reference the representative string
4. Explanation –a model produced by a alternative method that an observer can’t distinguish from the string being evaluated
5. Designer– a being capable of producing artifacts
6. Observer– a being that with feedback can generally and reliably distinguish between artifacts and models that approximate them
Please take some time to review and let me know if these working definitions are acceptable and clear enough for you all. These are works in progress and I fully expect them to change as you give feedback.
Any suggestions for improvement will be welcomed and as always please forgive the spelling and grammar mistakes.
peace
newton,
But does it sound wrong?
I think it’s wrong. I think fearfulness about ideas is a character trait rather than a property of theologies.
petrushka,
Perhaps those with that trait are attracted to authoritarian religions that discourage questioning.
I would say that all you have to do is convert the letters making up the sonnet into ASCII characters and string those together, comma separated. Make sure to include spaces and line breaks as ASCII too.
The string will obey the rules of the English language and therefore not be random (in the sense that every character is equiprobable at a given position, because certain letters and letter combinations occur far more often than others). Up to a point there will be some predictability at local scale, again because of these rules.
Fmm will then add random noise as per his workflow, and smooth it all out a bit again using his EA. The output is likely to be a bit more spiky than the original. I don’t know if the model will be different enough to raise the ‘design’ flag, but don’t worry, even if it doesn’t fmm will still consider the original designed because he believes that everything is designed!
fG
Ascii is fine by me. I asked a number of times how he wants the data presented.
I guarantee that no matter how wonderful his analysis is, it will be possible to get false positives and negatives by generating data with an EA. He already knows this, because he linked to a paper showing that even sophisticated algorithms can be fooled.
I have popcorn ready and am willing to play, but it looks like TimeCube stuff from here.
I’m not caught up on the thread, but if you want you can “play” the original game now
Go here and download the client:
https://processing.org/
https://processing.org/download/?processing
Paste in the source from here
http://pastebin.com/ZqGRxcjt
You’ll need two files, fake.txt
This sort of format works for the input, seems to be numbers and spaces only is needed.
http://pastebin.com/raw/MjV8RmvW
And real.txt, similar format.
With fake and real being the same, it should look like this when running:
Any limits or reasonable expectations for the size of the items or the length of test strings?
keiths:
petrushka:
No, because it is the observer, not the program, who decides whether to infer design. The code just presents line graphs of the strings to the observer.
In fifth’s own words:
There seem to be 40 lines that can be drawn at a time, lines 98 to 178 in the code.
The issue is that fmm considers the output of EA’s designed. He has said so to me upthread. The way he frames the problem there will never be any false positives because according to him everything is designed.
fG
I’ve downloaded the processing app, but need a bit of help figuring out how to load and run FMM’s app.
Excellent, That is exactly the sort of honest effort that I’m beginning to appreciate with you. Thanks
Your appraisal is mostly spot on
We are not looking for smooth data sets. It just so happens that this how this particular data set presents itself.
Real strings will differ from models in different ways sometimes they will be smother sometimes more choppy sometimes they will contain no plateau and other times there will be more plateaus that you would expect given the data.
Every data set is different. That is the reason that you can’t design a general purpose algroythym to model these strings.
You can get close with an EA but close does not cut the mustard in this regard. With a little feedback you can always distinguish between the real and the model
recall Maguire’s definition of an integrating function
quote:
the knowledge of m(z)does not help to describe m(z′),when z and z′are close
End quote:
I would argue that is what we see with these strings
Early on I would try to tweak the EA to get a better match but I quickly found that the algroythym just got more and more clunky and differences remained regardless and often got more pronounced.
Thanks again for the interaction
peace
OMagain,
Do you know how I can output the required format from Excel? I have a file I want to use for testing, it is a series of numbers with each number in a separate cell (ordered as a row or as a column). Somewhere between 300 and 400 numbers.
I tried space delimited (*.prn) output and renamed that to real.txt and fake.txt ( the same file for now) but the program can’t read it.
Thanks,
fG
Again excellent observation,
I hesitate to confirm or deny because part of the fun in looking at these stings is thinking about how much you can learn from examining and comparing raw data in the complete absence of context.
It’s almost like you are learning the global pattern via lossless information integration. 😉
You could test your hypothesis with the game by choosing the string with lags at multiples of thirty and see if you are correct
If I spill the beans to quickly it means we will have to start over with a new string
peace
I’m not in the office now but I could send you the original Excel sheet that I cobbled together when I make it back.
IMO it is better than the program.
I’m confident that in the end OMagain will put something together that is elegant and practical but as it stands the program needs lots of work.
It is after all my first attempt at programming
peace
I’m trying to use the sample files and get:
processing.app.SketchException: unexpected char: ‘D’
at processing.mode.java.JavaBuild.preprocess(JavaBuild.java:399)
at processing.mode.java.JavaBuild.preprocess(JavaBuild.java:193)
at processing.mode.java.JavaBuild.build(JavaBuild.java:152)
at processing.mode.java.JavaBuild.build(JavaBuild.java:131)
at processing.mode.java.JavaMode.handleLaunch(JavaMode.java:153)
at processing.mode.java.JavaEditor$36.run(JavaEditor.java:1099)
at java.lang.Thread.run(Thread.java:745)
While opening fake.txt
Right there is nothing special about the program it just provides a platform to compare. I’ve even toyed with other ways to present the data that don’t involve line graphs.
We could for example try pulsating balls that vary in size according to the values of each number in the sequence.
All that is important is that we be able to compare the strings visually.
peace
What I think is going on is that your randomisation of the original string introduces a lot of noise that you can never fully remove anymore. I would have to dig deep to find the correct explanation, but there is an upper limit to S/N in digital signals (and that is what your strings are).
Therefore, your model will always be noisier than your original, and if you zoom in closely enough you will be able to see that. Depending on the smoothness of your original data this will be easier or harder to do.
It is a bit like digital audio – digital music will never be identical to the original analog sound. The reason it sounds good is that the sampling frequency has been set so high that the noise is pushed into that part of the spectrum where our ears can’t hear it anymore. At the same time other sources of signal degradation have been eliminated by the recording process so that the music sounds very ‘clean’ compared to analog recordings. But it will never be exactly the same, and if you had unlimited resolution powers in your hearing you would be able to tell the difference.
In your game you can set the viewing resolution so high that you can view every single sample, and therefore you will always be able to spot the difference between the original and the noisy copy.
I honestly think that this is all that is going on here, and all the rest about ‘integrated information’ is just a red herring.
fG
Actually I have several real strings that fluctuate widely from point to point more so than you would expect by chance
In those strings the model is smother than the original
The graphs go by rather quickly and you are severely limited in regards to time.
You have to choose quickly
This prevents you from viewing every single sample you have to go with the overall pattern, that is the theory anyway
peace
A shot in the dark, but it could be the BOM in UTF-8?
Anybody have it working?
try just a file with
1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
on a single line
All zeroes in fake.txt produces this
I tried that.
I also cannot figure out where it is looking for the txt file. I tried setting preferences, but it doesn’t find it unless I hard code the path in the program.
Or maybe the POS in the java.exe file.
Put it in the same place as the processing executable. It then just magically finds it.
LOL
Well, it’s working here, tho I have no clue what to do with it
I tried with a fake.txt encoded in UTF-8 with BOM and it did crap out here too, but I got a different exception (NumberFormat)
Okay, the unexpected character is in the filename. Where the fuck is it looking for the txt file? It isn’t looking in the folder specified in preferences.
The line moves across, you “guess” which one is the fake and the real string by clicking on the respective buttons.
I think 😛
Okay, the mechanics are done. Now it appears that the “real” string is always on top. Is this just me, or is this a beta version?
First of all, saying something is ‘fluctuating more wildly than you would expect by chance’ sounds very sloppy. How do you determine what the ‘expected fluctuation’ is of a ‘chance string’?
I don’t know how you randomise your original, but I would think that this step will in all cases add noise to the original unless that was already pretty much white noise to begin with. Likewise, I don’t know what your GA does exactly, but I find it hard to understand how it would result in a smoother fake if you start with a randomised copy of a spiky (i.e. noisy) string.
We need to see more examples, and more detailed explanation of the steps you take to produce your fakes. OMagain’s program doesn’t do any of that, right? It just does the display thing. Tell us exactly what you do to your data.
More observer-dependent influences on the result. You are still very far from science.
fG
What if I swap the files? this makes no sense to me
If “fake” is simply determined by what’s in the fake.txt, and one can swap the contents of fake & real.txt, it turns the game into a 50-50 thing, very much like “guess which hand”
I’ll start with a first effort. Here are two strings, one of which is an actual dataset from published science , and the second is a randomized dataset of the same size.
40 10 10 20 20 10 40 20 30 10 10 10 10 30 20 10 10 20 30 40 10 20 20 40 40 20 10 10 10 20 40 40 10 40 20 20 30 30 40 20 10 20 40 10 10 20 40 40 20 30 20 40 30 20 40 40 30 20 30 10
20 40 30 40 40 10 30 40 20 30 20 20 20 20 10 30 20 30 10 10 10 10 10 40 40 40 20 20 20 10 20 20 40 30 20 40 10 10 10 40 40 10 30 20 20 10 10 20 40 30 20 30 10 40 20 40 10 30 20 20
20 20 30 10 20 40 10 20 10 40 30 10 20 30 20 20 30 30 10 20 10 20 20 10 20 10 40 40 40 40 30 40 40 40 10 10 20 10 40 10 40 10 10 40 20 40 10 20 40 10 20 40 20 20 20 20 40 40 40 10
10 40 10 40 40 10 20 20 20 20 10 10 20 40 30 20 30 10 40 40 40 30 20 20 30 20 20 30 10 10 20 20 10 40 30 10 20 10 20 20 30 30 10 20 10 20 20 10 20 10 40 40 40 40 40 40 20 40 10 40
10 10 40 10 10 40 10 10 10 40 10 20 40 10 30 10 30 30 20 10 40 20 40 30 20 20 20 10 20 10 20 10 40 40 10 20 10 20 10 30 40 40 30 20 40 30 20 30 40 30 40 30 10 40 10 20 40 10 30 40
40 40 10 40 20 10 30 20 10 20 10 40 10 40 30 10 40 40 40 10 20 40 20 10 20 40 10 40 20 20 10 10 10 30 40 10 10 40 30 30 30 10 30 30 30 10 40 10 10 30 40 30 40 40 40 10 40 20 20 30
10 30 20 10 20 40 10 40 40 10 10 30 10 10 20 10 40 40 30 20 20 10 40 20 10 30 10 40 30 20 20 30 30 20 40 20 10 20 40 30 20 40 30 10 10 10 10 20 40 20 30 10 30 20 20 30 40 40 10 40
20 10 20 30 10 10 10 30 40 30 20 10 20 20 20 20 20 40 30 10 20 10 10 10 40 20 10 10 40 10 30 40 20 40 40 30 30 40 30 20 40 20 10 40 10 20 10 20 30 20 40 10 30 10 20 10 40 20 30 20
40 40 40 30 30 30 30 10 30 10 10 40 40 10 10 20 10 40 40 40 40 30 20 10 20 40 40 20 40 40 40 40 40 20 30 20 10 20 30 40 20 30 10 20 20 40 20 20 20 30 30 20 20 10 30 10 10 40 20 10
10 20 40 30 30 10 30 40 10 40 40 40 20 40 10 20 30 30 20 30 40 10 10 10 40 40 30 20 10 20 30 10 20 10 10 40 10 10 30 10 30 10 30 30 30 10 10 20 30 30 40 40 10 40 40 10 30 20 10 10
petrushka,
easy peasy lemon squeezy
The fake one is…. the game
This has not escaped our notice.
My turn!
One of these two is an amino acid sequence of a human protein found at uniprot.org, each amino acid translated to a number from 1 to 22
The other one is a randomly generated sequence with Python with numbers ranging 1 to 22:
12 7 5 3 18 2 7 11 11 3 21 7 19 12 16 3 22 11 11 17 2 20 17 16 6 16 16 4 8 12 8 8 17 19 20 6 20 22 15 22 3 9 20 9 16 10 16 4 14 6 3 10 1 3 20 1 15 6 21 15 6 10 16 1 3 22 20 3 18 21 1 18 8 11 13 3 11 21 6 1 9 22 9 12 16 13 1 1 5 14 21 5 8 9 13 6 3 4 20 11 19 22 10 7 8 15 15 19 11 15 2 20 14 3 7 22 15 5 19 6 10 12 12 16 4 13 12 1 2 22 12 17 7 12 21 12 16 10 14 16 2 16 6 18 15 5 4 2 20 5 10 21 6 22 20 18 1 4 12 12 9 3 14 19 15 19 16 1 10 9 15 17 2 17 9 1 21 3 17 14 4 1 19 13 19 22 14 4 2 5 22 14 22 3 8 17 11 20 21 1 18 13 4 19 21 4 3 1 9 12 4 2 10 10 22 7 17 5 4 5 7 10 1 5 18 6 1 5 16 22 19 16 13 12 19 20 19 15 22 6 19 2 11 18 15 12 3 15 9 22 18 15 18 12 16 7 5 18 8 21 17 19 9 21 9 15 9 4 4 15 13 9 9 11 4 16 7 20 6 10 3 20 11 12 20 16 12 8 11 13 20 13 3 10 12 22 15 2 11 15 22 22 2 18 9 1 11 7 7 2 4 18 21 15 15 9 17 15 9 18 8 8 9 5 9 14 17 18 9 16 12 20 4 10 19 13 14 12 2 13 21 9 12 1 17 17 21 11 5 6 4 10 16 22 7 11 11 18 4 10 18 8 4 13 11 15 18 16 1 7 20 18 8 2 8 22 7 8 10 7 16 19 4 2 22 6 3 16 17 2 20 8 16 22 5 13 13 21 15 14 19 14 22 11 3 9 22 7 3 4 12 1 3 17 6 21 8 11 10 16 7 1 11 7 2 17 19 1 14 2 15 10 2 4 5 15 3 10 16 20 5 6 9 3 3 19 8 4 21 12 20 2 17 9 2 15 7 22 19 18 4 11 17 21 7 13 21 9 11 19 11 1 4 2 18 20 16 8 2 22 9 3 18 11 7 14 4 10 5 18 6 14 10 14 22 9 18 19 1 1 3 5 13 12 17 2 17 22 19 9 8 1 18 8 2 6 8 8 14 12 15 22 7 8 8 16 9 2 2 15 13 17 5 8 19 16 12 7 7 14 6 18 3 12 16 16 4 9 1 2 5 15 9 13 3 11 10 21 6 7 18 4 12 7 2 11 13 11 20 11 19 21 18 1 7 11 13 7 2 17 4 11 20 8 1 21 6 1 19 1 18 22 11 14 20 16 8 2 13 9 10 1 6 14 14 4 6 16 7 8 10 16 10 10 21 12 1 8 2 9 13 4 17 21 19 5 4 15 17 4 11 7 9 14 3 16 12 15 7 16 10 4 9 3 12 8 8
15 7 11 12 8 10 1 20 14 19 12 18 3 10 16 10 16 14 4 1 22 16 4 10 18 18 6 12 18 17 19 12 22 8 8 16 10 21 8 2 2 1 18 4 4 10 14 13 19 4 17 18 14 19 18 3 19 12 2 22 16 13 17 3 14 8 2 19 22 22 3 22 2 3 10 15 18 13 11 4 6 13 15 14 1 13 14 22 2 10 13 8 17 7 6 6 1 22 16 2 13 13 11 7 11 14 10 14 14 1 2 13 4 20 3 19 4 1 1 18 13 12 10 7 7 13 8 22 4 16 13 4 11 22 17 13 19 19 11 3 16 1 2 14 19 16 13 14 13 1 16 6 4 12 6 8 14 16 13 13 3 10 16 2 6 8 19 6 10 21 14 16 11 7 11 6 18 19 14 22 17 19 15 6 22 4 20 18 3 12 2 8 13 13 13 16 17 3 18 19 12 10 4 18 10 22 17 1 13 17 18 13 19 15 2 2 15 2 7 18 22 18 2 15 17 22 18 18 8 11 2 21 18 19 17 11 1 16 19 16 3 19 18 18 17 18 18 7 10 18 13 18 8 2 8 2 18 19 18 19 17 3 22 11 15 22 18 19 19 13 17 22 4 18 2 15 12 7 4 1 12 2 18 11 18 7 18 1 18 17 18 1 13 18 18 18 17 3 3 13 18 17 19 10 20 18 8 17 14 19 17 22 17 1 8 2 7 2 1 17 22 18 10 19 8 7 14 3 14 12 2 17 2 10 8 2 4 18 18 21 21 20 7 12 7 1 18 7 22 15 13 18 19 2 12 10 18 10 18 16 10 19 22 21 14 10 14 20 11 10 4 22 1 22 14 12 13 14 22 22 4 17 19 17 7 8 16 8 1 16 2 3 7 22 1 22 13 2 14 19 2 11 22 3 12 13 13 16 15 10 21 15 19 14 4 3 13 1 12 22 19 8 20 6 7 10 18 18 13 21 14 11 13 11 22 8 7 19 14 16 8 15 16 8 13 12 4 12 1 2 8 19 1 8 10 15 4 21 13 11 1 14 3 12 12 11 2 4 15 14 18 3 3 12 16 13 11 7 10 13 19 22 14 12 10 4 16 10 13 1 19 22 14 18 2 20 18 10 18 8 8 22 7 8 17 19 10 18 22 13 20 15 1 17 7 22 12 2 15 8 4 3 3 17 16 18 16 8 18 4 22 21 18 21 10 12 22 13 21 7 13 15 19 10 7 13 17 21 18 11 12 3 3 2 4 8 12 12 16 15 22 10 2 10 21 1 18 17 4 13 18 14 13 21 14 3 6 17 14 1 15 14 2 13 22 1 4 6 22 14 14 22 14 7 7 2 17 13 16 17 8 12 13 18 18 12 7 13 13 8 11 18 13 17 14 12 3 2 18 1 18 7 17 18 13 11 2 1 1 11 19 7 4 12 3 1 6 19 13 19 19 18 17 2 13 17 22 16
The game’s afoot. Time for a new thread?
fifthmonarchyman,
Untrue. While the paper overstates its conclusions, it does actually describe the process followed in enough detail to replicate it. You could learn a lot from that writing style.
As I summarized earlier, adapting that process to use machine learning rather than human participants would look like this:
1) Download all historical closing prices for the DJIA.
2) Randomly, with a uniform distribution, select a few thousand start dates.
3) From each start date load the next 250 closing prices and save these time series.
4) For each time series generate a permuted version following the algorithm described in the paper.
5) Divide the set of time series pairs into training, cross validation, and test sets.
6) Train a machine learning model using the training and cross validation sets by presenting pairs of real and permuted time series.
7) Test the trained model on the test set.
You have yet to explain why following this process, exactly that described in the paper, and achieving a 73% or better accuracy, exactly what humans achieved, would not meet your criteria for disproving your claims.
Your game is not the same as that described in the paper and you have not yet provided sufficient clarity on the rules for anyone to be able to play it.
I’ve described in detail what I propose. Either commit to abiding by the results or explain why following the exact process described in the paper doesn’t meet your criteria.
After a couple thousand posts it’s difficult to find a link to the finance game.
petrushka,
Is It Real, or Is It Randomized?: A Financial Turing Test
You’ll notice some marked differences between the process used in the paper and fifthmonarchyman’s less than clear game rules.