======================================================================= Cybernetics in the 3rd Millennium (C3M) Volume 14 Number 1, Dec. 2017 Alan B. Scrivener — www.well.com/user/absmailto:abs@well.com ========================================================================
machine learning applications illustration from dzone.com ( dzone.com/articles/top-10-machine-learning-use-cases-part-1 )
In this issue:

Short Subjects

Fractured Flickers "Fractured Flickers" letterhead with Theda Bara (artofjayward.blogspot.com/2015/06/fractured-flickers.html)
  • book review: Superforecasting (2016) by Philip E. Tetlock and Dan Gardner ( www.amazon.com/exec/obidos/ASIN/0804136718/hip-20 ) This book is about keeping forecasters honest, quantified and verifiable, and keeping track of all of their forecasts to avoid what I call "the error of the missing denominator." It's also about improving forecast quality. Author Tetlock helped found the Good Judgment Project to promote these goals. ( en.wikipedia.org/wiki/The_Good_Judgment_Project ) It also makes the important distinction between foxes and hedgehogs, expanding on ideas in a 1953 book by Isaiah Berlin ( en.wikipedia.org/wiki/The_Hedgehog_and_the_Fox ) based on a fragment of the Greek poet Archilochus which says: "The fox knows many things, but the hedgehog knows one big thing." It is claimed that superforecasters are foxes.

  • book review: The Phoenix Project: A Novel about IT, DevOps, and Helping Your Business Win (2104) by Gene Kim, Kevin Behr and George Spafford ( www.amazon.com/exec/obidos/ASIN/0988262509/hip-20 ) I'm always tickled when a novel comes along which fairly painlessly teaches something. In this case it's about my own career backyard: Information Technology (IT). A lot of ugly truths are trotted out in this book that I haven't seen anywhere else in print.
  • Web3D is here at last: X3DOM Web3d example For decades a project called Web3D has popped up at the SIGGRAPH conference and elsewhere, offering the goal of interactive 3D graphics in a browser with no plug-ins needed. Every time I checked in they were "almost there." Well, finally, a solution exists, provided you use a recent version of Chrome or other post-modern browsers. One JavaScript implementation is X3Dom. See if the example below works for you. ( examples.x3dom.org/example/RadarVolumeStyle/ ) I've been playing with this tech and will report soon. It's been a goal of mine since forever to write web articles about Bucky Fuller's geometries and have in-line interactive 3D illustrations. Meanwhile there's a lot of info available. ( www.x3dom.org/examples/ )
  • a short story: The Boomertown Rats A dear friend of mine, Phil Cohen, passed away about two years ago, and about one year ago I finished a science fiction short story written in his honor. It's got some cybernetics in it: darknets, ad hoc currencies, network effects... ( people.well.com/user/abs/Writing__/Fiction/ShortStories/BR/br.html )
  • Orlando Attractions on a Deck of Cards a research project by Alan B. Scrivener cards fanned This silly little project doesn't have much to do with cybernetics (except, of course, that everything does) but it has some interesting raw data, and I found it amusing to do. Sometimes research is therapeutic for me, and I did this project mostly over the 4th of July weekend 2017, entirely for fun. Quote:
    "About 2000 or early 2001 I visited Orlando for a combined family vacation and business trip, and ended up picking up a deck of Orlando area promotional cards. It was a real deck of cards, but had a local attraction or tourist-oriented business on each one. There was also a map, and the cards keyed into the map. I thought it was a clever gimmick, and I saved the box with the cards and map inside. Now, about 17 years later, I became curious which businesses had survived, and so I did some web research and compiled the following table..."
    ( people.well.com/user/abs/Writing__/Nonfiction/Orlando/cards.html )
  • SoCal Aerospace History Project

    In the last few months I've been working on a research project into the history of Southern California's aircraft and space industries. Again, there isn't a direct connection to cybernetics (besides the development of self-aiming anti-aircraft guns), but there's a lot of raw data. The final product will be a series of video blogs, but I'm sharing the work-in-progress research. ( people.well.com/user/abs/Writing__/Nonfiction/Vlogs/SoCal_aviation.html )

  • Dr. Lawrence J. Fogel Dr. Lawrence J. Fogel Dr. Lawrence J. Fogel portrait at asc-cybernetics.org ( www.asc-cybernetics.org/foundations/Fogel.htm ) Demonstrating once agin that everything is connected, I found one particular overlap between cybernetics and SoCal aerospace history. Dr. Lawrence J. Fogel was a Design Specialist for Convair, a pioneering SoCal aircraft company. He was an early practitioner of Human Factors Engineering and Artificial Intelligence (AI). He authored the book "Artificial Intelligence Through Simulated Evolution" (1966), an early work on artificial life. ( www.amazon.com/exec/obidos/ASIN/0471265160/hip-20 ) According to a bio:
    Dr. Fogel served as President of the American Society of Cybernetics in 1969, following Warren McCulloch. He also served as the founding Editor-in-Chief of the Journal of Cybernetics, the transactions of the ASC. He helped organize and co-edit the Proceedings of the Second and Third Annual ASC Symposia (1964, 1965), providing the keynote address at the latter meeting in which he concluded "it was my privilege to be among those who participated in this event in the 'coming of age' of cybernetics."
    ( www.asc-cybernetics.org/foundations/Fogel.htm )

Explorations in Machine Learning

ML definition and xkcd cartoon illustration from "Introduction to Machine Learning for Developers" ( blog.algorithmia.com/introduction-machine-learning-developers )
Machine Learning -- Machine Learning Everywhere internet meme 2017
"If managers aren't ramping up experiments in the area of machine learning, they aren't doing their job." — Harvard Biz Review tweet; 18 July 2017 ( t.co/HwFJvlPqZT ) "LIFE IS LIKE MACHINE LEARNING — YOU NEVER KNOW WHAT YOU'RE GOING TO GET" — an internet meme, 2017
Machine Learning (ML) seems to be the "flavor of the month." I've been jobhunting recently and I am amazed by the number of job openings involving ML, Artificial Intelligence (AI), Data Science and related tech. Meanwhile in the tech press there is near hysteria about the need for companies to catch up in this new revolution. Earlier this year I did programming work for SQLstream, which included integration of their streaming platform, SQLstream Blaze, with a Machine Learning package from Apache Spark called SystemML. They ran together on a Linux server. Since then I got SystemML working on my MacBook Pro laptop, and have been experimenting with it. This is a report on those experiments.

BACKGROUND

machine learning taxonomy illustration from mathworks.com ( www.mathworks.com/content/mathworks/www/en/discovery/ machine-learning/jcr:content/mainParsys/band_copy_262306852/ mainParsys/columns/1/image_2128876021.adapt.full.high.svg/1506453854867.svg )
"The [NIPS2003 Feature Selection] challenge attracted 75 competitors. Researchers all over the world competed for 12 weeks on 5 sizable datasets from a variety of applications, with number of input features ranking from 500 to 100,000. The goal was to use as few input features as possible without sacrificing prediction performance. The competitors succeeded in reducing the input space dimensionality by orders of magnitude while retaining or improving performance. Some of the most efficient methods are among the simplest and fastest. The benefits of using such techniques include performance improvements in terms of prediction speed and accuracy as well as gaining better data understanding by identifying the inputs that are most predictive of a given outcome." — Isabelle Guyon, Steve Gunn, Asa Ben Hur, Gideon Dror "Design and Analysis of the NIPS2003 Challenge" in Feature Extraction, Foundations and Applications (2016) Isabelle Guyon, Steve Gunn, Masoud Nikravesh, and Lofti Zadeh, Editors. Series Studies in Fuzziness and Soft Computing, Physica-Verlag, Springer ( www.amazon.com/exec/obidos/ASIN/3540354875/hip-20 )
In 2001 I was invited to produce an instructional video explaining the mathematical concept known as a Support Vector Machine (SVM), which I had previously never heard of. The term is confusing; the "machine" refers to an algorithm, implemented in software. What it does is called searching for "support vectors" which I will explain shortly. The sponsor was Mindtel LLC of Rancho Santa Fe, and the knowledge came from Biowulf Corp. I was fortunate to able to travel to Berkeley, CA and meet Isabelle Guyon, one of the creators of the algorithm. After interviewing her and some associates, and reading a stack of papers by her and colleagues, I came up with some metaphors for what a SVM does, and produced a video explaining it. If you're curious you can see the whole draft screenplay on the Mindtel web site, ( mindtel.com/2007/0603.anakin1/HIP_images/Biowulf/biowulf_screenplay_2nd.html ) and I've been searching crates in storage lockers for the video, which I plan to post on YouTube. Here's one metaphor: terrain illustration image from video "Biowulf's Mathematical Tools: New Breakthroughs In Machine Learning" (2001) by Alan B. Scrivener Imagine there's a region of terrain partially submerged by water. Let's say the water level is at 0 feet above sea level. Let's say you have an arbitrary list of locations in the region, with latitude and longitude (in degrees) and altitude (in feet). It might look something like this:
lat lon alt
32.5 -117.0 90
32.5 -117.1 219
32.5 -117.2 -131
32.5 -117.3 -327
32.5 -117.4 -2721
32.5 -117.5 -3964
32.6 -117.0 126
32.6 -117.1 8
32.6 -117.2 -90
32.6 -117.3 -282
32.6 -117.4 -442
32.6 -117.5 -3971
The above data represents the bottom two rows of "pins" in this map, listed right-to-left and bottom-to-top: pins in Google map map created using https://www.darrinward.com/lat-long If your question is, "Where is it dry or wet?" you can just look at the altitude column and see: positive altitudes are on dry land while negative altitudes are underwater. We can simplify the table by substituting a code: 1=wet, 2=dry, like so:
lat lon wet/
dry
32.5 -117.0 2
32.5 -117.1 2
32.5 -117.2 1
32.5 -117.3 1
32.5 -117.4 1
32.5 -117.5 1
32.6 -117.0 2
32.6 -117.1 2
32.6 -117.2 1
32.6 -117.3 1
32.6 -117.4 1
32.6 -117.5 1
If we throw in some useless columns, such as the time of day of the last measurement, the per cent of granite in the underlying rock, or even random numbers, we're ready to give a challenge to a Support Vector Machine.
lat lon time %
granite
random wet/
dry
32.5 -117.0 28040 5 0.947522450592 2
32.5 -117.1 46843 89 0.617269553296 2
32.5 -117.2 4772 43 0.203837149867 1
32.5 -117.3 4712 50 0.720961040106 1
32.5 -117.4 68497 6 0.755910110406 1
32.5 -117.5 60455 24 0.925103737398 1
32.6 -117.0 25560 23 0.263836121132 2
32.6 -117.1 39292 24 0.0865100394398 2
32.6 -117.2 8122 13 0.170456551953 1
32.6 -117.3 9486 80 0.849119181071 1
32.6 -117.4 34864 98 0.990071151817 1
32.6 -117.5 55406 8 0.730835110699 1
By way of example I'm going to talk about a Linear Binary-Class Support Vector Machine. Linear because it only looks for linear relationships; Binary-Class because the last column only has two possible states. We break the dataset into two pieces (two sets of rows): training data and testing data. In the case of the training data we give this data to the SVM, including the last column, so it knows the right answers for this data. Then the magic happens, and through advanced math, or "artificial intelligence" if you will, the SVM comes up with a linear equation to predict the last column based on the other columns. The testing data is used to "score" this formula for predictive value: how often is it right? What the algorithm is going to do is to search for the "support vectors" which is to say the important columns in the data for computing the last column. In this case one would hope it concludes that only lat and lon are support vectors. Then it comes up with a linear equation of the form: score = (a * lat) + (b * lon) where a and b are the coefficients of the support vectors. When score is negative we have wet (1) and where it is positive we have dry (2). Since this is a linear equation it describes a line. Looking at our map above we'd like an equation that separates the wet from the dry cleanly, although this is not quite possible. A linear approximation is the best the SVM can do for us. Now, if this only worked for 2D data it wouldn't be that amazing. But you can input much higher dimensions of data. A 3d dataset will be divided by the linear equation of a plane. Beyond 3D it becomes very difficult to visualize, but the math is simple: for example an 8D dataset will yield coefficients for this equation: score = (a * x0) + (b * x1) + (c * x2) + (d * x3) + (e * x4) + (f * x5) + (g * x6) + (h * x7) which is the equation for an eight-dimensional hyperplane dividing a hypercube. It's a mouthful to say, but using it is easier. You might wonder, how does it work? Honestly I have no idea. I have concentrated all these years on how to apply it.

APACHE SystemML

SystemML logo illustration for "Apache SystemML" from spark.tc This last year I have been working with SystemML from the Apache foundation. Wikipedia ( en.wikipedia.org/wiki/Apache_SystemML ) says:
SystemML was created in 2010 by researchers at the IBM Almaden Research Center led by IBM Fellow Shivakumar Vaithyanathan. It was observed that data scientists would write machine learning algorithms in languages such as R and Python for small data. When it came time to scale to big data, a systems programmer would be needed to scale the algorithm in a language such as Scala. This process typically involved days or weeks per iteration, and errors would occur translating the algorithms to operate on big data. SystemML seeks to simplify this process. A primary goal of SystemML is to automatically scale an algorithm written in an R-like or Python-like language to operate on big data, generating the same answer without the error-prone, multi-iterative translation approach. On June 15, 2015, at the Spark Summit in San Francisco, Beth Smith, General Manager of IBM Analytics, announced that IBM was open-sourcing SystemML as part of IBM's major commitment to Apache Spark and Spark-related projects. SystemML became publicly available on GitHub on August 27, 2015 and became an Apache Incubator project on November 2, 2015. On May 17, 2017, the Apache Software Foundation Board approved the graduation of Apache SystemML as an Apache Top Level Project.
Algorithms in SystemML are implemented in a language called Data Manipulation Language (DML), which looks like this:
    Xw = matrix(0, rows=nrow(X), cols=1) debug_str = "# Iter, Obj" iter = 0 while(continue == 1 & iter < maxiterations) { # minimizing primal obj along direction s step_sz = 0 Xd = X %*% s wd = lambda * sum(w * s) dd = lambda * sum(s * s) continue1 = 1 while(continue1 == 1){ tmp_Xw = Xw + step_sz*Xd out = 1 - Y * (tmp_Xw) sv = (out > 0) out = out * sv g = wd + step_sz*dd - sum(out * Y * Xd) h = dd + sum(Xd * sv * Xd) step_sz = step_sz - g/h if (g*g/h < 0.0000000001){ continue1 = 0
But you don't have to learn DML to use SystemML -- only to make changes to the algorithms. I like SystemML because I know how to use it, it's not too hard to install and use, no coding is required, and it runs on Linux, PC and Mac. The fact that it scales to Hadoop clusters and Spark streams is just gravy. All of the experiments below use Apache SystemML in Standalone mode, using the "Binary-Class Support Vector Machines" algorithms described in section 2.2.1 of the SystemML Algorithms Reference documentation. ( systemml.apache.org/docs/0.12.0/algorithms-classification.html )

SystemML INSTALLATION

screen cap of installation instructions screen shot from Standalone SystemML installation instructions ( github.com/apache/systemml )
"None shall pass!" -- the Black Knight, "Monty Python and the Holy Grail"
You too can install this free open-source software and play with it. It is easiest on Linux, then on Mac, lastly on Windows since you must first install a bash shell system, such as from Cygwin. I've used Linux, and most recently Mac. If you get the software working you can follow along reproducing what I did in the sections below. Note: Installing open-source software can sometimes seem as difficult as a Grail Quest. Take heart that others have passed this way and survived. If you complete the quest you will join the ranks of a few triumphant heroes. Keep calm and carry on! To do my Mac installation I used instructions from these sources:
  1. I began with the instructions at Github: github.com/apache/systemml
  2. When, in step 3, I encountered Error: No available formula for apache-spark21 I found help here: github.com/Homebrew/homebrew-core/issues/6970
  3. This in turn required that I resolve brew issues, which I did with help from here: github.com/Homebrew/brew/blob/master/docs/Troubleshooting.md#troubleshooting
  4. I also go help from here: medium.freecodecamp.org/installing-scala-and-apache-spark-on-mac-os-837ae57d283f
  5. Once the installation was complete I began following the standalone tutorial on this page: systemml.apache.org/docs/0.12.0/standalone-guide
  6. I needed help with wget here: stackoverflow.com/questions/4572153/os-x-equivalent-of-linuxs-wget
The most important advice is be persistent, Google symptoms, and ask for help.

TEST 1: DATA FROM APACHE

graph of Haberman data graph of Haberman data from blog post by James D. McCaffrey ( jamesmccaffrey.wordpress.com/2017/08/25/habermans-survival-data ) The above-linked standalone tutorial includes instructions on how to download and use test data called the Haberman dataset, which provides four columns of data about surgery patients for a specific procedure:
  1. age of a patient in years
  2. the xx part of the 19xx year when an operation was performed
  3. number of "nodes" (medical thing) detected in the patient
  4. the thing to predict where 1 = the patient survived for five years or more, and 2 = the patient died within five years
Here is a sample of the data:
name: age year nodes predict
data row 1: 30 64 1 1
data row 2: 30 62 3 1
data row 3: 30 65 0 1
data row 4: 31 59 2 1
data row 5: 31 65 4 1
I had installed SystemML in ~/swdev/ML/systemml-0.15.0-bin and I set the shell variable $SYSTEMML_HOME to point to it. In ~/swdev/ML/ I created shell scripts as directed by the tutorial, and also created folder data_haberman containing the data file haberman.data. Here is what my shell scripts do (written for the C shell csh):
    # printf "0.5\n0.5" > data_haberman/perc.csv echo '{"rows": 2, "cols": 1, "format": "csv"}' > data_haberman/perc.csv.mtd # echo '{"rows": 306, "cols": 4, "format": "csv"}' > data_haberman/haberman.data.mtd # echo '1,1,1,2' > data_haberman/types.csv echo '{"rows": 1, "cols": 4, "format": "csv"}' > data_haberman/types.csv.mtd # $SYSTEMML_HOME/runStandaloneSystemML.sh $SYSTEMML_HOME/scripts/algorithms/Univar-Stats.dml \ -nvargs \ X=data_haberman/haberman.data \ TYPES=data_haberman/types.csv \ STATS=data_haberman/univarOut.mtx \ CONSOLE_OUTPUT=TRUE # $SYSTEMML_HOME/runStandaloneSystemML.sh $SYSTEMML_HOME/scripts/utils/sample.dml \ -nvargs \ X=data_haberman/haberman.data \ sv=data_haberman/perc.csv \ O=data_haberman/haberman.part \ ofmt="csv" # $SYSTEMML_HOME/runStandaloneSystemML.sh $SYSTEMML_HOME/scripts/utils/splitXY.dml \ -nvargs \ X=data_haberman/haberman.part/1 \ y=4 \ OX=data_haberman/haberman.train.data.csv \ OY=data_haberman/haberman.train.labels.csv ofmt="csv" # $SYSTEMML_HOME/runStandaloneSystemML.sh $SYSTEMML_HOME/scripts/utils/splitXY.dml \ -nvargs \ X=data_haberman/haberman.part/2 \ y=4 \ OX=data_haberman/haberman.test.data.csv \ OY=data_haberman/haberman.test.labels.csv \ ofmt="csv" # $SYSTEMML_HOME/runStandaloneSystemML.sh $SYSTEMML_HOME/scripts/algorithms/l2-svm.dml \ -nvargs X=data_haberman/haberman.train.data.csv \ Y=data_haberman/haberman.train.labels.csv \ model=data_haberman/l2-svm-model.csv fmt="csv" \ Log=data_haberman/l2-svm-log_h.csv # $SYSTEMML_HOME/runStandaloneSystemML.sh $SYSTEMML_HOME/scripts/algorithms/l2-svm-predict.dml \ -nvargs \ X=data_haberman/haberman.test.data.csv \ Y=data_haberman/haberman.test.labels.csv \ model=data_haberman/l2-svm-model.csv fmt="csv" \ confusion=data_haberman/l2-svm-confusion.csv \ CONSOLE_OUTPUT=TRUE \ scores=data_haberman/haberman_scores
In the script I've highlighted in red the parameters I change for each new dataset. The number of rows is 306 and the number of columns is 4; all other red text refers to the data directory (which I suppose I should have made a shell variable as well). The first few rows create files in the data_haberman directory that will be used for the computations. The metadata (mtd) files describe the format of their sister files. After that I invoke the provided script runStandaloneSystemML.sh in each case, with the algorithm provided in the DML file, such as $SYSTEMML_HOME/scripts/algorithms/l2-svm-predict.dml. Arguments are provided after the -nvargs flag. The sampleXY and splitXY steps create the training and testing data. The l2-svm step trains the SVM, and the l2-svm-predict step tests the SVM. Unfortunately the output from this contains a lot of error messages because I don't have logging turned on properly, but I used a script to filter them out (available on request) and in the resulting output I see: Accuracy (%): 72.66666666666667 The tools also create a bunch of files in the data_haberman directory, including: l2-svm-model.csv which contains the coefficients selected by the SVM:
coefficient
for age
coefficient
for year
coefficient
for nodes
0.0011374698837280472 -0.01232727712872697 0.06045232444006443
This makes some sense: a positive coefficient for age means a higher age results in a greater chance of death in 5 years, as does a higher node count. A higher year (recent surgery) results in a lower chance of death. (Remember that a negative prediction translates to 1=lives while a positive predicition means 2=dies.) Another file is the "confusion matrix" documented at the Apache web site ( systemml.apache.org/docs/0.12.0/algorithms-classification.html ) but apparently they have it backwards, based my own experiments. The generated file l2-svm-confusion.csv should contain the following confusion matrix of this form:
Prediction -> Actual | V 1 2
1 t1 t2
2 t3 t4
The model correctly predicted label 1 t1 times The model incorrectly predicted label 2 as opposed to label 1 t3 times The model incorrectly predicted label 1 as opposed to label 2 t2 times The model correctly predicted label 2 t4 times. If the confusion matrix looks like this:
Prediction -> Actual | V 1 2
1 107 38
2 0 2
then the accuracy of the model is (t1+t4)/(t1+t2+t3+t4) = (107+2)/(107+38+0+2) = 0.741496599 So the SVM told me its confusion matrix for the Haberman data was:
Prediction -> Actual | V 1 2
1 100 8
2 33 9
This is how the accuracy level of 72.66666666666667 per cent above. I thought this was pretty good until I ran across this blog entry by James McCaffrey: ( jamesmccaffrey.wordpress.com/2017/08/25/habermans-survival-data )
Importantly, the number of survivors in the dataset is 225 and the number of people who died is 81. So, if you just guessed "survived" (1) for every patient, you'd be correct with 225 / 306 = 0.7353 = 73.53% accuracy.
So the SVM didn't even beat monkeys without darts! He goes on to say:
...even a deep neural network can't really do very well on Haberman's data. The best results I've ever gotten on Haberman's survival data use a primitive form of classification called k-NN (k nearest neighbors). Unfortunately, some prediction problems just aren't tractable.
I wonder why such an intractable dataset was used as a beginner's first tutorial?

TEST 2: RANDOM DATA WITH A PATTERN

a needle in a haystack a needle in a haystack Next I wanted to see this gizmo find a pattern I'd hidden for it. I created a dataset, which I called random.data, and a directory called data_random to put it in. The dataset came from this Python program:
    # mkMLdata.py # import random # # init vars # len_file = 1000 # a1 = -0.1 a2 = 0.2 a3 = -0.3 a4 = 0.4 a5 = -0.5 a6 = 0.6 a7 = -0.7 a8 = 0.8 a9 = -0.9 # # loop # for col in range (0, len_file): # x1=random.random() x2=random.random() x3=random.random() x4=random.random() x5=random.random() x6=random.random() x7=random.random() x8=random.random() x9=random.random() score = a1*x1 + a2*x2 + a3*x3 + a4*x4 + a5*x5 + a6*x6 + a7*x7 + a8*x8 + a9*x9 if (score > 0.0): sign = 2; else: sign = 1; print '{0},{1},{2},{3},{4},{5},{6},{7},{8},{9}'.format(x1,x2,x3,x4,x5,x6,x7,x8,x9,sign)
This creates 1000 rows of 10 columns: the first nine are (pseudo) random numbers between 0.0 and 1.0, and the last uses a linear combination of the form: score = (a * x0) + (b * x1) + (c * x2) + (d * x3) + (e * x4) + (f * x5) + (g * x6) + (h * x7) + (i * x8) where: a = -0.1 b = 0.2 c = -0.3 d = 0.4 e = -0.5 f = 0.6 g = -0.7 h = 0.8 i = -0.9 If score is negative then the predict value is 1; if score is positive it's 2. So if this SVM is any good it should be able to find these coefficients (or something like them). Below are some useful numbers about this dataset.
name: col1 col2 col3 col4 col5 col6 col7 col8 col9 predict
my coefficient: -0.1 0.2 -0.3 0.4 -0.5 0.6 -0.7 0.8 -0.9
svm coefficient: -0.434145... 0.606559... -0.962973... 1.287862... -1.602148... 1.975224... -2.259865... 2.786665... -3.161251...
ratios: 4.34 3.03 3.20 3.21 3.20 3.29 3.22 3.48 3.51
data row 1: 0.184502491353 0.621131573599 0.656836128894 0.461035307316 0.58579851741 0.598615021871 0.441159402261 0.155037865723 0.355242266506 1
data row 2: 0.221557328333 0.798967758449 0.869295774808 0.585811618549 0.498760107223 0.00287997320329 0.459763274194 0.166389991898 0.945106431953 1
data row 3: 0.301930685458 0.859523194582 0.0752209472553 0.190349633799 0.66281616692 0.296183806467 0.478688042456 0.310067875918 0.418075135011 1
data row 4: 0.0824805682028 0.938126238071 0.64250511666 0.389808569647 0.580621341305 0.604629084062 0.282610072446 0.700967029877 0.464468514178 2
data row 5: 0.0569194723308 0.966407414661 0.458589233925 0.922457014455 0.207030108769 0.920899249796 0.649639261478 0.818377012685 0.177906638916 2
After I ran the SVM I was told its accuracy (in %) is 96.98795180722891, which is the best score I've seen in this whole exercise. The confusion matrix is:
Prediction -> Actual | V 1 2
1 344 3
2 12 139
If you compare the "my coefficient" row with the "SVM coefficient" row, you'll see the SVM's guesses for my coefficients. For each I calculated the ratio as well. It seems to converge on a number near 3.5 or so. Actually, a perfect SVM would not guess my exact coefficient, but the ratios would all be the same. This is because the boundary of classification is defined when score = 0, so that the formula becomes: 0 = (a * x0) + (b * x1) + (c * x2) + ... You can multiply both sides by any constant k and the left side side is still zero, while the right side becomes: (k * a * x0) + (k * b * x1) + (k * c * x2) + ... So we have to verify that the machine's coefficients are correct even after a k is multiplied. Aside: These are some of the near-magical properties of linear equations. To learn more I recommend the amazing math comic book "The Manga Guide to Linear Algebra" (2012) by Shin Takahashi. ( www.amazon.com/exec/obidos/ASIN/1593274130/hip-20 ) I noticed that the greatest deviation in the ratios was for the first column. I thought about it and realized that if there's going to be an error somewhere, the best place to put it is on the term that has the smallest contribution to the result, in this case a magnitude of 0.1 out of a range of magnitudes up to 0.9. And it works OK; the 96.98... % accuracy was achieved using this coefficient with an error in it.

TEST 3: RANDOM DATA WITH NO PATTERN

Garbage In Means Garbage Out internet meme 2017
TYRELL: Is this to be an empathy test? Capillary dilation of the so-called 'blush response', fluctuation of the pupil, involuntary dilation of the iris. DECKARD: We call it Voight-Kampff for short. TYRELL: Demonstrate it. I want to see it work. DECKARD: Where's the subject? TYRELL: I want to see it work on a person. I want to see it work on a negative before I provide you with the positive. DECKARD: What's that gonna prove? TYRELL: Indulge me. — "Blade Runner" (movie, 1984)
Having seen it succeed, I now wanted to see the SVM fail. I modified my Python program to output a 1 or a 2 in the predict column by (pseudo) random choice instead of using my score equation. There was now no pattern to find. I got an accuracy of 49.22779922779923 %, which is a little worse than random. Not beating monkeys with darts, which could score 50 % accuracy over the long haul. Here is the confusion matrix:
Prediction -> Actual | V 1 2
1 104 146
2 117 151
Well, that's a negative result alright. It occurred to me that you could get a 0.77220077220077 % boost by guessing the opposite of what the SVM predicted, but that's probably an artifact of the dataset; more data might give a better (closer to 50%) result.

TEST 4: DISNEY STOCK DATA

One Does Not Simply Dump Raw Data Into a Machine Learning Algorithm internet meme 2017 Now I was hankering to give the SVM some real data. Arbitrarily I chose stock data. From the NASDAQ web site ( www.nasdaq.com/symbol/dis/historical ) I grabbed about 2 month's worth of daily data for The Walt Disney Company (DIS) stock. Here are the columns I harvested:
  1. Date — Trading date in UNIX in epoch form (seconds since 1-1-1970) divided by 2,000,000,000
  2. Open — Opening price divided by 150
  3. High — Daily high divided by 150
  4. Low — Daily high divided by 150
  5. Close — Closing price divided by 150
  6. Volume — Trading volume divided by 20,000,000
  7. Predict — Did stock go up in the next trading day? (1=no/2=yes)
The table below shows some details of the data.
name: Date Open High Low Close Volume predict
svm coefficient 0.021564786755327192 -0.14062211146101772 -0.11012191722666322 -0.10158773850879664 -0.08579435131540565 0.46581889550920497
data row 1: 0.755136 0.699933333 0.708266667 0.693866667 0.698533333 0.8348745 1
data row 2: 0.7550928 0.6722 0.692733333 0.672 0.684533333 0.6529615 2
data row 3: 0.7550496 0.6792 0.68 0.669466667 0.674533333 0.4011034 1
data row 4: 0.7550064 0.675266667 0.686 0.6716 0.6774 0.6078205 1
data row 5: 0.7549632 0.6558 0.6742 0.655666667 0.670933333 0.736199 1
I ran the SVM and saw an accuracy of 38.888888888888886 %, which is way below the monkeys with darts. Now, I was expecting poor results from this experiment. If you believe in the concept of "efficient markets," any information about future price that could be this easily gleaned from available data would've already been gleaned, and the results factored into today's price, so it wouldn't have to change tomorrow. This logic sometimes hurts my head, reminding me of Alice in Wonderland (in the 1951 animated Disney version) who said:
If I had a world of my own, everything would be nonsense. Nothing would be what it is, because everything would be what it isn't. And contrary wise, what is, it wouldn't be. And what it wouldn't be, it would. You see?
But then I looked at the confusion matrix and things got, as they say, curiouser and curiouser:
Prediction -> Actual | V 1 2
1 12 2
2 17 1
In almost every case the SVM predicted the stock would not go up. Since the stock did go up more often than not (18 out of 32 times) in the testing data, this was disastrous for the SVM. Now, in the training data the stock went up only 15 out of 33 times, so what we are seeing is a shift in the pattern (if there really is a pattern) happening fast enough to confound the SVM.

NEXT STEPS

fortune teller cartoon illustration from SAP Analytics Twitter feed ( twitter.com/SAPAnalytics/status/788204197116280832 ) I wanted to put these results out there, so that others may use them, but there are things I still want to do:
  • better data I feel like I haven't had a real "win" yet with this technique. From my years in visualization I've learned that people love a good story, coupled with the right balance of surprise and confirmation. There's a sweet spot between obvious and incomprehensible. Most people know the story of beer next to the diapers; ( canworksmart.com/diapers-beer-retail-predictive-analytics ) though the real story is less dramatic, but this is an example of the kind of stories we want to be told about machine learning, despite the fact that the machines really can't provide them. As quoted above, "LIFE IS LIKE MACHINE LEARNING — YOU NEVER KNOW WHAT YOU'RE GOING TO GET." One of the added values of data analysts in this era seems to be pulling together good stories.
  • simulate purchase data A tricky problem with classifiers is getting good data into the prediction column in a timely way. The example everyone seems to want to explore these days is credit card fraud, but the [problem is that when someone sticks a card into a slot — and you have maybe 5 seconds to decide whether to decline it — all of the data on the integrity of that transaction is not yet available. Fraud often gets caught days later when someone checks their bill. To do a useful job, this data must be folded into the training and testing data, so that the SVM "knows the future" while it is learning. The analyst groups such as Forrester are getting all excited lately about real-time systems being retrained on the fly, but this can't work if accurate data cannot be provided in real-time for the prediction column. One idea I've had is to use sales data from vending machines with variable pricing. The predicted event is a sale, and that data is available immediately. Ideally I'd love to get my hands on real, raw data, but the vending machines are still mostly stuck in a mode where it takes a service visit to change pricing. Look for more variable pricing data as new machines roll out. But meanwhile, simulating purchases may be useful. A swarm of "sims" wandering a stylized shopping center past vending machines, each with a shopping list and a price-sensitivity function ("demand curve"), could provide training and testing data for experiments while waiting on real data.
  • experiment with higher-order nonlinear terms Though the solver I'm using is linear, Isabelle Guyon once clued me in that you can make them non-linear by adding columns combining other columns. For example, assuming that people enjoy hot chocolate more with marshmallows more than they enjoy either with pickles, we would expect that if besides chocolate (c), marshmallows (m) and pickles (p) columns we add the products cm, cp and mp, we would expect the cm column to have more weight. (For such products of columns it is advisable to scale them from 0.0 to 1.0 first.) Also, a square function has a "U" shape (or, if negated, a "hump" shape) which makes finding correlations with hump-shaped patterns easier.
  • investigate Kaggle I learned at a recent San Diego Python meetup that there is a project called "kaggle" which has data analytics contest on-line, and "leader boards" of the best analysts for each contest. I'm going to look into this, for the data sources, the experience, and possibly the glory. ( www.kaggle.com )

Note Re:

"If It's Just a Virtual Actor, Then Why Am I Feeling Real Emotions?"

mask
Part eight of this serialization will appear next time.

TO BE CONTINUED...

======================================================================== newsletter archives: www.well.com/user/abs/Cyb/archive ======================================================================== Privacy Promise: Your email address will never be sold or given to others. You will receive only the e-Zine C3M from me, Alan Scrivener, at most once per month. It may contain commercial offers from me. To cancel the e-Zine send the subject line "unsubscribe" to me. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ I receive a commission on everything you purchase from Amazon.com after following one of my links, which helps to support my research. ======================================================================== Copyright 2017 by Alan B. Scrivener
Last update: Wed Dec 20 11:40:00 PST 2017