Diego Klabjan
  • Home
  • Vita
  • Publications
  • Contact

Artificial Intelligence vs CRISPR

3/24/2016

0 Comments

 
CRISPR/Cas9 is a brand new algorithm in machine learning that has potentials to replace humans in practicing law and contract negotiations. It is based on a new deep learning model that is trained only on a few documents and then is capable to produce questions to be asked at a trial or negotiate a real estate deal with Mr. Trump. The model consists of 10 layers of …

Got you (at least some of you). CRISPR/Cas9 is actually a gene editing technology developed at the University of California, Berkeley that does not have anything to do with machine learning (and artificial intelligence). You can read more about the technology on Wikipedia. It is a relatively recent invention and has already been used in curing diseases in adult tissues and to change the color of skin in mice, Mizuno el. al. 2014. No problem here since the humanity definitely wants to eradicate cancer and other genetic diseases. Until a recent very controversial study came out by Junjiu Huang, a gene-function researcher at Sun Yat-sen University in Guangzhou. They applied gene editing by using CRISPR/Cas9 on embryos. Truth to be told, these embryos cannot result in a live birth. Their intent was to cure a blood disorder in the embryo. Huang concludes that further advances need to be made since several of the embryos were not successfully edited, but some were. It goes without saying that this in the future could lead to an ideal child with blue eyes, 6 feet in height, IQ of 130, etc (and graduating from Northwestern University or Georgia Tech - where I got my Ph.D). This nice article in Nature summarize and discusses this controversial research direction.

On the other hand, deep learning in the context of artificial intelligence is all the rage these days. Everybody is warning about the danger of artificial intelligence (AI) to humanity, including extremely influential and prominent people such as Bill Gates, Elon Musk, and Stephen Hawking. There are at least 100 returned pages (and probably many more) on google mentioning “artificial intelligence Bill Gates Elon Musk Stephen Hawking”. For someone that knows deep learning (DL) and is conducting research in this space, I believe DL and AI are very far from endangering humanity. Yes, there were significant advances in supervised learning in specific areas (autonomous cars, scene recognition from images, answering to simple factoids), however these models still need a lot of training data and can solve only very narrow specific problems. Train a scene recognition model on images of living animals and then show it a dinosaur. The answer: “Elephant.” A lot of written news and reports today are written by computers powered by AI, but this is much more structured and easier to learn than negotiating a contract with Mr. Trump or preparing a lawyer for a trial. We are very far from computers displacing humans for such tasks.

I am not worried about AI, definitely not in my life span and that of my children, but CRISPR/Cas9 makes me much more nervous. It really means interfering with the natural process and in the not-so-distant future creating exceptional humans a la carte. I am convinced that without any regulations, i.e., unleashing the scientists, successful gene editing would be around the corner. I believe it is very important that experts around the globe step in and prevent further studies of gene editing on embryos. In terms of AI, using CRISPR/Cas9 or another yet-to-be-invented technology to artificially create a functional brain with all the neurons in a jar seems to be more viable and closer in time than mimicking the human brains satisfactory to endanger humanity with bits.

0 Comments

Discovery vs Production

3/14/2016

0 Comments

 
It has been noted that big data technologies are predominantly used for ETL and data discovery. The former has been around for decades and is well understood with a mature market. Data discovery is much newer and less understood. Wikipedia’s definition reads “Data discovery is a business intelligence architecture aimed at interactive reports and explorable data from multiple sources.”

Data lakes based on Hadoop are bursting out at many companies with the predominant purpose of data discovery from multiple sources (that are explorable). It is easy to simply dump files from all over the place into a data lake and thus the data source requirement in the definition is met. What about the part on “interactive reports?” Verb “to discover” based on dictionaries means “to learn of, to gain sight or knowledge of,” which is quite disconnected from interactive reports. It actually does not have much in common. Indeed, in business data discovery is much more aligned with the dictionary definition than Wikipedia. Data discovery as used with big data and data lakes really means “to gain knowledge of data – in order to ultimately derive business value – by using explorable data from multiple sources.”

The vast majority of the applications of big data are to conduct data discovery in the sense of learning from the data. The knowledge gained per se does not provide business value and thus such insights are operationalized separately in more established architectures (read EDW, RDBMS, BI).  A good example is customer behavior derived from many data sources, e.g., transactional data, social media data, credit performance. This clearly calls for data discovery in a data lake and insights written into a ‘relational database’ and productionalized by means of other systems used in marketing or pricing.

There are very few cases of big data solutions outside of ETL being actually used in production. Large companies directly connected with the web successfully deployed in production big data technologies (Google for page ranking, Facebook for friend recommendations) but outside of this industry big data solutions in production are rarely observed.

It is evident that today big data is used predominantly for data discovery and not in production. I suspect that as technologies mature even more and become more self-served the boundary will gradually shift more towards production assuming that business value would be derived from such opportunities. Today big data is mostly about data discovery. The Wikipedia definition about interactive reports is for now mostly an elusion and it is better to stick with the proper English definition of gaining knowledge of. 

0 Comments

    Diego Klabjan

    Professor at Northwestern University, Department of Industrial Engineering and Management Sciences. Founding Director, Master of Science in Analytics.

    Archives

    July 2019
    June 2019
    March 2019
    February 2019
    January 2017
    August 2016
    March 2016
    November 2015
    April 2015
    March 2015
    February 2015
    January 2015
    December 2014

    Categories

    All
    Analytics

    RSS Feed