This conference has concluded. Wolfram continues to be at the forefront of data science innovation.
We invite you to check out the latest at www.mpdatascience.com and join us at our Wolfram Technology Conference.

Schedule

Thursday, September 4

  Corcoran A Corcoran B
8–9:30am
Registration and Continental Breakfast
 
9:30–10:45am
Opening Keynote
Stephen Wolfram
 
10:45–11am
Break
 
11–11:30am
"Time Travel with Maps"
Ben Vershbow,The New York Public Library, Director, Digital Library + Labs

Time Travel with Maps

What if digital maps had a time slider that let you peel back layers of urban history? What if you could browse vanished streetscapes, “check in” to ghostly establishments, and instantly access archives relating to a particular time and place? In this talk, Ben Vershbow will walk through The New York Public Library’s experiments in digitizing historical maps, showing how they can be processed (with the help of computers and crowds) into open spatial datasets that can afford new ways of studying the past.

 
11:30am–noon
"Sensory Overload, Signal vs. Noise, Sensory Integration, IOT and the Behavioral Symphony of Wellness, and Implications for Creative Reconstruction of Healthcare"
John Mattison, Kaiser Permanente, SCAL, Chief Medical Information Officer

Sensory Overload, Signal vs. Noise, Sensory Integration, IoT and the Behavioral Symphony of Wellness, and Implications for Creative…

How will the healthcare team of the future leverage the avalanche of data?
What are the archetypal discussions that will rely on those data?
How will visualization inform those discussions?
What will the IoT of 2020 look like to a healthcare practitioner, and what will be the residual gap between what is actionable to a patient/person, versus what is actionable for their professional care team?
How will this all shape training, licensure, and privileging of individual members of the care delivery team?

"When the 'Thing' Is a Digital Scholarly Publication: Connecting Publications to Linked Data"
Nancy Kopans, Vice President, General Counsel, and Secretary, ITHAKA (JSTOR, Portico, and Ithaka S+R)

When the “Thing” Is a Digital Scholarly Publication: Connecting Publications to Linked Data

Scholarly journal articles and books are becoming dynamic and less “bounded.” Whereas even in digital format, these primary units of scholarly communication and artifacts of the scholarly record once were encapsulated in single files, now they are becoming multi-part, distributed objects. Scholarly digital publications increasingly consist of distinct building blocks, including text, graphics, and data, that reside in different repositories maintained by different institutions, employing different technologies. These components have many, and evolving, relationships that must be preserved over time, particularly given scholars’ growing expectations for connections between publications and underlying data. This presentation will describe a current initiative, the RMap project, that will make it possible to preserve not just the publications and their underlying data, but the complex relationships among them, thereby supporting the continual development of scholarly communication and digital publishing. A publisher that wants to know if there are reference links to data for a publication, for example, can submit metadata and identifiers to the RMap tool, which will return any relationships it finds, thus making it possible to track and preserve these connections through the scholarly communications cycle. By connecting publications to their linked data and preserving that connection, RMap aims to enable new forms of scholarly communication, research, and digital publishing, and will tackle an emerging need in the scholarly publishing and research community. The RMap project is being undertaken by Portico, a service of ITHAKA; the Data Conservancy at Johns Hopkins University; and IEEE, with funding from the Alfred P. Sloan Foundation.

noon–12:30pm
"Diversity of Data in the Search for Exoplanets"
Rachel Akeson, Deputy Director, NASA Exoplanet Science Institute

Diversity of Data in the Search for Exoplanets

The discovery of planets orbiting stars other than our own Sun has revolutionized astrophysics. Many observational methods are used to discover and characterize these systems, from the Kepler, Hubble, and Spitzer space telescopes to data collected by amateur astronomers in their backyards. Over 1,000 exoplanets are now known, and they range from massive planets hotter than the coolest stars to the recent announcement of an Earth-sized planet in an orbit that would allow liquid water. This range of properties presents a challenge to theories of planetary system formation and evolution in matching the observed sample, but the diversity of data collected also complicates this work. I will describe the range of data collected in the field and the resulting selection biases and work of the NASA Exoplanet Archive to collect and collate data on exoplanets to facilitate astrophysical research

"Supporting Science Investments with Small and Big Data Solutions"
Christian Herzog, CEO, ÜberResearch GmbH

Supporting Science Investments with Small and Big Data Solutions

Funding science is investing in the future, and it is carried out by national and international organizations as well as foundations and corporations. In addition, science funding happens on several levels: on the national level, serving national interests, and on the international level, performed by organizations like the European Union. But the science system itself is highly global and international, with researchers and organizations cooperating globally.

The data required to support objective decision making to direct science investment hasn’t, in the past, been available. As an example, the majority of science funding organizations do not make the information about which projects they fund, or who receives the funding, easily accessible, or even publish it at all (with the caveat that, for example, government organizations in the US and UK are providing databases and APIs). But there is currently no consolidated award database that allows for even simple interrogation, such as whether a similar research proposal has been funded by another research funder in the past.

ÜberResearch, a company backed by Digital Science (the youngest sibling of the Nature Publishing Group), is focusing on aggregating such a growing global award database (in the absence of a public initiative), and is providing the functionality and user interfaces expected from a big data initiative to fill this gap, with the support and input from 20 development partners—mostly science funders.

Concrete examples from our partners on portfolio alignment, reporting, and process support leveraging the global award database will be shared.

12:30–2pm
Lunch
 
2–2:30pm
"Data Is Everywhere. Relationships Matter... a Lot!"
Anthony Scriffignano, Dun & Bradstreet, SVP Worldwide Data & Insight

Data Is Everywhere. Relationships Matter... a Lot!

The "V's" of Big Data (volume, variety, veracity, value, etc.) are now well entrenched in our thinking as data scientists and information professionals. We have come to understand the techniques of discovery and curation, though we still struggle with the many underlying issues associated with overwhelming quantities of data. More and more, a new opportunity exists in the area of discovering and synthesizing insight from relationships among data. The ability to discern these relationships, both in terms of data in hand and discoverable data, is increasingly the capability that separates the urgent from the important in the decision matrix of leaders in information-centric roles. This session will discuss some of the critical evolutions in the discovery and curation of relationships involving business entities and people in the context of those entities on a global scale, including some interesting and humorous learnings as this capability is expanded across geographies and languages.

 
2:30–3pm
"The Emerald Cloud Lab: Data Lessons Learned from Running a Remote-Controlled Lifesciences Laboratory"
Brian Frezza, Emerald Therapeutics, Co-founder

The Emerald Cloud Lab: Data Lessons Learned from Running a Remote-Controlled Lifesciences Laboratory

The Emerald Cloud Lab (ECL) is a system that allows researchers to remotely run life sciences experiments in a central lab over the internet. Researchers send samples to the ECL and then remotely issue commands over the internet to conduct the experiments as if they were standing in front of the instruments and handling the material themselves. This system was initially developed for Emerald's internal research team over a four-year period, and has only recently been unveiled to the larger scientific community. In shifting the control of the laboratory itself to the cloud, rapid advancements in the way we communicate the science that comes out of the laboratory have also become readily feasible. Scientific figures, for example, have been at the heart of scientific communication for ages, as they provide an elegant means of making a strong logical argument on the basis of data. As a necessity of making a clean logical argument, these figures often provide only a tightly windowed point of view to a much larger world of supporting data. The full picture is often available only in bits and scraps assembled from the laboratory notebooks of the scientists involved in conducting the experiments. The ECL has pioneered a linked data system that allows scientific figures to not only present an initial viewpoint of the data, but also give the viewer the ability to dynamically interact with and freely explore the full dataset behind the figure. In this talk, we’ll present the system and discuss some of the implications of how this form of communication can facilitate a more frictionless exchange of scientific ideas.

"Techniques and Applications for Sentiment Analysis"
Ronen Feldman, Chief Scientist, Digital Trowel

Techniques and Applications for Sentiment Analysis

Sentiment analysis (or opinion mining) is defined as the task of finding the opinions of authors about specific entities. The decision making process of people is affected by the opinions formed by thought leaders and ordinary people. When a person wants to buy a product online, he or she will typically start by searching for reviews and opinions written by other people on the various offerings. Sentiment analysis is one of the hottest research areas in computer science. Over 7,000 articles have been written on the topic. Hundreds of startups are developing sentiment analysis solutions, and major statistical packages such as SAS and SPSS include dedicated sentiment analysis modules. There is a huge explosion today of texts available from social media, including Twitter, Facebook, message boards, blogs, and user forums. These snippets of text are a gold mine for companies and individuals that want to monitor their reputation and get timely feedback about their products and actions. Sentiment analysis offers these organizations the ability to monitor the different social media sites in real time and act accordingly. Marketing managers, PR firms, campaign managers, politicians, and even equity investors and online shoppers are the direct beneficiaries of sentiment analysis technology.

It is common to classify sentences into two principal classes with regard to subjectivity: objective sentences that contain factual information, and subjective sentences that contain explicit opinions, beliefs, and views about specific entities. We will mostly focus on analyzing subjective sentences. However, we will refer to the usage of objective sentences when we describe a sentiment application for stock picking.

3–3:30pm
Break
 
3:30–4pm
"Food and Nutrient Databases: Overcoming the Seesaw between Comprehensiveness and Completeness"
Denise King, Director of Operations, Nutrition Coordinating Center, University of Minnesota

Food and Nutrient Databases: Overcoming the Seesaw between Comprehensiveness and Completeness 

Comprehensive, complete, and accurate data on the nutrient composition of foods are fundamental to research examining the role of diet in health. In addition, there is growing need for this data in the context of the quantified self movement, which has spurred the development of thousands of apps for consumer use in tracking food and nutrient intake. There are more than 100 food and nutrient databases available publicly or through licensing. These databases vary with respect to the number of foods and nutrients included and accuracy. In most databases, there is a clear trade-off between comprehensiveness (number of foods in the database) and completeness (nutrients available and completeness of those nutrients). For example, a database that includes 100,000 foods is apt to provide composition data for a limited number of nutrients, and values for some nutrients are likely missing for most foods. In contrast, a database that contains a smaller number of foods is apt to provide composition data for more nutrients, and values are likely more complete. The University of Minnesota Nutrition Coordinating Center has developed and employed several strategies to address this conundrum so that the food and nutrient database it has developed is both comprehensive and complete relative to other databases. In this presentation, these strategies will be described, and future development opportunities will be discussed.

"Strategies toward Improving the Utility of Scientific Big Data"
Evan Bolton, Lead Scientist, National Library of Medicine, National Institutes of Health

Strategies toward Improving the Utility of Scientific Big Data

PubChem is a web-based open archive of chemical substance descriptions and their biological activities at the U.S. National Library of Medicine. From humble beginnings nearly ten years ago, it contains significant amounts of scientific research data and numerous inter-relationships (on the order of 10^12-10^15, depending on degree of specificity) between chemicals, proteins, genes, scientific literature, patents, and more. This information is used by many tens of thousands of researchers every day. The users are primarily scientists with diverse backgrounds, varied domain knowledge, and different use cases. Helping scientists maximize the value of this continually growing corpus of information can be a challenge.

The rigor of data can be highly variable. Some scientific experiment methods are more reliable than others or give different information. Some document-based data may be manually curated or algorithmically text-mined. Good information can be corrupted when data are exchanged or transformed.

Asking questions of PubChem data can, at times, seem like drinking water from a fire hose. Providing the most relevant information first is not trivial, as what is relevant depends on the particular user. Given the specificity of scientific information, an individual researcher may have very different perspectives on the same information. The data are too sparse (give me more!), too dense (show me less!), or incomplete (where is it?). The data interfaces too limited (why can’t I do this?), too complex (why can’t you just show me what I want to see?), or too slow (is an analysis of ten million by ten million too many?).

If you work with big data, some or all of this may sound a little familiar. The actors are different, but the challenges are similar. While always a work in progress, PubChem has remained useful and relevant to the chemical biology community it serves by adopting various strategies. This talk gives an overview of these approaches in a continual drive to improve the utility of scientific big data.

4–4:30pm
"Computational Knowledge Meets Quantum Chemistry"
Stefan Janecek, Senior Researcher, uni software plus GmbH

Computational Knowledge Meets Quantum Chemistry

Density functional theory (DFT) is one of the primary workhorses for ab initio simulations, that is, the calculation of material properties based entirely on the principles of quantum mechanics, without empirical input. Technically, DFT calculates an approximation to the quantum-mechanical ground-state of many-electron systems, such as atoms, molecules, or condensed matter. Today, it is heavily used in such diverse areas as chemistry, physics, the semiconductor industry, material science, molecular biology, drug design, and the steel industry.

In this talk, we present a DFT simulation code implemented entirely in the Wolfram Language. We will briefly discuss our experience in using the Wolfram Language as a software development tool compared to more traditional approaches, and then go on to explore the possibilities of ab initio simulation embedded in a computational knowledge framework with access to curated physics and chemistry data.

"Data Aggregation and Analysis Challenges for Intelligent Manufacturing"
Robert Graybill, President & CEO, Nimbis Services, Inc.

Data Aggregation and Analysis Challenges for Intelligent Manufacturing

In the context of Network of Things, much has been accomplished to advance the "smartness" of individual machines and pieces of equipment using locally deployed sensors and models. The challenge for twenty-first century Smart Manufacturing (SM) is manufacturing in which all information is available when it is needed, where it is needed, and in the form it is most useful to drive optimal actions and responses. The twenty-first century SM enterprise is data driven, knowledge enabled, and model rich with visibility across the enterprise (internal and external) such that all operating actions are determined and executed proactively by applying the best information and a wide range of performance metrics. SM also encompasses the sophisticated practice of generating and applying data-driven manufacturing intelligence throughout the lifecycle of design, engineering, planning, and production. Manufacturing intelligence is a deep, comprehensive behavioral understanding of the manufacturing process through data and modeling, which can create a new capacity to observe and take action on integrated patterns of operation through networked data, information, analytics, and metrics. SM applications use networked, information-based technologies to integrate manufacturing intelligence in real time throughout the enterprise, and identify untapped opportunities to improve manufacturing performance. An industry-directed group, known as the Smart Manufacturing Leadership Coalition (SMLC), identified that by lowering the implementation barriers around cost, complexity, ease-of-use, and measurement availability through the use of an open-cloud SM platform, the US manufacturing industry could deploy foundational infrastructure for vertically and horizontally oriented manufacturing intelligence to collectively strengthen capability.

4:30–5pm
"The Strange Case of the Noob Who Didn't Buy: Big Data and Pricing"
William Grosso, Scientific Revenue, CEO

The Strange Case of the Noob Who Didn't Buy: Big Data and Pricing

For 20 years now, we've known that the internet changes everything. What we're still learning is just how big the scope of "everything" is.

The rise of digital (and digitally mediated) goods, the emergence of direct-to-consumer long-term-persistent points of sale, and the adoption of big-data technologies are transforming the way we think about retail, pricing, and merchandising.

What happens when:

  • The marginal cost of goods is zero, or close to zero?
  • There is unlimited instantaneous capacity to create goods?
  • Most goods are consumed, expire, or become uninteresting?
  • The merchant has the ability to, in real-time, create and present an unlimited number of differentiated and personalized products, bundles, discounts, and other offers?
  • The merchant has a long-term relationship with the customer, in which it is seeking to maximize the lifetime value of the customer?
  • The merchant can track subsequent post-purchase behavior and outcomes, and can use that information in subsequent transactions?
  • The merchant has access to a vast array of data, and big data machinery, to closely examine the information?

In this talk, we'll use examples from gaming to briefly outline some of the challenges involved in rethinking pricing theory and practice to reflect the new realities of digital commerce.

"HealthMap and MedWatcher: Big Data and Crowdsourcing for Better Public Health"
Clark Freifeld, Research Software Developer, HealthMap/Boston Children’s Hospital

HealthMap and MedWatcher: Big Data and Crowdsourcing for Better Public Health

Traditional public health monitoring and reporting systems are vital to protecting population health, but suffer from delays, under-reporting, and bureaucratic friction. Meanwhile, massive adoption of the internet and connected mobile devices has created new capabilities for rapid, worldwide, many-to-many communication. In our research group at Harvard Medical School, we harness new media to improve public health surveillance, primarily in the areas of infectious disease and medication safety. Specifically, we have developed software tools to crawl the internet for information from news media, governments, and social media. We then apply our customized natural language processing algorithms to filter and classify the data, to derive novel population health signals at scale. At the same time, we use crowdsourcing via web and mobile platforms to engage directly with the public and create participatory epidemiology communities. In our work on HealthMap, a global, multi-lingual, real-time outbreak monitoring system, we capture early signals of emerging and re-emerging diseases such as H1N1 influenza, MERS coronavirus, and Ebola virus, typically ahead of official sources. With the MedWatcher system, we tap into social media conversations on experiences with drugs, devices, and vaccines. We also engage patients in direct reporting through our MedWatcher app. Whereas an estimated nine out of ten adverse event experiences go unreported through traditional FDA reporting channels, with MedWatcher, we are able to capture signals rapidly and provide a complementary view of safety information in real time. Together, HealthMap and MedWatcher demonstrate the power of internet media to improve public health.

5–5:30pm
"End-to-End Fusion of IOT & Big Data Technologies: IOT/Big Data Foundations, Applied & Futures"
Curt Aubley, Intel, Vice President, Data Center Group

End-to-End Fusion of IOT & Big Data Technologies: IOT/Big Data Foundations, Applied & Futures

In this session, we will discuss the evolving state of IOT and Big Data technologies, then dive down into applying IOT and Big Data to help in the area of healthcare. Specifically, we will review Big Data Science for Parkinson's disease, then review future technologies surrounding IOT, Big Data, next-generation cloud computing (software defined infrastructure), and high-performance computing to develop and secure next-generation IOT/Big Data solutions.

 
6–7pm
Reception
 
7–9pm
Dinner
 

Friday, September 5

  Corcoran A Corcoran B
8–9am
Continental Breakfast
 
9–9:30am
"Structuring the Data of Sausage-Making: Uncovering Legislative Trends across the 50 States"
Vladimir Eidelman, FiscalNote, Principal Scientist

Structuring the Data of Sausage-Making: Uncovering Legislative Trends across the 50 States

Over a hundred thousand pieces of legislation are introduced across the country every year. Lots of time, energy, and resources go into manually combing through this legislation while relying mainly on domain expertise to single out clues as to what’s important.

By automatically collecting, organizing, and connecting state and federal legislative activity in real-time, we provide the ability to the perform various instantaneous analyses in order to track prospective changes in the law, uncover trends sweeping the country, and forecast legislative outcomes.

In this talk, we’ll focus on challenges and insight we’ve gained from helping individuals and organizations stay on top of the legislative landscape. We discuss the uses and limitations of open data sources and how we combine insight from computational social science with statistical modeling methods to transform unstructured data into useful snippets of information. We reveal unexpected findings in political analysis and delve into the future of predictive analytics for government information.

 
9:30–10am
"Big Data & International Trade: Creating Transparency through Information"
Keith Soura & Peter Goodings Swartz, Accounts & Innovation, Panjiva, Inc./Data Scientist, Panjiva, Inc.

Big Data & International Trade: Creating Transparency through Information

International trade is big business. According to the World Trade Organization, total global trade volumes exceed four trillion USD per year. Previously, technical and geographic barriers made the dissemination of precise trade data nearly impossible, but big data is turning the tide. The scope and volume of information available is revolutionizing sourcing, logistics, and customs agencies around the world. Panjiva is taking full advantage of this flood of information: we are mapping the network of global commerce using hundreds of millions of records sourced from dozens of databases spanning multiple countries. Our users—governments, shipping intermediaries, buyers, suppliers, and creditors—depend on the data generated by Panjiva to identify new trading partners, track competitors, fight illicit trade, manage ports, and forecast global product trends. Here, we discuss the technical and regulatory challenges of mapping global trade using large, messy, and constantly evolving datasets. We give an overview of the data science required to cluster transaction-level and aggregate records without unique identifiers into a network of companies, our experience communicating actionable information to customers via data visualizations, and our learnings from working with governments and trade organizations to gain access to trade data. Finally, we discuss how Panjiva's efforts fit into the company's broader mission: creating greater overall transparency in the global economy to the benefit of governments, businesses, and consumers.

"The 10 Habits of Highly Effective Research Data"
Anita de Waard, VP Research Data Collaborations, Elsevier Research Data Services

The 10 Habits of Highly Effective Research Data

The main tenet of the current "data science" trend is that new science can be done on old data. To make this possible, the data need to be collected and stored in a way that allows downstream scrutiny, validation, and use. In this talk, I argue that this means that the many parties currently involved in data creation, storage, and access should be interested in each other’s problems and come to a joint solution. Specifically, there are ten requirements for research data, which need to be:

  • Preserved—existing in some format
  • Archived—existing in a long-term, durable format
  • Accessible—available to others than the researcher
  • Comprehensible—understandable to others
  • Discoverable—can be indexed by a search engine
  • Reproducible—allows others to reproduce the experiment
  • Trusted—validated by some authority, provenance known
  • Citable—able to link to dataset and track citation
  • Usable—allow tools to run over the data

I will present some examples of each of these aspects and suggest some potential models for enhanced collaboration on this topic, going forward.

10–10:30am
"Redefining Race: The Power of DNA in Uniting Societies"
Ken Chahine, Ancestry, Senior Vice President and General Manager

Redefining Race: The Power of DNA in Uniting Societies

What is race? Some say it’s how most people identify themselves. Or how we see others. But what if it’s not so black and white? As our country has become more and more of a melting pot, racial divides are increasingly blurred. But science uncovers something different—a new way of viewing race.

The concept of race is often a social construct created from ignorance or biases. Science—specifically personal genomics—has the ability to break down racial barriers by providing an even clearer picture of the genetic diversity, which totally debunks our social notion of race.

On average, an overwhelming majority (97%) of the US population reported only one race in 2010, while the database at AncestryDNA? shows that some individuals can be linked to as many as 11 ethnicities—the average person has nearly 4 ethnicities!

We truly are a mix of cultures and influences from across the globe. Realizing the ties between us, and the science that proves our connections with people from all walks of life, has the potential to change how we understand ourselves and those around us. Ken Chahine is working to dispel commonly held beliefs about race and bring together families across the world that never knew of their connections or true ethnic backgrounds. By putting genetic ethnicity and family connections in the hands of everyone, we will be able to tear down our notion of race and show how, although distant, we are all family—literally. Are we really one united family? DNA has the power to tell that story.

Ken Chahine, Senior Vice President and General Manager for AncestryDNA, has unique insights into the science and technology that bring personal stories and insights to life via DNA testing. He can discuss the impact that DNA testing has not only on our personal lives, but also the potential that it has to change how we relate to each other and resolve our differences. By proving the limited degrees of separation between each other with scientific fact, we can break down racial divides and dispel ignorant presumptions.

"Virtual Reading Rooms"
Roger Macdonald, Director, Television Archive, Internet Archive

Virtual Reading Rooms

An invitation to mine diverse media on a societal scale:

At its heart, the Internet Archive is an invitation to collaborate. To work together to open and explore humanity's cultural artifacts. To foster education and scholarship by facilitating highly-scaled data treatments of media.

The Archive has begun to work with select investigators in facilitating diverse media research at an unprecedented scale by hosting their algorithms within Internet Archive servers. Early-stage projects include media impact analysis, creating a collection of more than 50 million pictures and their subject metadata for image analysis researchers, and mapping more than 5 million mentions of 65,000 place names around the world in US television news, each day over 4+ years. (archive.org/tvgeo) In addition to enabling important research, the Archive is pursuing this "virtual machine" approach to address a pressing public information policy issue by offering a persuasive model for responsible public interest access to media that has been mostly locked away by owners concerned that public access would diminish the value of their intellectual creations.

We welcome researchers to let loose their imaginations on our digital troves of moving images (600 K films), television news (450 K hours), audio (1 M recordings, 100 K concerts), texts (2 M digital books), software, and a 7-petabyte database of more than 415 billion web pages.

10:30–11am
Break
 
11–11:30am
"Comprehending Things: Ontology and Semantics for Event Handling IoT"
Ryan Quick & Arno Kolster, Principal Architect, PayPal/Sr. Database Architect, PayPal

Comprehending Things: Ontology and Semantics for Event Handling IoT

The Internet of Things presents a distinct set of challenges for event processing. As the gulf between producer and consumer widens, challenges in semantic comprehension, stream and event ordering, and observational abstraction will fundamentally change best practices for handling events in unprecedented volumes. In this talk, we discuss IoT challenges with disparate source event stream analysis and showcase novel solutions in our Systems Intelligence framework. We will look at an event ontology addressing these issues with specific examples for

  • Mapping disparate events with different sources, characteristics, behaviors, inter-relationships, etc.
  • Why discrete mapping is critical for IoT
  • Semantics
  • The notion of “observation” itself, and what it means for semantics, events, states, transitions, and prediction
  • Eventing metadata, and the types and methods for deriving insight regarding contents from the “envelope”
  • Event semantics relationships, and how to derive dependency, ordering, and parallel processing using semantics

We will show examples from the Systems Intelligence framework developed by the PayPal Advanced Technology Group. The system is designed to process an event stream of disparate, un/semi-structured and normalized data at a rate of over three million events per second in near real time. We will show the theory behind the ontology; how it extends over time; and concrete examples of its flexibility, performance, and applicability in production systems.

 
11:30–noon
"Measuring and Correcting Error in Integrations"
Amanda Welsh, EVP, Data Science, The Nielsen Company
"Visual Intelligence: Seeing beyond the Immediate Image"
Aditya Khosla, PhD Student, MIT

Visual Intelligence: Seeing beyond the Immediate Image

"Daddy, daddy, I want a Happy Meal!" says your son, a glimmer of hope in his eyes. Looking down, you realize your phone is out of batteries. "How do I find McDonald's now?" you wonder. Looking left, you see buildings and on the right, mountains. Left seems like the right way. Still no McDonald's in sight, you end up at a junction; the street on the right looks shady, better avoid it. After a short walk, you find your destination, all without a map or GPS!

The above is just one instance of seeing beyond the immediate image—humans can infer properties about the environment and identify the best routes that avoid negative outcomes such as the possibility of crime, without seeing any crime in action.

While prior work in computer vision has focused on what is in the image, in this talk, we leverage big data to look beyond the simple visual elements to develop systems with visual intelligence. While prior work tackles tasks that humans excel at, such as naming objects in images, we use big data to perform tasks that are rather difficult for humans. For example, can we predict the proximity of an image to the nearest McDonald's? Can we tell how popular an image will be even before it is uploaded? Can we predict the extent to which people will remember an image? Furthermore, can we modify images to affect these non-visual properties in a predictable way? Ultimately, can we create artificial systems whose visual intelligence enhances human intelligence?

noon–1:30pm
Lunch
 
1:30–2pm
"What It Takes to Compute on the Whole World"
Kalev Leetaru, Yahoo! Fellow in Residence, Georgetown University

What It Takes to Compute on the Whole World

What does it take to build a system that monitors the entire world, constructing a real-time global catalog of behavior and beliefs across every country, connecting every person, organization, location, count, theme, news source, and event across the planet into a single massive ever-evolving real-time network capturing what's happening around the world, what its context is and who's involved, and how the world is feeling about it, every single day? What does it look like to construct a semantic network over nearly the entirety of the world’s socio-cultural academic literature about Africa and the Middle East dating back half a century? Or constructing the same network, but over the entire web itself stretching back two decades? How do you visualize networks with hundreds of millions of nodes, tease structure from chaotic real-world observations, or explore networks in the multi-petabyte range? How do you process and geographically visualize the emotion of Twitter in real-time? How do you rethink sentiment mining from scratch to power a flagship new reality television show? How do you adapt systems to work with machine translation, OCR, closed captioning error, the digital divide, and the messiness of real-world global data? How do you process half a million hours of television news, two billion pages of historic books, or the images of half a billion pages stretching back half a millennia? Most intriguingly, how can the world’s largest computing platforms allow us to uncover the fundamental mathematical patterns of global human life? This talk will survey a cross-section of my latest projects of the past year, offering glimpses into some of the greatest challenges and opportunities of the big data revolution and how it is reshaping the way we understand the world around us.

"Persistent Identifiers in Research Management: People, Places, and Things"
Laurel Haak, Executive Director, ORCID

Persistent Identifiers in Research Management: People, Places, and Things

It is fairly standard that a scholarly work such as a paper of datasets is unambiguously identified via its digital object identifier (DOI). However, determining which person or people contributed to a given work remains difficult, as well as what organization(s) they worked at and who funded the research. This is because contributors and organizations are customarily identified in bibliographic records by name only, and names can be shared, changed, or have multiple variants. In this presentation, ORCID will be introduced as a globally unique identifier for researchers, and how this identifier may be linked to other identifiers for people, places, and things in the research domain will be demonstrated. Impact on discoverability, data entry, and evaluation will be discussed.

2–2:30pm
"Social Monitoring"
Boe Hartman, Barclays Bank, Chief Information Officer, Barclaycard

Social Monitoring

We have developed in-house a social monitoring platform that uses rule-based models to determine from data dark what kind of customer experience we are passing onto our customer base across all businesses, geographies, and products. The rules segment mood and sentiment on three criteria: technology, customer experience, and risk. We first envisioned this platform as a way to see technology incidents before our performance indicated any issues. However, we discovered that this platform delivers wisdom into not only customer experience, but product performance, branding, competitor performance, and sales opportunities… just to new a few.

"Bringing Big Data to Personalized Health Care: A Patient-Centered Framework"
Nitesh Chawla , iCeNSA, Director

Bringing Big Data to Personalized Health Care: A Patient-Centered Framework

Proactive personalized medicine can bring fundamental changes in healthcare. Can we then take a data-driven approach to discover nuggets of knowledge and insight from the big data in healthcare for patient-centered outcomes and personalized healthcare? Can we answer the question: What are my disease risks? This talk will focus on our work that takes the data and networks driven thinking to personalized healthcare and patient-centered outcomes. It demonstrates the effectiveness of population health data to drive personalized disease management and wellness strategies, in effect impacting population health.

2:30–3pm
"The Wolfram Data Science Platform: Data Science in the Cloud"
Dillon Tracy, Wolfram Research, Senior Developer

The Wolfram Data Science Platform: Data Science in the Cloud

Data science in the cloud offers several advantages over desktop-based analysis, such as improved access to computing resources, shared data, specialized tools, and published results. The Wolfram Data Science Platform is a cloud-based data science application designed to capitalize on these advantages, making the Wolfram technology stack—including elements of Mathematica and Wolfram|Alpha—available in a normal web browser. Basic data manipulation happens in the Wolfram Language, whose syntax is designed around the pattern-matching operations common in data analysis, and results are collected and published in the Computable Document Format, viewable again in a normal web browser. Customized documents may be generated and distributed using user-defined parameters and schedules. I will demonstrate some data science workflows using the Wolfram Data Science Platform, covering import, preparation, analysis, and visualization of data, and the publication and distribution of results.

"THE IoT Killer App: Profitable Sustainability via Multi-stakeholder Data Privacy and Subscription Enablement"
Chris Rezendes

THE IoT Killer App: Profitable Sustainability via Multi-stakeholder Data Privacy and Subscription Enablement

Conversations about value and benefits in IoT tend to fall into one of two loose buckets: a) some level of specificity relating to brands/OEMs and their customers, or b) high-level concepts about big data and cloud-enabled emergence. Yet we have a number of examples today of IoT deployments that are testing the boundaries of the dominant conversations about the potential of IoT. These examples could pave the way for new approaches to privacy, management, and monetization of digital and physical assets.

3–3:30pm
Break
 
3:30–4pm
"The Journey Out of Darkness—How Smart Data and Intelligent Systems are Changing Education"
Kerri Holt, Dallas Independent School District, Special Projects Officer

The Journey Out of Darkness—How Smart Data and Intelligent Systems are Changing Education

For the last two years, Dallas ISD embarked on a journey to transform a system traditionally known to make speculative and intuitive decisions that led to “random acts of improvement” into an organization that relies on a strong intelligent data culture supporting innovative schooling to meet the needs of students in the twenty-first century and hold our staff accountable for student outcomes. Dallas ISD, a $1.6 billion enterprise with 20,000 employees, created a Performance Management System that consolidates, normalizes, and associates data from over 140 different sources to create a stream of interactive dynamic tools that allow our educators and administration to make smarter, faster decisions in order to improve instruction and the educational outcomes of 160,000 students a year. Learn how we overcame a failed $14 million dashboard project and bridged huge gaps between data, our educators, and technology in order to deliver a comprehensive, real-time decision-making tool providing access to budget actuals, vacancies, staff absences, substitutes, employee diversity, enrollment, demographics, discipline, student attendance, student performance, grades, teacher evaluations, campus performance, transportation, food services, maintenance, volunteers, crime stats, staff and parent surveys, anytime, anywhere, using any device. Take a tour in our world of public education reform through a look at our solution that leverages smart data to transform urban education. Also, get a glimpse of our next project that expands our intelligent data-driven culture to the point of instruction in order to provide personalized learning for all of our students.

"Best Practices for Building Legal Knowledge Bases"
Michael Poulshock, Exeter Group / Justocity, Legal Knowledge Engineer

Best Practices for Building Legal Knowledge Bases

Legal decision systems, or knowledge bases, are used to make automated determinations about people’s legal rights and obligations. They’re typically built by government agencies and corporations to do things like calculate taxes, determine eligibility for benefits, and assess compliance with the law. These systems provide quick, consistent, and cost-effective answers to legal questions. They’ve been around for a while—in mainframes, consumer software, and embedded in Java Script—and we’re going to see more of them in the years to come as the law becomes increasingly computational. These systems tend to be built in a hodgepodge of methodologies, and sometimes following no methodology at all. This presentation will discuss the best practices for building legal rule-based computational systems, based on lessons that I've learned over the years, sometimes the hard way. What are the ingredients of a successful project? How do they differ from ordinary IT projects? How do you build systems that are maintainable and scalable? What kind of tools are available? What mistakes should be avoided?