onsdag 28 juni 2017

Teaching machines to understand – and summarize – text

Significance magazine:
We humans are swamped with text. It’s not just news and other timely information: Regular people are drowning in legal documents. The problem is so bad we mostly ignore it. Every time a person uses a store’s loyalty rewards card or connects to an online service, his or her activities are governed by the equivalent of hundreds of pages of legalese. Most people pay no attention to these massive documents, often labeled “terms of service,” “user agreement” or “privacy policy.”


The numbers don’t lie: Why women must fill the data scientist demand

Image Credit: ra2studio / Shutterstock
VentureBeat | Big Data | by Tanja Rueckert, SAP:
Once we begin associating a variety of skills with data science, the perceptions of our industry can change. According to the Washington Post, women now make up 40 percent of graduates with degrees in statistics – a popular starting point for a career in data science.

While a degree in mathematics is a great place to start, it’s important not to categorize the position as being completely scientific and technical, only suited for individuals who excel at math and science. A career in data science is transferrable across all industries. Whether you have a passion for healthcare or retail, there is likely a data science opportunity for you.

Women flocking to statistics, the newly hot, high-tech field of data science

Jake Crandall/For The Washington Post
The Washington Post:
The numbers of women in science and technology are dismal: Barely 18 percent of computer science degrees go to women. Women make up 11 percent of math faculty. Nearly half of the women who graduate with engineering degrees never enter the profession, or leave soon after. As the demand explodes for workers in high-tech professions who can analyze the staggering amounts of raw digital data produced every year, women barely register.

Except in one field: statistics. Read more....

tisdag 27 juni 2017

The Problem With Data Is Too Much Of The Wrong Stuff.

Photo Courtesy of Morris MacMatzen/Getty Images
The Problem With Data Is Too Much Of The Wrong Stuff.:
"One of the biggest flaws in organizational use of data is confusing correlation with causation. As more companies embark on “big data” journeys, employees who are not necessarily trained in statistics or data science are being asked to analyze data. And when untrained people spot correlating factors, they often confuse the correlating variables with cause and effect. Compounding this issue is that access to dashboards and models with the intent of driving data-based decisions is widely granted. But easy access to data does not mean those with access have the proper background to be reading the data correctly. Organizations must employ workers who are trained in statistics, actuarial science, or data science — or provide the proper education to those who are not—to make sure the truth is reported."

The tyranny of statistics (and algorithms, too)

The tyranny of statistics (and algorithms, too):
Mark Twain said there are “Lies, damned lies, and statistics”, but it could just as easily be said that there are facts, damned facts, and statistics—sometimes all three at once.

I recently visited with Ewing High School’s creative writing and journalism classes, and particularly in the latter, we talked about the ways that certain information and statistics, while factual in the strictest sense, can be manipulated or misleadingly presented to steer the reader’s opinion one way or another.

Where's the Big Money in Big Data?

Years ago Mitchell Sanders laid out a roadmap to the archetypal data scientist: someone who combines domain knowledge (i.e., they understand their particular vertical industry), math and statistics expertise, and programming skills. Ben Lorica and Mike Loukides add even more detail to the job description:

"Whatever the role, data scientists aren’t just statisticians; they frequently have doctorates in the sciences, with a lot of practical experience working with data at scale. They are almost always strong programmers, not just specialists in R or some other statistical package. They understand data ingestion, data cleaning, prototyping, bringing prototypes to production, product design, setting up and managing data infrastructure, and much more."

fredag 16 juni 2017

Think Like a Statistician – Without the Math

I call myself a statistician, because, well, I’m a statistics graduate student. However, ask me specific questions about hypothesis tests or required sampling size, and my answer probably won’t be very good.

The other day I was trying to think of the last time I did an actual hypothesis test or formal analysis. I couldn’t remember. I actually had to dig up old course listings to figure out when it was. It was four years ago during my first year of graduate school. I did well in those courses, and I’m confident I could do that stuff with a quick refresher, but it’s a no go off the cuff. It’s just not something I do regularly.

Instead, the most important things I’ve learned are less formal, but have proven extremely useful when working/playing with data. Here they are in no particular order.


söndag 11 juni 2017

Experterna hade fel – Macron är ostoppbar

Foto: Alexander Zemlianichenko/AP
Teresa Küchler | SvD:
En stor väljarsondering inför söndagens första omgång av valet till nationalförsamlingen i Paris, alltså det franska parlamentets underhus, förutspår att president Macrons liberala rörelse ”Framåt för republiken!” får omkring 400 av de totalt 577 stolarna i kammaren. Det blir i så fall den största majoriteten sedan slutet av 60-talet.

UK general election: Five steps to make sense of the latest polls

Significance magazine: 
Unlike the 2015 general election, when the polls were essentially static (and wrong) throughout, the 2017 general election has seen some of the most extraordinary volatility in the polls that I can remember. If you are a Conservative supporter, the narrowing lead over Labour must be leading to anxiety and changed underwear. If you are a Labour supporter, you are probably starting to dream “can we? will we?!”

Read more....

lördag 10 juni 2017

The U.K. Election Wasn’t That Much Of A Shock

Theresa May’s loss was dramatic, but polls had shown her majority at some risk.

Despite betting markets and expert forecasts that predicted Theresa May’s Conservatives to win a large majority in the U.K. parliamentary elections, the Tories instead lost ground on Thursday, resulting in a hung parliament. As we write this in the early hours of Friday morning, Conservatives will end up with either 318 or 319 seats, down from the 330 that the Tories had in the previous government. A majority officially requires 326 seats.

fredag 9 juni 2017

Skilltest: Linear Regression

Analytics Vidhya:
Linear Regression is possibly the most widely used technique in Machine Learning. It is also the most researched technique in academia. You can't call yourself a data science aspirant until you know Linear Regression well. This skill test is specially designed for you to test your knowledge on linear regression techniques.

torsdag 8 juni 2017

The Three Scenarios For The U.K. Election

On the morning of the U.S. presidential election, we pointed out that there were three scenarios for what might transpire that night, each of which were about equally likely. In Scenario No. 1, the polls would be spot-on; Hillary Clinton would win narrowly, with a 3-to-4 percentage point popular victory and somewhere on the order of 300 electoral votes. In Scenario No. 2, Clinton would outperform her polls, leading to a near-landslide victory and possible wins in states such as Arizona and Georgia which had traditionally favored Republicans. And in Scenario No. 3, Donald Trump would beat his polls; because the Electoral College favored Trump, even a small polling error in his favor would probably be enough to make him president. Scenario No. 3 is the one that transpired, but it wasn’t any more or less likely than the other two.

Read more....

söndag 4 juni 2017

Are The U.K. Polls Skewed?

Bettors expect the polls to underrate Conservatives again. If they underrate Labour instead, Theresa May’s majority is at risk.

In April, when U.K. Prime Minister Theresa May called for a “snap” general election for June 8, polls showed her Conservatives with an average lead of 17 percentage points over Labour. Such a margin would translate to a giant majority for Conservatives: perhaps as many as 400 of the 650 seats in Parliament. (Conservatives currently control 330 seats; 326 are needed for a majority.) After several unpredictable years in U.K. politics — marked by Conservatives unexpectedly winning a majority in the 2015 general election, the successful Brexit referendum, and David Cameron’s decision to resign as prime minister and Conservative leader — such a result promised to provide May with a mandate as she negotiated the terms of the U.K.’s exit from the EU.

Read more....

fredag 2 juni 2017

SCB ger ingen tröst

Nyheter (Ekot) | Sveriges Radio:
Två partier rusar och ett parti rasar i opinionen. Men trots dramatiska förändringar fortsätter väljarna att trotsa partiledningarnas ideer om hur Sverige ska styras. Ingen partiledning som tänker längre än morgondagen kan känna riktig glädje över SCB:s senaste staplar, enligt Ekots politiske kommentator Tomas Ramberg.

The seven deadly sins of statistical misinterpretation, and how to avoid them

The Conversation:
Statistics is a useful tool for understanding the patterns in the world around us. But our intuition often lets us down when it comes to interpreting those patterns. In this series we look at some of the common mistakes we make and how to avoid them when thinking about statistics, probability and risk.

1. Assuming small differences are meaningful

Many of the daily fluctuations in the stock market represent chance rather than anything meaningful. Differences in polls when one party is ahead by a point or two are often just statistical noise.

You can avoid drawing faulty conclusions about the causes of such fluctuations by demanding to see the “margin of error” relating to the numbers.

If the difference is smaller than the margin of error, there is likely no meaningful difference, and the variation is probably just down to random fluctuations.

onsdag 31 maj 2017

Predicting the UK’s snap general election

Significance magazine:
I have decided to build a model to try to predict the results of the upcoming snap general election in the UK. I'm sure there will be many people attempting this, from various perspectives and using different modelling approaches. But I have set out to develop a fairly simple (though, hopefully, reasonable) model. In the process of describing this to you, I hope to shed some light on how statisticians build predictive models.

tisdag 30 maj 2017

Så förklarar Gapminder svåra saker med enkla bilder

Gapminders tjänst Dollar Street visar att hem i olika delar världen
 ser ungefär likadana ut för familjer på samma inkomstnivå.
Computer Sweden:
Om du frågar de flesta i Sverige om namnet på en person som är duktig på att visualisera data så lär svaret bli den nyligen bortgångne Hans Rosling. Men han jobbade inte på egen hand, utan tillsammans med sonen Ola Rosling och svärdottern Anna Rosling Rönnlund under 18 år. Nu fortsätter Ola och Anna på den inslagna vägen med Stiftelsen Gapminder som de grundade 2005, tillsammans med Hans Rosling.

Under ett föredrag nyligen på konferensen Data Storytellers i Stockholm demonstrerade Anna Rosling Rönnlund tjänsten Dollar Street som Gapminder byggt. Den går ut på att beskriva levnadsförhållanden för människor över hela världen, baserat på inkomst. Under föredraget visar hon att det ser ungefär likadant ut i hem över hela världen, för familjer på samma inkomstnivåer.

Läs mer....

lördag 27 maj 2017

Truth and statistics: How to find out what people really think

The Economist:
TO MANY people Big Data is less shiny than it was a year ago. After Hillary Clinton’s defeat at the hands of Donald Trump, her vaunted analytics team took much of the blame for failing to spot warnings in the midwestern states that cost her the presidency. But according to research by Seth Stephens-Davidowitz, a former data scientist at Google, Mrs Clinton’s real mistake was not to rely too much on newfangled statistics, but rather too little.