fredag 21 juli 2017

Beginner's guide to build data visualisations on the web with D3.js

From Beginner's guide to build data visualisations on the web with D3.js:
Introduction
I am sure you have heard this many times. A picture is worth a thousand words. I think with the proliferation of data, this statement can easily be modified to A picture is worth thousand(s) of data points. If you are not convinced, look at the example below.

Storytelling through Data Visualisation

Let’s look at the following statement:
“In 2013, Gun Deaths Claimed 11,419 lives in the U.S.”
What comes to your mind first?
In 2013, US saw a lot of gun violence
Gun violence is high in the US
Deaths due to Gun violence were 11,419 (in 2013)

Without visualisation the above are just a set of statistics. Though this stat is useful, it still falls short in conveying the bigger picture. You don’t get the complete context – What kind of people were involved? How much was the nation’s loss? etc.

Let’s see what happens when we visualise it:











Read more...

söndag 16 juli 2017

Vem tar vem? Ömsesidiga sympatier hos partiernas sympatisörer

Vem tar vem? Ömsesidiga sympatier hos partiernas sympatisörer:
----------
Figuren är hämtad från SOM-institutets årliga seminarium som hölls den 24 april, och där gjordes en särskild analys av Sverigedemokraternas plats i ett föränderligt svenskt partisystem (du kan se en inspelning av seminariet här). Sverigedemokraterna är därför utmärkta med orangea punkter i figuren. Resultaten illustrerar tydligt att SD gör skäl för namnet pariaparti. SD är ett starkt ogillat parti bland alla grupper av partisympatisörer utom, förstås, de egna sympatisörerna. Endast bland Moderaternas sympatisörer finns partier som är avgjort ännu mer ogillade (FI och V) än Sverigedemokraterna.

Rupert Murdoch ‘could use Sky data trove for political ends’

Photograph: Josh Reynolds/AP
Business | The Guardian:
One of the “largest and most sophisticated datasets in the country” – including the TV viewing, internet and phone records of 13 million households – could be misused for political purposes if Rupert Murdoch is allowed to proceed with his plan to buy out Sky, six members of the House of Lords claim in a letter to the Observer.

Read more.....

torsdag 13 juli 2017

Statistics are lifeblood of democracy, and need to be protected

Quality of life depends on quality of statistics
Anthony Hilton | London Evening Standard:
People rarely think about this but the quality of daily life depends hugely on the quality of statistics.

How much do you pay for your flood insurance? How congested are the roads? How far does your child travel to school? How long do you wait for a hospital appointment? How polluted is the air? How large is your pension? The list goes on.

What governments decide is political; but the areas in which they intervene, the order on which they do things and the way a decision, once made, is implemented, relies on the available data.

onsdag 12 juli 2017

Relation between statistical machine learning and big data

Naveen Joshi | Pulse | LinkedIn:
ML and statistics

Often in fields, like pattern recognition, data mining, and knowledge discovery, we see both machine learning and statistics coming together. What brings them together is a common goal – learning from data; this means both of them focus on drawing insights or knowledge from data. However, both these methods are affected by their inherent cultural differences. While statistics is a subfield of mathematics, machine learning comes from computer science and artificial intelligence. Not to forget, machine learning is a comparatively new field, made possible by the availability of cheap computing power and availability of what we call as big data that helped data scientists to train computers to learn by analyzing data. On the contrary, statistics has existed long before computers were invented.

tisdag 11 juli 2017

Machine Learning Education: 3 Paths to Get Started

Datanami: 
Machine learning is the predictive heart of big data analytics, and one of the key skills that separates data scientists from mere analysts. But getting started with machine learning can be a challenge. Here are a few ways beginners can get off the ground with their machine learning adventure.

Machine learning is a vast field with many different specialties, so it’s quite easy for a beginner to get overwhelmed. For instance, one specialty called deep learning powers many of today’s artificial intelligence breakthroughs. But without a background in basic machine learning approaches, a prospective data scientist would have zero chance of mastering this powerful but complex technology.

Here are three general paths that neophytes can get started with machine learning: Read more...

måndag 10 juli 2017

Machine learning, and the way ahead

From MillenniumPost:
---------------
At the very peak of the most recent 2016, Gartner Hype Cycle resides Machine Learning, which subsumes what used to be categorised as Big Data, Data Science, Artificial Intelligence, Data Mining, or Predictive Analytics. I would even include Statistics and Nonlinear Dynamics, two more traditional fields which continue to remain important, within the broad definition. While there may be slight technical differences, in common parlance they mean the same thing: the ability to use computers (i.e., "machines") and sophisticated mathematics to extract actionable predictive insights from data (i.e., "learn"). In our day and age, almost all mathematicians, statisticians, and physicists use computers, while all computer scientists who work with data usually learn the basics of traditional data science disciplines such as statistics and applied mathematics, nonlinear dynamics and network science in physics, signal processing in engineering, and econometrics in economics. Thus, any difference between the Machine Learning and what may have been once called traditional data sciences has been fast disappearing.

tisdag 4 juli 2017

Winner of the Significance young writer competition announced

StatsLife:
Judging took place last month for our writing competition for early-career statisticians. It was the most competitive contest in years, with entrants from several African nations, Australia, Belgium, Brunei, Costa Rica, India, Mexico, New Zealand, the United Kingdom and the United States.

Today we are delighted to announce our finalists, and this year’s winner. This year’s finalist are (in alphabetical order):

Read more.....

måndag 3 juli 2017

How You Will Die

FlowingData:
So far we’ve seen when you will die and how other people tend to die. Now let’s put the two together to see how and when you will die, given your sex, race, and age.

Read more.....

Sverige fortsätter växa

Statisticon Befolkningsprognoser:
– folkmängden i pendlingsnära kommuner ökar mest

Den 20 januari 2017 passerade Sveriges befolkning 10 miljoner invånare. Resan från 9 till 10 miljoner invånare tog 13 år. Enligt Statisticons befolkningsprognoser förväntas vi vara 11 miljoner invånare redan om åtta år.

Under 2016 ökade folkmängden i 271 av landets 290 kommuner, till stor del på grund av utrikes inflyttning där samtliga kommuner uppvisade positiva flyttnetton. På grund av åldersstrukturer med relativt många äldre, uppvisade hälften av kommunerna samtidigt negativa födelseöverskott.

fredag 30 juni 2017

Forskare: Människan kan bli äldre än man trott

SVT Nyheter:
Var går egentligen gränsen för hur gammal en människa maximalt kan bli? Om detta träter vetenskapen.
– Om ingenting ändrar sig är det väl troligt att vi under de nästa 25 åren ser en människa som lever i 119 år, säger Chalmersforskaren Holger Rootzén.

Läs mer....

torsdag 29 juni 2017

A New Theory on How Researchers Can Solve the Reproducibility Crisis: Do the Math

Lionel Cironneau, AP Images
The Chronicle of Higher Education:
Jeanne Calment of France was believed to be the world’s longest lived person when she died in 1997 at age 122. A recent headline-grabbing study about the limits of the human life span has drawn rebuttals with implications for how universities and scientists might approach the reproducibility crisis in research.

Read more....

Statistiken visar – då är det bäst att ha semester

 Foto: SHUTTERSTOCK / SHUTTERSTOCK
GT:
Sommar och sol? Njae, inte alltid, va?

– Det är normalt regnigare på somrarna än resten av året, konstaterar meteorolog Johan Groth på väderinstitutet Storm.

De varmaste dagarna kommer vanligtvis i slutet av juli och i början av augusti.

Här kan du se hur vädret har varit i Göteborg de senaste tio åren.

onsdag 28 juni 2017

Teaching machines to understand – and summarize – text

Significance magazine:
We humans are swamped with text. It’s not just news and other timely information: Regular people are drowning in legal documents. The problem is so bad we mostly ignore it. Every time a person uses a store’s loyalty rewards card or connects to an online service, his or her activities are governed by the equivalent of hundreds of pages of legalese. Most people pay no attention to these massive documents, often labeled “terms of service,” “user agreement” or “privacy policy.”

Read more...

The numbers don’t lie: Why women must fill the data scientist demand

Image Credit: ra2studio / Shutterstock
VentureBeat | Big Data | by Tanja Rueckert, SAP:
---------------
Once we begin associating a variety of skills with data science, the perceptions of our industry can change. According to the Washington Post, women now make up 40 percent of graduates with degrees in statistics – a popular starting point for a career in data science.

While a degree in mathematics is a great place to start, it’s important not to categorize the position as being completely scientific and technical, only suited for individuals who excel at math and science. A career in data science is transferrable across all industries. Whether you have a passion for healthcare or retail, there is likely a data science opportunity for you.

Women flocking to statistics, the newly hot, high-tech field of data science

Jake Crandall/For The Washington Post
The Washington Post:
The numbers of women in science and technology are dismal: Barely 18 percent of computer science degrees go to women. Women make up 11 percent of math faculty. Nearly half of the women who graduate with engineering degrees never enter the profession, or leave soon after. As the demand explodes for workers in high-tech professions who can analyze the staggering amounts of raw digital data produced every year, women barely register.

Except in one field: statistics. Read more....

tisdag 27 juni 2017

The Problem With Data Is Too Much Of The Wrong Stuff.

Photo Courtesy of Morris MacMatzen/Getty Images
The Problem With Data Is Too Much Of The Wrong Stuff.:
"One of the biggest flaws in organizational use of data is confusing correlation with causation. As more companies embark on “big data” journeys, employees who are not necessarily trained in statistics or data science are being asked to analyze data. And when untrained people spot correlating factors, they often confuse the correlating variables with cause and effect. Compounding this issue is that access to dashboards and models with the intent of driving data-based decisions is widely granted. But easy access to data does not mean those with access have the proper background to be reading the data correctly. Organizations must employ workers who are trained in statistics, actuarial science, or data science — or provide the proper education to those who are not—to make sure the truth is reported."

The tyranny of statistics (and algorithms, too)

The tyranny of statistics (and algorithms, too):
Mark Twain said there are “Lies, damned lies, and statistics”, but it could just as easily be said that there are facts, damned facts, and statistics—sometimes all three at once.

I recently visited with Ewing High School’s creative writing and journalism classes, and particularly in the latter, we talked about the ways that certain information and statistics, while factual in the strictest sense, can be manipulated or misleadingly presented to steer the reader’s opinion one way or another.

Where's the Big Money in Big Data?

Datamation:
-------------------
Years ago Mitchell Sanders laid out a roadmap to the archetypal data scientist: someone who combines domain knowledge (i.e., they understand their particular vertical industry), math and statistics expertise, and programming skills. Ben Lorica and Mike Loukides add even more detail to the job description:

"Whatever the role, data scientists aren’t just statisticians; they frequently have doctorates in the sciences, with a lot of practical experience working with data at scale. They are almost always strong programmers, not just specialists in R or some other statistical package. They understand data ingestion, data cleaning, prototyping, bringing prototypes to production, product design, setting up and managing data infrastructure, and much more."

fredag 16 juni 2017

Think Like a Statistician – Without the Math

FlowingData:
I call myself a statistician, because, well, I’m a statistics graduate student. However, ask me specific questions about hypothesis tests or required sampling size, and my answer probably won’t be very good.

The other day I was trying to think of the last time I did an actual hypothesis test or formal analysis. I couldn’t remember. I actually had to dig up old course listings to figure out when it was. It was four years ago during my first year of graduate school. I did well in those courses, and I’m confident I could do that stuff with a quick refresher, but it’s a no go off the cuff. It’s just not something I do regularly.

Instead, the most important things I’ve learned are less formal, but have proven extremely useful when working/playing with data. Here they are in no particular order.

Read more...