lördag 27 maj 2017

Truth and statistics: How to find out what people really think

The Economist:
TO MANY people Big Data is less shiny than it was a year ago. After Hillary Clinton’s defeat at the hands of Donald Trump, her vaunted analytics team took much of the blame for failing to spot warnings in the midwestern states that cost her the presidency. But according to research by Seth Stephens-Davidowitz, a former data scientist at Google, Mrs Clinton’s real mistake was not to rely too much on newfangled statistics, but rather too little.

onsdag 24 maj 2017

8 out of 10 cats fear statistics – AI doesn't have this problem

Use and abuse of figures
The Register:
If statistics were a human being, it would have been in deep therapy all of its 350-year life. The sessions might go like this:

Statistics: "Everyone hates me."

Pause.

Therapist: "I'm sure it's not everyone..."

Statistics: "And they misunderstand me."

Pause.

Therapist: "Sorry, I didn't quite get what you meant there..."

The problem is that statistics are misunderstood by the majority of the population and most people hate what they don't understand. Think of the well-known expressions: "Lies, damn lies and statistics" and "The government uses statistics as a drunkard uses a lamppost; more for support than for illumination."

måndag 22 maj 2017

De tar Hans Roslings hyllade kommunikationskoncept vidare

Från Resumé:
Han fick tung statistik att flyga genom staplade toapappersrullar och kaffekoppar. Bakom avlidne professor Hans Roslings innovativa sätt att presentera globala förändringar står sonen Ola Rosling och sonhustrun Anna Rosling Rönnlund.

På Ted X i Singapore kunde åhörarna vila ögonen på toalettpappersrullar och samtidigt lära sig fakta om hur världens befolkningstillväxt utvecklar sig. Men den gången i november 2015 var inte den första som Hans Rosling skapade uppmärksamhet för sina scenframträdanden.

Vad förklarar SD:s framgång?

Ekonomistas
Genomsnittligt väljarstöd i nationella parlament för högerpopulistiska partier i EU samt Island,
Norge, Schweiz, Serbien och Montenegro. Diagram från Timbros index över auktoritär populism 2016.

Få saker i vår samtid diskuteras lika flitigt som hur vi bäst kan förstå de nationalistiska strömningar som sköljer över västvärlden. Trumps valseger och Brexit överrumplade många, men faktum är att högerpopulistiska partier i Europa varit på frammarsch alltsedan 1980-talet (se figur nedan). Här i Sverige har SD ungefärligen fördubblat sitt väljarstöd i varje riksdagsval sedan 1988. Efter Macrons seger i Frankrike tycks många nu hoppas att kulmen har nåtts. Men utan att förstå drivkrafterna bakom högerpopulismens framväxt är det naturligtvis omöjligt att sia om den framtida utvecklingen. Jag vågar inte ge mig i kast med att förklara de bakomliggande drivkrafterna i alla länder, men när det gäller den svenska utveckligen undrar jag om inte en mycket enkel förklaringsmodell med tre beståndsdelar räcker ganska långt.

Läs mer....

torsdag 18 maj 2017

Statistics NZ CEO Liz MacPherson: Into tomorrow with information from today

Photo by Divina Paredes
CIO from IDG:
‘We are absolutely focused on unleashing the power of data to change lives.’

“You are already a statistic, you are the first woman Government Statistician.”

Someone said this to Liz MacPherson when she took on the top role at Statistics New Zealand three years ago.

And, indeed, the wall in her new office was filled with photos of her predecessors, all males.

Read more....

tisdag 16 maj 2017

Real or Fake News? Let Statistics Help: 7 Questions to Ask

This is Statistics:
There is a lot of discussion today about whether the stories we see in the news are real or fake. Statistical thinking can help you assess the validity of reports, claims from a new study, or other conclusions flashing through your social media feed. Here are a few tips from statisticians – experts in the scientific discipline of learning from data – for how to separate fact from fiction, science from salesmanship, precision from propaganda.

lördag 13 maj 2017

Sparse graphs using exchangeable random measures



Speakers: Francois Caron (University of Oxford, UK) and Emily B Fox (University of Washington, Seattle, USA)

Statistical network modelling has focused on representing the graph as a discrete structure, namely the adjacency matrix. When assuming exchangeability of this array—which can aid in modelling, computations, and theoretical analysis—the Aldous-Hoover theorem informs us that the graph is necessarily either dense or empty. We instead consider representing the graph as an exchangeable random measure and appeal to the Kallenberg representation theorem for this object. We explore using completely random measures (CRMs) to define the exchangeable random measure and we show how our CRM construction enables us to achieve sparse graphs while maintaining the attractive properties of exchangeability. We relate the sparsity of the graph to the Lévy measure defining the CRM. For a specific choice of CRM, our graphs can be tuned from dense to sparse on the basis of a single parameter. We present a scalable Hamiltonian Monte Carlo algorithm for posterior inference, which we use to analyse network properties in a range of real data sets, including networks with hundreds of thousands of nodes and millions of edges.

onsdag 10 maj 2017

Henrik Ekengren Oscarsson: Valforskningsprogrammet utvecklar ny sammanvägning av opinionsmätningar

Henrik Ekengren Oscarsson och TV4
Forskarna vid Valforskningsprogrammet i Göteborg har alltid varit engagerat i forskningskommunikation. Vårt nya visualiseringsprojekt med stöd från Riksbankens Jubileumsfond kommer under året att resultera i interaktiva applikationer som visualiserar de data som vi använder i vår forskning om val, opinion och demokrati. Vår databas med opinionsmätningar av väljarnas röstningsintentioner har under de senaste tio åren använts för att göra sammanvägningar, s k poll of polls, för att ge en bättre bild av väljaropinionens utveckling än vad enskilda mätningar kan göra. En samlad bild av opinionsläge och opinionsförändringar är viktig för väljare i flerpartisystem. Väljarna behöver bästa tillgängliga information om det här för att kunna ta hänsyn till den strategiska kontexten, framtida regeringsbildningar och fyraprocentspärrar.

Läs mer....

tisdag 9 maj 2017

Macron Won, But The French Polls Were Way Off

FiveThirtyEight:
Emmanuel Macron’s 32-percentage-point victory in France’s presidential election runoff may end up being touted as a triumph for French pollsters, who consistently gave him a huge advantage. But it shouldn’t be. The polls leading up to the contest between the centrist Macron and his far-right opponent were the least predictive in French history, underestimating Macron’s support, rather than Marine Le Pen’s, to the surprise of some.

Here are a few takeaways. Read more.....

Conventional Wisdom May Be Contaminating Polls

FiveThirtyEight:
Sunday’s French presidential election was the latest in a trend. The centrist candidate, Emmanuel Macron, won by a considerably wider margin than most observers predicted, with a 32-percentage-point landslide over Marine Le Pen, larger than the 24-point margin that the final polls showed.

But the trend isn’t that center-left globalism is making a comeback — that’s too early to say.1 Instead, it’s this: When the conventional wisdom tries to outguess the polls, it almost always guesses in the wrong direction. Many experts expected Le Pen to beat her polls. Currency markets implied that she had a much greater chance — perhaps 20 percent — than you’d reasonably infer from the polls. But it was Macron who considerably outperformed his numbers instead.

måndag 8 maj 2017

Är april- och novemberbarn lika långa?

Är april- och novemberbarn lika långa? – Ekonomistas
Många har sannolikt precis som jag själv ställt sig frågan om de som är födda i november är lika långa som de som är födda i april. Ett sätt att få svar på denna fråga är att vända sig till den mönstringsdata som finns tillgänglig för nedladdning från Riksarkivet. Dessa data är naturligtvis avidentifierade, men med över två miljoner observationer går det ändå att undersöka intressanta frågor.

Läs mer....

​​Ericsson AB klassas nu som ett tjänsteföretag i statistiken

​​Ericsson AB klassas nu som ett tjänsteföretag i statistiken - Almega:
Nu klassas Ericsson AB som tjänsteföretag i SCB:s statistik. Det ger stor inverkan på statistiken över tjänstesektorns tillväxttakt, eftersom Ericssons produktion väger tungt i svensk ekonomi. Förändringen infördes med publiceringen av SCB:s tjänsteproduktionsindex den 5 maj, som nu inkluderar Ericsson AB från och med januari 2015. Det innebär ett enormt tidsseriebrott från och med 2015, se diagram nedan. Tillväxttakten för tjänsteproduktionen från och med januari 2015 är alltså inte jämförbar med utvecklingen innan dess.

Läs mer...

DDoS: How Big Data can help combat a real and prevalent threat

SecurityBrief Asia - DDoS: How Big Data can help combat a real and prevalent threat:
"Large scale sequence statistics can also show events with normal behaviour but when criminals attack with fake traffic, phishing attacks or other cyber assaults, the event sequence will be different illogical, meaning the criminals couldn’t predict them. For example, a bank could daily analyse a huge amount of data and a simple anomalous sequence pattern could be a valuable clue to identify and shut down an attack.

Finally, armed with Big Data and appropriate statistical models, the subtle “fingerprints” left by cyber criminals can be identified and, in a system set up for speed, criminal assaults can be thwarted before major damage is done."

Read more....

söndag 7 maj 2017

Först till 175 vinner

Henrik Ekengren Oscarsson:
Den senaste sammanvägningen av opinionsmätningar (Mätningarnas Mätning) visar att varken Miljöpartiet, Kristdemokraterna eller Feministiskt initiativ skulle vinna riksdagsrepresentation vid ett tänkt riksdagsval om det hållits idag. De samlade rösterna på dessa tre partier -- 3,4+3,9+2,4 = 9,7 procent eller sisådär 750 000 röster -- skulle vara "bortkastade" i den meningen att de inte gav utdelning i form av riksdagsmandat.

What’s the Year’s Hottest Job? Statistician

This is Statistics:
Quiz time: What is one of the most in-demand jobs, with even more potential for growth in the coming years, that pays well and encompasses everything from sports to healthcare to marketing?

Once again, it is statistician.

Job-seeker website CareerCast recently released its annual “Best Jobs” rankings, and the job of statistician ranked at number one.

torsdag 4 maj 2017

A scalable bootstrap for massive data

A scalable bootstrap for massive data - YouTube:


The paper 'A scalable bootstrap for massive data' (RSS Series B, Volume 76, Issue 4, 2014) presented by Michael I Jordan, University of California, Berkeley.
Co-authors are: Ariel Kleiner, Ameet Talwalkar, Purnamrita Sarkar
Chair: Richard Samworth, Cambridge University

The bootstrap provides a simple and powerful means of assessing the quality of estimators. However, in settings involving large data sets—which are increasingly prevalent—the calculation of bootstrap-based quantities can be prohibitively demanding computationally. Although variants such as subsampling and the m out of n bootstrap can be used in principle to reduce the cost of bootstrap computations, these methods are generally not robust to specification of tuning parameters (such as the number of subsampled data points), and they often require knowledge of the estimator's convergence rate, in contrast with the bootstrap. As an alternative, we introduce the ‘bag of little bootstraps’ (BLB), which is a new procedure which incorporates features of both the bootstrap and subsampling to yield a robust, computationally efficient means of assessing the quality of estimators. The BLB is well suited to modern parallel and distributed computing architectures and furthermore retains the generic applicability and statistical efficiency of the bootstrap. We demonstrate the BLB's favourable statistical performance via a theoretical analysis elucidating the procedure's properties, as well as a simulation study comparing the BLB with the bootstrap, the m out of n bootstrap and subsampling. In addition, we present results from a large-scale distributed implementation of the BLB demonstrating its computational superiority on massive data, a method for adaptively selecting the BLB's tuning parameters, an empirical study applying the BLB to several real data sets and an extension of the BLB to time series data.

onsdag 3 maj 2017

Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing

Fig 1. Anscombe's Quartet (left), and a "Unstructured Quartet" on the right, where the datasets
have the same summary statistics as those in Anscombe's Quartet,
but lack underlying structure or visual distinction.
Autodesk Research:
It can be difficult to demonstrate the importance of data visualization. Some people are of the impression that charts are simply "pretty pictures", while all of the important information can be divined through statistical analysis. An effective (and often used) tool used to demonstrate that visualizing your data is in fact important is Anscome's Quartet. Developed by F.J. Anscombe in 1973, Anscombe's Quartet is a set of four datasets, where each produces the same summary statistics (mean, standard deviation, and correlation), which could lead one to believe the datasets are quite similar. However, after visualizing (plotting) the data, it becomes clear that the datasets are markedly different. The effectiveness of Anscombe's Quartet is not due to simply having four different datasets which generate the same statistical properties, it is that four clearly different and visually distinct datasets are producing the same statistical properties. In contrast the "Unstructured Quartet" on the right in Figure 1 also shares the same statistical properties as Anscombe's Quartet, however without any obvious underlying structure to the individual datasets, this quartet is not nearly as effective at demonstrating the importance of visualizing your data.

lördag 29 april 2017

Så kan vi göra statistik av big data


SCB Så kan vi göra statistik av big data:
Big data, en stor mängd obehandlad data, sägs kunna ge oss mängder av värdefull information om vårt samhälle. Men vad är big data egentligen och vad har det för betydelse för statistikbyråer världen över? Välfärds statistikskola tittar närmare på begreppet.

onsdag 26 april 2017

Beyond subjective and objective in statistics

(11) RSS Discussion Meeting: Beyond subjective and objective in statistics - YouTube:


Andrew Gelman (Columbia University, New York)
Christian Hennig (University College London)

Decisions in statistical data analysis are often justified, criticized or avoided by using concepts of objectivity and subjectivity. We argue that the words ‘objective’ and ‘subjective’ in statistics discourse are used in a mostly unhelpful way, and we propose to replace each of them with broader collections of attributes, with objectivity replaced by transparency, consensus, impartiality and correspondence to observable reality, and subjectivity replaced by awareness of multiple perspectives and context dependence. Together with stability, these make up a collection of virtues that we think is helpful in discussions of statistical foundations and practice. The advantage of these reformulations is that the replacement terms do not oppose each other and that they give more specific guidance about what statistical science strives to achieve. Instead of debating over whether a given statistical method is subjective or objective (or normatively debating the relative merits of subjectivity and objectivity in statistical practice), we can recognize desirable attributes such as transparency and acknowledgement of multiple perspectives as complementary goals. We demonstrate the implications of our proposal with recent applied examples from pharmacology, election polling and socio-economic stratification. The aim of the paper is to push users and developers of statistical methods towards more effective use of diverse sources of information and more open acknowledgement of assumptions and goals.

Anställda sågar SCB-flytt

Arkivbild. Foto: Noella Johansson/TT
Anställda sågar SCB-flytt | SvD:
Regeringen vill att Statistiska centralbyrån (SCB) flyttar sitt huvudkontor från Stockholm till Örebro.

Toppstyrning, tycker ett missnöjt fack.

– Därmed blir Örebro Sveriges statistikhuvudstad, säger civilminister Ardalan Shekarabi (S).

Det är långt ifrån alla drygt 500 anställda i Stockholm som berörs. Det handlar i första hand om generaldirektören och kringliggande stabs- och stödfunktioner.

Läs mer.... och vad säger SCB?