The article challenges the presumption that Big Data provide the best information for social scientists and is built on existing literature and studies of the topic. The authors critically interrogate this phenomenon, taking into account its limitations, subjectivity and the ethical issues involved.
The overall structure and language of the article supports the arguments the authors make and the text is generally clear to the target audience. Highlighting the six provocations by dividing the text into sections with title, gives a good idea about what the author will discuss next. It seems to me that the beginning of the paper is uncertain and I cannot decide if it is meant to be an introduction or an abstract; either way, the article begins with exactly the same text and it is better to pick one or the other. The article would also benefit if it had closure – a summary of further possible studies/discussions, as now it leaves the impression that it is not finished.
The major problem with this article, in my view, is the transitions between one idea to another. For example, on page 668 – “Just because Big Data presents us with large quantities of data does not mean that methodological issues are no longer relevant” – seems confusing in relation with the beginning of the paragraph. I think that there should have been mentioned before if methodological issues are considered as being relevant or not in the context of Big Data.
On page 655 – “We also recognize… it is time to start critically interrogating this phenomenon, its assumptions, and its biases” – I it find more suitable in the introduction.
On page 664, the last paragraph mentions “some significant and insightful studies”, but it does not name the studies. I think it would be good if the readers had a source to the studies referred to, unless they are hidden from the public.
“This process is inherently subjective” (page 667) – I think it requires more elaboration into why is it subjective or an example of where data interpretation was subjective based on the decisions made by the researchers.
The example given on apophenia (page 688) requires more elaboration or maybe another example. I think that “apophenia” is an interesting issue related to Big Data and it would be good if the text provided more information about it.
Again on page 668, the paragraphs “Twitter provides an example…” and “Twitter does not represent ‘all people’…” can constitute one single, short paragraph. There is no need to mention that scholars choose Twitter data “because it is easy to obtain”, as it does not bring any value to the arguments presented. Again, I identified a transition problem, where the word ‘people’ is introduced before the author describes what it refers to, which creates confusion.