Archive for the 'Science Culture' Category

Page 2 of 3

Do ethics apply to great data?

AOL just unwittingly released private, personally-identifiable data for 650,000 of its subscribers when it posted a large chunk of its search logs (20 million queries, actually) to its research website as a service to the scientific community.

Despite anonymizing user id’s, the search queries often include information that make it easy to associate them with a person. The query data include social security numbers, credit card numbers, porn queries, evidence of intent to engage in criminal activities, etc.

AOL has since removed the data, but it’s spreading like wildfire over the internet on mirrors and torrents. I was able retrieve a complete copy of it (2 gigabytes, uncompressed) in about an hour.

As a scientist who does research that could would really benefit from data like this, I can tell you: this is big. Big and dirty.

Ethically speaking…should we, as researchers, ignore that this data exists or deal with it pragmatically as an unfortunate accident?

On one hand this is extremely useful and compelling data for a host of social and computer sciences; on the other, it is an unequivocally criminal violation of ethical standards.

Given the Google subpoena, big brother NSA, and the ethical debates about scientific research this story is provoking in mass media, this feels like a watershed moment.

No one can ever create a ‘clean’ version of this data since it could always be traced back to the original, identifiable information.

Here’s a possible scenario:

Most scientists will hesitate to research it, but some rebels will and no doubt find interesting, at-first-unpublishable, results. Sooner or later, something will get published, and then the floodgates will open. Because something can’t be unethical if everyone is doing it.

Right?

Decisions Decisions

Carlos Gershenson writes:

Many people complain that nowadays there is too much choice. 20 ways of having your coffee, 50 types of ketchup, ten political parties in countries of less than 10 million [...] Too much choice overloads our cognitive abilities. Fifty years ago, George Miller published a paper showing that people tended to be able to keep in their minds only seven plus-minus two things at a time. In other words, after more or less seven types of fries, we lose track of what is going on…


Another salient example of overwhelming choice is the simple over-abundance of possible life paths. For many young people growing up in this globalized age — where economic niches are more plentiful than ever, and where transportation could probably get you to 99% of the towns on Earth in less than a week — the question is a daunting one: where to place yourself in this wide world? Compare this plenitude of options with those of an average human a few centuries ago. Most likely, you would inherit the occupation of your parents. Most likely, you would never travel further than the next town. For us post-moderns, the adjacent possible has expanded a million-fold and we are confronted with a dizzying array of decisions which, in sequence, will create our lives.

Is there something wrong with having a lot of possible niches? At first glance, it seems like a low-stress, non-competitive environment of abundance. Is this how the First World appears to the Third World? At second glance, it taxes our cognitive resources to be forced to consider so many options all the time in our quest to maximize utility. A related question is: in a directed search where we uncover the options one-by-one, at what point should we give up and go with the best-so-far?

When am I better off just picking my ketchup at random? When should I stop comparison shopping and just buy? When should I stop searching for the “perfect woman” and just settle down? It’s our old friend, the law of Diminishing Returns. At some point, the cost of evaluating more options will outweigh the possible slight increase in utility.

Take it from me, someone apt to be indecisive: better to eat than starve to death looking for the best deal. Better to be happy with a satisfactory decision than to stress out trying to make an optimal one.

When do networks not matter?

The question may have never occured to network researchers and enthusiasts. When you’ve found a paradigm that you love, it’s hard to see the boundaries of its utility. It’s the old “when you have a hammer, everything looks like a nail” story. But actually, the question which titles this post is an important networks question — not just a caution against overzealous methodologizing — because knowing when the network doesn’t matter means knowing when it does.

Network analysts use random networks as the standard by which to measure order in the networks they study. That’s because a random network is the graph-theoretic way of saying structure doesn’t matter. If the network structure you’re studying is significantly different from the random net, most likely it can’t be explained by chance alone; it has order, pattern, maybe even complexity. In other words, for the purposes of studying whatever system produced that structure, the network matters, i.e. it’s worth paying attention to.

And in the games of life and science, what matters most is knowing what is worthy of your thought and attention, and what is not.