http://grossmeier.net/about & http://blog.grossmeier.net
472 stories
·
34 followers

2017-03-17

1 Share

The post 2017-03-17 appeared first on Kickstand Comics featuring Yehuda Moon.

Read the whole story
greggrossmeier
23 days ago
reply
SF, US
Share this story
Delete

Algorithms and insults: Scaling up our understanding of harassment on Wikipedia

1 Share

Visualization by Hoshi Ludwig, CC BY-SA 4.0.3D representation of 30 days of Wikipedia talk page revisions of which 1092 contained toxic language (shown as red if live, grey if reverted) and 164,102 were non-toxic (shown as dots). Visualization by Hoshi Ludwig, CC BY-SA 4.0.

“What you need to understand as you are doing the ironing is that Wikipedia is no place for a woman.” –An anonymous comment on a user’s talk page, March 2015

Volunteer Wikipedia editors coordinate many of their efforts through online discussions on “talk pages” which are attached to every article and user-page on the platform. But as the above quote demonstrates, these discussions aren’t always good-faith collaboration and exchanges of ideas—they are also an avenue of harassment and other toxic behavior.

Harassment is not unique to Wikipedia; it is a pervasive issue for many online communities. A 2014 Pew survey found that 73% of internet users have witnessed online harassment and 40% have personally experienced it. To better understand how contributors to Wikimedia projects experience harassment, the Wikimedia Foundation ran an opt-in survey in 2015. About 38% of editors surveyed had experienced some form of harassment, and subsequently, over half of those contributors felt a decrease in their motivation to contribute to the Wikimedia sites in the future.

Early last year, the Wikimedia Foundation kicked off a research collaboration with Jigsaw, a technology incubator for Google’s parent company, Alphabet, to better understand the nature and impact of harassment on Wikipedia and explore technical solutions. In particular, we have been developing models for automated detection of toxic comments on users’ talk pages applying  machine learning methods. We are using these models to analyze the prevalence and nature of online harassment at scale. This data will help us prototype tools to visually depict  harassment, helping administrators respond.

Our initial research has focused on personal attacks, a blatant form of online harassment that usually manifests as insults, slander, obscenity, or other forms of ad-hominem attacks. To amass sufficient data for a supervised machine learning approach, we collected 100,000 comments on English Wikipedia talk pages and had 4,000 crowd-workers judge whether the comments were harassing in 1 million annotations. Each comment was rated by 10 crowd-workers whose opinions were aggregated and used to train our model.

This dataset is the largest public annotated dataset of personal attacks that we know of. In addition to this labeled set of comments, we are releasing a corpus of all 95 million user and article talk comments made between 2001-2015. Both data sets are available on FigShare, a research repository where users can share data, to support further research.

The machine learning model we developed was inspired by recent research at Yahoo in detecting abusive language. The idea is to use fragments of text extracted from Wikipedia edits and feed them into a machine learning algorithm called logistic regression. This produces a probability estimate of whether an edit is a personal attack. With testing, we found that a fully trained model achieves better performance in predicting whether an edit is a personal attack than the combined average of 3 human crowd-workers.

Prior to this work, the primary way to determine whether a comment was an attack was to have it annotated by a human, a costly and time-consuming approach that could only cover a small fraction of the 24,000 edits to discussions that occur on Wikipedia every day. Our model allows us to investigate every edit as it occurs to determine whether it is a personal attack. This also allows us to ask more complex questions around how users experience harassment. Some of the questions we were able to examine include:

  1. How often are attacks moderated? Only 18% of attacks were followed by a warning or a block of the offending user. Even for users who have contributed four or more attacks, moderation only occurs for 60% of those users.
  2. What is the role of anonymity in personal attacks? Registered users make two-thirds (67%) of attacks on English Wikipedia, contradicting a widespread assumption that anonymity is the primary contributor to the problem.
  3. How frequent are attacks from regular vs. occasional contributors? Prolific and occasional editors are both responsible for a large proportion of attacks (see figure below). While half of all attacks come from editors who make fewer than 5 edits a year, a third come from registered users with over 100 edits a year.

Chart by Nithum Thain, CC BY-SA 4.0.Chart by Nithum Thain, CC BY-SA 4.0.

More information on how we performed these analyses and other questions that we investigated can be found in our research paper:

Wulczyn, E., Thain, N., Dixon, L. (2017). Ex Machina: Personal Attacks Seen at Scale (to appear in Proceedings of the 26th International Conference on World Wide Web – WWW 2017).

While we are excited about the contributions of this work, it is just a small step toward a deeper understanding of online harassment and finding ways to mitigate it. The limits of this research include that it only looked at egregious and easily identifiable personal attacks. The data is only in English, so the model we built only understands English. The model does little for other forms of harassment on Wikipedia; for example, it is not very good at identifying threats. There are also important things we do not yet know about our model and data; for example, are there unintended biases that were inadvertently learned from the crowdsourced ratings? We hope to explore these issues by collaborating further on this research.

We also hope that collaborating on these machine-learning methods might help online communities better monitor and address harassment, leading to more inclusive discussions. These methods also enable new ways for researchers to tackle many more questions about harassment at scale—including the impact of harassment on editor retention and whether certain groups are disproportionately silenced by harassers.
Tackling online harassment, like defining it, is a community effort. If you’re interested or want to help, you can get in touch with us and learn more about the project on our wiki page. Help us label more comments via our wikilabels campaign.

Ellery Wulczyn, Data Scientist, Wikimedia Foundation
Dario Taraborelli, Head of Research, Wikimedia Foundation
Nithum Thain, Research Fellow, Jigsaw
Lucas Dixon, Chief Research Scientist, Jigsaw









Read the whole story
greggrossmeier
80 days ago
reply
SF, US
Share this story
Delete

2017-02-01 Jumping Ship

1 Comment

The post 2017-02-01 Jumping Ship appeared first on Kickstand Comics featuring Yehuda Moon.

Read the whole story
greggrossmeier
87 days ago
reply
This was me in Michigan.
SF, US
Share this story
Delete

Knowledge knows no boundaries

1 Comment
Photo by NASA, public domain/CC0.

Photo by NASA, public domain/CC0.

At the Wikimedia Foundation, our mission was born of a belief that everyone, everywhere, has something to contribute to our shared human understanding. We believe in a world that encourages and protects the open exchange of ideas and information, community and culture; where people of every country, language, and culture can freely collaborate without restriction; and where international cooperation leads to common understanding.

The new U.S. administration’s executive order on immigration is an affront to this vision. It impedes the efforts of our colleagues and communities who work together from around the world to make shared, open knowledge accessible to all. When our ability to come together across borders is restricted, the world is poorer for it.

Knowledge knows no borders. Our collective human wisdom has long been built through the exchange of ideas, from our first navigational knowledge of the seas to our ongoing exploration of the heavens. When one society has stumbled and slipped into ignorance, others have preserved our records and archives, and built upon them. Throughout the Dark Ages in Europe, scholars in Baghdad kept alive the writings of Greek philosophers. These meticulous studies, along with the discoveries of Persian and Arab mathematicians, would in turn help spark the intellectual renaissance of Europe.

Wikipedia is an example of what is possible when borders do not hinder the exchange of ideas. Today, Wikipedia contains more than 40 million articles across nearly 300 languages. It is built one person at a time, across continent and language. It is built through collaboration in person and in communities, at international gatherings of ordinary individuals from around the world. These collaborative efforts serve hundreds of millions of people every month, opening up opportunity and education to all.

The Wikimedia Foundation is headquartered in the U.S., where we have unique freedoms that are essential to supporting the Wikimedia projects. But our mission is global. We support communities and projects from every corner of the globe. Our staff and community members need to be able to move freely in order to support this global movement and foster the sharing of ideas and knowledge, no matter their country of origin.

We strongly urge the U.S. administration to withdraw the recent executive order restricting travel and immigration from certain nations, and closing the doors to many refugees. It threatens our freedoms of inquiry and exchange, and it infringes on the fundamental rights of our colleagues, our communities, and our families.

Although our individual memories may be short, the arc of history is long, and it unfurls in a continuous progression of openness. At the Wikimedia Foundation, we will continue to stand up for our values of open discourse and international cooperation. We join hands with everyone who does.

Katherine Maher, Executive Director
Wikimedia Foundation







Read the whole story
greggrossmeier
88 days ago
reply
I'm proud to work here.
SF, US
Share this story
Delete

MeFi: Indivisible: A Practical Guide for Resisting the Trump Agenda

1 Comment and 2 Shares
"Former congressional staffers reveal best practices for making Congress listen" [Single Link Google Docs] A guide, based on the Tea Party playbook, on how to use the tools of government to resist the Trump agenda.
Read the whole story
greggrossmeier
134 days ago
reply
What's the immediate step (step 0)? Who do I email? What list do I join? How can I tell someone "I'll be at the townhall if you tell me when I need to be there"?
SF, US
thcipriani
134 days ago
reply
greggrossmeier
134 days ago
Finally read it this morning. Now I want to have free time again. What's the immediate step (step 0)? Who do I email? What list do I join? How can I tell someone "I'll be at the townhall if you tell me when I need to be there"?
Share this story
Delete

Hiatus

1 Comment

This site is on hiatus until further notice: in the wake of recent political events, our community's energy and attention should be focused on more important things. If you can, please get involved in projects like this.

Read the whole story
greggrossmeier
147 days ago
reply
Wow. Can't blame them. They were doing good stuff.
SF, US
Share this story
Delete
Next Page of Stories