Friday, May 23, 2014

House's Gutted USA FREEDOM Act

May 20, 2014 | By Mark Jaycox and Nadia Kayyali and Lee Tien

EFF and Other Civil Liberties Organizations Call on Congress to Support Uncompromising Reform

Since the introduction of the USA FREEDOM Act, a bill that has over 140 cosponsors, Congress has been clear about its intent: ending the mass collection of Americans' calling records. Many members of Congress, the President's own review group on NSA activities, and the Privacy and Civil Liberties Oversight Board all agree that the use of Section 215 to collect Americans' calling records must stop. Earlier today, House Leadership reached an agreement to amend the bipartisan USA FREEDOM Act in ways that severely weaken the bill, potentially allowing bulk surveillance of records to continue. The Electronic Frontier Foundation cannot support a bill that doesn't achieve the goal of ending mass spying. We urge Congress to support uncompromising NSA reform and we look forward to working on the Senate's bipartisan version of the USA FREEDOM Act.

Passing the bill out of the Judiciary Committee for a vote on the House floor is an important sign that Rep. Bob Goodlatte, Rep. Jim Sensenbrenner, and other leaders of the House are engaging in a conversation over NSA reform. We are glad that the House added a clause to the bill clarifying the content of communications cannot be obtained with Section 215. Unfortunately, the bill's changed definitions, the lack of substantial reform to Section 702 of the Foreign Intelligence Surveillance Amendments Act, and the inability to introduce a special advocate in the FISA Court severely weakens the bill.

In particular, we are concerned with the new definition of "specific selection term," which describes and limits who or what the NSA is allowed to surveil. The new definition is incredibly more expansive than previous definitions. Less than a week ago, the definition was simply "a term used to uniquely describe a person, entity, or account.” While that definition was imperfect, the new version is far broader.1 The new version not only adds the undefined words "address" and "device," but makes the list of potential selection terms open-ended by using the term "such as." Congress has been clear that it wishes to end bulk collection, but given the government's history of twisted legal interpretations, this language can't be relied on to protect our freedoms.

Further, the bill does not sufficiently address Section 702 of the Foreign Intelligence Surveillance Amendments Act. We are specifically concerned that the new language references "about" searches, which collect and review messages of users who do not even communicate with surveillance targets.Congress must include reforming Section 702 in any NSA reform. This includes stopping the NSA from searching illegally collected Americans' communications, stopping the suspicionless "about" surveillance, and ensuring companies can report on the exact number of orders they receive and the number of users affected.

We are encouraged by Senator Leahy's commitment to continue with the more comprehensive version of the USA FREEDOM Act over the summer and look forward to working towards NSA reform in the Senate.
1. The bill reads “(2) Specific selection term.—The term ‘specific selection term’ means a discrete term, such as a term specifically identifying a person, entity, account, address, or device, used by the Government to limit the scope of the information or tangible things sought pursuant to the statute authorizing the provision of such information or tangible things to the Government.

Real ID Online? New Federal Online Identity Plan Raises Privacy and Free Speech Concerns

The White House recently released a draft of a troubling plan titled "National Strategy for Trusted Identities in Cyberspace" (NSTIC). In previous iterations, the project was known as the "National Strategy for Secure Online Transactions" and emphasized, reasonably, the private sector's development of technologies to secure sensitive online transactions. But the recent shift to "Trusted Identities in Cyberspace" reflects a radical — and concerning — expansion of the project’s scope.

The draft NSTIC now calls for pervasive, authenticated digital IDs and makes scant mention of the unprecedented threat such a scheme would pose to privacy and free speech online. And while the draft NSTIC "does not advocate for the establishment of a national identification card" (p. 6), it’s far from clear that it won’t take us dangerously far down that road. Because the draft NSTIC is vague about many basic points, the White House must proceed with caution and avoid rushing past the risks that lay ahead. Here are some of our concerns.

Is authentication really the answer?

Probably the biggest conceptual problem is that the draft NSTIC seems to place unquestioning faith in authentication — a system of proving one's identity — as an approach to solving Internet security problems. Even leaving aside the civil liberties risks of pervasive online authentication, computer security experts question this emphasis. As prominent researcher Steven Bellovin notes:

The biggest problem [for Internet security] was and is buggy code. All the authentication in the world won't stop a bad guy who goes around the authentication system, either by finding bugs exploitable before authentication is performed, finding bugs in the authentication system itself, or by hijacking your system and abusing the authenticated connection set up by the legitimate user. All of these attacks have been known for years.

A Real ID Society?

The draft NSTIC says that, instead of a national ID card, it "seeks to establish an ecosystem of interoperable identity service providers and relying parties where individuals have the choice of different credentials or a single credential for different types of online transactions," which can be obtained "from either public or private sector identity providers." (p. 6) In other words, the governments want a lot of different companies or organizations to be able to do the task of confirming that a person on the Internet is who he or she claims to be.

Decentralized or federated ID management systems are possible, but like all ID systems, they definitely pose significant privacy issues. 1 There’s little discussion of these issues, and in particular, there’s no attention to how multiple ID's might be linked together under a single umbrella credential. A National Academies study, Who Goes There?: Authentication Through the Lens of Privacy, warned that multiple, separate, unlinkable credentials are better for both security and privacy (pp. 125-132). Yet the draft NSTIC doesn’t discuss in any depth how to prevent or minimize linkage of our online IDs, which would seem much easier online than offline, and fails to discuss or refer to academic work on unlinkable credentials (such as that of Stefan Brands, or Jan Camenisch and Anna Lysyanskaya).

Providing a uniform online ID system could pressure providers to require more ID than necessary. The video game company Blizzard, for example, recently indicated it would implement a verified ID requirement for its forums before walking back the proposal only after widespread, outspoken criticism from users.

Pervasive online ID could likewise encourage lawmakers to enact access restrictions for online services, from paying taxes to using libraries and beyond. Website operators have argued persuasively that they cannot be expected to tell exactly who is visiting their sites, but that could change with a new online ID mechanism. Massachusetts recently adopted an overly broad online obscenity law; it takes little imagination to believe states would require NSTIC implementation individuals to be able to access content somehow deemed to be "objectionable."


The draft NSTIC "envisions" that a blogger will use "a smart identity card from her home state" to "authenticate herself for . . . [a]nonymously posting blog entries." (p. 4) But how is her blog anonymous when it’s directly associated with a state-issued ID card?

The proposal mistakenly conflates trusting a third party to not reveal your identity with actual anonymity — where third parties don’t know your identity. When Thomas Paine anonymously published Common Sense in 1776, he didn’t secretly register with the British Crown.

Indeed, the draft NSTIC barely recognizes the value of anonymous speech, whether in public postings or private email, or anonymous browsing via systems like Tor. Nor does it address issues about re-identification, e.g. the ability to take different sets of de-identified data and link them so as to re-identify individuals.

Bellovin credits the draft NSTIC for suggesting the use of attribute credentials rather than identity credentials — that is, using credentials that could establish that you're authorized to do something without saying who you are. But, as he puts it, "We need ways to discourage collection of identity information unless identity is actually needed to deliver the requested service," and the draft NSTIC doesn't seem to address this.
Privacy, Identity Theft and Surveillance

The draft NSTIC seems to presuppose widespread use of smart ID cards. In one example, it envisions that an individual will use "a smart identity card from her home state" to "authenticate herself for a variety of online services," presumably modeled upon driver’s licenses. (p. 4)

One major concern, acknowledged briefly in the draft, is whether people's computers can really be secure enough to be used for these purposes — smart ID cards or no smart ID cards. As noted above, the vast majority of privacy and authentication vulnerabilities stem from buggy software, and when a computer is trivial to compromise, its users’ credentials are easy to steal. The NSTIC proposal could, in fact, decrease user privacy and enable identity theft: once a user’s digital ID is stolen, it could be used to both pose as the user and access all the user’s accounts and data.

Consider, for example, the proposal to use a state digital ID card to access health records and online banking. What happens next time you lose your wallet?

Furthermore, by consolidating your credentials, the NSTIC plan may provide the government with a centralized means of surveilling your online accounts. And if the government issues your digital ID itself, it won’t even need to approach a third party with any kind of legal process before surveilling you.

The draft NSTIC also mentions the development of a public-key infrastructure (PKI). (pp. 15, 27) We support good, widespread encryption, which could allow people to get correct public keys reliably and possibly cut down on phishing, spam, fraud, and pretexting. But as Bruce Schneier and Carl Ellison have explained, doing PKI properly isn’t easy.2 All of their concerns apply, in some form, to the NSTIC proposal.

Another concern that’s emerged recently is whether governments could coerce certificate authorities in a PKI to issue false credentials in order to facilitate surveillance. Chris Soghoian and Sid Stamm have reported on an industry claim that governments could get "court orders" giving them access to falsified cryptographic credentials. This threat seems greater if the government itself is running the PKI.

Much more could be said. The NSTIC is only a draft, and the Department of Homeland Security and the White House sought public input online through July 19th. Because of the importance of this issue, EFF has joined with a coalition of concerned civil liberties group to ask the Administrations for a longer comment period and a way to submit more detailed comments. We hope and expect that this will be only the beginning of a public debate about ID management online.

US Government Begins Rollout Of Its 'Driver's License For The Internet'

from the seizing-the-(wrong)-moment dept

An idea the government has been kicking around since 2011 is finally making its debut. Calling this move ill-timed would be the most gracious way of putting it.
A few years back, the White House had a brilliant idea: Why not create a single, secure online ID that Americans could use to verify their identity across multiple websites, starting with local government services. The New York Times described it at the time as a "driver's license for the internet."

Sound convenient? It is. Sound scary? It is.

Next month, a pilot program of the "National Strategy for Trusted Identities in Cyberspace" will begin in government agencies in two US states, to test out whether the pros of a federally verified cyber ID outweigh the cons.
The NSTIC program has been in (slow) motion for nearly three years, but now, at a time when the public's trust in government is at an all time low, the National Institute of Standards and Technology (NIST -- itself still reeling a bit from NSA-related blowback) is testing the program in Michigan and Pennsylvania. The first tests appear to be exclusively aimed at accessing public programs, like government assistance. The government believes this ID system will help reduce fraud and overhead, by eliminating duplicated ID efforts across multiple agencies.

But the program isn't strictly limited to government use. The ultimate goal is a replacement of many logins and passwords people maintain to access content and participate in comment threads and forums. This "solution," while somewhat practical, also raises considerable privacy concerns.
[T]he Electronic Frontier Foundation immediately pointed out the red flags, arguing that the right to anonymous speech in the digital realm is protected under the First Amendment. It called the program "radical," "concerning," and pointed out that the plan "makes scant mention of the unprecedented threat such a scheme would pose to privacy and free speech online."

And the keepers of the identity credentials wouldn't be the government itself, but a third party organization. When the program was introduced in 2011, banks, technology companies or cellphone service providers were suggested for the role, so theoretically Google or Verizon could have access to a comprehensive profile of who you are that's shared with every site you visit, as mandated by the government.
Beyond the privacy issues (and the hints of government being unduly interested in your online activities), there are the security issues. This collected information would be housed centrally, possibly by corporate third parties. When hackers can find a wealth of information at one location, it presents a very enticing target. The government's track record on protecting confidential information is hardly encouraging.

The problem is, ultimately, that this is the government rolling this out. Unlike corporations, citizens won't be allowed the luxury of opting out. This "internet driver's license" may be the only option the public has to do things like renew actual driver's licenses or file taxes or complete paperwork that keeps them on the right side of federal law. Whether or not you believe the government's assurances that it will keep your data safe from hackers, keep it out of the hands of law enforcement (without a warrant), or simply not look at it just because it's there, matters very little. If the government decides the positives outweigh the negatives, you'll have no choice but to participate.

The Ultimate Guide to the Invisible Web

Search engines are, in a sense, the heartbeat of the internet; “googling” has become a part of everyday speech and is even recognized by Merriam-Webster as a grammatically correct verb. It’s a common misconception, however, that googling a search term will reveal every site out there that addresses your search. In fact, typical search engines like Google, Yahoo, or Bing actually access only a tiny fraction – estimated at 0.03% – of the internet. The sites that traditional searches yield are part of what’s known as the Surface Web, which is comprised of indexed pages that a search engine’s web crawlers are programmed to retrieve.

So where’s the rest? The vast majority of the Internet lies in the Deep Web, sometimes referred to as the Invisible Web. The actual size of the Deep Web is impossible to measure, but many experts estimate it is about 500 times the size of the web as we know it.
Deep Web pages operate just like any other site online, but they are constructed so that their existence is invisible to Web crawlers. While recent news, such as the bust of the infamous Silk Road drug-dealing site and Edward Snowden’s NSA shenanigans, have spotlighted the Deep Web’s existence, it’s still largely misunderstood.

Search Engines and the Surface Web

Understanding how surface Web pages are indexed by search engines can help you understand what the Deep Web is all about. In the early days, computing power and storage space was at such a premium that search engines indexed a minimal number of pages, often storing only partial content. The methodology behind searching reflected users’ intentions; early Internet users generally sought research, so the first search engines indexed simple queries that students or other researchers were likely to make. Search results consisted of actual content that a search engine had stored.

Over time, advancing technology made it profitable for search engines to do a more thorough job of indexing site content. Today’s Web crawlers, or spiders, use sophisticated algorithms to collect page data from hyperlinked pages. These robots maneuver their way through all linked data on the Internet, earning their spidery nickname. Every surface site is indexed by metadata that crawlers collect. This metadata, consisting of elements such as page title, page location (URL) and repeated keywords used in text, takes up much less space than actual page content. Instead of the cached content dump of old, today’s search engines speedily and efficiently direct users to websites that are relevant to their queries.
To get a sense of how search engines have improved over time, Google’s interactive breakdown “How Search Works” details all the factors at play in every Google search. In a similar vein,’s timeline of Google’s search engine algorithm will give you an idea of how nonstop the efforts have been to refine searches. How these efforts impact the Deep Web is not exactly clear. But it’s reasonable to assume that if major search engines keep improving, ordinary web users will be less likely to seek out arcane Deep Web searches.

How is the Deep Web Invisible to Search Engines?

Search engines like Google are extremely powerful and effective at distilling up-to-the-moment Web content. What they lack, however, is the ability to index the vast amount of data that isn’t hyperlinked and therefore immediately accessible to a Web crawler. This may or may not be intentional; for example, content behind a paywall or a blog post that’s written but not yet published both technically reside in the Deep Web.
Some examples of other Deep Web content include:
  • Data that needs to be accessed by a search interface
  • Results of database queries
  • Subscription-only information and other password-protected data
  • Pages that are not linked to by any other page
  • Technically limited content, such as that requiring CAPTCHA technology
  • Text content that exists outside of conventional http:// or https:// protocols

While the scale and diversity of the Deep Web are staggering, it’s notoriety – and appeal – comes from the fact that users are anonymous on the Deep Web, and so are their Deep Web activities. Because of this, it’s been an important tool for governments; the U.S. Naval research laboratory first launched intelligence tools for Deep Web use in 2003.

Unfortunately, this anonymity has created a breeding ground for criminal elements who take advantage of the opportunity to hide illegal activities. Illegal pornography, drugs, weapons and passports are just a few of the items available for purchase on the Deep Web. However, the existence of sites like these doesn’t mean that the Deep Web is inherently evil; anonymity has its value, and many users prefer to operate within an untraceable system on principle.

Just as Deep Web content can’t be traced by Web crawlers, it can’t be accessed by conventional means. The same Naval research group to develop intelligence-gathering tools created The Onion Router Project, now known by its acronym TOR. Onion routing refers to the process of removing encryption layers from Internet communications, similar to peeling back the layers of an onion. TOR users’ identities and network activities are concealed by this software. TOR, and other software like it, offers an anonymous connection to the Deep Web. It is, in effect, your Deep Web search engine.

But in spite of its back-alley reputation there are plenty of legitimate reasons to use TOR. For one, TOR lets users avoid “traffic analysis” or the monitoring tools used by commercial sites, for one, to determine web users’ location and the network they are connecting through. These businesses can then use this information to adjust pricing, or even what products and services they make available.
According to the Tor Project site, the program also allows people to, “[...] Set up a website where people publish material without worrying about censorship.” While this is by no means a clear good or bad thing, the tension between censorship and free speech is felt the world over; the Deep Web. The Deep Web furthers that debate by demonstrating what people can and will do to overcome political and social censorship.

Reasons a Page is Invisible

When an ordinary search engine query comes back with no results, that doesn’t necessarily mean there is nothing to be found. An “invisible” page isn’t necessarily inaccessible; it’s simply not indexed by a search engine. There are several reasons why a page may be invisible. Keep in mind that some pages are only temporarily invisible, possibly slated to be indexed at a later date.
  • Engines have traditionally ignored any Web pages whose URLs have a long string of parameters and equal signs and question marks, on the off chance that they’ll duplicate what’s in their database – or worse – the spider will somehow go around in circles. Known as the “Shallow Web,” a number of workarounds have been developed to help you access this content.
  • Form-controlled entry that’s not password-protected. In this case, page content only gets displayed when a human applies a set of actions, mostly entering data into a form (specific query information, such as job criteria for a job search engine). This typically includes databases that generate pages on demand. Applicable content includes travel industry data (flight info, hotel availability), job listings, product databases, patents, publicly-accessible government information, dictionary definitions, laws, stock market data, phone books and professional directories.
  • Passworded access, subscription or non subscription. This includes VPN (virtual private networks) and any website where pages require a username and password. Access may or may not be by paid subscription. Applicable content includes academic and corporate databases, newspaper or journal content, and academic library subscriptions.
  • Timed access. On some sites, like major news sources such as the New York Times, free content becomes inaccessible after a certain number of pageviews. Search engines retain the URL, but the page generates a sign-up form, and the content is moved to a new URL that requires a password.
  • Robots exclusion. The robots.txt file, which usually lives in the main directory of a site, tells search robots which files and directories should not be indexed. Hence its name “robots exclusion file.” If this file is set up, it will block certain pages from being indexed, which will then be invisible to searchers. Blog platforms commonly offer this feature.
  • Hidden pages. There is simply no sequence of hyperlink clicks that could take you to such a page. The pages are accessible, but only to people who know of their existence.

Ways to Make Content More Visible

We have discussed what type of content is invisible and where we might find such information. Alternatively, the idea of making content more visible spawned the Search Engine Optimization (SEO) industry. Some ways to improve your search optimization include:
  • Categorize your database. If you have a database of products, you could publish select information to static category and overview pages, thereby making content available without form-based or query-generated access. This works best for information that does not become outdated, like job postings.
  • Build links within your website, interlinking between your own pages. Each hyperlink will be indexed by spiders, making your site more visible.
  • Publish a sitemap. It is crucial to publish a serially linked, current sitemap to your site. It’s no longer considered a best practice to publicize it to your viewers, but publish it and keep it up to date so that spiders can make the best assessment of your site’s content.
  • Write about it elsewhere. One of the easiest forms of Search Enging Optimization (SEO) is to find ways to publish links to your site on other webpages. This will help make it more visible.
  • Use social media to promote your site. Link to your site on Twitter, Instagram, Facebook or any other social media platform that suits you. You’ll drive traffic to your site and increase the number of links on the Internet.
  • Remove access restrictions. Avoid login or time-limit requirements unless you are soliciting subscriptions.
  • Write clean code. Even if you use a pre-packaged website template without customizing the code, validate your site’s code so that spiders can navigate it easily.
  • Match your site’s page titles and link names to other text within the site, and pay attention to keywords that are relevant to your content.

How to Access and Search for Invisible Content

If a site is inaccessible by conventional means, there are still ways to access the content, if not the actual pages. Aside from software like TOR, there are a number of entities who do make it possible to view Deep Web content, like universities and research facilities. For invisible content that cannot or should not be visible, there are still a number of ways to get access:
  • Join a professional or research association that provides access to records, research and peer-reviewed journals.
  • Access a virtual private network via an employer.
  • Request access; this could be as simple as a free registration.
  • Pay for a subscription.
  • Use a suitable resource. Use an invisible Web directory, portal or specialized search engine such as Google Book Search, Librarian’s Internet Index, or BrightPlanet’s Complete Planet.

Invisible Web Search Tools

Here is a small sampling of invisible web search tools (directories, portals, engines) to help you find invisible content. To see more like these, please look at our Research Beyond Google article.