30 mars 2008
Web 2.0, mashups and social networking - what is it all about?
3 different terms - Web 2.0, mashup and social networking, but all intertwined in the brave new Internet, the so-called second phase of the evolution of the online world. But what does it all mean?
Some companies have made the claim of using "Web 2.0" as a marketing strategy, but it seems in many cases it may be unfounded.
Do you really need to have a "New! Improved by Web 2.0" slogan on your site in order to survive and thrive? Not really, it's somewhat of a buzzword, but it's good to understand what this jargon means and to begin thinking about how your site can evolve to take advantage of the direction the web is heading in.
Long gone is the the concept of the Internet geek - the loner in a darkened room engaged in uber-technical pursuits. The web is cool with teens, it's a vital tool and recreation area for Generation X, the web is happening with senior citizens and as a result, it's becoming increasingly user driven rather than tech-geek dictated.
What is Web 2.0?
The roots of the term "Web 2.0" were in a conference brainstorming session between O'Reilly and MediaLive International in 2004. It referred to a change of thinking about how the applications of the future should be developed. Even before the term existed, Web 2.0 type applications were already around; e.g. blogging software. The brainstorming session sought to identify the common elements of these popular applications and services as a model for the future.
Web 2.0 applications and services have at least several of the following elements common:
- fresh, useful data is the core
- the ability for other parties to manipulate that data
- "living" applications that can be easily adapted
- harnessing the collective experience
- the web as a platform, independent of user platform
- primary focus of participation, rather than publishing
- trusting of users to provide reliable content
Other examples of applications and services with strong Web 2.0 influences are bookmark sharing, Google AdSense, RSS web feeds, Wikipedia and the thousands of mashups currently in existence. Personally, I see forums as a Web 2.0 type of application, but I don't see them recognized as such.
The very interesting point I find about the whole Web 2.0 movement is that in one particular aspect, it's really nothing new. In the 70's the technology boffins were desperately trying to get away from the mainframe/dumb terminal infrastructure and in some ways, we're heading back to that - just with hugely increased flexibility.
For a very much more detailed description of Web 2.0 concepts, something not overly technical, I highly recommend this article by Tim O'Reilly of O’Reilly Media, Inc.
What is a mashup?
The term "mashup" originated in the music industry - it's music that is made up of other songs already released, usually by other artists. I often find myself saying to my daughters that the "new" song they think is so cool is actually the bass line from X artist circa 1974 and the electric guitar riffs from X song originally recorded in 1982. Pah! Modern music, don't get me started ;).
Anyway, mashups in the web development world are actually very useful things ;). A mashup is usually a web-based application that combines content and functionality from a variety of sources using technologies including RSS and AJAX (Asynchronous JavaScript And XML).
Mashups generally don't require a programming degree, hence the rapid uptake of the concept. A company will release an API (Application Programming Interface) which is the interface that allows for external requests to be made to whatever content the company is offering. Instead of it being just a rigid reproduction of information, there is a high degree of interactivity and for the developer/user to manipulate that data - hence its tie in with Web 2.0 concepts.
So, between the API implementation and the user/developer's additional work to manipulate the content for use within another application - that's a mashup; although some purists might argue more than one API needs to be used to qualify for that term.
Mashups can be very simple or extraordinarily complex; for example, VirtualPlaces is a mashup of APIs provided by Amazon Web Services, Weather.com, Flickr, MSN Search, Feedmap and GeoURL.
If you'd like to start messing around in the world of mashups, there's some very good mashup tutorials here.
What are social networking applications?
There's a huge difference between social networking and social engineering - I've seen a few people get mixed up between the two.
Social engineering is a term related to hacking. It's the process by which a hacker or fraudster elicits information from people in order to get access to their/their company's systems. For example, a hacker may call an employee posing as a senior executive and ask for details relating to a certain client in order to access the profile and create havoc.
Generally speaking, social networking services relating to the web are where a group of people launch a highly interactive service based on common interests between users and easy to use communications tools to detail and promote those interests to others.
They then invite their friends and colleagues to join and encourage them to also to invite people they know who have similar interests. Introductions are then made between these people that have been invited throughout the various tiers of the process.
Via common connections these processes connect businesses to consumers, consumers to consumers and businesses to businesses whom otherwise may not have met. It also helps establish a network of credibility - "oh, X knows Y so Y must be ok". If Y is making a recommendation about a product or service, then that single recommendation may wield a great deal of purchasing influence.
A great example of social networking is the hugely popular MySpace.com - an online community that lets you meet your friends' friends and colleagues. A single profile can generate a little "world" of people who have similar interests, with these worlds eventually overlapping with other worlds. It's useful to the user and a marketers dream :). MySpace has really found it's niche in music and band branding - many of the top bands in the world have MySpace pages.
Another pioneer that has become extraordinarily successful is Friendster.
From an ecommerce aspect, LinkedIn is a great example, with over 4 million members. I've been a member of LinkedIn for a while and it's very interesting to see who knows whom. Big players such as Google and Adobe have representation in the network, allowing LinkedIn members a route via their network of connections to some of the decision makers within large companies.
The popularity of social networking applications reinforces the validity of the theory of "six degrees of separation"; that is, that any two human beings, regardless of age, color, creed or social status have some sort of connection within five intermediaries. For example:
- I know Fred, who works in the Department of Commerce
- Fred regularly communicates with his boss
- His boss meets with the agency head once a week
- The agency head communicates with one of the President's inner circle
- That person lunches with the President every month to discuss issues.
While I'm based in Australia and have never met the President of the USA, I have "connections" to him, only separated by 5 degrees. Web based social networking applications will probably decrease the number of degrees of separation for millions of people in the years ahead.
The humble blog can also be considered to be a form of social networking application. It invites others to comment on items and the blog itself "pings" another blog when a post is made that relates to the other blog. It's a more insular world, but very effective nonetheless for building large networks. Many bloggers, through the exchange of links and post quoting, build up huge networks - not just of users and subscribers, but of other bloggers.
What does all this mean for online business?
For those of us who have established online businesses as somewhat of a way to avoid human interaction, I have bad news :). Heavy participation with our audiences is becoming increasingly expected. In the years ahead, it will become more difficult to have a successful, fully automated site where you can take off for a week and ignore what's happening.
A great example of this is blogging. I've read a few stories where once successful bloggers became a little slack in updating and lost their visitors very, very quickly - forever. Blogging is a powerful tool that can become useless if it isn't kept relatively fresh.
For those who run social networking services such as MySpace, they need to continually have their finger on the pulse of what their users want. Many social networking services will spring up during the years ahead, but I believe relatively few will thrive due to this frenetic pace of adapting to user demands.
Given the availability and relative ease of implementation of mashups, people will also demand more from us smaller players. For example, not so long ago, having a basic map to your premises on your site was a great customer service. In the time ahead, people will want detailed instructions of getting to your premises from wherever they are - street by street, road by road, turn by turn - complete with photos of surrounding buildings and landmarks.
It appears that thinking too hard is becoming an optional extra for human existence ;) - although, with many of us becoming "specialists" in given areas in our lives and with data overload becoming a real problem, perhaps we just don't have the headspace for thinking about "menial" issues any more.
General user communities will also become increasingly important - using forums as a marketing tool only or just for traffic generation will fall by the wayside in many cases - it's already very tough to make a forum succeed. Clients and visitors want to interact with each other, but also with you and the wider related communities. It's all very tribal :).
Getting into Web 2.0 - the easy way
There's no need to start scurrying to implement mashup applications right away - simple as they are in many cases, there's still a learning curve and developing your own API's is a more complex task.
For starters, I suggest if you have content you wish for others to reproduce, while offering articles directly from your site is a great way to go, you may want to consider automating this somewhat using an RSS web feed - it's pretty simple to implement.
In doing this, you may also be able to get broader exposure by providing data for mashup developers to include in their applications and for industry commentators and journalists to have an easy way to keep up to date with what's happening in your sector. It doesn't have to be just articles you use. You may have a catalog with items containing technical specifications which could be useful to other sites. Just be sure that there's an easy way for the person viewing the content on the other site to make their way back to you.
Invite developers to comment on your feeds; ask them how it should evolve. You may find some of them wanting to collaborate with you in improving your feeds in a way that will benefit you as much as them.
If your site doesn't offer content for reproduction, I suggest starting a blog or a forum and implementing an RSS feed around that. It's just a matter of posting news items from your industry, but not just repeated in a parrot-type fashion; inject your own spin and opinions and relate it back to your own business if possible. Encouraging comment is also important, it keeps a topic alive and interesting.
If time allows, get involved with other social networking services; use them as a launching pad for getting the word out about your company - just as many bands are doing via MySpace.
While the new "connected" generation presents us all with many challenges, especially those of us whom are smaller players; there's some great opportunities as well. The popularity of Web 2.0, mashups and social networking applications will allow for viral marketing in ways and with reach not possible before.
Social Network Fatigue and the Missing Web 2.0 Address Book
SunFeb 11 2007 Tim O'Reilly
Jon Udell just wrote a thought-provoking piece about the difficulty of new social networks reaching critical mass, and the obvious fact that there already is an uber-social network at critical mass, if only we can make things interoperate:
Years ago at BYTE Magazine my friend Ben Smith, who was a Unix greybeard even then (now he’s a Unix whitebeard), made a memorable comment that’s always stuck with me. We were in the midst of evaluating a batch of LAN email products. “One of these days,” Ben said in, I think, 1991, “everyone’s going to look up from their little islands of LAN email and see this giant mothership hovering overhead called the Internet.”
Increasingly I’ve begun to feel the same way about the various social networks. How many networks can one person join? How many different identities can one person sanely manage? How many different tagging or photo-uploading or friending protocols can one person deal with?
Recently Gary McGraw echoed Ben Smith’s 1991 observation. “People keep asking me to join the LinkedIn network,” he said, “but I’m already part of a network, it’s called the Internet.”
Jon very much echoes my own sentiments. What really needs to be done is not just to connect the various social networks that do exist in internet network-of-networks style, but also to social-network enable our real social network apps: our IM, our email, our phone. Where, I keep asking vendors, is the Web 2.0 address book?
When one of the big communications vendors (email, IM OR phone) gets this right, simply by instrumenting our communications so that the social network becomes visible (and under the control of the user), it seems to me that they could blow away a lot of the existing social network froth. That being said, when I've had this conversation with Reid Hoffman of LinkedIn, he's pointed out that he's well aware of that possibility, and has been working for years to layer additional value on top of the raw social network data. And he's very right about that.
To use Ben Smith's analogy about the internet as mother ship: if you were a proprietary LAN vendor trying to fight the internet, it was game over. But if you were a LAN vendor who was on the right bandwagon, you became Cisco.
Web 2.0 Social Networking Apps List
by robyn on December 12, 2005
- Social Bookmarking (Gewinner: del.icio.us)
- Web 2.0 Start Pages (Netvibes)
- Online To Do Lists (Voo2do)
- Peer Production News (Digg)
- Image Storage and Sharing (Flickr)
- 3rd Party Online File Storage (Openomy)
- Blog Filters (Memeorandum)
- Grassroots Use of Web 2.0 (Katrina List Network)
- Web-Based Word Processing (Writeley)
- Online Calendars (Calendarhub)
- Project Management & Team Collaboration (Basecamp)
What are you opinions? I like Trumba for a good calendar and BackPackIt for To Do Lists. I've not heard of Calendarhub and Voo2do, but you can bet I'll be checking them out now.
Web 2.0, social networking can endanger corporate security, analyst says
October 02, 2007 (Computerworld) -- With the Web becoming central to the way companies do business, cybercriminals are taking increasing advantage of Web 2.0 and social networking sites to launch attacks, according to IDC analyst Christian Christiansen.
The Web isn't the benign resource for information that people once saw it as, said Christiansen, who spoke today at Kaspersky Lab Inc.'s Surviving CyberCrime conference in Waltham, Mass. "One of the things that's happened that's disconcerting -- and it's been growing over the last 10 years -- is the blending of people's private lives with their corporate lives," he said.
Employees' personal lives -- their online shopping habits and interactions with friends and families -- get intermingled with the interactions they have at work with customers, fellow employees, partners and suppliers, he said. "So that creates a perforated perimeter where there isn't a hard, fast separation between the corporate world and the personal world," he said.
The problem is that employees don't always follow their companies' security policies -- probably because they don't know what those policies are, just as they don't know what their companies' acceptable use policies are. The result: employees don't know what's allowed and what they're barred from doing. Sometimes, Christiansen said, the very people who set up the corporate policies don't even follow them.
Problems also occur when an IT department no longer controls the products being connected to the corporate network. That list could include everything from smart phones to new and untested laptop and desktop computers to various application environments, he said.
"We're seeing the realization that the internal security problem is growing -- the threats are coming from inside the network," he said.
The latest threats to network security now are coming from collaboration and Web 2.0 environments -- where employees casually click on links that could lead them to malware. And they're coming from the wide variety of devices that may be accessing private as well as corporate networks, he said.
"We're seeing a change in the threat environment," he said. "Instead of the threats -- the malicious code -- being distributed as e-mail attachments, we're seeing more and more that they're being embedded in Web 2.0 links," he said. "In the past, what you saw was an immediate effect. Now we're seeing much greater levels of subterfuge and much more sophisticated attacks."
To better avoid potential problems, IT departments need to control user behavior, the types of devices being used to access information, the applications being used and content contributions.
"Risk reduction requires policy managements and layered protection -- at the gateway to the Internet as well as at the endpoint [desktops, laptops and servers]," he said. "You need a whole series of checks and balances."
Defining Web 2.0 Social Networking
A definitive definition of a Web 2.0 “Social Network” is as hard to come by as a definitive definition of Web 2.0 itself.
Tim Berners-Lee recently noted (see “Evolving from Web 1.0 to Web 2.0”) the seeming futility of encapsulating fluid and amorphous interactive applications into digital sound bites saying: “I think Web 2.0 is of course a piece of jargon, nobody even knows what it means.”
Nobody may know what a Web 2.0 “Social Network” means either.
In “Del.icio.us is already a social network,” Fred Stutzman takes exception, and rightly so, with the notion that Del.icio.us, a social book marking service, would not be considered a social network.
Not only is the tag line of Del.icio.us “social bookmarking,” two of its three call-out slogans to users foster personal communication and interaction, or networking socially:
all your bookmarks in one place
bookmark things for yourself and friends
check out what other people are bookmarking
Del.icio.us wants its users to embrace social bookmarking to network socially:
What is del.icio.us?
del.icio.us is a collection of favorites - yours and everyone else's.What is social bookmarking?
del.icio.us is a social bookmarking website, which means it is designed to allow you to store and share bookmarks on the web…First, you can get to your bookmarks from anywhere, no matter whether you're at home, at work, in a library, or on a friend's computer.
Second, you can share your bookmarks publicly, so your friends, coworkers, and other people can view them for reference, amusement, collaboration, or anything else..
Third, you can find other people on del.icio.us who have interesting bookmarks and add their links to your own collection. Everyone on del.icio.us chooses to save their bookmarks for a reason. You have access to the links that everyone wants to remember…
Stutzman on how people create and share bookmarks to connect through a “sociality in the network” of del.icio.us:
Social networks connect us - something that del.icio.us has been doing since its very inception. The difference here is that the link is the object center of the sociality in the network. It is most useful to compare to Flickr. In Flickr, we browse photographs through a number of paths - tags, groups, pools - and while the photographs are still the center of the network, these social features enable a deeper form of sharing and browsing. The social aspects compliment the core content, rather than replacing it.
I believe the del.icio.us will stick firmly to keeping the link the object center of the network. By adding social features, we'll have new ways to find content - and we'll be able to find out more about the people who share content. This will be very valuable to those who use del.icio.us for research and analysis - and it stands to unite communities of practice. When I see 10 other people bookmarking an obscure link about social networks, I want to know more about those people. With lightweight social features, we all stand to gain from our link-centric connections.
Del.icio.us’s link-centric connections foster social networking just as the MySpace and Facebook profile-centric connections do.
MySpace calls out to its “friends” to “share photos, journals and interests,” Facebook calls out to its “students” to “share information” and Del.icio.us calls out to everyone to “share links.”
In Web 2.0 Social Networking, sharing is interactive caring.
The Impact of Web 2.0 and Emerging Social Network Models
The term Web 2.0 describes a new generation of websites allowing users to share content and create networks in online public forums. To kick off the session, Moderator Peter Schwartz, Chairman, Global Business Network, USA, asked Chad Hurley, Co-Founder and Chief Executive Officer, YouTube, USA, for his definition. "Web 2.0 is an overused buzzword, but there is in fact a movement to leverage the power of people and community," Hurley replied. His online video-sharing service grew out of the frustration that he and his partner experienced with exchanging and distributing their own videos.
Caterina Fake, Founder, Flickr, USA, called Web 2.0 "a return to the roots of the Web. What was exciting in 1995-96 was that everyone was publishing. But we all had to be power users and learn html (a website design program). We got distracted by the dot.com and e-commerce wave, but now with more people online and with more access to broadband, we are getting back to the roots."
"I am happy to hear that the Internet is finally going back to the people," said Viviane Reding, Commissioner, Information Society and Media, European Commission, Brussels. "My principle is: keep the hands of the government off the Internet."
Mark G. Parker, President and Chief Executive Officer, Nike, USA, described an interactive element of his company’s website that allows customers to design their own shoes. "People are making sneakers, creating a customized product," he said. "These new developments on the Web are enabling a fundamental shift in power to the consumer."
William H. Gates III, Chairman, Microsoft Corporation, USA, mentioned developments on the near horizon:
- Television programming on demand via the Internet: "TV is still broadcast," he said. "As you get the TV to the Internet, you can see what you want when you want."
- High-quality three-dimensional content
- A viable micro-payments system
- A digital rights model to protect content producers: "Because there is no digital rights model, content creators are hesitant to dive in," Gates said.
Reding also picked up on the digital copyrights issue. "All of the rules are for old media," she said. "We must have a new model for IPR [intellectual property rights] and content production."
Schwartz and Challenger Dennis Kneale, Managing Editor, Forbes Magazine, USA, asked the panellists to examine what, if anything, these new technologies mean for business and society. "We are changing the world because everybody has a voice," reported Hurley. Gates noted that, "These are tools of empowerment. They are not changing society but we’re letting people express themselves." He did, however, add that "there is incredible promise in the areas that most interest me, education and healthcare."
On the business side, Parker warned companies to not ignore Web 2.0. "If you don’t embrace this you are at risk," he said. "I think it could be deadly."
If today is the age of Web 2.0, that leaves an obvious question about the future: will there be a Web 3.0? "If the next buzzword is Web 3.0, I think we have a lack of creativity in buzzwords," quipped Gates. "But as we get 3D and speech, and as we decide that things like textbooks do not have to be on paper, we’ll have enough developments in the next 10 years for four new buzzwords," he said.
Web 2.0 Social Networks: Cool but marginal and unprofitable?
Is Cisco making the RIGHT BET on Social Networks?
Not according to Om Malik who offers a "News flash" for Cisco:
This social software thing – it is too marginal, doesn’t make money and can’t make you cool.
Really? Apparently Rupert Murdoch, News Corp. CEO and proud corporate owner of MySpace, didn’t get the memo!
Why did News Corp. bring MySpace into its space?
Murdoch shared his strategic thought process on the acquisition and its now far from marginal financial impact on News Corp. at the Media Summit in New York City last month:
Two to two and a half years ago we were living in a booming economy…but print advertising and television advertising was not growing at the rate they would have in the past…we looked at where the money was going, a lot of money was going to the Internet…it was time to move there seriously.
MySpace is now growing faster than we expected, we had to almost put the brakes on it physically, handling the traffic, we needed a lot more hardware, a lot more servers…not a vast amount of money needed to support, though…
The advertising revenues have gone from basically nothing to $25 million a month, growing monthly, 30% every quarter, next year search revenue from Google kicks in…we are looking at a billion dollars in revenue…
Murdoch projected that revenues from MySpace and other Fox Interactive Media sites such as IGN could represent as much as 10 percent of News Corp.'s total revenue within the next five years.
Malik's “news flash” for Cisco is headlined “Cisco’s wrong bet on Social Networks.”
Perhaps it was Murdoch’s News Corp. MySpace $1 billion revenues “news flash” in NYC that is reinforcing Cisco’s RIGHT BET on Social Networks!
Web 3.0: When Web Sites Become Web Services
Written by Alex Iskold / March 19, 2007 12:11 PM
Today's Web has terabytes of information available to humans, but hidden from computers. It is a paradox that information is stuck inside HTML pages, formatted in esoteric ways that are difficult for machines to process. The so called Web 3.0, which is likely to be a pre-cursor of the real semantic web, is going to change this. What we mean by 'Web 3.0' is that major web sites are going to be transformed into web services - and will effectively expose their information to the world.
The transformation will happen in one of two ways. Some web sites will follow the example of Amazon, del.icio.us and Flickr and will offer their information via a REST API. Others will try to keep their information proprietary, but it will be opened via mashups created using services like Dapper, Teqlo and Yahoo! Pipes. The net effect will be that unstructured information will give way to structured information - paving the road to more intelligent computing. In this post we will look at how this important transformation is taking place already and how it is likely to evolve.
The Amazon E-Commerce API - open access to Amazon's catalog
We have written here before about Amazon's visionary WebOS strategy. The Seattle web giant is reinventing itself by exposing its own infrastructure via a set of elegant APIs. One of the first web services opened up by Amazon was the E-Commerce service. This service opens access to the majority of items in Amazon's product catalog. The API is quite rich, allowing manipulation of users, wish lists and shopping carts. However its essence is the ability to lookup Amazon's products.
Why has Amazon offered this service completely free? Because most applications built on top of this service drive traffic back to Amazon (each item returned by the service contains the Amazon URL). In other words, with the E-Commerce service Amazon enabled others to build ways to access Amazon's inventory. As a result many companies have come up with creative ways of leveraging Amazon's information - you can read about these successes in one of our previous posts.
The rise of the API culture
The web 2.0 poster child, del.icio.us, is also famous as one of the first companies to open a subset of its web site functionality via an API. Many services followed, giving rise to a true API culture. John Musser over at programmableweb has been tirelessly cataloging APIs and Mashups that use them. This page shows almost 400 APIs organized by category, which is an impressive number. However, only a fraction of those APIs are opening up information - most focus on manipulating the service itself. This is an important distinction to understand in the context of this article.
The del.icio.us API offering today is different from Amazon's one, because it does not open the del.icio.us database to the world. What it does do is allow authorized mashups to manipulate the user information stored in del.icio.us. For example, an application may add a post, or update a tag, programmatically. However, there is no way to ask del.icio.us, via API, what URLs have been posted to it or what has been tagged with the tag web 2.0 across the entire del.icio.us database. These questions are easy to answer via the web site, but not via current API.
Standardized URLs - the API without an API
Despite the fact that there is no direct API (into the database), many companies have managed to leverage the information stored in del.icio.us. Here are some examples...
Delexa is an interesting and useful mashup that uses del.icio.us to categorize Alexa sites. For example, here are the popular sites tagged with the word book:

Another web site called similicio.us uses del.icio.us to recommend similar sites. For example, here are the sites that it thinks are related to Read/WriteWeb.
So how do these services get around the fact that there is no API? The answer is that they leverage standardized URLs and a technique called Web scraping. Let's understand how this works. In del.icio.us, for example, all URLs that have the tag book can be found under the URL http://del.icio.us/tag/book; all URLs tagged with the tag movie are at http://del.icio.us/tag/movie; and so on. The structure of this URL is always the same: http://del.icio.us/tag[TAG]. So given any tag, a computer program can fetch the page that contains the list of sites tagged with it. Once the page is fetched, the program can now perform the scraping - the extraction of the necessary information from the page.
How Web Scraping Works
Web Scraping is essentially reverse engineering of HTML pages. It can also be thought of as parsing out chunks of information from a page. Web pages are coded in HTML, which uses a tree-like structure to represent the information. The actual data is mingled with layout and rendering information and is not readily available to a computer. Scrapers are the programs that "know" how to get the data back from a given HTML page. They work by learning the details of the particular markup and figuring out where the actual data is. For example, in the illustration below the scraper extracts URLs from the del.icio.us page. By applying such a scraper, it is possible to discover what URLs are tagged with any given tag.

Dapper, Teqlo, Yahoo! Pipes - the upcoming scraping technologies
We recently covered Yahoo! Pipes, a new app from Yahoo! focused on remixing RSS feeds. Another similar technology, Teqlo, has recently launched. It focuses on letting people create mashups and widgets from web services and rss. Before both of these, Dapper launched a generic scraping service for any web site. Dapper is an interesting technology that facilitates the scraping of the web pages, using a visual interface.
It works by letting the developer define a few sample pages and then helping her denote similar information using a marker. This looks simple, but behind the scenes Dapper uses a non-trivial tree-matching algorithm to accomplish this task. Once the user defines similar pieces of information on the page, Dapper allows the user to make it into a field. By repeating the process with other information on the page, the developer is able to effectively define a query that turns an unstructured page into a set of structured records.
The net effect - Web Sites become Web Services
Here is an illustration of the net effect of apps like Dapper and Teqlo:

So bringing together Open APIs (like the Amazon E-Commerce service) and scraping/mashup technologies, gives us a way to treat any web site as a web service that exposes its information. The information, or to be more exact the data, becomes open. In turn, this enables software to take advantage of this information collectively. With that, the Web truly becomes a database that can be queried and remixed.
This sounds great, but is this legal?
Scraping technologies are actually fairly questionable. In a way, they can be perceived as stealing the information owned by a web site. The whole issue is complicated because it is unclear where copy/paste ends and scraping begins. It is okay for people to copy and save the information from web pages, but it might not be legal to have software do this automatically. But scraping of the page and then offering a service that leverages the information without crediting the original source, is unlikely to be legal.
But it does not seem that scraping is going to stop. Just like legal issues with Napster did not stop people from writing peer-to-peer sharing software, or the more recent YouTube lawsuit is not likely to stop people from posting copyrighted videos. Information that seems to be free is perceived as being free.
The opportunities that will come after the web has been turned into a database are just too exciting to pass up. So if conversion is going to take place anyway, would it not be better to rethink how to do this in a consistent way?
Why Web Sites should offer Web Services
There are several good reasons why Web Sites (online retailers in particular), should think about offering an API. The most important reason is control. Having an API will make scrapers unnecessary, but it will also allow tracking of who is using the data - as well as how and why. Like Amazon, sites can do this in a way that fosters affiliates and drives the traffic back to their sites.
The old perception is that closed data is a competitive advantage. The new reality is that open data is a competitive advantage. The likely solution then is to stop worrying about protecting information and instead start charging for it, by offering an API. Having a small fee per API call (think Amazon Web Services) is likely to be acceptable, since the cost for any given subscriber of the service is not going to be high. But there is a big opportunity to make money on volume. This is what Amazon is betting on with their Web Services strategy and it is probably a good bet.

Conclusion
As more and more of the Web is becoming remixable, the entire system is turning into both a platform and the database. Yet, such transformations are never smooth. For one, scalability is a big issue. And of course legal aspects are never simple.
But it is not a question of if web sites become web services, but when and how. APIs are a more controlled, cleaner and altogether preferred way of becoming a web service. However, when APIs are not avaliable or sufficient, scraping is bound to continue and expand. As always, time will be best judge; but in the meanwhile we turn to you for feedback and stories about how your businesses are preparing for 'web 3.0'.
Web 3.0 debates
There is considerable debate as to what the term Web 3.0 means, and what a suitable definition might be.
Transforming the Web into a database
The first step towards a "Web 3.0" is the emergence of "The Data Web" as structured data records are published to the Web in reusable and remotely queryable formats, such as XML, RDF and microformats. The recent growth of SPARQL technology provides a standardized query language and API for searching across distributed RDF databases on the Web. The Data Web enables a new level of data integration and application interoperability, making data as openly accessible and linkable as Web pages. The Data Web is the first step on the path towards the full Semantic Web. In the Data Web phase, the focus is principally on making structured data available using RDF. The full Semantic Web stage will widen the scope such that both structured data and even what is traditionally thought of as unstructured or semi-structured content (such as Web pages, documents, etc.) will be widely available in RDF and OWL semantic formats.
An evolutionary path to artificial intelligence
Web 3.0 has also been used to describe an evolutionary path for the Web that leads to artificial intelligence that can reason about the Web in a quasi-human fashion. Some skeptics regard this as an unobtainable vision. However, companies such as IBM and Google are implementing new technologies that are yielding surprising information such as making predictions of hit songs from mining information on college music Web sites. There is also debate over whether the driving force behind Web 3.0 will be intelligent systems, or whether intelligence will emerge in a more organic fashion, from systems of intelligent people, such as via collaborative filtering services like del.icio.us, Flickr and Digg that extract meaning and order from the existing Web and how people interact with it.
The realization of the Semantic Web and SOA
Related to the artificial intelligence direction, Web 3.0 could be the realization and extension of the Semantic web concept. Academic research is being conducted to develop software for reasoning, based on description logic and intelligent agents. Such applications can perform logical reasoning operations using sets of rules that express logical relationships between concepts and data on the Web.
Sramana Mitra differs on the viewpoint that Semantic Web would be the essence of the next generation of the Internet and proposes a formula to encapsulate Web 3.0.
Web 3.0 has also been linked to a possible convergence of Service-oriented architecture and the Semantic web.
Web 3.0 is also called the "Internet of Services", i.e. besides the human readable part of the web there will be machine accessible SOA services which can be combined/orchestrated to higher level of services.
Evolution towards 3D
Another possible path for Web 3.0 is towards the 3 dimensional vision championed by the Web3D Consortium. This would involve the Web transforming into a series of 3D spaces, taking the concept realised by Second Life further.This could open up new ways to connect and collaborate using 3D shared spaces.
Web 3.0 as an "Executable" Web Abstraction Layer
Where Web 1.0 was a "read-only" web, with content being produced by and large by the organizations backing any given site, and Web 2.0 was an extension into the "read-write" web that engaged users in an active role, Web 3.0 could extend this one step further by allowing people to modify the site or resource itself. With the still exponential growth of computer power, it is not inconceivable that the next generation of sites will be equipped with the resources to run user-contributed code on them.The "executable web" can morph online applications into Omni Functional Platforms that deliver a single interface rather than multiple nodes of functionality.
Web 3.0 as it relates to socio-technological values
The inclusion of the concept of a Web 0.0 as a pre-existing "real-world" sensual web has been proposed. In that context Web 3.0 is the end of a loop where integration of technologies for digital networking and processing is digested and non dissociable of the new "real-world". In this definition, Web 3.0 is "the biological, digital analog web where information is made of a plethora of digital values coalesced for sense and linked to the real-world by analog interfaces."
Today's Web 3.0 Nonsense Blogstorm
Thu Oct 4 2007
Tim O'Reilly
If Web 2.0 was so hot, how about Web 3.0? This has been a recurrent theme of would-be meme-engineers who want to position their startup as the next big thing. Nova Spivack started it by describing the as-yet-to-be-revealed Radar Networks as Web 3.0, but now Jason Calacanis has his competing definition, neatly tailored to fit his own mahalo.com. The resulting storm of derision is entirely to be expected.
Now, I of all people should be hesitant to say "Web 3.0 is a stupid idea" because of course, that same criticism was leveled at "Web 2.0." But there are a couple of important distinctions:
- Web 2.0 started out as the name of a conference! And that name had a very specific purpose: to signify that the web was roaring back after the dot com bust! The 2.0 bit wasn't about the technology, but about the resurgence of interest in the web. When we came up with the idea back in 2003, a lot of programmers were out of work, and there was a general lack of interest in web applications. But we saw a resurgence coming, and designed a conference to tell the story of what was going to be different this time.
- I then spent some serious time trying to identify the characteristics of companies that had survived the dotcom bust and the best of the new companies and sites I saw coming up. That paper, What is Web 2.0?, was a retrospective description based on a broad swath of successful companies, not tailor-made for a single company or project that has yet to make its mark.
So for starters, I'd say that for "Web 3.0" to be meaningful we'll need to see a serious discontinuity from the previous generation of technology. That might be another bust and resurgence, or more likely, it will be something qualitatively different. I like Stowe Boyd's musings on the subject:
Personally, I feel the vague lineaments of something beyond Web 2.0, and they involve some fairly radical steps. Imagine a Web without browsers. Imagine breaking completely away from the document metaphor, or a true blurring of application and information. That's what Web 3.0 will be, but I bet we will call it something else.
I'm with Stowe. There's definitely something new brewing, but I bet we will call it something other than Web 3.0. And it's increasingly likely that it will be far broader and more pervasive than the web, as mobile technology, sensors, speech recognition, and many other new technologies make computing far more ambient than it is today.
But in any event, the next meme to take hold will be broad based, with many proof points, each showing another aspect of the discontinuity. Anyone who says his startup is the sign of this next revolution is just out of touch.
I find myself particularly irritated by definitions of "Web 3.0" that are basically descriptions of Web 2.0 (i.e. new forms of collective intelligence applications) that justify themselves as breakthroughs only by pretending that Web 2.0 is somehow about ajax, mashups, and other client side technologies. For example, see Nova Spivack's post today in response to Jason's:
Web 3.0, in my opinion is best defined as the third-decade of the Web (2010 - 2020), during which time several key technologies will become widely used. Chief among them will be RDF and the technologies of the emerging Semantic Web. While Web 3.0 is not synonymous with the Semantic Web (there will be several other important technology shifts in that period), it will be largely characterized by semantics in general.
Web 3.0 is an era in which we will upgrade the back-end of the Web, after a decade of focus on the front-end (Web 2.0 has mainly been about AJAX, tagging, and other front-end user-experience innovations.)
I have some sympathy with Nova's attempt to rescue the Web 3.0 term by tying it to a timeline rather than to any particular technology (Windows 95 anyone?), but I find the idea that Web 2.0 is about "front end" technologies to be so ridiculous as to discredit the whole idea. Google is the pre-eminent Web 2.0 success story, and it's all back-end! Every major web 2.0 play is a back-end story. It's all about building applications that harness network effects to get better the more people use them--and you can only do that with a richer back end. Nova is right that Semantic Web technologies may come increasingly into play in some sites, but I don't think that's a given.
As I wrote in a comment on Nova's blog:
Alas, I find the Web 3.0 arguments as clear evidence that the proponents don't understand Web 2.0 at all. Web 2.0 is not about front end technologies. It's precisely about back-end, and it's about meaning and intelligence in the back end.
The real difference between Web 2.0 and the semantic web is that the Semantic Web seems to think we need to add new kinds of markup to data in order to make it more meaningful to computers, while Web 2.0 seeks to identify areas where the meaning is already encoded, albeit in hidden ways. E.g. Google found meaning in link structure (a natural RDF triple); Wesabe is finding it in spending patterns.
There are sites (geni.com comes to mind) that create narrow-purpose cases where people add structured meaning, and I think we'll find lots more of these. But I think that the big difference is in the amount of noise you accept in your meaningful data, and whether you think grammar evolves from data or is imposed upon it. Web 2.0 applications are fundamentally statistical in nature, collective intelligence as derived from lots and lots of input at global scale.
See my various posts on Web 2.0 vs. the Semantic Web.
Meanwhile, Web 2.0 was a pretty crappy name for what's happening (Microsoft's name, Live Software, is probably the best term I've seen), so I don't see why we'd want to increment it to Web 3.0. But when people ask me what I think Web 3.0 will be, I don't think of the semantic web at all.
What are things that will give a qualitative leap beyond what we experience today?
I think it's the breaking of the keyboard/screen paradigm, and the world in which collective intelligence emerges not from people typing on keyboards but from the instrumentation of our activities.
In this sense, I'd say that Wesabe and Mint, which turn our credit card into a sensor telling us about tracks we're leaving in the real world, or Jaiku, which turns our phone into a sensor for a smart address book, or Norwich Union's "Pay as you drive" insurance, are more early signals of something I'd call "Web 3.0" than Semantic Web applications are.
Let's just call the Semantic Web the Semantic Web, and not muddy the water by trying to call it Web 3.0, especially when the points of contrast are actually the same points that I used to distinguish Web 2.0 from Web 1.5. (I've always said that Web 2.0 = Web 1.0, with the dot com bust being a side trip that got it wrong.)
Nova did have a great response to this comment, which he sent to me in email, and which I reproduce here with his permission:
I would actually say that I agree with much of what you state in your comment on my post. EXCEPT for one thing. The Semantic Web is completely orthogonal to the issue of collective intelligence. It can in fact be used as a better backend for existing "Web 2.0" folksonomies, or it could be used for expert systems -- it is not just a top-down framework.
It would not be technically correct to say that Semantic Web is not about statistics or that it is not about deriving structure from what is already there in the data -- The Semantic Web is just a way of encoding whatever it is that you know (it could have been derived, or not).
So you could use statistics, or mining, or the wisdom of crowds, to markup data -- but then where do you store and share what you have learned about that data? The Semantic Web proposes a richer framework for storing and publishing that metadata. It is completely independent of how the metadata is generated. It's just a better way to share that metadata.
Using string tags and microformats, or XML tags for that mater, are just different ways of marking up data. RDF and OWL are also just different ways of marking up data -- but they are BETTER ways of doing it. They have much more power, they are more open, they are more extensible, they support bottom-up collective intelligence better in fact.
This is why I propose that if we MUST use ridiculous terms like Web 1.0, Web 2.0, Web 3.0, then let's not tie them to a particular technology. Let's just tie them to decades, in which many technologies happen together.
Let's face it the world is not as cut-and-dried as people would like to make it seem. RDF started in Web 1.0 in fact!!!
I think that there is a distinct difference in the structure of the Web over time however. RDF enables us to move the Web from a file-server to something more like a database. It enables a web of data. It does for data what hypertext does for text -- I call that hyperdata. This is certainly something new and very useful, but it will depend on what people ultimately do with it.
At Radar we are taking a Web 2.0 approach to Web 3.0. Essentially we are making use of user-generated content and the wisdom of crowds, as well as statistical analysis, mining and machine learning. Combined we have something much more powerful than either on its own: a true platform for collective intelligence. The fact that we happen to store the data using the Semantic Web is a convenience -- it makes our data more extensible and reusable by others. But ultimately the data itself comes from users.
Some of this makes sense to me. He's certainly right that the Semantic Web may prove very useful for many classes of intelligent applications. But the proof of the pudding is in the eating, as my mother used to say.