Reflections: OmegleBot

Omegle.com allows people from around the world to converse with each other “anonymously”. It is one of those sites that let you start a text or video chat with someone online and consequentially makes you doubt the intelligence of man kind. On sites like this, text chat follows the famous “Greater Internet Fuckwad Theory” and the video chat is… Well it’s probably a phallus. Omegle started as a way for strangers to connect and talk with each other, but has since devolved and the chance of finding some meaningful conversation on it is minuscule which is a shame because random chatting is a fun concept. I would add a premium feature that administers an IQ test and matches you to someone according to that but that is an idea for a different time.

A typical Omegle chat

A typical Omegle chat

When I first discovered Omegle I quickly got tired of trying to find someone to talk to. The idea of Omegle is not new or revolutionary. IRC and chat rooms were there before but this made it as easy as can be. Since I already spent a ton of time in online social communities with people who have the same interests as me I dismissed it as a cost efficient way of communicating. I did find a use for Omegle though – there was nothing preventing me from spying on a random conversation and recording it. A nice challenge and it seemed fun. This was years before Omegle itself introduced the “Spy mode” so I guess there is something there. The concept of Spy Mode might look like something “evil” to do – spying on other people’s conversations is an ethical gray area in the real world, but is it online?
The answer to this question depends on how much you know and are aware of privacy online. In theory everything you do online can be (and is) monitored by a number of entities including your ISP (that can read all of your online activity), your operating system and other programs on your machine, a lot of routers on the Internet and at least a few governments. That is all beside the point – I thought about ethics for a second but to be honest I don’t see this as anything but a technological challenge (also, it isn’t illegal per se). My goal wasn’t to spy on people but to hack a bot together and as such I probably only ran the finished script once – to make sure all the bugs were solved. I call it the Hacker’s Mindset :).

This is how OmegleBot was born. A simple and a very quick and dirty and unrefactored python script. Before that script I didn’t have a lot of knowledge about HTTP, httplib and urllib because I used raw sockets to talk HTTP (poorly) in the past. This was a perfect project to help me understand the python libs relating to HTTP and JSON. The bot opens two simultaneous connections to Omegle and sends them both a simple greeting, “asl?”, which is the way most conversations in chat channels start. It then proceeds to proxy their conversation and also record it into a text file. The most interesting part is the post function. It started as a simple call to connection.request and evolved to include a variety of HTTP headers including a faked user-agent and referer needed to defeat some of Omegle’s “security checks”. Usually services will have more server side security checks (“never trust user input”), but unfortunately Omegle doesn’t have a choice here. Because they are open and allow anonymous chatting it leaves them with only so many ways to ensure I’m a client and I masqueraded as one well. Omegle uses the JSON protocol to pass data about events like whether the other user is typing, the message the user sent and of course when a user disconnects. Reverse engineering it was the hardest part of this project (and it wasn’t all that hard). I think the only challenge I faced was understanding why Omegle blocked the first iterations of the bot and adding various headers until I passed for a client in their book.

I also attached a sample output file with a few conversations. There is nothing interesting there nor did I capture anything interesting. All the conversations are very short which is definitely a symptom of Omegle – long and meaningful conversations are few and far between. I even sent “typing” statuses every few iterations to encourage people to converse and it didn’t help.
What can we learn from this? Masquerading as a browser is easy. Writing bots is easy. As a person on the internet you should take from this that bots are everywhere on the web. You should be aware of that because a lot of spam and fraud is done by bots – you can trivially change this bot to spam on Omegle (although ChatRoulette, a similar site has a “spam” button that might be useful against that). Radiolab even had a podcast on a bot that had an online relationship with a human. It is a fact that bots are becoming better and better at passing for human beings. Soon they might even be good enough to write a programming blog, and then what will I do?

Southpark's "dey took er jerbs" guy"

Southpark’s “dey took er jerbs” guy”


(program them, probably)

P.S. Unfortunately the bot stopped working. It can be that Omegle changed the protocol a bit, added some more security or that I have a bug. Feel free to fork it and bring it back to life!

Practical Programming: Technical Debt

This is part of the “Practical Programming” series in which I try to say something profound about the craft of programming.

Technical debt is a very important factor in any software project yet you might never even encounter it until you start working in the field and having long-term projects. Even then, technical debt is something that creeps up. It’s not a bug, nor a feature you forgot or a meeting you dozed off in. It is more of a feeling, a state, a cup that slowly fills up until it overflows.

My definition of technical debt is the difference between the code right now, and the code if it was written perfectly by a thousand coders with unlimited time. This debt includes missing documentation, no tests of any kind, no release process, no design, no consistency in the code and other such issues that are usually regarded as secondary to actual working code. The metaphor is aptly named because it also has interest – the more the technical debt grows, the harder it is to scale development, be it adding new features, adding new developers to the team or being efficient in solving bugs. In the end you go broke and declare code bankruptcy – a rewrite.
(Some great resources that explain it in more depth: Martin Fowler’s article, c2 wiki page)

Let’s unwind the stack of metaphors and get back to why is technical debt a problem – to create a functional program all you need is code that compiles, but to create a sustainable development environment you need code, design, documentation, tests, deployment procedures, etc. This difference is the technical debt of your project. If a lot of those basic things are missing from your project it’ll get tougher and tougher to make even the smallest change without ruining something. As with any debt the interest will keep on climbing until it will be obvious that a rewrite is easier than fighting the code. This step always comes. You can ignore it and expect developers to work around these problems but eventually this will stop being cost efficient – bugs will pop up everywhere taking time from developing new features and new features will be buggy and produce unexplained behavior (the old “compile it and let’s see what happens”).

There are several ways to lessen the burden of technical debt. The easiest is just slowing down the development process and allowing time for basic software development practices. This is definitely the preferred way to do your projects (hehe). The other side of that is rewriting the whole code base, adding 1 to the version number and starting a  marketing campaign – “new packaging”.

I’ve recently had to think about it and I’m advocating a hybrid solution – allocating time in each development sprint to pay the technical debt of a specific class or package in your code base. I have summarized my approach to five easy steps:

  1. Documentation – create a page for the package using whatever documentation repository you use. Document the following about the code:
    1. How should it work? This means a general overview of the correct working procedure for the code and comments, if there are any, about how the current implementation differs from this ideal.
    2. Class level documentation – a few words about each classes’ responsibilities and the interactions with other classes. If there are more than 3 classes a diagram might be needed.
    3. Other things that will need to be documented: XML and other data exchange formats, client server interactions, security measures, relevant databases, etc.
  1. Code Documentation – Document the code itself:
    1. Documentation for classes and all public functions. Use your platform’s most popular documentation framework.
    1. Document algorithms and private functions as needed.
    2. Add a header to the file. A header should contain who is responsible for the code (usually the author), a date and a short description of the file’s content. Some places will also want the license added to the code.
  1. Uni-test the code – Write at least one unit test for every public function in the class. Try to hit the major code paths. This will make refactoring easier because you’ll have the safety net of knowing the tests pass, and having one unit-test makes the mental barrier of adding more way smaller.
  2. Make easy and safe refactorings::
    1. Magic values into global constants. Strings to external files for easy L10N
    2. Don’t Repeat Yourself (DRY) problems – if some code is copy pasted make it into a helper function / its own module / whatever. Sometimes this needs to be said… I know.
    1. Run Lint on your code and see what is easy to fix.
  1. Code Review – Review the code and document the major changes you’ll need to do to make it more robust. Create tasks, put them in the backlog and allocate time for them to be fixed along-side bug fixes and features. This might seem like cheating because you’re still deferring the problem, but knowing what needs to be done is half the battle. If you have it among your other tasks it is easy to schedule it and consider it part of development instead of some concept that doesn’t contribute to the push forward.

Is there a way to write software without getting in debt? Probably, but it might not be practical. Let’s not have perfect get in the way of good enough and ask is there a way to write software without getting into much debt? Of course. The best way is identifying those moments where you are deciding between a “quick and dirty” solution and a slower but better solution, and understanding that the “quick” in “quick and dirty” is only short-term and it might be slower in the long run because of the effects of technical debt.