Archive for February, 2009

Finding deals is always a hot topic, especially under current economic conditions. dealsea/bensgargin/dealnews are just samples of hundreds of deal websites that collect sales/deal information online. But the nature of the collection process determines that you are always going to be behind the deal publishers. Is there a way to act before the deal is even published?

Let’s take a look at how Neiman Marcus releases their sales items: admins first updates the inventory, then the website releases/links these items onto the sales page. Those lucky web-surfers catch a glimpse of the 60% off prada bags while refreshing the sales page, and in one second, those deals/steals are gone.

It turns out that with some engineering efforts, it’s possible to grab on deals before they even get published onto the sales page.

Imagine you are one of those Prada fans that are desperate for Prada handbags. Here is what you need to do to get the bags you want:

  1. Scrape the entire inventory of Neiman Marcus.
    Neiman Marcus website is designed in such a way that inventory of each product can be accessed via: http://www.neimanmarcus.com/store/catalog/prod.jhtml?itemId=prodXXXXXXXX
    So if you can enumerate all possible product IDs, you can scrape the entire inventory of Neiman Marcus. With cloud computing, it took me less than 8 hours to collect the inventory using 16 nodes.
  2. Filter out all items but Prada Bags.
    From millions of item pages, you can classify the pages by brand, and item type. If you are only interested in Prada handbags, that will narrow down the item list to a couple of hundred items only. Note you want to store available items as well as unavailable ones, as they might turn available with a sales tag and deep cut price.
  3. Online monitoring all Prada Bags.
    Now that you have collected the item list, all you need to do is to constantly check whether there is any deal popping out on these items in your background processes. An email service could come in handy to alert you as soon as your websearch routine returns a bag updated with a sales tag in the inventory.

I know you would ask: how effective is this and how soon do I have to act upon the deals?

After getting to see all the discounted prada bags, what I found out was: it usually took websites over 20 minutes to publish/update their inventory change, since they have millions of products in stock. But even if you are on a single PC, it took only 5 minutes to check through the entire Prada bag list. So you win over on average 15 minutes, which is about enough for you to call your friends, search reviews and make orders!

I actually built a system to do exactly mentioned above, not because I love Prada bags (although I did buy a few after seeing the discount and the price). I was looking for building some interesting internet search applications, and this turns out to be perfect given all the technical elements in it: web crawling, cloud computing, text processing, and web programing.

*ps: Again, I don’t recommend/encourage anyone to do this for commercial purpose. It’s all about getting your friends happy and convincing them IT can do good deeds!

Comments 1 Comment »

BootCamp provides Mac users with a close-to-perfect solution to run dual-boot on their mac machines. You can access the windows partition on a mac easily with BootCamp, but not the other way around. 🙁

I would like to access my Mac partition while running Windows XP, and for that, there are bunch of third party solutions: MacDrive, TransMac, and HFSExplorer. I tried all three and I’m reporting some first-hand experience with these products. It’s by no means a thorough analysis with any side-by-side detailed comparisons. Even worse, some of the glitches I ran into might not even due to any of these systems, but at the end of the day, it’s your gut feeling that really makes the call, isn’t it? 🙂

  • MacDrive provides the best interface and functionality. The Mac drive looks, feels and runs exactly like a windows drive. I fell in love with it for a few days, but after endless of failures to reboot, I finally located some serious pitfalls for MacDrive. For one, I noticed some weird behavior after enabling write access on Mac drive. It somehow modifies the mac indexing every time you modifies the Mac drive via Windows XP, so once boot in Mac, lookup has to re-index the drive (time consuming). Moreover, if you have the System Restore turn on, and you disable the write access to your Mac drive, then the system gets very fragile. I guess one reason is System Restore tries to make backup copies on the disk and fails due to read-only access. However, when I tried to disable System Restore, I wasn’t able to do so for some reason. I ended up un-install MacDrive, and all problems seem gone.
  • TransMac is very stable and simple to use with its file-explorer like interface. One note: you need to make sure you have administrator right in order to access your Mac Drive. The access to the Mac drive is done indirectly implicitly. Meaning, if you want to open a file on your Mac drive, TransMac makes a local copy onto your windows disk, and access from there.
  • HFSExplorer has the smallest feature set among the three. But it’s purely free. It has an interface very similar to TransMac, and data access is explicitly indirect. You have to extract the file to your windows local directory before any operation on the file.

So here is the take-away message: if you don’t want to pay anything and you know what you are doing, HFSExplorer is all you need; MacDrive and TransMac cost about the same(50$), I see MacDrive as a solution for dummies (in a nice yet unreliable way), and TransMac seems a nice balance among the three.

Comments No Comments »