Design a distributed web crawler that can efficiently crawl billions of web pages while respecting robots.txt rules and site policies. Focus on the crawler's architecture, how to manage the URL frontier, and strategies for handling duplicate content and distributed crawling.