面试指南针,面试问题解答

In the stock data crawling project, what techniques did you use to ensure the accuracy and efficiency of your data collection methods?

"The interviewer asked about the techniques I used to ensure the accuracy of data collection in the stock data crawling project. Here’s how I approached it:

First, let me break down my answer into key points.

1. **Data Source Planning**: I started by identifying reliable data sources, like East Money and other financial sites. Ensuring that the sites were consistent in providing accurate information was critical.

2. **Crawling Strategy**: I designed a systematic crawling method using Python’s `requests` library alongside MongoDB and Redis. This approach allowed for scheduled crawling, ensuring we captured the data daily without overloading the servers.

3. **Multi-threading**: I implemented multi-threading to parallelize data requests, which significantly increased the speed of data collection. It allowed us to gather vast amounts of data efficiently.

4. **Data Validation**: Finally, I used pandas for statistical analysis to verify data integrity and consistency. This ensured the data we gathered was accurate for our frontend displays and analytic needs.

In summary, through careful planning, efficient crawling strategies, parallel processing, and thorough validation, I ensured our stock data collection was both accurate and efficient."