Lots of good rules for feature development. In general, be prepared for change. ML is an iterative topic, and your codebase should constantly be in a state where you can modify it and be sure that you`ve had a positive impact not only on statistical metrics, but also on business goals. Rule #03 is one of my favorites. Some people think that rule-based systems are somehow simpler than ML solutions. This is not the case. At a certain level of complexity, it is easier to form models than to deal with convoluted relationships between rules. If you want heuristics, you can implement them through feature engineering. You`ll avoid many issues in Google`s rules if you`re forced to face technical and data challenges right away. [1] sites.google.com/site/wildml2016nips/SculleyPaper1.p. But sometimes, “most views” are a poor substitute for “interesting content for our potential customers.” Instead of designing a large number of business rules, you need to create a machine learning model. The rules focus on “traditional” models of machine learning and data science. For example, an entire chapter is devoted to feature engineering, a topic that is less relevant with deep learning models requiring less feature engineering.

Usually, the problems that machine learning is trying to solve are not entirely new. There is a grading or classification system or the problem you are trying to solve. This means that there are a number of rules and heuristics. These same heuristics can give you a boost when optimized with machine learning. Their heuristics should be broken down after all the information they have for two reasons. First, the transition to a machine learning system will be smoother. Second, these rules usually contain a lot of intuition about the system that you don`t want to throw away. There are four ways to use an existing heuristic: Most of Google`s rules involve investments in data processing infrastructure. Make sure your training data doesn`t fit into your test data. Make sure you have good test coverage for your data pipelines and get notified when something changes. Quality ranking is an art, but spam filtering is a war. The signals you use to determine high-quality posts will become obvious to those using your system, and they will optimize their messages to have those characteristics.

Therefore, your quality ranking should focus on ranking content published in good faith. You should not strongly disparage the learner of the quality ranking for ranking spam. Similarly, “racy” content must be treated separately from quality ranking. Spam filtering is another story. You should expect the functions you need to generate to change constantly. Often there are obvious rules that you put in the system (if a message has more than three spam votes, do not retrieve it, etc.). Each learned model must be updated daily, if not faster. The reputation of the content creator will play a big role. Machine learning is useful for creating models that succinctly express complex business rules. You don`t have to (or don`t want to) use machine learning for simple situations. Nine rules on one topic, and most of them are incredibly wordy.

I think that shows how important the issue is. There will be signs that the second phase is coming to an end. First, your monthly profits will decrease. You`ll start to have trade-offs between metrics: you`ll see some go up and some fall in some experiments. This is where it gets interesting. As profits become harder to achieve, machine learning needs to become more sophisticated. One caveat: this section has more blue sky rules than previous sections. We`ve seen how many teams have gone through the happy days of machine learning in Phases I and II. Once Phase III is reached, teams must find their own way. What latency is allowed? Can engineering pre-process functionality in the background with Javascript while the user waits? For example, in linear, logistical, or Poisson regression, there are subsets of data where the predicted mean expectation is equal to the mean label (calibrated at 1 moment or calibrated only). This is true assuming you don`t have regularization and your algorithm has converged, and it`s usually pretty much true. If you have an entity that is 1 or 0 for each example, the set of 3 examples where this feature is 1 is calibrated.

If you have a feature that is equal to 1 for each example, all examples are calibrated. There are different ways to combine and modify features. Machine learning systems like TensorFlow allow you to pre-process your data through transformations. The two most common approaches are “discretizations” and “crossovers”. Rule #15 refers to a case of convoluted ML where different ML issues are mixed into a single product. They interact with each other, have different contexts and need to be trained and used differently. On the other hand, it`s very Google-centric, so I`ll just link it here so the reader can find analogues to their own work: [link]. If you really want to get user feedback, use user experience methods. Create user personas (a description can be found in Bill Buxton`s Sketching User Experiences) at the beginning of a process and run usability tests later (a description can be found in Steve Krug`s Don`t Make Me Think). User personas involve the creation of a hypothetical user. For example, if your team is all-male, it might be helpful to design a 35-year-old female user persona (with user features) and look at the results it generates, rather than 10 results for 25- to 40-year-old men.

Involving real people to observe their reaction to your website (local or remote) in usability testing can also give you a new perspective. The main problem with factored models and deep models is that they are not convex. Therefore, there is no guarantee that an optimal solution can be approached or found, and the local minima found at each iteration may be different. This variation makes it difficult to judge whether the impact of a change on your system is significant or random. By creating a model without deep functionality, you can achieve excellent basic performance. Once this baseline is reached, you can try esoteric approaches. Rule #20: Combine and modify existing features to create new features in a human-understandable way. Before moving on to the third phase of machine learning, it`s important to focus on something that isn`t taught in any machine learning course: how to examine and improve an existing model. It is more an art than a science, and yet there are several anti-models that it avoids.

Prolego is an elite consulting team of AI engineers who guide the world`s largest companies in AI transformation. Prolego helps its clients reach AI abundance – the inflection point where a company uses AI to unlock trapped value for exponential growth. Rule #24 (delta measurement) is an excellent point. Combine that with an assessment based on different segments like in point III.6 of [here] and combine that with a clear iteration based on the architecture, and you have an extremely powerful workflow. Rule #29: The best way to make sure you`re exercising while serving is to store the features used at the time of service and then transfer those features to a log to use at the time of training. A simple heuristic can throw your product out of the door. A complex heuristic is not maintainable. Once you have data and a basic idea of what you want to achieve, move on to machine learning. As with most software engineering tasks, you need to constantly update your approach, whether it`s a heuristic model or machine learning, and you`ll find that the machine learning model is easier to update and maintain (see rule #16).

Some members of your team will be frustrated by system features they don`t like, which aren`t captured by the existing loss feature.