Admissions | Pearson

In this section, I will talk about topics such signal processing and system identification, optimization and machine learning. As I mentioned in the home page, all these subjects are actually connected, because (simply), they all use the same main logic. Below, I will mention about how these topics are connected. Then, I will briefly explain the main methods and techniques (that I encountered the most) within each different topic.

I will also talk about the books that I consider the most useful for each topic.

Of course, probability and math knowledge play a significant role. I will also explain which books provides good explanations of the math and probability needed for signal processing and system identification, optimization and machine learning.

I hope you will enjoy :)

Introduction

As I understood so far, I would like to say that entire signal processing/system identification and machine learning are based on the simple following relation: an input interacts with a system and creates an output (see the figure at right). In most cases, the optimization methods come into play to find the parameters of the system that produces the desired outputs.

For example, the goal of signal processing and machine learning is to design the filter coefficients or neural network or machine learning methods’ parameters to generate the desired outputs. So, optimization methods come into play to find the most optimum parameters to generate the desired outputs. The goal of system identification is to find the parameters of the systems using only the output or both the input and output. So, optimization methods come into play again to find the most optimum parameters that represent the system. Accordingly. in signal processing, the system can be considered to be the filter coefficients; in machine learning, the system can be considered to be a neural network (e.g., that performs image classification) or parameters of another machine learning method; and in system identification, the system can be considered to be the dynamic characteristics of a structure or an engine (below, I will talk more about what the systems can represent for different applications).

Now the tricky part: system identification and signal processing can be very difficult to differentiate. This is because, most of the time, the signals have specific features (e.g., specific frequencies, shapes, decay rates, phases… ), unless the signal is a pure random signal. Signals can have such features due to two reasons:

1) The signals can interact with a system, so they change their features. As an example, wind can cause a building to vibrate, and buildings have specific resonance frequencies, so if a signal is recorded from the building, it will have the characteristics of the building’s vibrational behavior.

2) The signals can be initially generated with specific features. For example, each word you say will create different acoustic waves with different frequencies.

Therefore, system identification can be applied to signals that are produced as a result of their interaction with systems (which represent real mechanisms). However, it is possible to model the signals that are created with specific features as the output of hypothetical systems. The book “Digital Signal Processing and Spectral Analysis for Scientists” explain this approach clearly. Consequently, many signal processing books will contain chapters regarding estimating the parameters of the signals, which gives the parameter of real or hypothetical systems.

Therefore, in my opinion, signal processing can be considered as system identification for many applications. Therefore, I will talk about signal processing and system identification together. Then, I will explain the method and techniques that are commonly used in different topics and applications. In addition, I will explain more in detail what the system represents for different topics and applications and where/when the optimization algorithms are used.

One more note that: Even training of neural networks and machine learning methods can be considered as system identification, because, if it is supervised learning, the input and outputs are used to determine the system that produces the outputs, and then new sets of input are used to determine the outputs of the new input sets. I will also explain this more in detail.

Signal Processing and system identification

The common use of signal processing is to filter the signals at specific frequencies, obtain the frequency content of signals, enhance specific features in the signals, reduce the noise (the unwanted signal component in the measurements) and estimate the parameters of signals (which brings the system identification side of signal processing).

For filtering a signal at specific frequencies using low/high/bandpass filters or wavelets, or enhancing particular features with wavelets: the signal is multiplied with the impulse response function of filters or mother wavelets. So, it can be considered that input is processed with a system (where the system is the impulse response function of filters or wavelets) to create the desired outputs.

However, if we select the features of the output signal (e.g., dominant frequencies) which were generated due to the interaction of a signal with a system, we perform system identification. For example, if you have a random signal which interacted with a machine vibrating at a specific frequency and created an output signal (assume you do not know the vibrational frequency of the machine), you can take the Fast Fourier Transform of the output signal, and clearly see a peak in the spectrum which gives you the vibrational frequency of the machine. In reality, system identification cases are not that straightforward and they require input-output methods (to eliminate the effect of input if the input is not random) or output-only methods to eliminate the effect of the random input. But I wanted to again explain how all applications can be considered as the following relation: input-system-output, and how system identification and signal processing are connected.

You can also read or hear about methods called template matching and matched filter, whose goal is to enhance (or find) particular signals in noisy signals. They operate as follows: the template signals are multiplied (correlated) with unknown (noisy) signals. Therefore, template signals can be considered to be systems while the unknown signals are the input. The outputs are the noise-reduced signals of interest.

To learn about filters, and parametric and non-parametric methods (which I will explain below), and to grasp the idea behind wavelets, the book “Digital Signal Processing and Spectral Analysis for Scientists” is a very good book. For learning more about wavelets, the book “Wavelets and Wavelet Transformation – A premiere” is a very good introductory book (in my opinion).

About parametric and non-parametric approaches: I am sure that anyone who starts to get into signal processing and system identification, starts to hear about these approaches all the time. In non-parametric methods, the goal is to estimate the power spectrum of signals. Cross-correlation Power Spectra is one example of the methods that are used to estimate the power spectrum. However, the signals are usually finite in length, causing leakage (causing the signal power at some frequencies to leak to neighbor frequencies). In parametric methods, the signals are considered to be generated by a model with a known functional form, and then the parameters of the models are estimated. Autoregressive (AR) modeling can be the most basic example of time-domain (parametric) approaches. Parametric methods can also be used to estimate missing signal segments or future values of signals.

Before starting learning these methods, I was thinking that these methods are so complex and different. However, once the concept of these methods is understood, learning more bout parametric methods is easy and fun. The book “Digital Signal Processing and Spectral Analysis for Scientists” is a really good book to get into these approaches.

About adaptive filters: Adaptive filters are mainly used for two purposes (as far as I understood so far). The first purpose is noise reduction while the second purpose is system identification. Adaptive filters require both the input and output signals.

For noise reduction, a signal that consists of only the expected noise component in the recorded signal is fed into the adaptive filter, along with the recorded signal (which involves both the desired signal and the noise component that we need to get rid of). The filter coefficients are updated using optimization until the error between the filter output and the desired signal is minimized (this minimization is possible as the desired signal is not correlated to the noise component). Adaptive filtering for noise reduction is applicable only when the expected noise (that is correlated to the noise component in the recorded signal) is available. This method is mainly used in acoustic noise reduction applications (however, in this case, the desired signal component is a real noise signal that you can hear, and the unwanted noise component can be the unwanted noise such as background noise or noise caused by the instrumentation. Therefore it is also important to know that you call noise.)

For system identification, the input (of a system) is processed with the filter and the filter coefficients are updated until the output of the filter is close to the recorded output that is produced by the system. The filter’s coefficients represent the characteristic of the system. Therefore, adaptive filters are (in my opinion) the basis of input-output system identification and machine learning (as I mentioned above).

The book “Adaptive Filter Theory” is really a very good book to get into adaptive filters.

About estimation theory: When I started to get into estimation class and books, I thought it would be about estimating the parameters of systems such as specific frequencies of signals, phases, shapes. However, the signals can also be modeled using probabilistic distributions (e.g., consider normal distribution with sample mean and sample standard deviation). This is what confused me the most when I started to get into books and classes which had the word “estimation” in them. So, when you start to read about estimation methods, do not expect that it will be directly about estimating the parameters of a system, it is also about estimating the statistical parameters of distributions.

A very simple example (which took me a while to understand) is that the standard deviation formula for normal distribution is actually an estimator which estimates the dispersion of the distribution. They are also other estimators that can calculate the dispersion of the distribution. So, I was surprised and confused when the first a few homework of my estimation class was about estimating statistical parameters through estimators.

There are many different estimators for statistical properties as well as the estimation of parameters. This Wikipedia page is good about listing the most commonly used estimators: https://en.wikipedia.org/wiki/Estimation_theory

You will see that the estimators listed on the Wikipedia page are explained in many signal processing, optimization, and machine learning books as well as estimation books.

Kalman filtering (something you will also hear a lot since it is widely used) is a very commonly used optimal state estimator. Therefore, it fits under “estimation theory”. One main advantage of Kalman filtering is that it is a very good estimator in noisy environments (where unwanted signal components exist). Therefore, it is used in many applications for estimation purposes.

The book “Stochastic Models, Estimation and Control” gives a very good introduction to the stochastic models, basics of estimation theory and probability, and Kalman Filters. This book helped me a lot to understand these topics and see how they are different and also actually related.

About inverse problems and regularization: Inverse problems are about finding the system that creates a set of signals. This can be considered as system identification, and therefore it is actually used in solving the system identification problems. Since it is also about finding the system using a set of observations, optimization comes into play again as the optimal system producing the set of observations is found. Inverse problems are also used in X-ray-based tomography, acoustic source reconstruction. It has also other applications.

Regularization is a method used so that a model is not overestimated (overfitted), or it is used to solve an ill-posed problem (kind of the same thing actually 😊 ). It is easy to understand and visualize regularization. This link is very good to understand and visualize it: https://medium.com/greyatom/what-is-underfitting-and-overfitting-in-machine-learning-and-how-to-deal-with-it-6803a989c76

The best introductory book on inverse problems and regularization are “Parameter Estimation and Inverse Problems”. It helps understand these topics with good explanations and visualizations.

About system Identification: As a civil engineer, I worked on identifying structural systems so my focus was to identify the frequencies, shapes, and damping ratio of structures. I explained above how system identification is a big part of signal processing. Therefore, knowing the signal processing methods, basics of optimization and some probability knowledge is very required before getting into system identification topics. I particularly suggest understanding the following very well: digital filters, wavelets, the parametric and non-parametric signal processing, Bayesian methods (which I did not mention but it’s a widely used statistical interference), some optimization methods such as gradient descent, basics of estimation theory, adaptive filters, and basics of inverse problems. Then you can start to learn about system identification methods from books or research papers and it will be very easy.

To learn probability, Bayesian methods, and stochastic processes more deeply, I think that the following books are very useful: “Advanced Digital Signal Processing”, “Stochastic Process”, “Random Data”. The commonly used system identification books are “System Identification – Theory for User” and “Linear System Theory”.

Machine Learning

In machine learning, most of the time, inputs and outputs are used to train a model. This model can be considered to be the system in the figure at top of this page. Using the input-output and optimization methods, the optimum model (i.e., system) that creates the observed outputs is obtained. Then, new inputs can be used to estimate the new outputs using the optimized model. This operating nature of machine learning is actually very similar to adaptive filters, and as well as to system identification methods. Therefore, after a good knowledge of signal processing/system identification, adaptive filtering and optimization, learning machine learning is not difficult.

When you start to get into machine learning, you will hear a lot about supervised learning, unsupervised learning, deep learning, reinforced learning, clustering, classification. I will not into getting into details of these here, as you can simply google them and read some articles about them (which explain them simply and clearly).

In my opinion, the book called “Neural Networks and Learning Machines” is a very good book to get into machine learning. Actually, this book has the same author as the book called “Adaptive Filter Theory”. In my opinion, machine learning is a continuation of adaptive filtering (which I mentioned before).

Optimization

Optimization problems are usually evaluated based on different characteristics such as (i) the convex and nonconvex shape of the objective function, (ii) linear and non-linear behavior of the objective function and constraints (if there is any), (iii) constrained and non-constrained, (iv) local and global (i.e., having more than one local minimum).

For example, convex problems are usually non-linear optimization problems, and they can be constrained or not constrained by other functions (variables). Due to the convex shape of their objective function, they only have one local minimum which is also the global minimum. However, not all non-linear optimization problems are convex, nonlinear-optimization can be used, for example, when you do not know the shape of the objective function. Linear problems are not convex and there can be more than one local minimum. Both linear and non-linear optimization problems can be constrained and unconstrained. The following webpage gives a nice visualization to linear and nonlinear optimization problems: https://pediaa.com/what-is-the-difference-between-linear-and-nonlinear-programming/#Linear%20Programming

Accordingly, the solutions to optimization problems can be grouped as local optimization algorithms and global optimization algorithms. The former type of algorithm is generally gradient-based (some non-gradient methods are also available), which simply uses gradient information to find the local optimum solution. To use local algorithms when there are constraints, an unconstrained objective function) is created (in most cases by embedding the constraints into the objective function using penalties. The most commonly used local optimization algorithms that you will see in signal processing and machine learning books are Newton’s method, Gradient Descent, Conjugate Gradient Descent, and Stochastic Gradient Descent.

Global optimization algorithms are used when then there are multiple local minimums and the goal is to find the global minimum. These algorithms are grouped into evolutionary and deterministic algorithms. Usually, the first type of algorithms are used, and they are based on phenomena that can be observed in nature. This is why they are called evolutionary algorithms (and sometimes have funny names:) ). In this method, a set of initial points converge into the global maximum. The most commonly used algorithms that you can see in the books and papers are Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, and Ant Colony Optimization.

For nice and brief information about the local and global optimization algorithms, you can refer to the paper called “Review of Optimization Techniques”. A detailed review of gradient-based local optimization algorithms can be found here: https://ruder.io/optimizing-gradient-descent/

In addition, optimization problems can have more than one objective function, leading to multi-objective optimization. Please see my blog on this webpage to get more information about multi-objective optimization. This type of optimization has more than one feasible solution, therefore it can be more challenging and also more fun to solve.

THE BOOKS

Below, I grouped the books into signal processing, optimization, wavelets, probability/stochastic process, system identification, and machine learning. I made this classification based on the main context of the books. But, as I mentioned above, it is not so easy to classify the books separately as many approaches/methods are used in other approaches/methods. When you check the content of these books, you will see that the many topics are explained in different books, especially the ones related to probability, some math, signal processing, estimation methods, and also some optimization methods (almost half of the commonly used methods though 😊 ).

Therefore, it is important to read 1-2 books from each group to understand how signal processing/system identification, optimization, and machine learning works. I sorted the books in each group in a way that the first book is usually the best introductory book for that group. The books listed after the first book are very good books to follow after reading the introductory books since they will help get deeper into the topics with more theory.

I hope that the books that I listed, along with my explanations, will give you an overall simple picture of the worlds of signals processing/system identification, optimization and machine learning.

Note: It took me 8 years of grad school to understand these topics in a way that I mentioned in this page, so lean back and try to enjoy the process of learning 😊

Note 2: So, everything can be considered as Ax=b, where A is your system, x is input, and be is output (As Stephen Boyd says in one of its lectures). If you can write any problem in this form and manage to solve it (and minimize the noise if the measurements have noise), you can solve any problem. Therefore, we can consider everything as Ax=b 😊

Signal Proc./System Id.

Machine Learning

Optimization

The books

Signal Processing

Optimization

Wavelets

Stochastic Process - Random Data

System Identification

Inverse Problems

Machine Learning

Blind Source Separation

Signal Processing

Optimization

Wavelets

Stochastic Process - Random Data

System Identification

Inverse Problems

Machine Learning

​

Blind Source Separation