A Review on Automated Music Transcription

I think we're talking about recording music and producing music written in standard indicators. The purpose would be for the musician to play the task while reading the music, or perhaps to consider it a better purpose, for educational purposes.



Occasionally, I receive messages from people saying they have solved or will solve the duplicate problem automatically. When I ask for more information, I usually find that they have written a program to do what I call "discovery notes" in plain content. This means recording music and discovering what notes are playing. It is not that difficult, as long as it is as simple as the piano alone and even the transcript! Already do this: Take a look at the piano roll. What they don't usually understand is that it is not a simple task to mark the quotation in the standard indicator, and in fact it is much more difficult than the basic discovery of the notes. When the musical content is more complex and more and more instruments are played, of course it can be difficult to find the note.

This note is for people and anyone who understands that the computer should be able to transfer music automatically from the recording.

Many people, especially if they do not have much experience in copying music, imagine this is a process of identifying notes that are being reproduced and translated into a map. This is not entirely wrong, but it is a dangerously simple scenario. Written music is a set of instructions for the composer, which the music will interpret as it sees fit. Therefore, the original purpose of the transcript is to try to decide which package of labeling instructions will compel the composer to listen to the music and write the instructions. The process of transmitting sound to an indicator is not really a translation, it is like reverse engineering.

To illustrate this, imagine that someone is sitting on a chair and has a piece of paper with some instructions. The instructions say:

Get up, rotate the whole circle and then sit down.
Lift your head with your right hand.
Clip three times.
Smile
The person follows the instructions, and the computer's job is to watch the video and try to figure out which instructions to follow. Without going into details, I'm sure you can see that it's not easy. If the computer produces instructions that say, "Move this leg in this direction, raise your hand", these instructions will be almost useless. We can imagine a man trying to take them, and finally say, "Oh!" You think you want me to turn. "For a successful copy, the computer Must identify (and identify) this person's intention.

Suppose the pianist plays a race, but some notes overlap because the second note starts before it is published. Should we make that bet? If we do, the result will be a disaster, difficult to read. We should probably run without overlapping. We should write intentions, not execution. A good musician doesn't play a rhythm that suggests a strict appreciation of note values. If they did, it would be a mechanical, unusual sound. To write a paid rhythm, we must find the intention, that is the meaning of the rhythm. It really is quantity and not easy to get right. Generally, knowledge of music style and specific instrument is required to achieve useful transcript. For example, if the guitar breaks, the full transcript of every note played may not be appropriate. Instead, the rhythm chord symbols in the rhythm would be more useful. It would be very complicated if we presented you with a complete transcript of each note with the guitarist (because notes in the chord are not played at the same time), and we can imagine that, oh, you say Wish you would like me to play the central G rag.



All rhythms are interpreted in relation to the rhythm of the rhythm, so before you start thinking about interpreting the rhythm, you have to decide where the rhythm and the measurement are. This can be easy for some content, for example, if there is a clashing drum that plays the first battle of every move, but generally, it is not necessary on the computer. Even musicians can find it difficult if they do not know the style of music. And if you're wrong, your quote won't make much sense.

The standard music notification system has been running for hundreds of years and is being developed to ensure that everything played in Western music has a traditional representation on the site. For example, writing a chord composed of B, EB, and F # is almost certainly wrong, even if the note is correct. It should be B, D #, and F #, or CB, EB and GB. Similar themes apply to all aspects of locks, key signatures, time signatures, and really written music. There are always some ways to look at something that you can read and there are infinite ways to get it wrong, which causes musicians to panic when you ask them to play it.

And now here's a thought provoking: All the problems I've talked about so far apply, even if the computer starts with a very accurate list of notes. Is. But when the computer starts with audio recording only, with the simplest material .a, this list of notes shows that there will be many errors. This will increase the number of problems several times. I'm not saying that it's impossible, of course, a well-educated composer can do it, but I'm saying it can't be done.

Then you need to ask what the outcome of your program will be used for. Aside from the simplicity, there are many other bugs or notes that can be technically correct but irreversible. Therefore, the first thing a user needs to do is fix them. This means that the user must duplicate so that his program does not change the skills of the human transcriber, the best thing for him is to save some time. This can be useful if your program manages to build something close enough to be used as a starting point. There are already several programs trying to do this, see here, but I haven't heard of anyone calling them useful.


Of course, if your goal is to create MIDI, then the problem is minimal. MIDI doesn't differentiate between D # and EB and doesn't ask how to make locks anonymous. Their main problems are identifying notes in the audio and detecting the rhythm and deciding how often to measure (tape). It's also not easy, though it is possible in some content. However, if you are preparing the cells, you should ask yourself if this is useful and if so, for whom? Note that if you do not correctly identify where the rhythm is and how many beats are in each bar you will find a MIDI file that can play well, but entering the MIDI editor will be difficult to understand. ۔ The rhythm problem goes away if we talk about a game that was played on the click path of MIDI, and in this case some content can yield useful results. But then we're no longer talking about the general problem of transcribing music with existing audio recordings.
Previous Post
Next Post

Tech Trends from News to Technology.

Related Posts