Overlay with image detection translation program

tiagocc0 · Jan 7, 2015

I tried looking for a "overlay with image detection translation program" but couldn't find anything conclusive*, someone might have some knowledge on this.

*I did find that google buy world lens company that did a realtime translation project that would show the translation right on the spot, but this is overkill to what I want.

Anyway, let explain it in detail:

The application would find the target window (let's say a japanese game), would then fix itself on it by copying it's position and size. Would not appear on the taskbar and would have a transparent background (would be able to show for example just a small image over the target window). The program application would not be able to be clicked, the mouse and keyboard commands would be applied to the target window only.

Then using image detection (recognition) it could for example find the japanese word Menu (like a button) by using an image cut from a screenshot of the own game window. Later it would put a new image with the translated text of the same size over it which would then look like the game was translated (for that word/button).

Does a program like that exists?

EDIT: Anyway, I was playing with some code today and I made most of it, just missing the image detection part and for that I could probably use autoit. I will look more into it later.

SCO · Jan 7, 2015

This exists, but i don't remember the name. Used by desperate degenerate basement dwellers playing eroge games, not that i would know anything about it.

tiagocc0 · Jan 7, 2015

I will look more into it then, google time!

Yeah, but it could be used for changing static images from any game, even from console games being played on emulators or old games on dosbox for example.
It would probably be slow, but for old games and mostly static games could be great.

EDIT: Using hentai in the google search yielded better results:
http://www.hongfire.com/forum/showt...e-galge-eroge-visual-novel-translation-helper
But that isn't exactly what I had in mind, there must be other applications closer to what I described.
This one looks more like a text-grabber-on-the-go-translator-overlay-text-changer.

tiagocc0 · Jan 8, 2015

I was looking for image detection c++ libraries to use, but they all look overkill for what I want to do.

I guess I would have three types of detection:
1. Using pixels, the most simple one, I could detect windows/scenes/screens by checking a few selected pixels at the screen, as long as the window keeps the same size from when analysed and when playing.
So I could know for example that the game is currently on the menu screen, and from there I could lay out the buttons translated or change any graphics.

2. Using a pre-fixed position and a small image (without opacity for now), I could check at that position if the image matches what's on screen. This would be slower but I would have more guarantees that I'm where I need to be.

3. Using an small image without a position, so the application scans the whole area for the image, this is where image detection would be useful, but as long as I don't use transparency I think I can do with a simple function:

Code:

QRect compareImages(QPixmap small, QPixmap big)
{
 qWarning()<<small;
 qWarning()<<big;
 qWarning()<<big.width()-small.width()<<big.height()-small.height();
 QRect rect(small.rect());
  qWarning()<<rect.x()<<rect.y()<<rect.width()<<rect.height()<<small.rect();

 for (int i=0;i<big.width()-small.width();i++)
  for (int j=0;j<big.height()-small.height();j++)
  {
   rect=QRect(i,j,small.rect().width(),small.rect().height());
   if (small.copy(0,0,1,1).toImage()==big.copy(i,j,1,1).toImage())
    if (small.copy(small.width()-1,0,1,1).toImage()==big.copy(i+small.width()-1,j,1,1).toImage())
     if (small.copy(0,small.height()-1,1,1).toImage()==big.copy(i,j+small.height()-1,1,1).toImage())
      if (small.copy(small.width()-1,small.height()-1,1,1).toImage()==big.copy(i+small.width()-1,j+small.height()-1,1,1).toImage())
       if (small.toImage()==big.copy(rect).toImage())
       {
        qWarning()<<"Found"<<rect;
        return rect;
       }
  }
 qWarning()<<rect.x()<<rect.y()<<rect.width()<<rect.height()<<small.rect();

 return QRect();
}

It's very simple, first it tries to find the first corner of the small image on the larger image, without having the small image going beyond the larger image boundaries.
After that it goes and verifies all the other three corners and then verifies the whole small image on that spot.

Now I will try to put a new image on top.
I have to be careful though, because if the image I put on screen is above the part that I use to identify the situation, then I won't be able to identify it again on the next iteration.
But for the third method it's important since I would be using the return Rectangle to position the new image on top.

I could make it into two phases, one to identify the position and another to maintain it and use the position found to overlay a new image.

tiagocc0 · Jan 9, 2015

So it works, slow as hell but works:

https://purpleorangegames.tinytake.com/sf/MzQxODhfMjkxODgw

You can watch on the site above and there is also an option to download the video, it's 3.8MB.
Maybe checking just a few pixels might be a better solution.

EDIT: Here is another:
https://purpleorangegames.tinytake.com/sf/MzQxOTFfMjkxOTIz

Changed the code to accept a point, if I pass a point then it checks if the small image exists just there instead of scanning the entire window.

EDIT2: A direct link to the second video:
http://purpleorangegames.com/wp-content/uploads/TinyTake09-01-2015-01-02-12.mp4

J1M · Jan 9, 2015

I have no personal interest in your project, but it is nice to see you moving forward with it and getting things done.

Have you considered a slightly different approach where instead of trying to constantly perform image recognition, you instead allowed for someone to specify the region they want translated?

ie. Hold down ALT and input stops going to the game and instead goes to your overlay frame (which can now be the entire screen and doesn't need to track a window) and the user drags a rectangular region to translate. The input would be similar to selecting a region in a paint program.

Unless you require real-time translation for some reason?

I also think the approach of replacing the detected image with a cropped image is misguided, it sounds like way more work than rendering the translation to a bitmap and showing that.

tiagocc0 · Jan 9, 2015

There are several translation software like that already, usually you have a translator program working on the behind, then the application tries to constantly capture the text of the game while copying it to the clipboard while the translator program translates everything that drops on the clipboard.

The problem is that I don't want to capture the text or perform OCR, while one requires specific knowledge of the game and how it was coded/compiled, the second would cause too much error and with realtime translation people would usually be happy with what a google translator like program automatically showed than actually translating the whole thing manually.

From what I have thought about, this program would be ideal to translate menus from games other than English, there wouldn't be a need to translate the whole thing, there are tons of games that I would play if they had at least the menu and buttons translated.
Even more so because most game use image for those making it impossible to capture the text, and also difficult for OCR depending on the art used to make the button.

The bonus part is that I would also be able to change any kind of image that I wanted in any game.

I'm struggling right now with the part where the image that I show is getting in the way of my screenshots, so if a text appears on the same area showing different phrases then this type of recognition wouldn't work unless I went with a complex set of pixel checking.

tiagocc0 · Jan 9, 2015

It seems I can use 3 methods.
BitBlt
PrintWindow
and Qt entire screen screenshot (which I'm using)

It seems if I use PrintWindow I can get the actual window even if there is anything above it. But it is very limited on what kind of programs it works and can even mess up the program depending on how it was developed.
I think I read somewhere that BitBlt cannot capture transparent windows, if true I could set my window to be slightly transparent which means I could at least capture the window without mine on top on the resulting image. It also seems there are some cases it doesn't work.
I think I'm going to use all three methods and let the user choose which one to use.

zeitgeist · Jan 9, 2015

You can do this in AHK, if you're going with the initial concept of having specific images replaced with other images with no smart OCR or anything. I'm not sure what the point would be though, since menu option positions are usually extremely easy to memorize.

One problem you might encounter is if there's something fully or partially obscuring the menu option graphics, such as various animated highlights that change in realtime, this will only work well if there's a static portion of the graphic you can focus on.

If you know the fonts used for text and you can set up an OCR program in the background so that it actually produces useful translations, you can do snapshot-based text replacement too.

tiagocc0 · Jan 9, 2015

Thanks, I will check AHK. (downloading it now) I have always heard about it but never tried, I would usually use autoit.

I can't memorize stuff, I'm thirty years old and I have yet to memorize one single music.
But I will use it mainly for menus, others could use for a full translation.

Let's imagine one guy wants to translate one game, entirely, full translation so I'm not talking about google translate like applications.
So the guy needs to hack the code, find where texts are stored, where images are stored which contain texts, probably replace or insert new fonts if the game only have like japanese fonts.
He could make tools to make the translation work go faster, let's say he did, all those tools will only work on this game, maybe on some other game developed on the same engine.
It required him a lot of time to understand the game, to make the tools and then he has yet to translate it. So we need at least one hacker.

Now, let's say the application that I'm making works, simple as I have imagined it, just putting simple images on top of the game as overlay, the most complex part of it is to recognize when to show those images.
Now any kid can take some screenshots, edit the image on photoshop and use my program to use it on a certain condition by checking a few pixels or checking an image.
Or course I still need to make it moddable and resolve some issues, but the possibilities (for those who would otherwise hack a game to translate it) are great!

Yes, if there is animation involved and I'm using the button for example as the recognition image then it would be a problem.
I'm thinking about using a third party library to have the option of matching similar images and not only equal images like my initial solution.
But if it is for buttons that are usually on the same place I could use any part of the screen that indicates that the button exist to fire the condition.

On the video I showed above the image that fires the condition is the paper on the table.
Since I'm taking screenshots and on dosbox the mouse is captured, if I hover the mouse over the paper the head image disappears since it won't be able to find the paper.
In this case what I could do is use two or three images that recognize the screen so even if one is obscured the other two could still indicate it's on the right place.

tiagocc0 · Jan 9, 2015

Testing BitBlt first on notepad, the good news is that even by showing an image on top of the window (on the left) I can still see the original picture (mirrored on the right).

EDIT: The bad news it seems that BitBlt and also PrintWindow doesn't work on dosbox..

adrix89 · Jan 9, 2015

I wish this functionality was put in VNR: http://sakuradite.com/wiki/en/VNR
VNR already has all types of text extraction including OCR and the only thing missing is image replacement.
It's open source I think if you want to try.

A couple of things about things to keep in mind for your program.
If all you want is menu translations its best to think in terms of how a GUI works.
Use customizable templates for all screens, even if you scale the window you can still find the relevant positions and check it by going around the relative position until you match the exact group of pixels.
With templates the worst case scenario you should not need to sample more then the horizontal and diagonal lines for any element.
Let the images that you replace have additional information like a tooltip. For example explaining what a particular skill does.

I'm thinking about using a third party library to have the option of matching similar images and not only equal images like my initial solution.

They usually do that with downsampling and thresholds.
To be fair you need fast solutions rather then approximate ones.

tiagocc0 · Jan 9, 2015

adrix89 Thanks, very interesting software, I will take a look at it, it might not be entirely possible to add this functionality there since all the overlay as done using Qt, unless maybe if I port VNR to Qt or discover how to do the overlay thing on whatever engine they use.
Anyway, I will look into it.

Yeah, I'm studying openCV for this purpose: http://opencv.org/
Very interesting idea about the tooltips, thanks! With those we could also use transparent rectangles to just show tooltips on mouse over as well, one more feature to the program!

crufty · Jan 10, 2015

why not just translate the artifacts directly--wouldn't that be simpler?

tiagocc0 · Jan 10, 2015

I didn't get your reply, what artifacts?

crufty · Jan 10, 2015

tiagocc0 said:
I didn't get your reply, what artifacts?

my bad; maybe instead of artifacts, resources.

button labels / menu items are text or image resources--why not change the resource at the source and avoid the complexity of image overlay?

tiagocc0 · Jan 10, 2015

Some games may let you modify those resources on easy to access places or by giving tools.
If not one need a hacker to find those resources that are usually compiled on the executable or are found on enigmatic resource files.

Overlay with image detection translation program

tiagocc0

Arcane

SCO

Arcane

tiagocc0

Arcane

tiagocc0

Arcane

tiagocc0

Arcane

J1M

Arcane

tiagocc0

Arcane

tiagocc0

Arcane

zeitgeist

Magister

tiagocc0

Arcane

tiagocc0

Arcane

adrix89

Cipher

tiagocc0

Arcane

crufty

Arcane

tiagocc0

Arcane

crufty

Arcane

tiagocc0

Arcane